4: Utilizing Best Practices for Preparing and Analyzing Data
4.6 Identify required data structures and functions
4: Utilizing Best Practices for Preparing and Analyzing Data
4.6 Identify required data structures and functions - Video Tutorials & Practice Problems
Video duration:
10m
Play a video:
<v Instructor>With an appropriate model design in place,</v> we need to now analyze the layouts of a data preparation sheet and determine an appropriate hierarchy for our naming convention, identify the required inputs and any constraints required for data entry, leverage your Excel skills to select appropriate data structures for accessing and processing our data, and also determine how our formulas may be copied, moved, or filled within our model so we can include appropriate cell reference types. Now let's review these items in detail. The content on our prep budget sheet can be broken down into multiple columnar groups. These include; categories, actuals, budget, forecast, budget radiation, and finally, calculated budget. We can further break down these logical groups into another hierarchical level referenced by our level two column headers. For example, budget year for calculated budget, base metric, base value, percent change, and dollar change for budget variation, current year for forecast and so on. Similarly, the content can also be divided into horizontal groups for revenues and operating expenses. So with this structure in mind, we can define our naming convention using the falling acronyms for our level one column headers and names for our level two column headers. We can also use the rev and apex codes for our horizontal groups. So when naming specific items within our content, we can use the following syntax, the sheet acronym PB followed by an underscore, then the acronym for our level one column header, which is A in this case for actuals, followed by another underscore then the data structure prefix, like range or spill, followed by our level two column header, which in this case is two years ago. Finally, we can add the rev or apex suffix to the name, in order to identify the specific horizontal group for this object. For consistency, we can easily use this name for similar items within another horizontal group. For that, we just need to change the suffix for the new name as relevant to the new horizontal group. For cases where the defined name spans across multiple columnar groups, for example, the totals rows, we can skip the columnar groups acronym within the name and continue with the rest of the syntax. These names can still be copied across other horizontal groups by just changing the suffix or when required, by changing the data structure prefix in addition to the suffix for our name. Please note that within this naming syntax we can swap the locations for our level one column header acronym and the suffix for the relevant horizontal group. However, it is usually better to have the higher group count within the hierarchical section of her names as opposed to the suffix. This helps to have a better organization for our names especially when reviewing names within the name manager list or, using the auto-complete list when typing formulas. Next, let's look at the required inputs for the sheet and any data entry constraints that we might need to include. For the calculations on this specific sheet we'll need some input variables as defined on the input sheet. These include; the analysis date, current year, and the budget year. Users will also be able to enter the percent and dollar variances within columns M and N which we can pre format appropriately. As for the base metric dropdown we have used Excel data validation to limit the responses and generate appropriate error messages in case of data entry errors. Now let's determine the appropriate data structures for this layout, which might identify additional inputs or define names that we will need for this sheet. To start with, let's look at the revenue and operating expense categories. With the latest versions of Excel 365, we have a couple of options here. We can either use a predefined manually enter list of categories, or we can use dynamic array formulas to automatically generate and update the relevant lists based on the source data. Since Emily's company maintains the latest copy of Excel 365, we will use dynamic array formulas for this model. Now let's mark this data structure within our model. Now since we have used a dynamic array formula for our categories on the left, we can author the forum less for other calculated values in the same row by referencing the spilled ranges for our categories. By doing so, we will effectively write dynamic array formulas for all calculations in this row. This will also maintain the same vertical sequence for the categories and their corresponding calculated values. So we can now mark these spilled ranges on our layout. Similarly, we can identify the spill ranges for our operating expense data. In addition to these bill ranges we will probably need to define additional range names for the manually enter data within our rows. This would include; the base metric, percent change, and dollar change fields on this sheet. Similar to the calculated values, we will need separate ranges when they have these columns for the revenue and operating expense categories. So with that, we have now identified most of our data structures for the sheet. Next, let's look at some expected functions and the copy, move, and fill patterns for our formulas. Since I've already reviewed our source data, I know that we will need to use the unique and filter functions for our category arrays on the left. We may also want to combine these functions with the sort function to list our categories in an alphabetical order. Next, our actuals and budget calculations will most likely use the sumifs function as we need to aggregate the source data while filtering with the category spill range on the left and the corresponding year at the top. It is important to note that in this case, we will need to reference the spill range on the left using absolute cell references or define names so that it doesn't change as formulas are copied down or across. Additionally, the year at the top will need to be referenced using a combination of absolute and relative cell references so that the column reference changes when the formulas are copied across, but the row reference does not change when the same formulas are copied down. Now let's record an expected function for the actuals and budget calculations. Moving further to the right, we will need some date functions for our forecast column as we need to prorate our actuals for the number of days left in the year beyond our analysis date. The base value calculation we'll probably need to use the switch function as we qualify the selected base metric before returning an appropriate result. And finally, our calculated budget formula, we'll use the logical if function to determine if the user has entered a dollar change value to then appropriately calculate the resulting budget value. Now once we've authored all our calculations we will probably define names for all spill ranges so they can be easily referenced for further analytics. We should also add define names for all spill ranges created for the calculated values within each columnar groups of our layout. This will ensure that our references do not get updated as formulas are copied across or down and that the formulas are easy to author and support in the future. So with that, we now have a good understanding of our model design, data structures, and some key functions we want to use. Now let's use this information to start populating our model with the desired data.