2: Importing and transforming data using Power Query Editor
2.14 Organize and manage queries
2: Importing and transforming data using Power Query Editor
2.14 Organize and manage queries - Video Tutorials & Practice Problems
Video duration:
15m
Play a video:
<v Instructor>In this sub-lesson,</v> which is the final sub-lesson in our discussion of the Power Query Editor, we will first talk about query best practices. We'll discuss disabling query loads from making it into the actual data model itself. We'll talk about creating and managing groups so you can better organize your queries. We'll discuss managing query properties. Then we'll take a look at applying step properties. We'll look at rearranging applied steps, which we've already taken a peek at. And we'll also view our query dependencies. So first, let's take a look at query best practices. So a few things to remember that we had covered through some of the sub-lessons here were to number one, bring only the data that you need into Power BI. So a couple techniques that you can use to do that are to first remove the rows that are not needed. So if you've got rows of data that are not needed for analytics, don't bring them in. Secondly, you can go through and remove columns. So if there's columns that you have at data source that you do not need, once again, do not bring them into Power BI unless you actually need them. Next best practice is to go through and choose appropriate names for your columns. And lastly, what we wanna do is, on our columns, is go through and ensure that we choose appropriate data types on a column-by-column basis. These four principles will take you a long ways in creating the most efficient and effective queries that you possibly can inside the Power Query Editor. Okay, so now let's go ahead and open up our Power BI desktop file that will support this particular sub-lesson. So we can see our GitHub files here, and I'm gonna open up the file here that is sub-lesson 14. So let's just go ahead and open that desktop file up, and we will wait a moment. Okay so we've got our Power BI desktop open. Once again we're gonna launch the Power Query Editor by using the transform data command, and we are brought into the Power Query Editor. So the first thing that we're gonna do in this particular sub-lesson's demonstration, is go and organize our queries here. So right now, we can just see there's a group here called Other Queries, and everything right now is just folded in underneath. What we're gonna do here is create several different groups for us to put our queries in. So the way that we're going to do this is we're going to right-click in the empty space down here and say New Group. So I'm gonna go ahead and add a couple different groups in this. I'm gonna create one called Facts. I will then go and create another group here called Dimensions, like so. And then I'll go and create one last group called Staging, like so. Okay, so now I have three different groups that I can use to organize my queries. So we have not yet really discussed the term facts or dimensions, what they mean, we're gonna talk about them in lesson number three, but back here in the Power Query side, we're gonna have to start arranging our things using the terminology that we'll be using in dimensional modeling on a go-forward basis here in Power BI. So what we're gonna do here is any of the queries that are used to average average, sum, min-max or count, those are queries that we do aggregations on, and those are called facts. So the only real fact query that we have here is this one called Sales. So I'm gonna click on Sales, and I'm just gonna drag it up and drop it into that fact group right there. The next group I have here is what are called dimensions. So these are the ways that we're gonna filter and aggregate and label reports that we go ahead and build in Power BI Desktop based on the fact tables. So the dimensions that we have in our model are Order Date, so I'm just gonna put that up here. Product, that goes in here as well. Sales Territories, that goes there as well. And that should do it for our dimensions. The other queries we have here are what are referred to as staging queries. So they only come as far as the Power Query Editor, and they're used to help the transformation process out. That's it, so they're not going to work their way into the actual data model itself. So we're just gonna take Country Population Data and put it in Staging as well, because it was just a sample to show us how to pull web data down. The United States Sales, we're gonna move that into Staging as well, as we will with Product Level and Other Countries Sales. So now I have all of my queries organized into different groups. So what I do here is just quickly click Close and Apply, and then we're gonna go back to the Power BI Desktop and show you what things look like from the desktop perspective. So let us go ahead and let things load up. And then what we're gonna do here is concentrate on the Fields pane on the right-hand side here. So let's go ahead and take a look at this. We can see that we have a number of tables being delivered from Power Query, and some of these tables, we don't actually need. So Country Population Data, we don't want for doing our further analytics. Other Countries Sales, we don't want it, because that data is actually inside of Sales. Product we want to keep. Product Levels we can get rid of, because the two columns we cared about inside Product Level are inside the Product table. The Sales query we want to keep, or the Sales table, I should say, we want to keep. Sales Territory, we want to keep, and United States Sales, once again, we can get rid of this table, because its data is inside of this table here called Sales. So let's show you how to go ahead and keep these tables from being delivered into the actual data model itself. So I'm gonna click on the Transform Data tab here, and we're gonna get back into the Power Query Editor. Now, these four queries here that I put in the Staging area, we actually don't want these being delivered back or over into the actual data model itself. So I'm gonna do is right-click on each one of these things and go down to this Enable Load checkbox. It's gonna give me a warning that disabling this load from the data model may break reports or visualizations. Good news is we haven't actually created any visualizations yet, so we're not worried about breaking anything. So I'm gonna go ahead and click Continue. And then I'm gonna go ahead and do that for these other three queries. Okay, so I repeated the same steps that I did on Country Population Data for the other three queries. Now we can see that all four of these are italicized, which means they have been disabled from being loaded over onto our data model. So what we're gonna show you here now is I'm gonna click Close and Apply here again and show you what happened to our data model. So let's go Close and Apply, and let Power Query go through and do the loading of the data into the model. And what we're gonna see here now is we're left with four tables and four tables only. And these are the four tables that we care about from an analytic perspective as we go forward in this particular class. Okay, so I'm gonna move back now into the Power Query Editor to show you what the data model looks like here now. And there we go. So we've actually got our queries now grouped into appropriate groupings, and we've turned off some of the queries that we don't want to actually get delivered to the data model, these four queries in particular. So this is a fairly small model. We're not bringing lots of data in here, but we can already see how this is starting to grow on us a little bit, and getting into some good organizational habits early on will certainly pay dividends as our models start to grow. Okay, so next, let's focus in on some other things we can do from a housekeeping perspective to our queries. So I'm gonna focus in on the Product query this time. And now what I'm gonna do is right-click on Product queries, and go down to the bottom here to Properties. Now the Properties dialog will come up and here's where I could rename the query if I wanted to, and actually add a description. So I'm gonna add a description in here for this query, just so that we have a little bit of metadata around just in case somebody else comes in to support this file, or I want to come in at some later date and see what exactly happened on this, and maybe where did I get this file from? So I just dropped a comment in here saying this file comes from Bob in marketing and it will be delivered monthly. So now, if somebody comes into the Power Query Editor and wants to understand where the Product query is coming from, at least we have some commentary in here. Now, is this verbose enough? Maybe, maybe not for this example here, but I just want to show you the ability to actually add some descriptions in. And if I wanted to, I could actually uncheck this checkbox right here if I didn't want the Product query to get delivered to the data model. So I have the ability to enable and disable that here in the actual Query Properties. I don't want to do that. I'm just gonna go ahead and click OK, 'cause I've done everything want to do to this. So you could go through each one of the queries here and put a description around it and build that documentation up directly inside of the Power Query Editor. In my mind, that's where the documentation really belongs. There's certainly nothing wrong with outside or external documentation. Maybe you have some Vizio diagrams, some Word documents that support how the solution was all pulled together, but the nuts and bolts of how all this work really belongs inside or Power Query, because this is where people are gonna go first to start dissecting things, and if all the information is there for where things come from, then you're in a much better spot from a supportability standpoint. All right, so next we're going to draw our attention over here to the Applied Steps, which are part of the Query Settings. And what we're gonna do here is we can do something similar in that we can actually rename these applied steps and give them a description. So what I'm gonna do here is I'm just gonna focus in on this applied step that was called Cleaned Text. And if you recall, this enabled us to get rid of some of the special characters and character term length feeds that were sitting in our data. So what I'm gonna do here is I'm just gonna right-click on Cleaned Text and go down to Properties, and that will go ahead and bring up the Properties dialog box, and here's where I'm gonna rename the step so it makes a little bit more sense and actually add a description into things as well. Okay, so I had done is I renamed that step from Cleaned Text to Removed Non-printable Characters and added a description here as well that says data source was found to have non-printable characters that we needed to remove. So I've added a little bit more description around this particular step here. So I'm gonna go ahead and click OK. And give it a moment, and then it's gonna make my Applied Steps pane a little bit bigger here, the Query Settings pane. And I can see now that the step has been renamed to Removed Non-printable Characters, and if I hover my mouse over top of it, I'm gonna see the commentary that I put in there as well. So each one of these steps can be documented individually. And in order to build an application that's truly maintainable over the course of time in my mind, it's a really good practice to go through and rename these steps as they're created and put some documentation in where appropriate to help support your application. Okay, and lastly, what we can do here is if we wanted to, we could move some of our applied steps around. So this is something you have to tread cautiously around, because you could break the dependencies that are set up in this applied step. So if I wanted to, I could move this Removed Columns from here up to the previous step. That looks like it went and broke things, so that's not necessarily a good move, so now I'm going to go ahead and take that and move it back down here. So I have the ability to move things around. I just need to be careful about how I'd move things around, because you have to keep in mind, you're setting up a dependency chain here. So what we can also do is we can actually go through and remove certain steps. So we've done this in the past here. I'm just gonna click on this Removed Columns, because the two columns we got rid of were the Length Before and Length After, so I'm just gonna go ahead and click that, and I'll go through and actually repeat the step, but I wanted to show you that you have the ability to actually go ahead and remove steps from the applied steps process. So here, I'm gonna go ahead and just say remove those columns. And there's my applied step added back in, once again reminding you of the fact that there is no Undo button inside of the Power Query Editor, so you have to be careful about how you go ahead and do your work here. So one other thing I wanted to show you as well is you can see that the Product query has a lot of applied steps here. So if you want to see the culmination of all the M code that was developed, we showed you how you can see it as we go through each applied step in the formula bar here, but what if we want to see what all of it looks like? What I can do is go to the Home tab, and under the Query Group here, there's a command called Advanced Editor. If I click on that, that will bring me to all the M code that was generated as we went through and built this Product query up and did the transformations. So you can go in here and play around with this M code if you see fit. Not something we really advise on too early in the process, if you're just getting used to the Power Query Editor. We tend to recommend you stay away a little bit from the M query language until you get comfortable with Power Query itself. But as you progress in your Power BI journey, you will eventually come to a point where understanding the M language is certainly handy to- Handy as you do more work with the Power Query Editor itself. Okay, and lastly what we want to show here is, in the View tab, there's this group over here called Dependencies, and the command called Query Dependencies. If I click on this, I can see all of the data sources that I have in my file and how they were all brought together. So let's just suppose, let us go in in a little deeper here to zoom in here. If I want to take a look at this Sales query right here, I can see how it was all built up. I can see that some of the Sales query came from the United States sales, which has had its load disabled, and its source was this file right here called unitedstate.xls. So I can see the dependencies here. I can go over here and see the Other Countries Sales query came from a folder of data. So I can see how everything came together to build out sales, which is quite nice. I can also, if I want take a look at the Product query, so keeping in mind the Product query was built from the Product file and Product level, so I can see those two dependencies there. And I can also see the commentary that we added into the Product query as well to remind me this file came from Bob in marketing, and that we're gonna deliver it on a monthly basis. So the one nice thing here about, or one of the many nice things about query dependencies here is that it lets you see visually how everything has been pulled together inside of the Power Query Editor. I'm just gonna go ahead and close that. All right, so that was just a brief tour through some of the things that allow you to do a little bit of organization inside of the Power Query in terms of documentation, putting your queries into proper groups, naming your applied steps, and then pulling that all together and showing the query dependencies. All of these things here go a long ways towards creating an application that is a lot more maintainable over the course of time.