2: Importing and transforming data using Power Query Editor
2.9 Manage columns
2: Importing and transforming data using Power Query Editor
2.9 Manage columns - Video Tutorials & Practice Problems
Video duration:
5m
Play a video:
<v Instructor>In this sub lesson, we're gonna explore</v> how to find columns in your queries, because often the number of columns in a query may be quite large and can make it quite difficult to find columns, and we're gonna explore techniques to remove columns from our queries so that we're delivering the right number of columns over to our final datasets in the actual data model. So what this looks like, once again, we have five columns in this query, and there are seven rows. But we're not really concerned about the rows of data this time, 'cause we, in a previous sub lesson, removed rows. This time we're concerned about getting rid of columns that we don't need. So let's just suppose that columns four and five contain information that has no analytical value. So what we can do is take advantage of the techniques inside of Power Query to get rid of those columns and pare the data set down so that we only bring the calls of information in that we actually need. So let's go ahead now and get our Power BI desktop file open that supports this particular sub lesson. Okay, so now, in the GitHub files, I'm going to go ahead and open up the file that corresponds to sub lesson number nine. So let's go ahead and double click on that and wait for the Power BI desktop to open up. Okay, so now that my Power BI desktop file's open, I'm gonna focus on getting myself into the Power Query Editor through the Transform Data command. And I want to go down to the other country's sales query here to get rid of some of the columns out of it. The columns that I want to get rid of are source.name. And if I scroll over just a little bit more here I want to get rid of due date, ship date and English promotion name. So there's a couple different ways to go about removing these columns. What I'm gonna do here is I'm just going to click on the control key in my keyboard and click on English promotion name, ship date, due date, move my scroll bar over to the right and source that name has already been selected. So what I'm gonna do here now is I'm going to go over to the manage column group of commands here, off the home tab and click the drop down box here, and save remove columns. So I go ahead and do that. The columns that I just selected are actually removed out of my dataset, which is ultimately a good thing but I want to talk about a better more durable way to actually manage this process. So what I'm gonna do here is I'm actually gonna remove this remove columns step right here. So I'm just going to click the little X right next to removed columns, to get us back to the state we were in previously. And that is something that you can do in your applied steps. You want to get rid of something, you can go ahead and just click the X, and that gets rid of a step. One of the thing to note about the Power Query Editor, which I haven't mentioned yet, is that there is no undo and redo buttons. So you need to be careful about the things that you're getting rid of. 'Cause there's no really easy way to go ahead and recover the steps that you dismissed. So let's go back into this example here and we still want to get rid of those four columns. Let's do it a different way here. So we can do is go up to the manage columns group, and under choose columns here, we can click this dropdown box right next to it. And when an opens up we can click on this option here called choose columns. And what this allows us to do is actually explicitly choose the columns that we want to keep in our dataset and remove the ones that we don't. So what I'm gonna do here is I'm gonna click on source.name, due date, ship date and English promotion names. So the ones that actually want to keep we're actually instead of doing it the other way, we're saying here's the ones that want to delete. Now I'm seeing here's the ones I want to keep, so I'll go ahead and click, Okay. And you'll notice in the applied steps, the step has a different name now, it says removed other columns. And the reason this is the preferable way of doing things is let's just suppose that our other country's file suddenly gets an additional column across all the files. So maybe there's a new column called ABC one, two, three and that's a column we actually don't want over in our data model. If we had used the first technique where we went and explicitly got rid of columns by choice we chose the columns we wanted to get rid of and said, remove columns. That column that gets added in the data set, would flow its way into our data model, and we may not notice it, if we use the technique which we just use here where we did the choose columns and select what we actually wanted to keep to the dataset, that would prevent that brand new column the gut added to our files, from being moved through the Power Query Editor into the actual data model itself. So it's actually a little safer way to actually go ahead and remove your columns. So the last thing that we want to show you here, is the way to actually go ahead and find columns. So none of these queries are particularly large but if you had a query that was pulling in data that maybe had 30, 40, 50, 60 columns or maybe more, it can often be hard to go and find the columns that you want to get to. So once again, under this managed columns grouping, and click choose columns and there's a box of that allows you to say, go to column. And in here we can actually use it to go and search out the column name we want to find. We can go and use the search bar here to go ahead and find it. We can resort the order based on name, as opposed to the natural order of that being stored in the file. So a couple different ways to actually get at the columns themselves (indistinct). This is really handy when you have a query that has an awful lot of columns, and you're trying to find something and you want to avoid scrolling right to left constantly to go and search and eyeball something out. Okay, so quite simply, that's it for this particular sub lesson. Now you've learned how to remove columns from information from our quarries, which is another best practice that we want to implement is only bringing the data in that we actually need to see. We've already seen how to do that through rows. Now we've done it through columns.