2.7 Remove columns - Video Tutorials & Practice Problems
Video duration:
7m
Play a video:
<v Illustrator>One of the first things we encourage</v> people to do when bringing data in from a source system is to only bring the data that you need. So that means removing columns and filtering rows. In this particular sub lesson, we are gonna look at removing columns of information. So there are a few different ways to go about performing this particular task. So let's begin. Okay. So the first thing we're gonna do is we're going to go in and choose the columns that we want to actually go and remove from our data source. So, your first temptation may be to do something along these lines. So let's say that we didn't want the due date and we didn't want the ship date and purchase order and a couple of the columns that are through here, which we'll go ahead and pick on mass. So if we wanna choose multiple columns to get rid of what we can do is highlight on the column. So click on the column names. So it turns green like this, and we can see it and then click the control key down and start choosing the columns that you no longer want to see as part of your data set. So due date, ship date, I'm gonna get rid of the purchase order number. I'm going to get rid of the account number, scrolling over a little bit more. I am going to go ahead and get rid of the store number. Then I get rid of person full name. Then I go ahead and get rid of this National ID. I'm gonna leave our product number. I'm gonna get rid of this order total and leave unit price and order quantity. So the only columns I really wanna see are the order quantity, unit price, the product number, sales territory, the order status and type which I have right here. And then my order date. So what I'm gonna do here is I'm just gonna go ahead and go to my home tab and go to the columns, manage columns group right here and click on this remove columns. And I'm just gonna say remove columns. And just like that, we'll see an additional step was added into my applied steps called remove columns. And now you can see that I only have the columns that I wanted to keep. So the ones that I did not highlight back on the previous step. So this may be your first temptation when wanting to going ahead and getting rid of columns. But this isn't necessarily the best way of doing things. 'Cause what you've done here, is you basically said, I want to get rid of these columns and you've gone through and declared the columns that you want to get rid of. But what happens now with this data source, if suddenly a new column is added back at source. And you actually don't want that in your dataset? Well, the problem is that new column will show up at source, come into our query now, and it will actually be included in this list because we didn't explicitly remove it. Now, we may be able to go back into Power Query and remove it at a later date, but we don't necessarily want that type of maintainability challenges built into our queries. So what we're gonna do here is I'm actually going to get rid of this step right here. So I'm gonna go to this remove columns that I had right here, and I'm just gonna delete this step. Go ahead and do that and now our data's back at the state it was when we first started this exercise. So now we'll show you the preferred way of actually doing the removal of columns. What we're gonna do is we're gonna tell the creator that the columns we want to keep and say remove the other columns. So, what we're gonna do here is we are going to first do it the way you maybe tempted to do it and then I'm gonna show you a more efficient way of doing it is I could go in here. If I know I wanna keep the order date and I'm going to hit the control key, and I'm gonna keep the order status type. I know that I also wanna keep the sales territory right here. I also want to keep the product number. I also want the unit price and order quantity. So I'm just clicking the control. And these are the columns that I wanna keep. So, with that, what I can do is I'm just gonna scroll to the left. I'm gonna go to the remove columns and I'm going to say, remove other columns. So what that's doing is saying, I wanna keep these columns and get rid of all the other ones. The nice thing about this methodology here is going back to the example we used at the beginning, is if a data source actually had a new column added to it and it worked its way into Power Query, this methodology is gonna keep it from actually flowing into your target. So, that's a much, much nicer way of doing things. And will save you on the maintenance. Now, using the control key and clicking your way through all the columns was not so bad, but let's show you an even better way of doing things. So once again, I'm gonna go ahead and get rid of that removed other column step. And this time I'm gonna go under managed columns and click choose columns. So I go to choose columns and I'll pull up a dialog box, which is a whole lot easier to navigate than shuffling back and forth between all of the columns in the data preview pane. And in this case, what your gonna do is I'm just gonna turn off all the columns and now only select the ones that I want. So now I want order date, order status and type. I want the sales territory, the unit price, the order quantity. Oh, and I forgot the product number. So I want that one as well. So I go ahead like that and now I've got a list that I can go through and choose ones I want. One added benefit of this particular methodology of doing things, if I have a really long list of columns and a source, and I've worked with a lot of spreadsheets that have a lot of data and a lot of columns in them, I can actually use the search functionality go and find those particular columns I want to take a look at. Also, over here on the right hand side, before we finish up, if I can click this and I can have this list sorted in natural order name, which means the order that the columns were presented to me in source. So that's basically we see in behind here in the preview pane, or I can actually go ahead and order this list by name. So if I knew what cool names I wanted to keep, I could put it in alphabet order and then sift by way through the list, much easier that way. This list was really small, so I was able to just go on the screen and choose the ones I want. But the salient point here is you have a lot of different ways that you can go in and find the columns you wanna keep. So I'm just gonna go ahead and click okay. And the net result of this is I have the exact same remove other columns step that I had back in the previous step. This time I went through and actually use that dialog box. If I wanted, I could go back. So let's just say I wanna go back and edit that list right next to this edit or remove other columns. I can click this little gear and it's gonna bring me right back to this choose column. So in this step I could go through and add and remove columns as I saw fit. Okay. So let's just go ahead and cancel that out. And, that is it. So just a few different ways to go through and remove the columns that you want. A couple of different techniques. I do like to show you the hard way of doing things and then how to progressively make things easier, because that's where I find most people do their learning when they're starting to work with Power Query. All right. So that brings us to the end of this sub lesson.