2: Importing and transforming data using Power Query Editor
2.5 Load a folder of data
2: Importing and transforming data using Power Query Editor
2.5 Load a folder of data - Video Tutorials & Practice Problems
Video duration:
9m
Play a video:
<v ->In this sub lesson,</v> we're gonna discuss the benefits of using a folder, we'll discuss the requirements in behind using a folder as a connection, and we're gonna demo loading a folder of data. So let's go ahead and get ourselves over onto our Power BI Desktop. Okay, so let's go ahead and get our sub lesson five file opened up here, and get the Power BI Desktop up and running. So what we're gonna do here, once again, is we're gonna make ourselves a connection into a folder. Okay, so my Power BI Desktop is now open and I'm gonna change things up just a little bit this time. So instead of getting us into the power creator first to actually make this connection, we're actually gonna go to the get data command, that is in the data group here. So I'm gonna click on get data, and I wanna make a connection into a folder, but I can see that is not part of the common data sources. So I'm just gonna click more, and wait for our get data dialog box to open up. And what I'm gonna do here is, it's about the, what the fifth connector down here, and it's the one called folder. So I connect on that, click the connect button, and what we're gonna get prompted here for is a folder location. So what I'm gonna do here is just click the browse button, and I'm just gonna go down to this PC, find my C drive, go down into the MSPressIntroToPowerBI here, click on data, cause I want to go in the data folder. I'm gonna click on sales, and then I wanna click on this folder here called other countries. But before I click okay here, I actually want to show you the contents of the folder that we're going to make a connection into first here. So what I'm gonna do here is just get my file explorer open, and within the MicrosoftPressIntroToPowerBI folder here, I'm gonna click on data, go into my sales folder, and then open up this folder here called other countries. In here we can see that we have five different files listed here, we've got a sales file for each one of the countries that we deal with as Adventure Works. And where the file folder data type connector is very useful is if we've got files that are the same, and we want to actually treat them as if they're one file. So if we are able to meet those particular criteria, we're able to load all five of these files and treat them as if they're one file, and we'll go through some of the rules momentarily, but these are the files that are contained within that folder. So let's get ourselves back over to the power BI desktop. I'm going to go ahead and click okay on other countries, because that's what I want to connect into. I'll click okay, and PowerBI is going to go into the folder, and indeed find the five files that we just mentioned a moment ago, and what we're going to do here next, is down at the very bottom of the screen. We're going to click on this combine, and there's a little arrow right next to thing. We're going to say combine and transform data. So what that's gonna allow us to do is combine all those files together, as if they're one and then get ready to perform some subsequent transformations. So the next step is that we get brought to a combine files data dialog box. And what this allows us to do is we need to make a few choices here. So what we want to do here first is we're going to choose a sample file. And by default, it's the first file in the directory. So if I go ahead and open that, it's letting us know that Australia sales file is the first one in the directory, and it is the one that is going to be treated as the template file. So what that means is power BI will take a look at that first file and say all other files must have the exact same structure as that first file or they will not be loaded, or our load will fail. We'll discuss what that means here momentarily. So I already know that the five files in that directory are exactly the same. So I'm going to leave it as the first file. The next thing we're going to do here, is go down and choose what do we actually want to bring in? So each one of the five files has to have an excel table called TBL sales in em, cause that's what I'm going to connect into. So in the first file, which is the Australia file, I'm going to connect into a table called TBL sales, which has these columns in it. So as long as all the other files meet this criteria, they will be loaded or our load will fail. So let's discuss what that means. So moving down towards the bottom of the screen here, we can see that there is a checkbox that says skip files with errors. So what this is doing here is seeing, if the files are different from that first file. So any of the files are different, I can click that box and say, skip any of the files that have errors, and just allow our data load process to continue on, and silently skip over that file that had an error in it. Alternatively, I can uncheck that. And if one of the files in that folder does not meet the file format criteria, then our entire load will fail. So these are pretty extreme situations that are either it fails or, sorry, I shouldn't say fails. It either succeeds silently and misses data, that was, may need to be loaded up, or it fails the load completely. So it's up to you to decide how do you want to go about dealing with this. For now, I'm going to leave this unchecked, because I already know that all my files actually meet that criteria. So I'm going to click okay, and give power query editor a moment here, because it's going to go and take a look at the five files in that particular folder. And then it's going to go ahead and bring those all into the power creator itself, and I can see here, the power creator is open, but it actually didn't pop up. So I'm just going to force the pop-up of the power creator screen. And we can see here now, that we now have a query called other countries. So the name of the query was borrowed from the folder name. So the first thing we're going to do here is we're going to go into the properties dialog, and we're going to change this to sales, because this is other country sales. And once again, we want to be clear as to what this query is actually achieving. Okay. So I've renamed it here, and I can see over in my queries pane, the name didn't take. So I forgot to hit enter, so I'll go ahead and enter. And now we can see the query name over here is also called other country sales. So just a couple things to note here, in the data preview pane, you're going to see a column here, that is called source.name. So if I click this little dropdown box in this particular column here, we're going to see that, we can see all the different excel files that are being brought into the mix. The only challenge here is we only see the Australia sales file at this point. But if we look a little further down here in this dialog box, it says the list may be incomplete. And that is because if you recall earlier on, we mentioned that power query does it's profiling based on the first thousand rows, and the Australia sales Excel spreadsheet has more than a thousand rows. So I can click on this load more, to the right hand side here, and it's actually going to load the rest of the data in. So now I can see that I've got an Australia sales excel file, a Canada sales excel file, France, Germany, United Kingdom. So I can see that all the different files are being brought in. Now, this column right here is useful in that it actually lets us know which one of the files did this row of data come from. So it's useful when we're doing our data transformations, and data preparation. It's not really that useful a column when it comes to doing analytics. So in a subsequent sub lesson, we're going to actually hide this column, or get rid of it from our dataset. Okay, so we've got all of the files in that particular folder loaded up. And what we're going to do next here, is we'll just briefly explain the benefits of this. So if you had not been aware of the folder connector, you may have been tempted to go into the folder that contained the five files, and make individual query connection to those, and that might've worked to start with, but what happens now when additional countries start getting added into the mix? You would then have to go back into the power creator, and continuously make connections into existing files. And then if a file was ever removed from that directory, you'd have to go in again, and actually make some changes to power creator, to exclude that particular file, cause if it's suddenly gone, the power creator would fail, because it's looking for a file that is no longer there. So the nice thing about this folder connector, is that you can move files in and out at your leisure, and as long as they follow that file format, power query will just ingest those files, and treat them as all if they're one, and we find the most common use case for this is with finance departments. Because often what we see is somebody may come along and say, I would like access to sales information or GL information, and I'd like to make a connection into the ERP directly. And finance a lot of times for various reasons won't let you do that, but they may say to you, okay I'll give you that information. What we're going to do is I'm going to run a report on a monthly basis, and I'll take that report and actually dump the contents of it to a folder, and the file will be an excel spreadsheet. And they'll do that in let's just say, January of 2021, and February of 2021, and March of 21, et cetera, and so forth. So every month you can expect a brand new file with a different name to show up in that folder. So the nice thing about that is that the delivery of the files is in a common format. So everything typically comes in the same, and then you can agree on a common place to land that information, put it in a folder, and as long as it shows up on that periodic basis, power creator can just go ahead and pull that information in at its leisure. And it's very dynamic in nature. Okay. So that brings us to the end of this sub lesson, where we talked about loading a folder of data.