2: Importing and transforming data using Power Query Editor
2.4 Load data from files
2: Importing and transforming data using Power Query Editor
2.4 Load data from files - Video Tutorials & Practice Problems
Video duration:
14m
Play a video:
Video transcript
<v ->In this sub-lesson we're gonna discuss file data sources.</v> Possible file locations where those files may exist. And then we're gonna demo loading the following. We're gonna do a couple of Excel spreadsheets and we'll do a text file. But first for possible file locations. Files locations may be local to your computer. They maybe in some network share in your on-premise sources. You may have information sitting in OneDrive for business. It may be in OneDrive personal, SharePoint on premise or perhaps SharePoint online. So a lot of different places where your files can be located and there's advantages and disadvantages to each. So some of the individual file data sources we can connect into are the Excel spreadsheet, tech/CSV which of the two we're gonna connect into here. XML files, JSON files, PDF files. In a subsequent sub-lesson will connect into a folder. And then we can also connect into SharePoint folders. So let's go ahead and quickly take a look at the architecture of Power BI so we can see where those files may be located. So as you may recall from a previous lesson, this is the Power BI high-level architecture. And above the dotted line are things that are considered to be in the cloud and below the dotted line in middle of the screen are things that are considered to be on premise. Now your files may exist up in the cloud. So they could be in OneDrive personal, OneDrive business or SharePoint online. Or your files may exist on premise. So they may be local to your machine or in some network share or on the on-premise version of SharePoint. So either way in Power BI we can connect to those different file data types and bring the information into Power BI to be further manipulated and transformed and brought into our data models. So let's go ahead and get ourselves into the Power BI desktop and make a connection into our file based sources. Okay, so we're gonna go to the files that we obtained from our GitHub repository. And I'm gonna open this one up called sales data Mart SL0204 start. So let's go ahead and get that file opened up. Okay, so our desktop file is now open up. And we can see that we are on a blank canvas here. Just wanna point out that on the right hand side of the screen we've already got the country population data that we obtained from our website. Plus then we preloaded an order date query and a United States sales query. Which are in the form of tables here in the data model now. So what we're gonna do next here is we're gonna go to the home tab under the queries group, click on transform data like so. And then get ourselves into the power query editor. Where we're gonna make connection into the next three files that we want to bring in as queries. Okay, so the first file that we're gonna bring in is one called product. So, so far we've got an order date file that has been preloaded. We have our United States sales file which was preloaded for us. Plus the one that mentioned that we did earlier on from our website. What we're gonna do here next is click on this new source, choose Excel. And then we're gonna go into our data directory here. So we'll go to our Ms Press, into our Power BI. Going into the data folder. We'll go into our dimensions folder and then find this file here called product. We will then go ahead and click open. And what's gonna happen next is we're gonna get brought to our navigator dialog box. So, what we can do here is power query editors proposing that there's two things that we can connect to into that Excel spreadsheet. So if I click on TBL product here which is option number one. We can see this what the data looks like. And this is in fact the data that we're gonna want. Or we can take a look at this product sheet information. We can see that it kind of looks like the data we want. But there's some extraneous information at the very top of it here. So, what we're gonna do here, is I wanna just quickly jump into Excel to show you the difference between these two connections that we can make into an Excel spreadsheet. Cause one is an Excel table that looks like this. And one is the sheet of data called product sheet. So let's go ahead and get that Excel spreadsheet open really quickly here. Okay, so I've opened the actual so spreadsheet that we had as our source. And just want to point out a couple of things. So in the bottom left hand side of the screen, we can see that we've got a sheet called a product sheet down in the bottom here. And this was one of the things that the navigator picked up on. Also, if I put my cursor inside of this a table of data right here. I can see at the very top of the screen, I've got a tab called table design. And if I just look a little over to the left side inside here, I can see the table is called TBL product. So this is the other thing that power query was asking me if I wanted to connect into. The difference between the two here is that we can see if we connected in the sheet of data. If you're a callback in the navigator. This information at the very top which is above the table, was being included in. And we'd have to get rid of that in power query if we didn't want that information as part of our dataset. Which we don't. Alternatively, if we connect in the table which is this information here plus a little bit more information that's further down that I haven't highlighted, power query will connect into the table of data only. And this is a preferable way to connect into data in an Excel spreadsheet, because this is data that somebody has gone through and put a little bit of care and attention into formatting instruction the data in a certain way that's gonna make it a little bit easier to ingest into power query editor. Now, we don't always have this luxury. Oftentimes we'll have to connect into a sheet of data and do a bunch of cleanup with power query editor as we see fit. And sometimes we just have to take the data that we're provided. But ideal, if you have things in an Excel table. It just makes things a little bit easier on the ingestion side of things. Although it's not mandatory. So like I said, I just wanted to point out the contents of this Excel spreadsheet and what the navigators actually take a look at. So let's get ourselves back over to the navigate. Okay, so this is where we left off in the power query editor. And once again, we can see the product sheet which we saw on the Excel spreadsheet which is including extraneous information at the top and TBL product is only pointing in that table of data. So this is what I wanna connect into. So I'm gonna connect that little check box there and click okay. And now power query editor is going to go ahead and bring that information in. The last thing I want to do at this particular query, is I'm actually gonna go over and rename it. We can see that a borrowed the name TBL product from the actual Excel table. I'm gonna get rid of that TBL in front of things and call this product. And we'd like to do this because we like to make our query names descriptive for ourselves. If we were to come in and take a look at this, you know, days, weeks or months later. Or if we actually pass this off to somebody else, we want our query names to be descriptive. We're also going to see later on when we get into our data model and into the Power BI service. That there are a lot of very good benefits of good naming conventions at the query level which eventually translates into tables of data and the column level. Okay, so these are our individual products that we have in our product catalog. So what we're gonna do next here, is we are going to bring a text file of data. I'm gonna click on new source, go down a little bit lower here. And we're gonna see text/CSV. So if I click on that, I can see in my dimensions folder I've got a file called product level. So I'm just gonna go ahead and open that up. And the power query editor is gonna open that up and we can see the contents of that file. So we can see that it's a tab delimited file. The data type detection that has done so far is based on the first 200 rows. I can play around with that if I want. But I'm gonna leave the default. And we can see that there's three columns of information here. We've got a product subcategory key, a subcategory and category. The reason that we wanna load this file up, is that we want a way to aggregate and build hierarchies around our product information. So we'd like to roll individual products into subcategories, such as mountain bikes and road bikes. Or on the component side, handlebars, bottom brackets. And then at the top level, we wanna roll things into product categories such as bikes, components and clothing. So this is gonna enable us later on to build hierarchy. So I'm just gonna go ahead and click okay. And the power query editor will go ahead and ingest that file in and we can see that we've got the three columns we saw back in the preview. And everything's been loaded in. And just to walk you through the applied steps here again. Just to remind you of all this. Is we can click on the source here and it's gonna let us know the source file location. So it's on the C Drive where we brought our files from GitHub. Next thing that happened is, this first row right here which has our header information was promoted into the top. And then we'll here that this first column is a text data type. We'll discuss data types more in a later sub-lesson. Is then the power query editor is actually going through and making as best guess on the data types that exist in these columns based on the column profiling of the top 1000 rows. So these first three steps are handled for you automatically by the power query editor. But as a recommendation to go in and just check these things out and make sure at least the data types were set properly. But once again, we'll talk about that in a later sub-lesson. Okay, so the last file we're gonna connect into is a Excel spreadsheet for sales territory. So, we can see so far that we're gonna be able to analyze our sales data by dates and products. We also wanna take a look at doing information analysis by our sales territories. So what I'm gonna do this time as click new source, go to Excel. We'll find the Excel spreadsheet here called sales territory. We'll go ahead and open it up. We'll wait for the navigator dialog box to open. And we can see here there's only one thing that we can connect into in this Excel spreadsheet and that's sheet number one. So we're just gonna go ahead and click the checkbox next to it. And we can see a preview of the information that we wanna connect into. So I'll go ahead and click okay. And wait for the power query editor to finish its work. And now we can see that we have a new query called sheet one. And what we wanna do here is we're gonna rename that to sales territory. Okay, so we go ahead and do our rename here. We'll go ahead and hit enter. And now we're gonna see the query name is called sales territories. So, the last thing that we're gonna do here is just a quick, a couple of transformations here. And we can see that the very first row here says, sales territories. And this should really be our header information. So, what we're gonna do here is just do a light transform. And we're gonna go up to the transform tab at the very top here. And what we're gonna do is go to this command under the table group. And say, use first row is header. What that's gonna do is push that sales territory into our header information. So, the power query editor was not able to make that determination. But we know that this is the column header for that. So we added an additional applied steps to go ahead and do that. Okay, so those are the file-based sources that we're gonna connect to and to load. What we're gonna do next here is move into a folder of data to bring in the rest of our sales information. And that will provide us all the information we need from the extract point of view for our power query editor. But before we go, what we wanna do here is I wanna show you the mechanics of what is happening in the actual power query editor. So, I'm gonna go ahead and save my work. So I'm gonna go ahead and click save. And we're gonna once again get provided with that dialog box I mentioned earlier. What I'm gonna do here now, is I'm just gonna quickly click apply later. So instead of saying apply, I wanna say apply later here. So I'm just gonna go ahead and do that. What I'm gonna do next here, is I'm actually gonna close my power query editor. So I'm gonna go ahead and close it. It's gonna ask me if I wanna apply my changes. And I'm gonna say not now. Cause I wanna explain the mechanics of what's happening here. So, if we are now back over on our Power BI Desktop, we can see that we have three tables of information that have been loaded through to our data model. So we've got country with population data, order date and United States sales. So what has happened here is these three tables have gone through the extract process, the transform process and the load process over the data model. So let's just recap what that looks like from a diagrammatic perspective. So if you recall back to our Power BI end to end process flow diagram, we first made connections into our data sources. So in this sub-lesson, we made connections into three different files. We did that extract the E above the line there. We then move data into the power query editor and we did some very light transformations on that data. And if you recall back a moment ago, we had three of the queries loaded in as tables into our data model. And the ones that we just worked on in this particular sub-lesson here, we actually haven't loaded over into our data model. So, so far we've only done an E and a T step to those. So, what we're gonna do here now is I'm gonna just get us back into the power query editor and actually go through and perform that load step over to the data model. So let's just go ahead and do that. So, I'm back on my Power BI Desktop now. I wanna get myself back over into the power query editor. Cause I wanna actually bring those other three queries through into my data model. So I'm gonna click on transform data and it takes me back to my power query editor. And at this point in time the product, product level and sales territory which we just did here in the sub-lesson. I want to load those through to my data model. So at this time what I'm gonna do is I'm gonna click on this close and apply button in the upper left hand corner. And what that is doing here now, is it's gonna close the power query editor and apply all the query changes that have been made over into the data model. So now what you can see on the right-hand side of the screen under the fields pane is we actually now have those three queries loaded in as tables of information. So I've got product, product level and sales territories. So just recapping again on our Power BI end to end process flow diagram. We have now taken a series of queries from our data sources. We've extracted. We've done some transformations. Although we'll start doing more in later sub-lessons. And now we've loaded everything through to our data model. So we've basically walked through the entire ETL process. All right. So, that then brings us to the end of this particular sub-lesson.