6.3 Import a dataset - Video Tutorials & Practice Problems
Video duration:
12s
Play a video:
Okay so now let's look at the data set we're gonna be using today. So I downloaded the code from here and made a couple of changes if you want to see I'm using the tips dot C. S. V. Data data set. There are a number of other free datasets available and you can just download them here if you want to practice on other publicly available datasets. Uh The one we're gonna be using today is inside of the challenges folder. The same location that the notebook is in and it's called tips dot C. S. V. We'll open this up and C. S. V. Stands for comma separated values. So if you haven't seen this before this is pretty much like an Excel spreadsheet data where each row is a different um data point different item and each column is a different attribute of that data. And the first row of the data shows the headings. So about a little bit about this data set. It represents all of the tips that were collected at a chain restaurant in a mall in the U. S. A. So it's a pretty popular chain restaurant supposedly. I'm not sure which one it is. Uh And it shows the tipping behavior of the patrons there over a two month period Uh in the 1990s it's a little dated if we get some interesting results out of this. We also have to take into account what data set are we working with and like how much can that dataset actually inform uh current trends. So it's a little dated but it's interesting and here's some of the features um you know the smoking refers to whether they were in a smoking or nonsmoking section. Um I'm sure they didn't like confirm the sex of the uh people but it's probably now more appropriate to say gender presentation. Uh And then the day of the week time of day we'll we'll see what those end up being. So the first thing we're gonna do is import the dataset. And if I want to say that's what we're doing, I can maybe like you know have some sort of heading import data set and run that and I need to change it to mark down and then I'll import it. So we're gonna import pandas and that's the library we're gonna be using today. Um And as I said in the earlier lesson or sub lesson that pandas is a python library that is good for modeling data in something called data frames. And that's the same data structure that are uses to model data import pandas. And then we can say tips equals pandas dot read C. S. V. Um So I'm just I just press tab to open up um like the suggestions. So read C. S. V. Is here and then I'm going to pass in um the name of the file that we want to read. So I can just say tips dot C. S. V. Because they're located in the same directory. If it was located in a different directory, I'd have to put the path to it. So maybe it was in a data folder I could do that or if I had to go up into our parent folder, I could do these two dots to say um go up one level and then from there you can find the file but it's just in that folder so we can do that. And then let's just um look at tips with python, uh jupiter notebooks. The output will always be the last evaluated results. So I don't actually have to uh use print use the print function if it is the last thing that's being evaluated. So here I have a whole bunch of data. There are 244 rows and seven columns. This is taking up a lot of space in the file though, so I just am going to get the beginning of it with tips head and I can run the file by also using the keyboard, shortcuts, shift and return or shift, enter. And so head will show me just the beginning. Uh just the first few rows. Um I can also look at tail and see the last few rows and if I put a value into head I can get just that number of rows. So we have um some the data set already imported. That's great if we wanted to get a little bit more information from it. Um here are a few functions you can use and properties so tips dot shape shows like how many rows and how many columns there are. D. Types, shows the type of data in each column. And then tips dot describe will actually give us um some statistics on all of the numerical rose or numerical columns. And next we're gonna look at visualizing this data to get a better sense of um what data it contains.