4.2 Read an Excel Spreadsheet into R - Video Tutorials & Practice Problems
Video duration:
4m
Play a video:
<v Voiceover>Up until recently, it was</v> difficult getting data out of Excel. But there are now new packages that make it easier. The first example we'll see is Read Excel by Hadley Wickham. As an example, we will use this file, Excel example dot slsx, which has three tabs. Has tomato, which has a mix of categorical and numeric variable; has wine, which is all numeric and ACS which is also a mix. Let's get started by reading in the tomato variable. This file's hosted on my website. And whereas we'll sfv files you could do re dot table with a URL, that won't work with Read Excel. Because it actually checks to see if the file exists. There is a patch out there that is gonna make it so you can read from a URL but it's not quite there yet. So we need to go download this file from my website. You could use your browser and go to jaredlander.com/data and download it manually. Or we could have R do it for us. So we say download dot file, we give it the URL, which is HTTP://www.jaredlander.com/data/ExcelExample.xlsx, that's capital E for both of them. The desk file where we are saving it to is just simply the same name and we'll put it in our working directory. And mode equals WB because excel files are binary files. It downloads the file as we could see in our get pane, we have a new file there. To read this we load the package Read Excel. So library, read, excel. We're gonna save this as tomato excel gets read underscore excel since this is Hadley Wickham, there will be lots of underscores. Then in quotes the path to the file, which is excel example dot xlsx. We tell it sheet equals one. We look at the head of that file. And we have our nice tomato data set. Nice, simple and easy. If we wanted to, we could've specified the sheet by name. So we will read in the wine tab, wine excel gets read underscore excel, again pass it the path to the file and sheet equals wine. We'll look at head of wine excel and there it is. It's not showing all the variables 'cause they don't fit on the screen but that's because it's saved as a tbl format. What's nice about read excel is that it works with both the modern excel format, the xlsx and the old excel format, the xls. That is incredibly great to have and it has no external dependencies, doesn't require java, doesn't require perl, doesn't require python; it's just an R package. There's another package which came out roughly the same time, maybe a little bit earlier, called open xlsx and while this only works with modern excel files, it lets you write to excel files which is a nice touch. Let's see how it reads in data. We load in the package. And we say tomato excel one gets read dot xlsx, pass it the file name and tell it which sheet. We can look at this and see we have the nice tomato data set. Now this package, however, doesn't handle missing strings. Sometimes when you get a data file, they'll use dots as missings or 99s. Read underscore excel gives you the option to account for that; read xlsx doesn't. So there's pros and cons; you'll probably want to use read excel for reading excel files and open xlsx for writing to excel files. Either way the situation for reading excel data into R has vastly improved in even just the past year. Getting excel data into R is now easier than ever before.