7.9 Use tidyr - Video Tutorials & Practice Problems
Video duration:
2m
Play a video:
<v Voiceover>For years,</v> the gold standard in reshaping data, moving it from wide to long or long to wide was the reshape2 package by Hadley Wickham. He has since came out with the tidyr package, which makes it a little easier. To look at this, we will load the package, tidyr and to take advantage of piping, we will also load the magrittr package. To illustrate, we'll look at the air quality data, which is currently in wide format. Converting from wide to long in the reshape2 package used melt, in tidyr it's called gather. We will say, air gather gets. The first argument to gather is the data, so we can just pipe air quality into gather. The next argument is the name of the key column that will be generated in the new data set. We will call that metric and we can just put it in without any quotes. The next argument is the name of the column that will be created in the new data set holding the values. We will call that simply value. The next argument is all the columns you are gathering, means all the other columns. In our case, solar, r, wind, temp, ozone. You can specify those explicitly, or you can say the columns you're not going to gather. That would be negative c, because we only want to write the negative once and we're going to exclude two columns: month and day. We run that and check the head. And we see our data is now in long format. We'll also check the tail, and we can see it works very nicely. Going the opposite direction, that is long to wide, in reshape2 was done with dcast. Now we use the function spread. So we will say air spread gets, again we'll pipe the data, which now is air gather into the spread function. The next arguments are the calms and the current data set holding the key and the value. So that is metric and value. We run this and we see our data is now in the wide format. Tidyr isn't necessarily any faster than reshape2, but its syntax is supposed to be easier. Either way, they are both great solutions for transforming data.