5.9 Make line plots - Video Tutorials & Practice Problems
Video duration:
8m
Play a video:
Video transcript
<v Voiceover>A very popular</v> type of graph is the line plot. This is especially useful when dealing with data that has some sort of natural continuity to it, such as time on an X axis. To examine this, we will look at the economics data available in ggplot. So examining it first, and I'll use tab completion to make sure I don't have any typos, we see we have information such as date, population, unemployment. So to start illustrating line plots, lets look at population over time. So we do ggplot, economics, say aes, x equals date, y equals pop, and plus geom underscore line. And here we have a nice plot date on the X axis, population on the Y axis, and it shows and trend overtime. Now that looks good, but we can do so much more with line plots. One particular use of line plot is showing some sort of trend repeatedly. For instance, you might have the months along the X axis, and a separate line for each year, so you can see how each year did on a month to month basis compared to each other. To do this though, we need to manipulate the economics data a little bit more to add some more variables in there. While Bazaar has plenty of functions for manipulating dates, Hadley Wickhams lubridate package just makes it a little bit easier. So to start, we will load his package, require lubridate. And now we can go ahead and create a new year variable and a month variable. We can do this by typing economics, dollar year, now this is new variable we are creating, hasn't existed yet, and that gets the year of the date. Now lubridate has a function called year, and all you do is you feed it a date, and it returns the year, nice and simple. In a similar fashion, you'll create a month variable, and here this takes the function month of economics, date. Now if we look at the head of economics, and of course we don't want a nine, we want a parenthesis, we can see we have two new columns for year and month. Now, showing all this data would be a bit to much to show in a single plot, so we want to create a new data set that just has all the data since the year 2000. We will call this new data frame econ2000, and it takes a subset of economics data. We create that subset using the square brackets, remember the first argument is of which rows to keep. To figure this out we will say which, economics, year, is greater then or equal to 2000. This does a check on each row, finds the rows that have years greater then 2000, and keeps track of them. And the second argument to the square brackets is blank, because we want to keep all the columns. Our original data set had 478 rows, this truncated data set has 87 rows, so there's much less data to work with. Lets clear out the console, so we save ourselves some room. Now we want to start building up a nice, good looking graph, but before we do that, we should look at and make sure we know what exactly we're going to see. So lets once again look at the head of economics data, actually we'll look at econ2000, and we notice that month is here as numbers. It gives the number of the month, like January's the first month, February's the second month, we really don't want that. What we would like is for it to be January, February, April, as you may have it. So what we can do, is we come in here to econ2000, we will overwrite the month variable, by saying month of econ2000, dollar, date, and then we do comma, label equals true. And looking at the data again, of course we need to make sure there's not a nine there, we see, hey the months are now as text. That just makes things a little prettier. So once again lets clear the console, and start building up this plot. A lot of the things we want to use for this plot are gonna be in the scales package, which comes with ggplot two, but needs to be loaded separately. First lets save a basic plot, g gets ggplot, econ2000, the aesthetics are x goes to month, y goes to pop. Then we'll do g gets g plus, cause we're continuing to build it up and save it, geom underscore line, and we're going to give it a few more aesthetics. Particularly we will make color be equal to the factor of year. Year looks like a numeric variable to ggplot, so if we didn't turn it into a factor we would see all blues and just different shades of blue. By turning it into a factor, it makes the year a discreet variable and therefor you'll have a green, red, blue, yellow, all different colors. We're also going to set the group aesthetic to year, say that the lines should be broken up according to year. Now if we want to, we could go ahead and view this graph, and see we're getting something that looks useful, but it's really not that pretty. So we want to continue building it up. The first thing we want to do is give the legend a better name, just call it year with a capital y, none of this other stuff. We will do g gets g plus scale, the legend is a type of scale, it shows you how data relates to visuals, color, cause we're dealing with a color scale, and discreet, because our color in this instance is a discrete variable. And we give it the argument, name equals year, and that should change the name of it right here, to year. We run that, we also don't like the formatting of the y axis labels, they're hard to read. Putting commas after every three digits will make it a lot better. So we do g gets g plus scale y, this time, and it's continuous, and here we say labels equals comma. This comma's a special function from the scales package, that will format the numbers on the y axis now. Lastly we add some good labels, we do this using the labs function. We say title equals population growth, x equals month and y equals population. We're now ready to see what the graph looks like. You can see, most things are starting to look a lot better so far, the years have been put in order, we have proper labels, we have good formatting, we have a title, however, the dates seem to be a little mushed together. There is a way to fix that. Now we can change this by controlling the angle of the x axis. We do this with g gets g plus theme, axis dot text dot x, Hadley is nothing, if not verbose. That takes the value of element underscore text, and here we set the angle equal to 90 degrees, and the hjust equal to one. We run that, now we run the plot again, and we can see the values are nicely spun on their sides. The angle adjusted the rotation and hjust moved them to the right. If we want that to be centered we can change the hjust, and we could change the vjust, and that will give us fine tuned control over every element of this plot. So here we have seen much more then just a simple line graph. We've seen all sorts of elements that can go into a line graph, we've seen geoms, we've seen aesthetics, two different types of scales both discreet and continuous. We've seen labels, we've seen themes and angles. Lots of different things go into this plot, and ggplot really allows detailed fine tuned control over every single element.