5.13 Use Web graphics - Video Tutorials & Practice Problems
Video duration:
29m
Play a video:
<v Voiceover>D3 is a</v> Javascript library for plotting that really ushered in the age of interactive graphics. So naturally, people in the R community went to Hadley Wickham asking him to port ggplot to D3. His creation, ggvis, will be very similar to ggplot, though not quite the same. The beautiful thing about it is that it's native web graphics. You write R code, but it comes out in Javascript and can easily be displayed in a browser. Let's load ggvis with require And as you can see from this message, this is a brand new package that is rapidly changing. So the way you use it today might be a little different than the way you use it in two months, just one of the pitfalls of using bleeding edge software. To illustrate this, we will use the cocaine data set that came with ggvis. So we load it up saying data(cocaine) And we should take a look at it. Here you see we have different states. We have the potency, weight, month, and price. First things first, let's look at a scatter plot of weight versus price. To remind ourselves of how we would do this in ggplot, let's load that up. Then make a quick, simple plot. We can do ggplot(cocaine, aes(x=weight, y=price)) + geom_point() We can see it opens up here on the plots pane of RStudio, and we have this nice scatter plot. The syntax for ggvis is going to be similar, but again a little different. The first big change is that it makes use of pipes as seen in the section on dplyr. The next big change is that you don't need to use the aes function to specify mappings of columns to aesthetics. We will see how that's done here. So first up, you type in the name of the data frame you will be using, in this case cocaine Then you pipe it to the ggvis function. Now right in here, the first argument was the data set which got piped in, so we're done with that. Now we can specify the data mappings right here in the ggvis function and don't need to have a separate aes function. We want to map the weight column to the x-axis. The way you do that in ggvis is you have to make use of the formula notation. So you say x=~weight This is the same formula notation you would be seeing in an ln formula. It's basically saying that weight is coming from the cocaine data frame. Next we want to map price to the y-axis, so we say y= and I'll put a space here so it's easier to see, ~price Close off the function. Now pipe this to a layer function. In ggplot, you would say geom_point In ggvis, it's layer_points Again, that's plural points, layer_points Before it was geom_point, now it's layer_points Let's run this. Immediately it has a different aesthetic to it. It looks quite different from the ggplot graph. Also note that the ggplot graphic came up in the plots pane of RStudio. This one comes up in the viewer pane, which is essentially a little web browser. That's because this is a web graphic. That's a little different than a plot in R, it's a web graphic, so it acts a little differently, and you can view this natively in a browser. And if you want to see this in your browser directly, you can click this little button right here which says "Show in new window," and that will launch your default browser, and there's your graphic. This beautiful web graphic is now available to you. It's stored as a temporary file, but you could easily export it. Mapping other variables to other aesthetics is just as easy. So let's return to our scatter plot and say cocaine %>% ggvis, x will go to weight, y will go to price. Again, I'm putting a space in here so it's easier to see. And fill will go to potency. Fill is the color of the dot. In ggplot, it would've been color, but in ggvis it's fill, 'cause you're filling it in. And again, pipe that to layer_points Now we can see the dots are on a gradient color scale, and there's a legend built right in, just like ggplot used to do, but it looks a little different. The reason that the color is based on fill is because the outer part of the circle is called the stroke. So let's say we wanted to do the same thing but have the dots all be black with just the outline being color coded. So we could say cocaine %>% ggvis(x=~weight, and I'm no longer putting the spaces because everyone understands the tilde now, y=~price and stroke=~potency Harder to tell, but you can definitely see that the points are black and the circles on some of them are gonna be lighter blue and some will be darker blue. That's the outer layer. Fill is the color of the dot itself or the bar itself, and stroke is the color of the outline. You might be facing a situation where you want to hard code one of the aesthetics, perhaps a color. You want the color of the dots to be green all the time. In that case, you have to use a slightly different syntax for assigning the value. So we will say cocaine is piped to ggvis, and in here we will say x=~weight, y=~price and fill:= then in quotes, green. That's very important. When you are mapping one variable to an aesthetic, you say =~ When you are assigning a value to an aesthetic, it's := After we're done with that, we pipe it to layer_points And let's see what we get. We have these nice green dots. Pretty simple, about what we expect, and it will come naturally to you if you've used ggplot before. And of course, ggplot is covered extensively in previous LiveLessons and throughout R for Everyone, particularly chapter seven. Just like with ggplot, it's possible to put multiple layers. So let's say we have our scatter plot and we also want to put in a smoothing curve. So we build it as usual, cocaine %>% ggvis(x=~weight, y=~price, fill=~potency) Now we'll pipe that to layer_points and we'll pipe that to layer_smooth Now this is a perfect example of how things are a little different than ggplot We tried layer_smooth which is singular and that doesn't work. We need to use layer_smooths with an s on the end. And we run that, we have our scatter plot we had before color coded by potency with a legend, and we now have this best fit line. Nice, simple, easy way to combine multiple layers in a single plot. A very neat feature of ggvis is that, since it's based on web graphics, you can add in a bit of interactivity. You can actually get quite complex with how much interactivity you have, such as mouseovers, when you hover over a point and it shows you information. But for now we're just going to include some sliders. We will control the size of the dots and the opacity of the dots based on sliders. So we start off as usual cocaine %>% ggvis And at this point, we don't have to specify x=~weight It knows the first one is gonna be x, so we could just say ~weight and also ~price I'll say fill=~potency because just like in anything else in R, you can match arguments either positionally or by names. And now for the size of the points I will say size and I'll put a space here so it's easier to see, :=input_slider And we'll go from 10 to 100. Then for opacity, :=input_slider Close off ggvis. Break it to the next line. Then I'll add in layer_points, _points If we run these two lines of code, and while we're actually here on the ggplot pane, we wanna be here in the viewer pane, which now has an interactive graphic. Now remember, this viewer pane is actually a web browser, and it's actually running a Shiny application. Shiny is a really cool dynamic interface from the RStudio guys. And this is a little webpage, we have our graphic here. We also have these sliders. The slider on the left controls the size of the dots. Let's put that up to 100. See how the dots got bigger? The slider on the right controls the opacity. Let's slide this towards the left. Then the points became more transparent. This is a really cool feature. The only downside is if you're trying to embed this into a PDF, obviously it won't be interactive. But it's great for data exploration and it's great for putting in a website. Let's just fiddle this a little more. Let's make the points smaller, and then make them less transparent. It's a really great way to see what's happening in your data and do some more exploration. Now that we're done with the visualization, we'll hit Escape, and it drops us out of there and it reverts back to the previous graph. Of course, scatter plots aren't the only thing you can do in ggvis. Perhaps one of the most popular graphs is a histogram, and that is just as easy to do here as it was in ggplot. As a reminder, in ggplot it would be ggplot(cocaine, aes(x=price)) + geom_histogram() Run that and we get this nice histogram. In ggvis, you start with the data set, pipe it to ggvis. And now you do the mapping, so x=~price Pipe that to layer_histograms Again, it is plural histograms. Run that, and we now have a ggvis histogram. There are, of course, many things you can do with ggvis. You can come up with all sorts of graphics. But keep in mind as you go forward, controlling ggvis might change slightly as Hadley continues to improve it. Another exciting R package that lets you write R code yet generates D3 graphics is called rCharts. It was written by Ramnath Vaidyanathan while professor at McGill University in Montreal. It is highly experimental and isn't even available on CRAN yet. Its website is ramnathv.github.io/rCharts with a capital C. As you can see here, you should install it using devtools to get it straight from GitHub. So, we will say require(devtools) Then we can install by typing install_github And here you put the name of the repo, which is "rCharts" and then the name of the user who owns that repo, "ramnathv" And it goes and downloads the code and builds an R package and installs it for you. You can actually use this to install any package on GitHub straight from GitHub without having to go to CRAN. You can do it also from Bitbucket from a generic Git source from SVN. Devtools is a really great package that helps you build packages, and it even helps you just install packages from sources other than CRAN, which a lot of packages are doing these days. So right now, the computer is going through. It's already downloaded the package, and it's building a package and installing it. And now you could see we're done and we're ready to go. So you load up rCharts like any other package, require(rCharts) because even though it was installed from GitHub, it is now just a regular R package like anything else. Going back to Ramnath's website, you can see that rCharts is actually built upon a few different D3 libraries. There is the Polychart library, there is the Morris library, and there's a number of others. And so when you use rCharts, there are different function names depending on which plotting library you want to use. So we'll actually go through some of the examples available on his site. So first up, let's take a look at the iris data set. This is a famous statistical data set. It's small but will get a point across. We see here we have just a bunch of measurements, and the first thing we want to do just to make it a little bit easier to use in Javascript, I wanna get rid of those periods in the names. So we will just go and do a quick regular expression search and get rid of them. We will say names(iris) <- gsub We're going to do a global substitution. We're gonna search for a period, which in regular expressions needs to be escaped with a double back slash. And we are going to replace it with nothing. And we are searching in names(iris) Now if we look at head(iris) you see all the names no longer have periods. Let's go ahead and make a scatter plot of sepal length on sepal width faceted by species. So we say rPlot with a capital P, SepalLength ~ SepalWidth now again, that tilde means that it's gonna be sepal length against sepal width. That means sepal length will be the y-axis and sepal width will be the x-axis. The pipe means faceted, break up the plot by species. This will be similar to lattice, and we will also color code it by species. And the type will be point. So notice, when we specify the color, you need to enter the column as a string. But when you're doing the faceting, that's part of the formula interface. Let's take a look. And notice we made a big mistake here. We didn't specify the data. We gave it a formula and some columns, but we never told it the data, and so this is just like in ggplot. You need to specify the data. So let's go back in here. We'll say data=iris And again, instead of coming up in the plots pane, it came up in the viewer pane because rCharts is writing to D3 and therefore it's essentially a webpage. You can see you could scroll around and we could expand it a little bit, or we can go ahead and pop it into its own window. Let's check that out. And here we have a nice scatter plot color coded and faceted by species. A nice built in feature is even that when you hover over a point it shows you the information about that point. This is a really great way to make easy D3 graphs. The rPlot function also lets you build bar charts. To do this, let's build a new data frame called hairEye That's going to get as.data.frame(HairEyeColor) We look at this data set. We can see we have information about the hair color, the eye color, the sex, and the frequency of the people in this data set. So let's plot this. Again it's rPlot Then we will say Frequency is our Y variable, and Hair will be our X variable. Then we will facet it by Eye, and we will also color code it by Eye. The type is going to be bar, and as we learned last time, it's very important to specify the data set, data=hairEye And now we have this nice bar chart. I will expand its dimensions so we can see it a little better. We have this nice bar chart that was faceted and color coded by eye color. Nice, simple, easy. Another D3 library that rCharts can write to is Morris. To show this off, we'll make a few line charts, so want to get the economics data set from ggplot2. We'll say data(economics, package= in quotes, "ggplot2" head(economics) gives us this. We have all sorts of economic information. We have the date where this information came from. We have the personal consumption expenditures, the total population in thousands, the personal savings rate, the median duration of unemployment, and the number of unemployed people in thousands. And it goes from 1967 to 2007. First things first, D3 doesn't necessarily handle dates the same way R does. So let's convert that date column from an actual date to character. We will say economics$date <- as.character(economics$date) Once again, we'll take a look just to make sure it worked as we wanted. It looks the same to us, but it's actually stored as a character, which will be easier to plot. Now we also get to show how you can save a plot to a variable and then keep working on it. We will say m1 <- mPlot Again, we're doing Morris, so it's mPlot as opposed to rPlot from before. And here we no longer use the formula interface. We will use character. And on the y-axis, we're actually going to plot two lines, y=c("psavert", unemployed, which is "uempmed") type="Line" with a capital L, and the data=economics We run that, and we could plot m1 right now. But first, let's customize it a little bit. We will say m1$set(pointSize=0, and linewidth=1) Now let's type in m1 And it has to do some reshaping. But here it's going to display in the viewer this nice graphic. And as you can see as we hover over it, we can still get the tool tips to pop up and give us information. If we scroll across, we could see the whole data set. We could pop it into its own browser window and see it all at once. And this is a very lovely chart that really gives you a lot of good information. Another powerful feature of rCharts is its ability to plot maps. Maps have become incredibly popular these days. It's a great way to display information, and rCharts makes it very easy. We will create a new object called map1 This gets Leaflet$new() It's constructing a new leaflet object. We will then say map1$setView and now we give it latitude and longitude coordinates of where to place the map. So let's place it over central London. We will say 51.505, -.09 And we will set the zoom to 13 so it's nice and tight. We could set other options, or we could just go ahead and plot this and see what it looks like. As you can see now, we have this beautiful map of London, and we can drag it around, we could zoom in. It is a really, really great, easy way to come up with interactive maps in R. I can't express enough how amazing this tool is. Of course, I've used rCharts myself to great effect. Every month at the New York City R meetup, we get pizza delivered from a different pizzeria, and I have the attendees judge the pizza. The results are all available at bit.ly/pizzapoll This bar chart, which is highly interactive, was generated using rCharts code. The data itself, and anyone at home can go ahead and do this, the data itself is at www.jaredlander.com/data/PizzaPollData.php You can see here it's all JSON data, and that gets pulled into the chart that makes this beautiful display. So we can do this in R. I'll show you how I made this chart. First step, we need to load a few packages. First one is rjson This allows us to read JSON code. We are also going to need the plyr package to manipulate the data a bit. Now let's go ahead and read that data. We'll say pizza <- fromJSON(file= "http://www.jaredlander.com/data/PizzaPollData.php") We have the JSON, and now this comes across as a list. We want that formatted as a nice data frame, so we will say pizza <- ldply(pizzaJson, as.data.frame) We check out the head of the data. We can see we have the poll ID, the answer, the number of votes. So for this poll, zero people said excellent, but six people said good, four said average. We have the question that was asked, the name of the pizza place, the time, the total number of votes for that poll, and the percentage that that answer got. So let's go ahead and build the plot. pizzaPlot <- nPlot We wanna see as, on the y-axis, percent, on the x-axis, place. So that is Percent ~ Place, the data is equal to pizza, the type is multiBarChart And we will group it by answer. Now that we've done all this, we will set a few more options. We will say pizzaPlot$xAxis We will say axisLabel="Pizza Place" and rotateLabels=-45 We will also make some settings for the y-axis. The axisLabel is "Percent" pizzaPlot$chart(reduceXTicks=FALSE) And now we can go ahead and plot this by typing in pizzaPlot Now we see in our viewer it refreshed, but there's nothing there. And this can be an actually disconcerting error, but we didn't do anything wrong. As of the filming of this video, there is a bug in RStudio that will not display nPlots in the viewer. You can go ahead, however, and click on this little popup and have it load in your browser, and you can see your nice, beautiful chart there. Just like on my website, you can change it to stacked. You can get rid of the "never again" answers and get rid of the "excellent" answers. You could change it back to group. Fully interactive. RStudio is aware of the problem and they're working on it, but for now you can just go ahead and open it in a browser and see how it looks. rCharts is a powerful new package that is in rapid development and really makes it very simple to generate beautiful D3 plots using just a few lines of R code.