11.2 Use rbokeh - Video Tutorials & Practice Problems
Video duration:
8m
Play a video:
Video transcript
<v Voiceover>While R</v> has always been known as providing great graphing facilities, even base graphics but especially ggplot 2, there's been a whole new load of JavaScript libraries that are just amazing for doing graphics. One such library which has an R binding is Bokeh. To use this package you may need to install it from GitHub using devtools::install_github then in quotes bokeh/rbokeh. We already have it installed so we'll just comment that out but that's how you can go ahead and install it if it's not on cran. We will look at our canonical plotting data set, the diamonds data from the ggplot2 package. We'll say data(diamonds), package=ggplot2'). Looking at the head of diamonds, we see it has all this good information about diamonds. So let's load up the R Bokeh package. Lib and we'll use snippets, rBokeh. So let's say we want to make a scatterplot of price on carat, in Bokeh, you initialize the plot using the figure function. You could put other arguments in here such as height or width, but we'll just leave it at that. You then use pipes like magridder or dplyer to pipe that into layer objects. All the layer objects start with ly_, and we are looking for points because we're making a scatterplot. We put in the argument for the x axis, for the y axis, and we say data = diamonds. This will be familiar to anyone who has used ggplot, building up layer by layer and specifying aesthetic mappings. It gives us a message that we didn't specify x lim or y lim, but it will calculate it itself. And now, it drew the plot inside of the viewer pane of Rstudio, and if you're not running Rstudio, this will pop up in your browser. But let's say you want to now color code the points according to cut, so you say figure, pipe, ly.points (carat, price, data = diamonds, then we can specify color=cut. It does some calculations and then it will display us a graph with the points color coded according to cut, and it even builds us a legend automatically. Making it more powerful, we can tell it to print more information when you hover over a point. Figure pipe, ly_points, carat, price, data=diamonds, color = cut, hover = list, carat, price. While the graph looks the same, if we hover over a point, it prints out the carat and price. If you hover over multiple points, it prints out the information for each of them. But now let's say we want to also add a smooth lowess curve. So we actually copy and paste this plot, and then pipe that to ly_lines. Here we say we're gonna use lowess, x=carat, y=price, and we need to specify data=diamonds. Whereas ggplot would just use the data set you specified initially as the default data for each of the layers, in Bokeh you need to keep specifying the data set for each layer. Let's run these lines, and we now have the plot, but there's a lowess curve put in there also, this package is already turning into a natural evolution of ggplot for the modern web browser. Let's say we want to vary the shape of the points according to cut as well. So we say figure, pipe, ly.points, carat, price, data=diamonds, color=cut, instead of saying shape, we say glyph, because the glyph is how they specify the object. And that's also cut. We run this, and the points are color coded and shape coded, and even the legend reflects that. We have an overplotting issue, so let's add some transparency. I'm going to copy and paste this line, and then say alpha equals 1/3. We can see now that the points were made a little transparent and this makes it easier to read the data when there's overplotting. A very common statistical plot is the histogram. Let's see how Bokeh does that. We start as always with figure, pipe that into ly_hist, and we tell it we want to just calculate the histogram for price, and data=diamonds. And we have this nice histogram. Now if we want to, we can plot the density right on top of the histogram. So we say figure, pipe it to ly_hist, price, data=diamonds, then we'll say frequency equals false so that it will be on the same scale as the density, and then we pipe this into ly_density, and we say price, and data=diamonds. We now have a density plot to go along with our histogram. Now inside the Rstudio viewer, you don't get the full advantage of Bokeh. Let's pop it out into a browser. Here, we can see if we scroll, it zooms in on the data and we can move the data around. We can save the image, we can resize it, we can use our mouse wheel to zoom, there's all sorts of different stuff we can do. So using our Bokeh gives you the full functionality of Bokeh especially when you open it in a browser. Another, albeit controversial statistical plot is the box plot, of course, Bokeh does this very easily, with ly_boxplot. We're going to make a calculation on price, and we're going to do it separately for each cut. And we say data=diamonds. We get a few more messages, it's just telling us what it's going to do automatically. And we do get this boxplot here. But it would be nice if we could format the axis labels, and maybe format the values along the axis. So let's see what we can do, I'm going to copy and paste this line since that part will stay the same, and then I'll pipe it to x_axis, and say label equals, then in quotes, diamond cut, then pipe that to y_axis, first thing we do is say label=price with an uppercase P, then I'll tell it to use the number formatter, numeral, then I'll specify the format to be 0,000. We run those lines of code, and we see our axes are labelled as we wanted, and the y axis now has attractive looking numbers. We've just barely scratched the surface of what rBokeh can do. It gives you access to the full suite of functionality of Bokeh and lets you easily in a ggplot2 like fashion, create beautiful web graphics.