5.10 Create small multiples - Video Tutorials & Practice Problems
Video duration:
4m
Play a video:
<v Voiceover>Edward</v> Tufte, the modern guru of data visualization is a big fan of what he calls small multiples. That is plotting the same information again and again for slightly different levels of a certain variable. In ggplot2 this is called faceting, and it is really a great way to quickly see lots of data. So we'll build up our base plot which is g gets ggplot diamonds, aes, x equals carat, y equals price. Now let's say g gets geom_point, and here we might as well add another aesthetic. Color equals color, and we're going to add something new. Facet underscore wrap, tilde, color. What this says is, break up the plot into discrete units, one for each level of color, and then form these into a grid. Let's look at it. We see there's an individual plot for each level of color, one for D, one for E, one for F, G, H, I and so on. There's a legend because we color coded the data, that legend isn't necessary though because you see different plots. The wonderful thing about a small multiples plot is that you only need to learn how to interpret the plot once and then you can quickly interpret each pane of the plot. Facet_wrap takes the data and makes a long strip out of it, and then just wraps the strip as necessary to fit in the plot. This is different than facet_grid, which is used to make a two dimensional display. So for that we'll do g plus geom_point, let's add the aesthetic again because it looks nice, and now we'll do facet_grid. We will say do cut, going down, and clarity, going across. What this will do is make a grid where each pane represents a combination of cut and clarity. Takes a little longer to compute because there's a lot of information being done, but ggplot's still pretty fast. So let's zoom in and look at this. Each pane shows just the diamonds for that combination of the margins. So for instance, all of these diamonds are just good IF, and if we expand this out, we can see the legend and it redraws when you expand it, you can see how the diamonds are color coded according to the legend. This is amazingly powerful. Doing this in base graphics would take so many lines of code, yet it really only took one, maybe two lines of code in ggplot. Faceting isn't restricted to just dot plots. It works just as well with histograms or any other layer for that matter. So let's look at ggplot, diamonds, aes, x equals carat, plus geom_histogram, plus facet_wrap, on color. And this tilde here, in wrap there's only one variable, so you get tilde on the variable, and what it's saying is break up according to that color. For grid you have two variables, the one on the left side goes down, and the one on the right side goes across. That is across the graph. And that's again the formula of notation, where it's one variable versus another. Well let's look at the small multiples for histograms. That gives us a lot of messages saying it had to choose it's binning for each individual pane. And here we see a nice beautiful graph showing a faceted histogram. Small multiples are a great way to quickly convey small changes in the data based on a given variable. It is a powerful tool liked by Edward Tufte, Indra Goman, Hadley Wickham and many other luminaries in the field, and can be used a great extent.