5.4 Make boxplots with base graphics - Video Tutorials & Practice Problems
Video duration:
1m
Play a video:
<v Voiceover>One graph</v> that is commonly taught in Intro Statistics classes, and often instills fear in the students, is the box plot. This, again, is a plot of one variable, and it tries to get the sense of how the data is spread out. This plot is a source of great debate in the statistics community. Some people love it, some people disdain it. Either way, it's an important plot to learn. So, again, using the carat variable from the diamonds data, we're going to draw a box plot. In Base R, this is done simply with the boxplot function, and it's main argument is the variable, which in this case is diamonds$carat. Running that, we see the box plot. The box here is where it gets its name from, represents the middle 50 percent of the data, that is the top line is the third quartile, the bottom line is the first quartile, and the thick black line in the middle is the median. The whiskers at the top, that represents 1.5 times the inter-quartile range. The inter-quartile range is simply the third quartile minus the first quartile. It's how spread out the middle 50 percent is. And all the dots represents outliers that are beyond 1.5 times the inter-quartile range. While this graph does show the data nice and succinctly, it's important to remember that this box represents 50 percent of the data. That means if 50 percent of the data is inside the box that the other 50 percent of the data is not being shown, and there's a lot of information to miss out on. Something to consider as you draw these plots.