So in earlier videos, we talked a lot about the different charts and graphs that we use to visualize data. And we said that a histogram was one of the ways to visualize quantitative data. In more recent videos, we've talked a lot about frequency distributions. These are tables that organize the frequency across different classes of numbers or labels, usually just numbers. The problem with these tables, though, is they're kind of just boring.
There's just a bunch of numbers and columns, and it's hard to see the different patterns and trends that you'll have to identify in the data. That's exactly why we use histograms. What I'm going to show you in this video is how to create a histogram out of a dataset. We're going to take this table and turn it into a chart and a graph with a bunch of bars and label the numbers over here.
I'm going to show you how to do this. There are a couple of important definitions you'll need to know when it comes to the patterns and trends. Let's go ahead and get started here. So remember that a histogram is essentially just a bar graph or a bar chart, but for quantitative data. We use vertical bars to graph frequencies, that's little f, across different classes.
In other words, a histogram is just a graphical representation of a frequency distribution. We're gonna take this table over here and turn it into a graph. How do we do that? Well, basically, a graph is going to have some axes on the x and y axis. Let's take a look first at our data sets.
We have this data of students that are studying for their exam, and we have the time in minutes. We've actually already seen this exact dataset before. Before you actually start with the histogram, you should always build a frequency distribution. I'm assuming that you already know how to do this. We've actually seen this exact frequency distribution before.
Alright. So there's nothing new here. So all we're gonna have to do is take this data and then turn it into a graph. In order to do that, I'm gonna need the x and y axes.
The classes or bins will go on the horizontal axis. If you use the class limits, like 20 to 29 and 30 to 39, you're just going to cram this x axis over here. A lot of times what you'll see is these things written as class midpoints. Remember, you can always calculate this by using the upper and lower and dividing by two, and we've already figured out what those numbers are.
The class midpoints are going to go here on the x axis. It's going to be 24.5, and then you're just going to write all the rest of them. What about the frequencies? That's just gonna go on the vertical axis. The frequencies over here are gonna go your f, and this is going to go on the y axis.
So this is going to be your frequency. You're just going to start with zero, one, two, three, four, and five. Now that you have your axes labeled, the next thing you have to do is just draw a bunch of bars that correspond to this data over here.
The 24.5 class is gonna have a height of one because that's a frequency. These bars are supposed to touch. So the next one, which is 34.5, is going to have a frequency of two. The next one's going to be a frequency of four. The next one's going to be 54.5, which with a height of three is going to look something like this. The next one is going to be a height of two. And finally, the last one is going to be one.
So this is going to be essentially what your histogram looks like. You can shade in these bars if you really want to. You don't have to, but this is essentially what this histogram is. So clearly, we can see a picture that's sort of a pattern that's emerging with the data.
And you may have been able to tell this with some of the numbers here, but oftentimes with larger data, it's going to be a little bit sort of harder to tell very quickly that there are different patterns or trends going on. Alright. So now let's take a look at our problem here. Is this distribution normal? Is it skewed? Is it uniform? Or is it none of these? So I want to talk about the different shapes of distributions that you're going to see very often, and there are basically four of them.
Histograms have four common distribution shapes. The first one, which we're going to talk about a lot, is a normal distribution. It starts really low, peaks towards the middle, and then drops off again. In other words, it's a bell shape, or this is basically just symmetrical. An example of this would be something like test scores, where some people score poorly, some people that score very well, but most people are usually somewhere in the middle. So the next one is called skewed.
The first one is skewed right. And this is always confusing to me because when I think of skewed right, I think it's going to peak to the right, but it's actually the opposite. Data peaks to the left and it trails off to the right. That's what skewed right means.
An example of this is like annual incomes. Most people earn something like within 50,000 or 100,000 or something like that. But there are a few people who earn like a million, and that pushes that data way off to the right side there. So it gets skewed. The opposite of that is skewed left, which is the reverse.
The data peak is to the right and the data trails to the left. An example of this is like life expectancies. Most people live until their later years in life. The last one is basically a uniform distribution. This is where there's no clear winner here. The classes have equal frequencies or roughly equal. An example of this is like a dice roll. So if you roll a bunch of dice, the faces are one, two, three, four, five, six, and they're all equally probable, so they're going to form a uniform distribution.
Going back