2. Describing Data with Tables and Graphs

Histograms

2. Describing Data with Tables and Graphs

Histograms: Videos & Practice Problems

Topic summary

Histograms are graphical representations of frequency distributions for quantitative data, using vertical bars to display frequencies across different classes or bins. To create a histogram, label the x-axis with class midpoints and the y-axis with frequencies. Common distribution shapes include normal, skewed right, skewed left, and uniform. A normal distribution appears bell-shaped, while skewed distributions trail off in one direction. Understanding these concepts is essential for analyzing data patterns and trends effectively.

concept

Intro to Histograms

Video duration:

Video transcript

So in earlier videos, we talked a lot about the different charts and graphs that we use to visualize data. And we said that a histogram was one of the ways to visualize quantitative data. In more recent videos, we've talked a lot about frequency distributions. These are tables that organize the frequency across different classes of numbers or labels, usually just numbers. The problem with these tables, though, is they're kind of just boring.

There's just a bunch of numbers and columns, and it's hard to see the different patterns and trends that you'll have to identify in the data. That's exactly why we use histograms. What I'm going to show you in this video is how to create a histogram out of a dataset. We're going to take this table and turn it into a chart and a graph with a bunch of bars and label the numbers over here.

I'm going to show you how to do this. There are a couple of important definitions you'll need to know when it comes to the patterns and trends. Let's go ahead and get started here. So remember that a histogram is essentially just a bar graph or a bar chart, but for quantitative data. We use vertical bars to graph frequencies, that's little f, across different classes.

In other words, a histogram is just a graphical representation of a frequency distribution. We're gonna take this table over here and turn it into a graph. How do we do that? Well, basically, a graph is going to have some axes on the x and y axis. Let's take a look first at our data sets.

We have this data of students that are studying for their exam, and we have the time in minutes. We've actually already seen this exact dataset before. Before you actually start with the histogram, you should always build a frequency distribution. I'm assuming that you already know how to do this. We've actually seen this exact frequency distribution before.

Alright. So there's nothing new here. So all we're gonna have to do is take this data and then turn it into a graph. In order to do that, I'm gonna need the x and y axes.

The classes or bins will go on the horizontal axis. If you use the class limits, like 20 to 29 and 30 to 39, you're just going to cram this x axis over here. A lot of times what you'll see is these things written as class midpoints. Remember, you can always calculate this by using the upper and lower and dividing by two, and we've already figured out what those numbers are.

The class midpoints are going to go here on the x axis. It's going to be 24.5, and then you're just going to write all the rest of them. What about the frequencies? That's just gonna go on the vertical axis. The frequencies over here are gonna go your f, and this is going to go on the y axis.

So this is going to be your frequency. You're just going to start with zero, one, two, three, four, and five. Now that you have your axes labeled, the next thing you have to do is just draw a bunch of bars that correspond to this data over here.

The 24.5 class is gonna have a height of one because that's a frequency. These bars are supposed to touch. So the next one, which is 34.5, is going to have a frequency of two. The next one's going to be a frequency of four. The next one's going to be 54.5, which with a height of three is going to look something like this. The next one is going to be a height of two. And finally, the last one is going to be one.

So this is going to be essentially what your histogram looks like. You can shade in these bars if you really want to. You don't have to, but this is essentially what this histogram is. So clearly, we can see a picture that's sort of a pattern that's emerging with the data.

And you may have been able to tell this with some of the numbers here, but oftentimes with larger data, it's going to be a little bit sort of harder to tell very quickly that there are different patterns or trends going on. Alright. So now let's take a look at our problem here. Is this distribution normal? Is it skewed? Is it uniform? Or is it none of these? So I want to talk about the different shapes of distributions that you're going to see very often, and there are basically four of them.

Histograms have four common distribution shapes. The first one, which we're going to talk about a lot, is a normal distribution. It starts really low, peaks towards the middle, and then drops off again. In other words, it's a bell shape, or this is basically just symmetrical. An example of this would be something like test scores, where some people score poorly, some people that score very well, but most people are usually somewhere in the middle. So the next one is called skewed.

The first one is skewed right. And this is always confusing to me because when I think of skewed right, I think it's going to peak to the right, but it's actually the opposite. Data peaks to the left and it trails off to the right. That's what skewed right means.

An example of this is like annual incomes. Most people earn something like within 50,000 or 100,000 or something like that. But there are a few people who earn like a million, and that pushes that data way off to the right side there. So it gets skewed. The opposite of that is skewed left, which is the reverse.

The data peak is to the right and the data trails to the left. An example of this is like life expectancies. Most people live until their later years in life. The last one is basically a uniform distribution. This is where there's no clear winner here. The classes have equal frequencies or roughly equal. An example of this is like a dice roll. So if you roll a bunch of dice, the faces are one, two, three, four, five, six, and they're all equally probable, so they're going to form a uniform distribution.

Going back

Problem

Use the frequency histogram below to determine (a) the number of classes and (b) the class width.

(a) 5 classes; (b) 2

(a) 5 classes; (b) 3

(a) 4 classes; (b) 3

(a) 14 classes; (b) 2

Here’s what students ask on this topic:

What is a histogram and how is it used in data analysis?

A histogram is a type of bar graph used to represent the frequency distribution of quantitative data. It displays data using vertical bars, where the height of each bar corresponds to the frequency of data within a specific range or class. The x-axis represents the class midpoints, while the y-axis shows the frequency. Histograms are essential in data analysis as they help visualize the distribution of data, identify patterns, and understand the shape of the data distribution. Common shapes include normal (bell-shaped), skewed right, skewed left, and uniform. Recognizing these shapes aids in statistical analysis and decision-making by providing insights into data trends and variability.

Created using AI

How do you create a histogram from a frequency distribution?

To create a histogram from a frequency distribution, start by labeling the x-axis with class midpoints, which are calculated by averaging the upper and lower class limits. The y-axis should be labeled with frequencies. Plot vertical bars for each class, where the height of each bar corresponds to the frequency of that class. Ensure the bars touch each other to represent continuous data. This graphical representation helps visualize the distribution of data, making it easier to identify patterns and trends. Histograms are particularly useful for large datasets, where patterns may not be immediately apparent from the frequency distribution table alone.

Created using AI

What are the common shapes of histograms and what do they indicate?

Histograms commonly exhibit four shapes: normal, skewed right, skewed left, and uniform. A normal distribution is bell-shaped, indicating data is symmetrically distributed around the mean, often seen in test scores. A skewed right distribution peaks to the left and trails off to the right, common in income data where most values are low with a few high outliers. A skewed left distribution peaks to the right and trails off to the left, seen in life expectancy data. A uniform distribution has equal frequencies across classes, like dice rolls, indicating no particular trend. Understanding these shapes helps in analyzing data patterns and making informed decisions.

Created using AI

Why are histograms preferred over frequency distribution tables for visualizing data?

Histograms are preferred over frequency distribution tables for visualizing data because they provide a clear and immediate visual representation of data distribution. While frequency tables list numbers and frequencies, histograms use bars to depict these frequencies, making it easier to identify patterns, trends, and outliers. The visual format of histograms allows for quick assessment of data shape, such as normal, skewed, or uniform distributions, which is crucial for statistical analysis and decision-making. This graphical approach is particularly beneficial for large datasets, where patterns may not be easily discernible from tables alone.

Created using AI

How can you determine if a histogram represents a normal distribution?

To determine if a histogram represents a normal distribution, look for a bell-shaped curve where data is symmetrically distributed around the mean. In a normal distribution, the histogram will start with low frequencies, peak towards the middle, and then decrease symmetrically on both sides. The highest frequency should be near the center, with frequencies tapering off evenly towards the tails. This shape indicates that most data points are clustered around the mean, with fewer occurrences as you move away from the center. Recognizing a normal distribution is important for statistical analysis, as many statistical tests assume normality.

Created using AI

Your Statistics for Business tutor

Patrick Ford

Physics and Math Lead Instructor