3. Describing Data Numerically

Mean

3. Describing Data Numerically

Mean: Videos & Practice Problems

Topic summary

The mean, or average, is calculated by summing all values in a dataset and dividing by the total number of values. For example, for the dataset {5, 10, 12, 14, 3}, the mean is $(5 + 10 + 12 + 14 + 3) / 5$ = 8.8. The mean serves as a measure of central tendency, summarizing the dataset with a single value. Extreme values, or outliers, can significantly affect the mean, as seen when adding a value like 76, shifting the mean to 20. The notation for the mean includes x̄ for samples and μ for populations.

concept

Calculating the Mean

Video duration:

Video transcript

In the last couple of videos, we talked a lot about how we can visualize different data using various charts and graphs. However, often when we interpret data, we'll have to do it more numerically. So we're going to shift focus in this video. I'm going to start talking about how we can calculate certain important variables from the datasets. The first one we'll talk about is called the mean.

And luckily, that's a word you've probably heard and seen before. In this video, I'm going to show you how to calculate the mean from datasets. We'll discuss some different notations that we need to know, some important conceptual information, and then we'll just do some examples. Let's get started here. The mean, that's probably something you've heard at some point in math class, maybe even in grade school is basically just an average of a dataset.

All right? And when you take an average of a set of numbers, all you're going to do is add up all those values, and then you're going to divide by the total number of values. Alright? So let's just go ahead and look at this example here. Let's say I have a sample of numbers. I've got five, ten, 12, 14, and three. I've got five numbers in this dataset. And to calculate a mean or an average, we're going to use the word mean here, I'm just going to take all those numbers and add them together first. So five plus ten plus twelve plus fourteen plus three. Then I'm going to divide it by the total number of values that I have, which in this case is five.

Alright? So what happens here is when you plug this into your calculator, make sure you do that top part in parentheses because of the order of operations, but you should get a number that's 44 divided by five. Then when you calculate that, you're just going to get 8.8. Alright? So in other words, this dataset here, whatever it represents, has a mean of 8.8.

Alright? That's how you calculate that. Now a lot of times in math, we're going to take these complicated lists of instructions, and we'll turn them into shorter equations with symbols. So I want to talk to you about what the mean is. The mean, when you see it, you're going to see this sort of symbol here, x with a little bar on top of it, and we just call it x bar.

And basically, the equation here mathematically can be represented as: x ¯ = ∑ i x i / n This expression simply means summation of all x values divided by n, the total number of values.

We've seen that before. Alright. So that's what that equation means. So this x bar here, when we calculated this, was just a mean of 8.8. Alright.

So what does that mean? Well, the mean is what we call a measure of center or measure of central tendency. And basically, it's a fancy way of saying it summarizes a dataset in one central value. We have numbers in the sample that range from three all the way to fourteen. That's where my min and my max are, and I calculated a number of 8.8, which is, more or less, in the center or in the middle. That's what a mean as a measure of center means. Alright? So that's really it for this first example. Let's go ahead and move on to our second one over here.

So what I want you to do is imagine that this sample of data over here is actually part of a larger population. So now we actually have an extra number in the mix here. We've added this seventy-six. But ultimately, whether you're dealing with a population or a sample, the mean is always the same. You're just going to add up everything and divide by the total number.

So we've got five plus ten plus twelve plus fourteen plus three plus seventy-six. Again, we're going to put that all in parentheses. And now when you divide by the total number, what do you think you're dividing by? Is it five? Well, be careful here because we've added another number into the mix. There's actually six data values here. So one of the things that you might see is you might see little n's with samples and you might see big N's with populations. But ultimately, you calculate the mean the exact same way. So you're just going to divide by six over here. This ends up being 120 over six, which when you calculate the mean is going to be 20.

Alright? So just some notation here, whenever you see a population, you may see some different symbols attached to this. You may see 'mu' instead of x bar, and then you may see big N instead of little n. So basically, if you see this equation here where mu is equal to Sigma X over big N, don't freak out because all that's happening here is you're just calculating the mean. You don't really need to know when to use one versus the other.

And if you're ever unsure, x bar is probably your safest bet here. Alright? So just wanted to let you know that. Okay. Cool.

So let's talk about these means here for a second. When we calculated the sample, we got 8.8. Then when we threw in this extra number of seventy-six, we got a mean of 20. So what's going on there? Basically, what happens is while the mean uses all the data values, any extreme values that you have, any outliers like the seventy-six over here, are going to significantly change your mean.

Alright? And we threw in the seventy-six. That's a number that's so big relative to the other numbers that it kind of shifts the mean, and you still end up with a number that's sort of in between three and seventy-six, but that seventy-six has shifted the mean by a lot. And now you have a mean that's twenty instead of 8.8. So that's something you'll have to be aware of.

Alright? So that's it. That's it for the introduction as to calculate the mean. Let's go ahead and take a look at some practice problems.

Problem

A retail store sells different models of smartphone (prices shown below). Find the mean price.

$847.83

$820.00

$799.00

$809.50

example

Calculating the Mean Example 1

Video duration:

Video transcript

Back, everyone. Let's go ahead and work this out together here. So the data below shows heart rates in terms of beats per minute from a sample of adult males and females. So the question we're trying to ask is, does there appear to be a difference in mean heart rates between these two samples? So a lot of times in these sort of problems, we're going to have to first figure out what to calculate and then sort of interpret that.

So we're going to calculate the mean heart rates for both of these samples, then we're going to have to compare the two. Alright? That's what's going on here. Again, to calculate sample means, we're going to use this equation. So add up all the values and divide by the total number.

In this case, what we have here is that n equals 10 for both of these samples. What happens here is that in order to calculate the average for males or the mean for males, I want to use all of the data in this row here, which is 10 numbers. And then for females, I want to use all the data in this row, which is, again, 10 numbers. So don't confuse the two and just try to do 20 because the sample mean is going to be you're going to use all of these numbers over here just for males, which is 10 and so on and so forth. Alright?

So what I'm going to do here is I'm going to put a little column over here where we're going to calculate the x bars for both of these things. So let's go ahead and start with the first one, the x bar for males. We're going to use all of the top row information here. Remember, I have to add up everything and divide by the total number. So this is going to be 84+70+68+59+61+77+90+65+56+7210. Alright? So remember, when you do all this in one step, you should really enclose these things in, in parentheses, otherwise you'll get the wrong answer.

And you just divide it by 10. So in other words, the sample mean over here for the heart rate for males is going to turn out to be 70.2. Alright? So that's 70.2 over here. I'm going to put that over here.

Okay. Let's do the same exact thing, but for females now. So we're going to have x bar for females. We're going to use all the blue information and this is going to be 80+73+88+91+69+85+91+81+79+7710. Alright?

Again, all the separate parentheses divide by 10. What you should get over here is 81.4. So this is going to be the data for the sample mean for heart rates for mean heart rates for females. So we have 70.2 versus 81.4. And by the way, this is pretty accurate when you actually compare these mean heart rates.

And basically, what happens here is we can clearly tell that the average or mean heart rate for females is higher than it is for males. This is just a biological difference, but it's actually true. So in other words, what we can say here is and our question is, does there appear to be a difference in mean heart rates? And yes, there is. And the way we write this is that x bar for females is greater than the x bar for males.

All right? So again, all these things involve a calculation, and then you have to interpret that data. And that's basically what it's telling us over here. So this would be the answer to your question. Alright?

Thanks for watching. Let me know if that makes sense. And let's move on.

Here’s what students ask on this topic:

What is the formula for calculating the mean of a dataset?

The formula for calculating the mean, or average, of a dataset is to sum all the values and then divide by the total number of values. In mathematical terms, the mean is represented as $\bar{x}$ for a sample, and the formula is $\frac{\sum x}{n}$ , where $\sum$ denotes the sum of all data points $x$ , and $n$ is the number of data points. For a population, the mean is denoted by $μ$ and the formula is $\frac{\sum x}{N}$ , where $N$ is the total number of values in the population.

Created using AI

How does an outlier affect the mean of a dataset?

An outlier is a data point that is significantly different from other values in a dataset. When an outlier is present, it can have a substantial impact on the mean, as the mean takes into account all values in the dataset. For example, if you have a dataset {5, 10, 12, 14, 3} with a mean of 8.8, adding an outlier like 76 will shift the mean to 20. This is because the mean is sensitive to extreme values, which can skew the central tendency, making it less representative of the dataset as a whole. Therefore, it's important to consider the presence of outliers when interpreting the mean.

Created using AI

What is the difference between the sample mean and the population mean?

The sample mean and the population mean are both measures of central tendency, but they are used in different contexts. The sample mean, denoted as $\bar{x}$ , is calculated from a subset of the entire population and is used to estimate the population mean. The formula for the sample mean is $\frac{\sum x}{n}$ , where $n$ is the number of observations in the sample. The population mean, denoted by $μ$ , is the average of all values in the entire population, calculated using the formula $\frac{\sum x}{N}$ , where $N$ is the total number of values in the population. The sample mean is often used when it is impractical to collect data from the entire population.

Created using AI

Why is the mean considered a measure of central tendency?

The mean is considered a measure of central tendency because it provides a single value that summarizes the center of a dataset. It is calculated by averaging all the data points, which gives an indication of where the majority of values lie. For example, in a dataset {5, 10, 12, 14, 3}, the mean is 8.8, which is a central value around which the data points are distributed. The mean is useful for understanding the overall trend of the data and is often used in statistical analysis to compare different datasets. However, it is important to note that the mean can be affected by outliers, which can skew the central value.

Created using AI

How do you calculate the mean if you have a large dataset?

To calculate the mean of a large dataset, you follow the same basic steps as with a smaller dataset. First, sum all the values in the dataset. Then, divide this sum by the total number of values. For example, if you have a dataset with values $\sum x$ and the number of values is $n$ , the mean is calculated using the formula $\frac{\sum x}{n}$ . In practice, for large datasets, you might use statistical software or a calculator to handle the arithmetic efficiently. This approach ensures accuracy and saves time, especially when dealing with thousands or millions of data points.

Created using AI

Your Statistics for Business tutor

Patrick Ford

Physics and Math Lead Instructor