4 & 5. Statistics, Quality Assurance and Calibration Methods

Detection of Gross Errors

4 & 5. Statistics, Quality Assurance and Calibration Methods

Detection of Gross Errors - Online Tutor, Practice Problems & Exam Prep

Topic summary

Created using AI

Grubbs test and Q test are methods for identifying outliers in normally distributed data sets. Grubbs test calculates $g = \frac{| x - μ |}{σ}$ and compares it to a critical value. If $g > g_c$ , the outlier is discarded. The Q test, suitable for small data sets, uses $Q = \frac{gap}{range}$ for similar comparisons. Both tests help maintain data integrity by identifying significant deviations.

When dealing with data sets it becomes important to eliminate outliers in order to have the most accurate standard deviation.

Grubbs Test vs. Q Test

concept

Grubbs Test vs. Q Test

Video duration:

Video transcript

Our 2 tests on this page are basically methods that we can use to determine if a value within our given dataset should be ignored or not. Now we're going to look at the first Grubbs' test. Grubbs' test is used to detect a single outlier in a single-variable dataset that follows some type of normal distribution. Here in Grubbs' test, we first have to calculate our g calculated. Here, we have our questionable value, so our potential outlier minus your mean or average in absolute brackets divided by your standard deviation.

Now, we're going to compare our g calculated to our g table. Here we have our number of observations and then we have our g table or sometimes called our g critical based on a particular confidence interval. We have 90%, 95%, and 99% confidence. Now if our g table value happens to be less than our g calculated, that means that outlier needs to be discarded, and we need to recalculate the standard deviation and the mean within the remaining datasets. Next, if your g table is greater than your g calculated, that means that outlier is fine and it's within the normal level of confidence.

So, we can retain it, hold on to our mean and our standard deviation. Our q test, is another method that's usually not talked about. But here, this is just another method in finding outliers in very small, normally distributed datasets. Here, the number of measurements is normally between 3 to 7 values. Now, it can exceed that but the q test is usually reserved for very few data measurements.

Now here, we're going to say q calculated equals gaprange. What does that mean? Well, your gap is in absolute brackets x1−xn+1. x1 is just the suspected outlier that we're looking for. We're trying to determine if this is the number we need to ignore.

Then here, this is the next closest data point. That's the next measurement that's closest to that outlier. Then range, your range is just your largest value minus your smallest value in your dataset. For the q test, what you need to do is you need to take all your measurements and you need to organize them from smallest to largest value. Then your range is just that largest value minus your smallest value.

We'll see how to utilize this later on as we do a question on the q test. Now just like the Grubbs' test, we compare our q calculated in this case to our q table. Again, we have a number of measurements which you can compare to different levels of confidence. Here, again, if your table value is lower than your calculated value, in this case q, we disregard that value. It is an outlier and it cannot be included with our data measurements.

If your q table or your q critical value happens to be greater than your q calculated, then we can hold on to that suspected outlier and say that it does belong with the other measurements. Again, Grubbs' test is the more commonly used test to find the outlier. Q test is normally not discussed as much and it's usually reserved for very small amounts of measurements. Just remember these 2 different types of tests that are great at finding an outlier within a given dataset.

example

Q Test

Video duration:

Video transcript

So here it says, wishing to measure the amount of caffeine in a cup of coffee, you pour 10 cups. From the data provided, perform a queue test to determine if the outlier can be retained or disregarded. All right. What we need to do here is we need to organize this list of measurements from smallest to largest. We can see that the smallest number is 72.

Then the next smallest number looks like it is 77, then 78, then 78 again, then 79, then 81, 81 again, 82 twice, and then finally 83. I've organized it from smallest measurement to largest measurement. Remember, that's important because that'll give us the range that we need. Now, we're going to figure out our q calculated. Remember, our q calculated will be the outlier that we are investigating minus the number that's closest to it in absolute brackets divided by our range.

Remember, that's gap divided by range. Actually, let me rewrite this. If you calculate it as a gap divided by range. Alright. So the number that we're looking at is the outlier, the one that's farthest from everyone else.

So let's see. The difference between these 2 is 5. And then we're going to say the difference between these 2 is 1. These are 0. This is 1.

This here is 0. The difference between these 2 is 2. The difference between these 2 is 1. The difference between these 2 is 0. The difference between these 2 is 1.

The outlier is the one that's farthest from all the other measurements and we can see that that would be 72. 72 is the farthest from everyone else. Its difference is 5 away from 77. That's the outlier that we're investigating. That's 72−77 divided by the range.

Remember, the range is your largest value minus your smallest value. 83−72. That's going to come out to being 0.455. Now, if you take a look at the q table that we have on the previous page, let's say we want to look at it in terms of our 99% confidence interval. We want to have 99% confidence whether this should be kept or not.

Alright. Looking at the number of measurements we have, we have 10 measurements. Looking at that table, we're looking at 10 measurements and scroll all the way to the right till you get to the 99% confidence value. There, we're going to see that q table or q critical equals 0.568. So comparing our q calculated to our q table, what can we say?

Well, we're going to say that our q table is a larger value than our q calculated. Therefore, you have to retain the value. That number of 72, we're going to keep it around from our 99% confidence that we were able to find out from our q table. That's all we'd have to do in order to figure out our q calculated and then compare it to our q table. Now that you've seen this one, look to see if you can figure out the example 2 that's on the bottom of the page.

Don't worry if you get stuck. Just come back and see how I approach that same question. Just remember some of the techniques we used here for the q test.

example

Grubbs Test

Video duration:

Video transcript

Here it says white blood cells are the defending cells of the human immune system and fight against infectious diseases. Provided below is the normal white blood cell counts for a healthy adult woman. Determine if the current white blood cell count is reasonable by Grubbs test. Here we're going to first realize that for Grubbs test, we need to first calculate our g calculated. G calculated equals our questionable value, which in this case is today's white blood cell count minus your mean or average in absolute brackets divided by your standard deviation s.

For our mean or average, we add up each one of these 7 values and divide them by 7. When we do that, we get 5.2857⋅106 . Our standard deviation equals, remember, square root of the summation of each measurement minus the mean or average squared divided by n minus 1. Here, we just have to input each measurement minus it by the average, and that's squared, and then you're just going to add them. So it's pretty long drawn out.

Plus 4.910⋅106 - 5.2857⋅106 ^2. And so as you can see, it's a lot of writing for these numbers. Okay, almost done. Just gotta finish the other 2. So remember, we're inputting all the values, all 7 of them.

Plus, finally, this last one. And then divided by number of measurements which is 7 minus 1. So all of that would give us So our questionable value is 6.1⋅106. So our questionable value is 6.1⋅106 - our average divided by our standard deviation. That gives me a g calculated equal to 2.04595.

Our g calculated which is 2.04595. And let's just compare it to our g table at 95% confidence. So go on the previous page, look at that value. So g table, when we're looking at 7 measurements under 95% confidence, our g table value is 2.020. We see that our g calculated is larger.

So what does that mean? That means, therefore, I have to disregard that value. So this value here is too high, so we have to disregard it. The reason it could be high since we're dealing with our immune system and white blood cells, it could be that she has maybe a cold and her body is just increasing the amount of white blood cells in order to combat, whatever infectious disease she may have, whatever it might be. So we all know, hopefully, how the immune system works.

So when we have an infection, our white blood cell count spikes in order to combat whatever that infection may be. So that would explain why her white blood cell count for that day would be a little bit higher than normal. These would be her average round of what you expect her white blood cells to be in terms of a typical day. By using our Grubbs test, we can see that, hey. Her white blood cell is higher than usual.

Maybe she's fighting some type of infection. Guys, hopefully you're able to follow along in terms of the Grubbs test. Remember, we have both the Grubbs test and the q test. Both look for the outlier within a given data set to see if it can be disregarded or retained within our calculations. In this example, because we have to disregard that value, that would mean that you'd have to calculate a new standard deviation as well as a new average for this data set.

Here, we're not asked to figure that out. We're just asked to see if we have to ignore the 6.1 times 10 to the 6, which we found out that we do because g calculated is larger than g table.

Here’s what students ask on this topic:

What is the Grubbs test and how is it used to detect outliers?

The Grubbs test is a statistical method used to identify a single outlier in a normally distributed data set. To perform the Grubbs test, you first calculate the Grubbs statistic (G) using the formula:

$G = \frac{| x - μ |}{σ}$

where $x$ is the questionable value, $μ$ is the mean, and $σ$ is the standard deviation. You then compare the calculated G value to a critical value from the Grubbs table, which depends on the number of observations and the desired confidence level. If the calculated G is greater than the critical value, the data point is considered an outlier and should be discarded.

Created using AI

How does the Q test differ from the Grubbs test in detecting outliers?

The Q test is another method for detecting outliers, but it is typically used for small data sets (3 to 7 values) that are normally distributed. The Q test calculates the Q statistic using the formula:

$Q = \frac{| x_{1} - x_{n+1} |}{Range}$

where $x_{1}$ is the suspected outlier, $x_{n+1}$ is the next closest data point, and the range is the difference between the largest and smallest values in the data set. The calculated Q value is then compared to a critical value from the Q table. If the calculated Q is greater than the critical value, the data point is considered an outlier and should be discarded. Unlike the Grubbs test, the Q test is less commonly used and is reserved for very small data sets.

Created using AI

When should you use the Grubbs test versus the Q test?

The choice between the Grubbs test and the Q test depends on the size of your data set and the distribution of your data. The Grubbs test is more commonly used and is suitable for larger data sets that follow a normal distribution. It is designed to detect a single outlier in such data sets. On the other hand, the Q test is typically reserved for very small data sets, usually between 3 to 7 values, that are also normally distributed. If you have a small data set and suspect an outlier, the Q test may be more appropriate. For larger data sets, the Grubbs test is generally preferred.

Created using AI

What are the steps to perform the Q test for outliers?

To perform the Q test for outliers, follow these steps:

Organize your data set from smallest to largest value.
Identify the suspected outlier (x1) and the next closest data point (xn+1).
Calculate the range of the data set by subtracting the smallest value from the largest value.
Calculate the Q statistic using the formula:

$Q = \frac{| x_{1} - x_{n+1} |}{Range}$

Compare the calculated Q value to the critical value from the Q table, based on the number of measurements and the desired confidence level.
If the calculated Q is greater than the critical value, the suspected outlier should be discarded. If it is less, the data point can be retained.

Created using AI

What is the importance of detecting outliers in a data set?

Detecting outliers in a data set is crucial for maintaining data integrity and ensuring accurate statistical analysis. Outliers can significantly skew the results of your analysis, leading to incorrect conclusions. By identifying and removing outliers, you can improve the reliability and validity of your data. Outliers may result from measurement errors, data entry errors, or genuine variability in the data. Identifying these outliers helps in understanding the underlying patterns and trends in the data, leading to more accurate and meaningful interpretations. Methods like the Grubbs test and Q test are essential tools for detecting and handling outliers effectively.

Created using AI

Your Analytical Chemistry tutor

Jules Bruno

General Chemistry, Analytical Chemistry and GOB lead instructor

My Courses

Chemistry

Biology

Math

Physics

Business

Social Sciences

Programming

Product & Marketing

Detection of Gross Errors - Online Tutor, Practice Problems & Exam Prep

Grubbs Test vs. Q Test

Grubbs Test vs. Q Test

Video transcript

Q Test

Video transcript

Grubbs Test

Video transcript

Here’s what students ask on this topic:

What is the Grubbs test and how is it used to detect outliers?

How does the Q test differ from the Grubbs test in detecting outliers?

When should you use the Grubbs test versus the Q test?

What are the steps to perform the Q test for outliers?

What is the importance of detecting outliers in a data set?

Your Analytical Chemistry tutor