4 & 5. Statistics, Quality Assurance and Calibration Methods

Detection of Gross Errors

4 & 5. Statistics, Quality Assurance and Calibration Methods

Detection of Gross Errors - Online Tutor, Practice Problems & Exam Prep

Topic summary

Created using AI

Grubbs test and Q test are methods for identifying outliers in normally distributed data sets. Grubbs test calculates $g = \frac{| x - μ |}{σ}$ and compares it to a critical value. If $g > g_c$ , the outlier is discarded. The Q test, suitable for small data sets, uses $Q = \frac{gap}{range}$ for similar comparisons. Both tests help maintain data integrity by identifying significant deviations.

When dealing with data sets it becomes important to eliminate outliers in order to have the most accurate standard deviation.

Grubbs Test vs. Q Test

concept

Grubbs Test vs. Q Test

Video duration:

Video transcript

Our two tests on this page are basically methods that we can use to determine if a value within our given dataset should be ignored or not. Now, we're going to look at the first Grubbs' test. Grubbs' test is used to detect a single outlier in a single variable dataset that follows some type of normal distribution. Here, in Grubbs' test, we first have to calculate our g calculated. Here, we have our questionable value, so our potential outlier minus your mean or average in absolute brackets divided by your standard deviation. Now, we are going to compare our g calculated to our g table. Now here we have our number of observations, and then we have our g table or sometimes called our g critical based on a particular confidence interval. We have 90%, 95%, and 99% confidence. Now, if our g table value happens to be less than our g calculated, that means that outlier needs to be discarded. If the table is greater than your g calculated, that means that outlier is fine and it's within the normal level of confidence so we can retain it, hold on to our mean and our standard deviation. Our Q test is another method that's usually not talked about. But here, this is just another method of finding outliers in very small, normally distributed datasets. Here, the number of measurements is normally between 3 to 7 values. Now, it can exceed that, but the Q test is usually reserved for very few data measurements. Now here, we're going to say Q calculated equals your gap divided by your range. Now, what does that mean? Well, your gap is in absolute brackets, x₁ minus x_n+1. x₁ is just the suspected outlier that we're looking for. We're trying to determine if this is the number we need to ignore. x_n+1 here is the next closest data point. That's the next measurement that's closest to that outlier. And then the range, your range is just your largest value minus the smallest value in your dataset. For the Q test, what you need to do is you need to take all your measurements and you need to organize them from smallest to largest value. And then your range is just that largest value minus your smallest value. We'll see how to utilize this later on as we do a question on the Q test. Now just like the Grubbs' test, we compare our Q calculated in this case to our Q table. Again, we have the number of measurements which you can compare to different levels of confidence. Here, again, if your table value is lower than your calculated value, in this case Q, we disregard that value. It is an outlier and it cannot be included with our data measurements. If your Q table or your Q critical value happens to be greater than your Q calculated, then we can hold on to that suspected outlier and say that it does belong with the other measurements. Again, Grubbs' test is the more commonly used test to find the outlier. Q is normally not discussed as much and it's usually reserved for very small amounts of measurements. Remember these two different types of tests that are great at finding an outlier within a given dataset.

example

Q Test

Video duration:

Video transcript

Wishing to measure the amount of caffeine in a cup of coffee, you pour 10 cups. From the data provided, perform a queue test to determine if the outlier can be retained or disregarded. All right. What we need to do here is we need to organize this list of measurements from smallest to largest. Alright. We can see that the smallest number is 72. Then the next smallest number looks like it is 77, then 78, then 78 again, then 79, then 81, 81 again, 82 twice and then finally, 83. I've organized it from smallest measurement to largest measurement. Remember, that's important because that'll give us the range that we need.

So now we're going to figure out our Q calculated. Our Q calculated will be the outlier that we're investigating minus the number that's closest to it in absolute brackets divided by our range. So remember, that's gap divided by range. So actually, let me rewrite this. So Q calculated is gap divided by range. Alright. The number that we're looking at is the outlier, the one that's farthest from everyone else. So let's see. The difference between these is 5. And then we're going to say the difference between these is 1. These are 0. This is 1. This here is 0. Difference between these is 2. Difference between these is 1. Difference between these is 0. Difference between these is 1. The outlier is the one that's farthest from all the other measurements and we can see that that would be 72. 72 is the farthest from everyone else. Its difference is 5 away from 77.

The outlier that we're investigating is 72 minus the value that's closest to it, which is 77 divided by the range. Remember, the range is your largest value minus your smallest value. Your largest value is 83 minus your smallest value which is 72. That's going to come out to being 0.455. Now if you take a look at the Q table that we have on the previous page. Let's say we want to look at it in terms of our 99% confidence interval. We want to have 99% confidence if this should be kept or not. All right. Looking at the number of measurements we have, we have 10 measurements. Looking at that table, we're looking at 10 measurements and scroll all the way to the right till you get to the 99% confidence value. There, we're going to see that Q critical equals 0.568. So comparing our Q calculated to our Q critical, what can we say? Well, we're going to say that our Q critical is a larger value than our Q calculated. Therefore, you have to retain the value. That number of 72, we're going to keep it around from our 99% confidence, that we were able to find out from our Q table. That's all we'd have to do in order to figure out our Q calculated and then compare it to our Q table. Now that you've seen this one, look to see if you can figure out the example 2 that's on the bottom of the page. Don't worry if you get stuck. Just come back and see how I approach that same exact question. Just remember some of the techniques we used here for the Q test.

example

Grubbs Test

Video duration:

Video transcript

So here it says white blood cells are the defending cells of the human immune system and fight against infectious diseases. Provided below is the normal white blood cell counts for a healthy adult woman. Determine if the current white blood cell count is reasonable by Grubbs' test. Alright. Here we're going to first realize that for Grubbs' test, we need to first calculate our g calculated. So g calculated equals our questionable value, which in this case is today's white blood cell count minus your mean or average in absolute brackets divided by your standard deviation s.

For our mean or average, we add up each one of these 7 values and divide them by 7. When we do that, we get 5.2857×106. Our standard deviation equals, remember, square root of the summation of each measurement Here, we just have to input each measurement minus it by the average, And that's squared, and then you're just gonna add them. It's pretty long drawn out.

For example: ( 4.910 × 106 − 5.2857 × 106 )2 + … + ( Last value − 5.2857 × 106 )2

So once we finish the calculations and divide by the number of measurements, which is 7 minus 1, that would give us our standard deviation.

Now plugging that into our formula for g calculated: | 6.1 × 106 − 5.2857 × 106 | s

This gives us a g calculated equal to 2.04595. Let's just compare it to our g table at 95% confidence. For 7 measurements under 95% confidence, our g table value is 2.020. We see that our g calculated is larger. So what does that mean? Therefore, I have to disregard that value. The value here is too high, so we have to disregard it.

The reason it could be high, since we're dealing with our immune system and white blood cells, it could be that she has maybe a cold, and her body is just increasing the amount of white blood cells in order to combat whatever infectious disease she may have. Hopefully, we all know how the immune system works. So when we have an infection, our white blood cell count spikes in order to combat whatever that infection may be. So that would explain why her white blood cell count for that day would be a little bit higher than normal. These would be her average on what you expect her white blood cells to be in terms of a typical day.

By using our Grubbs' test, we can see that her white blood cell count is higher than usual, suggesting she's fighting some type of infection. Remember, we have both the Grubbs' test and the Q test. Both look for the outlier within a given dataset to see if it can be disregarded or retained within our calculations. In this example, because we have to disregard that value, that would mean that you'd have to calculate a new standard deviation as well as a new average for this data set. Here, we're not asked to figure that out, but we've concluded that we have to ignore the 6.1 × 106, which we found out that we do because g calculated is larger than g table.

Here’s what students ask on this topic:

What is the Grubbs test and how is it used to detect outliers?

The Grubbs test is a statistical method used to identify a single outlier in a normally distributed data set. To perform the Grubbs test, you first calculate the Grubbs statistic (G) using the formula:

$G = \frac{| x - μ |}{σ}$

where $x$ is the questionable value, $μ$ is the mean, and $σ$ is the standard deviation. You then compare the calculated G value to a critical value from the Grubbs table, which depends on the number of observations and the desired confidence level. If the calculated G is greater than the critical value, the data point is considered an outlier and should be discarded.

Created using AI

How does the Q test differ from the Grubbs test in detecting outliers?

The Q test is another method for detecting outliers, but it is typically used for small data sets (3 to 7 values) that are normally distributed. The Q test calculates the Q statistic using the formula:

$Q = \frac{| x_{1} - x_{n+1} |}{Range}$

where $x_{1}$ is the suspected outlier, $x_{n+1}$ is the next closest data point, and the range is the difference between the largest and smallest values in the data set. The calculated Q value is then compared to a critical value from the Q table. If the calculated Q is greater than the critical value, the data point is considered an outlier and should be discarded. Unlike the Grubbs test, the Q test is less commonly used and is reserved for very small data sets.

Created using AI

When should you use the Grubbs test versus the Q test?

The choice between the Grubbs test and the Q test depends on the size of your data set and the distribution of your data. The Grubbs test is more commonly used and is suitable for larger data sets that follow a normal distribution. It is designed to detect a single outlier in such data sets. On the other hand, the Q test is typically reserved for very small data sets, usually between 3 to 7 values, that are also normally distributed. If you have a small data set and suspect an outlier, the Q test may be more appropriate. For larger data sets, the Grubbs test is generally preferred.

Created using AI

What are the steps to perform the Q test for outliers?

To perform the Q test for outliers, follow these steps:

Organize your data set from smallest to largest value.
Identify the suspected outlier (x1) and the next closest data point (xn+1).
Calculate the range of the data set by subtracting the smallest value from the largest value.
Calculate the Q statistic using the formula:

$Q = \frac{| x_{1} - x_{n+1} |}{Range}$

Compare the calculated Q value to the critical value from the Q table, based on the number of measurements and the desired confidence level.
If the calculated Q is greater than the critical value, the suspected outlier should be discarded. If it is less, the data point can be retained.

Created using AI

What is the importance of detecting outliers in a data set?

Detecting outliers in a data set is crucial for maintaining data integrity and ensuring accurate statistical analysis. Outliers can significantly skew the results of your analysis, leading to incorrect conclusions. By identifying and removing outliers, you can improve the reliability and validity of your data. Outliers may result from measurement errors, data entry errors, or genuine variability in the data. Identifying these outliers helps in understanding the underlying patterns and trends in the data, leading to more accurate and meaningful interpretations. Methods like the Grubbs test and Q test are essential tools for detecting and handling outliers effectively.

Created using AI

Your Analytical Chemistry tutor

Jules Bruno

General Chemistry, Analytical Chemistry and GOB lead instructor

My Courses

Chemistry

Biology

Math

Physics

Business

Social Sciences

Programming

Product & Marketing

Detection of Gross Errors - Online Tutor, Practice Problems & Exam Prep

Grubbs Test vs. Q Test

Grubbs Test vs. Q Test

Video transcript

Q Test

Video transcript

Grubbs Test

Video transcript

Here’s what students ask on this topic:

What is the Grubbs test and how is it used to detect outliers?

How does the Q test differ from the Grubbs test in detecting outliers?

When should you use the Grubbs test versus the Q test?

What are the steps to perform the Q test for outliers?

What is the importance of detecting outliers in a data set?

Your Analytical Chemistry tutor