Deming Regression

August 2024

Suppose you have been monitoring your blood pressure using a cuff that goes on your arm. You want to compare your monitor to one that goes on your wrist. You can compare whether these two methods give similar results by using Deming regression.

You might think this sounds like simple linear regression. It is not. Both techniques compare one response variable (Y) to one predictor variable (X). In simple linear regression, only Y has measurement error. In Deming regression, both X and Y have measurement error. This makes Deming regression a great technique for comparing two measurement methods. Both have measurement error.

In This Issue:

Overview of Deming Regression
Example
Determining the Measurement Errors for X and Y
Calculating Lambda
Collecting the Data for Deming Regression
Performing the Regression
Deming Regression Output
Summary
Quick Links

Please feel free to leave a comment at the bottom of the publication. You can download a pdf copy of this publication at this link.

Overview of Deming Regression

Deming regression is a technique used to fit a straight line to two continuous variables, where both variables, X and Y, are measured with error. It is often used to compare measurement methods. It is used with paired measurements (x_i and y_i) and associated error (s and t):

x_i = X_i + s_i

y_i = Y_i+ t_i

The equation that Deming regression fits is:

Y’_i = b_o + b₁X’_i

where b_o and b₁ are the intercept and slope coefficient , and X’_i and Y’_i are the estimates of the true values of Y_i and X_i.

The first step is to ensure that the two test methods are in statistical control. When this is true, you can estimate the measurement error associated with both test methods. This allows you to estimate lambda which is the ratio of the measurement system variances. Then you can run the Deming regression to calculate the coefficients in the above equation and use various statistical tests to see if the two methods give similar results.

Example

Suppose you and one of your suppliers have the “same” test method for determining a key quality characteristic of a part. You want to determine if the two test methods are comparable and want to use Deming regression to determine if the two methods are comparable. To do this, you know you will need to determine estimates of the measurement error for each test method. After that, you will take 30 parts and test them in each test method. The tests are nondestructive. Then you will run the Deming regression and reach a conclusion about whether the two test methods are comparable or not.

For this article, the supplier will be Test 1 and represented by X. You (the customer) will be Test 2 and represented by Y. Let’s walk through how this happens with Deming regression.

Determining the Measurement Errors for X and Y

To run the Deming regression, you need estimates of the measurement errors for each of the two tests. You can get these estimates from Gage R&R studies you have done on the tests. If you don’t have measurement error estimates from the past Gage R&R studies, I recommend you use the EMP consistency study. This is one of Dr. Donald Wheeler’s Evaluating the Measurement Process techniques. Part of that analysis involves constructing an individuals (X-mR) control chart from repeated measurements by one operator on one part. If the moving range chart is in statistical control, you can estimate the measurement error. Table 1 shows the repeated measurements for Test 1 (the X values from your supplier) for one part.

Table 1: Repeated Measurements Results for Test 1 (X)

Sample No.	X	Sample No.	X
1	1.354	16	1.355
2	1.351	17	1.354
3	1.353	18	1.353
4	1.355	19	1.351
5	1.351	20	1.354
6	1.350	21	1.355
7	1.353	22	1.354
8	1.354	23	1.353
9	1.351	24	1.351
10	1.354	25	1.352
11	1.353	26	1.355
12	1.351	27	1.354
13	1.352	28	1.353
14	1.351	29	1.354
15	1.354	30	1.355

These 30 results are used to construct the individuals control chart. Please see our SPC Knowledge Base article for more information on how to construct and interpret individuals control chart. Figure 1 shows the X and mR chart for the Test 1 data.

Figure 1: X-mR Chart for Test 1 (X) Data

Since there are no points beyond the control limits nor any patterns in the two charts, both are said to be in statistical control. This test method is consistent and predictable. Please note this:

Before running the Deming regression, the two test methods must be in statistical control. If they are not, then you do not know what the methods will produce in the future.

This is often overlooked in doing tests like the Deming regression. Since the test is in statistical control, you can estimate the measurement error using the average moving range on the mR chart. The average range (R) is 0.0018. The measurement error (s) for Test 1 is then:

s = R/1.128 = 0.0018/1.128 = 0.0016

This process should be repeated for Test 2 – your test method. The data for the repeated measurements on a single part for Test 2 are shown in Table 2 below.

Table 2: Repeated Measurements Results for Test 2 (Y)

Sample No.	Y	Sample No.	Y
1	1.350	16	1.350
2	1.350	17	1.352
3	1.356	18	1.350
4	1.350	19	1.350
5	1.350	20	1.350
6	1.350	21	1.350
7	1.350	22	1.356
8	1.356	23	1.354
9	1.356	24	1.354
10	1.351	25	1.350
11	1.350	26	1.350
12	1.350	27	1.350
13	1.356	28	1.354
14	1.356	29	1.356
15	1.355	30	1.350

You may construct the individuals control chart on the data in Table 2. We will not do that here to save some time and space. But Test 2 is in statistical control. The average moving range for Test 2 is 0.0022. The measurement error (t) for Test 2 is then:

t = R/1.128 = 0.0022/1.128 = 0.002

At this point, we have the two estimates of the measurement error for the tests and the knowledge that both tests are in statistical control. We can move forward with the Deming regression.

Calculating Lambda

Lambda (l) is the ratio of the measurement errors (as variances) between the two tests:

l = s²/t²

Note that the ratio is X (Test 1) to Y (Test 2). Some sources use the ratio of Y variance to X variance. The equations below involving lambda use the ratio as shown in the equation above, i.e., X to Y. This value is assumed constant in the Deming regression. For our two tests, the value of lambda is:

l = s²/t² = (0.0016)²/0.002)² = 0.64

Collecting the Data for Deming Regression

The next step is to collect the data for the Deming regression. Thirty parts that reflect the specification range are collected. Each part is numbered and run in each of the two tests (the supplier’s and the customer’s test). The results are shown in Table 3.

Table 3: Data for 30 Parts from Supplier (X) and Customer (Y)

Part No.	X	Y	Part No.	X	Y
1	1.327	1.338	16	1.295	1.294
2	1.296	1.306	17	1.381	1.375
3	1.350	1.337	18	1.336	1.319
4	1.299	1.312	19	1.368	1.382
5	1.386	1.393	20	1.326	1.306
6	1.388	1.393	21	1.351	1.363
7	1.335	1.337	22	1.334	1.350
8	1.337	1.331	23	1.368	1.369
9	1.298	1.294	24	1.382	1.381
10	1.352	1.350	25	1.362	1.370
11	1.310	1.306	26	1.360	1.363
12	1.308	1.313	27	1.344	1.357
13	1.332	1.331	28	1.333	1.337
14	1.385	1.381	29	1.315	1.319
15	1.296	1.300	30	1.349	1.356

Performing the Regression

The following equations give the estimates of the coefficients that minimize the Deming regression sum of squares.

where b₁ and b_o are estimates of b₁ and b_o respectively, x and y are the means of y_i and x_i respectively, and

Using = 1.340 and = 1.342, the values for u, v, and r for the data in Table 3 are:

u = 0.0252587

v = 0.0272687

r = 0.0251167

With lambda = 0.64, the values of the coefficients are:

b₀ = -0.0389

b₁ = 1.031

The best fit equation is then given by:

y = -0.0389+ 1.031x

You can get an idea of how good the fit is by plotting Y vs X and the best-fit line. This is done in Figure 3.

Figure 3: Deming Regression Results

Figure 3 shows most of the points are near the best-fit line, so this would imply that there is a correlation between the two test methods. They give comparable results.

Deming Regression Output

In addition to the best-fit equation and Figure 3, there is other output that often accompanies Deming regression in software to help you analyze the results. We will look at the output from the SPC for Excel software which contains Deming regression.

*Regression Coefficients*
	Coefficient	Standard Error	df	t Stat	p-value	LCL	UCL
Intercept	-0.0389	0.0652	29	-0.597	0.5552	-0.172	0.0944
X	1.031	0.0484	29	21.31	0.0000	0.932	1.129

The table above contains information on the regression coefficients. The “Coefficient” column contains the values of b₀ and b₁ calculated above. The standard error is estimated using a jackknifing technique. This is a resampling method that is used to estimate the standard error of a statistic, in our case, the coefficients. The technique involves leaving out one observation at a time and calculating the values for each of these. The results can be used to estimate the standard errors.

The rest of the table is the typical regression coefficient table where:

df: degrees of freedom, N – 1, where N is the number of parts in the study; some sources recommend using N- 2.
t Stat: the t statistic given by coefficient/standard error (i.e., 1.031/.0484 = 21.31).
p-value: the probability of getting that t statistic if the null hypothesis is true.
LCL: the lower confidence limit.
UCL: the upper confidence limit.

Let’s explore the p-value a little more. With the intercept (b₀), we are testing two hypotheses:

H₀: b₀ = 0

H₁: b₀ ≠ 0

H_o is the null hypothesis; H₁ is the alternative hypothesis. For the intercept, we calculated a t statistic (-0.597). The p-value represents the probability that we would get this value for the t statistic if the null hypothesis is true. If the p-value is sufficiently small (often, ≤ 0.05), we conclude that the probability of getting this t-statistic is small and the null hypothesis is rejected – the intercept is different than 0. If the p-value is larger, we conclude that the probability of getting this t-statistic is large and the null hypothesis is accepted. The intercept can equal 0.

The p-value for the intercept is 0.5552. Here, the p value is large, so we accept the null hypothesis and conclude the intercept could be 0.

The same approach is used for the coefficient. We are testing the null hypothesis that the coefficient is 0. The p-value for the coefficient is very small (0.0000). We conclude that we would not get this p-value if the null hypothesis is true, so we reject the null hypothesis and conclude that the coefficient is significantly different from 0.

The p-values can be calculated using the Excel function T.DIST.2T. The two entries are the absolute value of the t- statistic and the degrees of freedom.

Now, let’s explore the confidence limits a little more. The confidence limit for the intercept is -0.172 to 0.0944. Note that this confidence interval contains 0. This means that it is possible for the intercept to be 0. This matches what we concluded from the p-value. The confidence limit for the coefficient is 0.932 to1.129. Note that it does not include 0, so we conclude that the coefficient is significantly different from 0.

The confidence intervals are calculated as b +/- t(SE), where b is the intercept or coefficient, t is the t value and SE is the standard error for b. The t value can be calculated using Excel’s formula T.INV.2T, where the input is alpha value (usually 0.05), and the degrees of freedom.

So, Figure 3, along with the p-value and the confidence intervals imply that the two methods give comparable results.

You can also do hypothesis testing to directly test if the Deming regression slope is 1. The null hypothesis is that b₁ = 1 and the alternate hypothesis is that b₁ <> 1. The t test statistic in this case is:

t = (b₁ – 1)/SE(b₁)

Likewise, you can directly test that the two means, and, are the same. In this case, the test statistic is:

t = (y –x) /SE(y –x)

The output from the SPC for Excel program is given below for the hypothesis testing.

*Hypothesis Testing*
	Parameter	Standard Error	df	t Stat	p-value	LCL	UCL
Slope Test	0.0305	0.0484	29	0.632	0.5326	-0.0838	0.145
Means Test	0.00200	0.00162	29	1.232	0.2280	-0.00184	0.00584

You interpret this table the same way you did for the regression coefficients. The hypothesis tests for the slope are:

H₀: b₁ – 1 = 0

H₁: b₁ – 1 ≠ 0

Note that the slope test has a high p-value and the confidence limit contains 0. This means that b₁ – 1 can be 0.

The null hypothesis for the means test is that the difference in the two means is 0; the alternate hypothesis is the difference in the two means is not 0. The means test has a high p-value as well and the confidence interval contains 0. So, you conclude that the means are the same.

The slope test and the mean test further confirm that the two methods are comparable. Note, that if either of these tests fail, the two methods are not comparable.

To further determine how good of a model the Deming regression produces is to see if the residuals are normally distributed. There are a number of ways of doing this. One option in the SPC for Excel program is to determine if the standardized residuals are normally distributed by doing a normal probability plot. Residuals are the difference between the actual values and the predicted values. The standardized residuals are the residuals divided by the standard deviation of the residuals. Figure 4 is a normal probability plot of the standardized residuals.

Figure 4: Normal Probability Plot of Standardized Residuals

If the points lay along the straight line, the residuals are normally distributed.

The conclusion for these two test methods is they produced comparable results.

Summary

This publication introduced Deming regression. This technique is used to determine if two test methods produce comparable results. You start by ensuring both test methods are consistent and predictable and use those results to determine the measurement error and set the value of lambda. You then run paired measurements through both tests and perform the Deming regression to estimate the coefficients as well as perform the hypothesis tests. The slope test and means test confirm if the two methods are comparable.

Quick Links

Thanks so much for reading our SPC Knowledge Base. We hope you find it informative and useful. Happy charting and may the data always support your position.

Sincerely,

Dr. Bill McNeese
BPI Consulting, LLC

Connect with Us

Measurement Systems Analysis/Gage R&R

6 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Cesana Bruno Mario

9 months ago

Very nice introduction to Deming’s regression.
However, the fact that we do not reject the null hypotheses intercept equal to zero and slope equal to 1 is not sufficient for concluding that the two measurement methods are in agreement. Indeed, the 95%CI of the slope (and intercept) are too wide. It is necessary to fix “a priori” two “agreement thresholds” (0.9 and 1.1 for example for the slope) and the conclusion of agreement is possible if the 95%CI are within these thresholds like an equivalence test.

William

Author

Reply to Cesana Bruno Mario

9 months ago

Good point on setting what is an acceptable difference that the two methods are in agreement. I agree you need to add that to the process. You would have to use your knowledge the process to set those limits. You could run a paired equivalence test to see if the means are within an acceptable range. Thanks for the comment.

Raphy Bar

9 months ago

What about the The nonparametric Passing-Bablok procedure, which also compares two methods? Will this be incorporated in SPC for Excel?

William

Author

Reply to Raphy Bar

9 months ago

I am not familiar with the Passing-Bablok procedure. I will look into it. Thanks.

Cesana Bruno Mario

Reply to William

9 months ago

The Passing – Bablok regression compares two methods by considering them in agreement if the 95%CI of the slope includes 1 and the intercept includes 0.
So, there is the problem I said before of 95% CI too wide deponing for a no agreement that wrongly allows to conclude for the agreement. Furthermore, the testing procedure in the agreement settings is an equivalence test and accordingly the required sample sizes have to be calculated. I think that the Passing and Bablok regression is a procedure that has to be forgot in an agreement analyis.
You can see my papers: Cesana Antonelli Ferraro Critical appraisal of the CLSI guideline EP09c 10.1515_clinical chemistry and laboratory medicine-2024-0595
Cesana Antonelli Sample_Size_for_Agreement_Studies_on_Quantitative variables_Epidemiology Biostatistics and Public Health 2024 for the sample size; paper correcting Shieh’s recent proposal.

Martin Gibson

9 months ago

Interesting article. I think this what Professor Peter Sprent (University of Dundee) called stochastic regression circa 1969.

wpDiscuz

Deming Regression

Overview of Deming Regression

Example

Determining the Measurement Errors for X and Y

Calculating Lambda

Collecting the Data for Deming Regression

Performing the Regression

Deming Regression Output

Summary

Quick Links

Connect with Us

SPC Knowledgebase Newsletter and Videos