Deming Regression

Deming Regression

August 2024

Suppose you have been monitoring your blood pressure using a cuff that goes on your arm. You want to compare your monitor to one that goes on your wrist. You can compare whether these two methods give similar results by using Deming regression.

You might think this sounds like simple linear regression. It is not. Both techniques compare one response variable (Y) to one predictor variable (X). In simple linear regression, only Y has measurement error. In Deming regression, both X and Y have measurement error. This makes Deming regression a great technique for comparing two measurement methods. Both have measurement error.

In This Issue:

Please feel free to leave a comment at the bottom of the publication.   You can download a pdf copy of this publication at this link.

Overview of Deming Regression

Deming regression is a technique used to fit a straight line to two continuous variables, where both variables, X and Y, are measured with error. It is often used to compare measurement methods. It is used with paired measurements (xi and yi) and associated error (s and t):

xi = Xi + si

yi = Yi + ti

The equation that Deming regression fits is:

Y’i = bo + b1X’i

where bo and b1 are the intercept and slope coefficient , and X’i and  Y’i  are the estimates of the true values of Yi and Xi.

The first step is to ensure that the two test methods are in statistical control. When this is true, you can estimate the measurement error associated with both test methods. This allows you to estimate lambda which is the ratio of the measurement system variances. Then you can run the Deming regression to calculate the coefficients in the above equation and use various statistical tests to see if the two methods give similar results.

Example

Suppose you and one of your suppliers have the “same” test method for determining a key quality characteristic of a part. You want to determine if the two test methods are comparable and want to use Deming regression to determine if the two methods are comparable. To do this, you know you will need to determine estimates of the measurement error for each test method. After that, you will take 30 parts and test them in each test method. The tests are nondestructive. Then you will run the Deming regression and reach a conclusion about whether the two test methods are comparable or not.

For this article, the supplier will be Test 1 and represented by X. You (the customer) will be Test 2 and represented by Y.  Let’s walk through how this happens with Deming regression.

Determining the Measurement Errors for X and Y

To run the Deming regression, you need estimates of the measurement errors for each of the two tests. You can get these estimates from Gage R&R studies you have done on the tests. If you don’t have measurement error estimates from the past Gage R&R studies, I recommend you use the EMP consistency study. This is one of Dr. Donald Wheeler’s Evaluating the Measurement Process techniques. Part of that analysis involves constructing an individuals (X-mR) control chart from repeated measurements by one operator on one part. If the moving range chart is in statistical control, you can estimate the measurement error. Table 1 shows the repeated measurements for Test 1 (the X values from your supplier) for one part.

Table 1: Repeated Measurements Results for Test 1 (X)

Sample No.X Sample No.X
11.354 161.355
21.351 171.354
31.353 181.353
41.355 191.351
51.351 201.354
61.350 211.355
71.353 221.354
81.354 231.353
91.351 241.351
101.354 251.352
111.353 261.355
121.351 271.354
131.352 281.353
141.351 291.354
151.354 301.355

 

These 30 results are used to construct the individuals control chart. Please see our SPC Knowledge Base article for more information on how to construct and interpret individuals control chart. Figure 1 shows the X and mR chart for the Test 1 data.

Figure 1: X-mR Chart for Test 1 (X) Data

Since there are no points beyond the control limits nor any patterns in the two charts, both are said to be in statistical control. This test method is consistent and predictable. Please note this:

Before running the Deming regression, the two test methods must be in statistical control. If they are not, then you do not know what the methods will produce in the future.

This is often overlooked in doing tests like the Deming regression. Since the test is in statistical control, you can estimate the measurement error using the average moving range on the mR chart. The average range (R) is 0.0018. The measurement error (s) for Test 1 is then:

s = R/1.128 = 0.0018/1.128 = 0.0016

This process should be repeated for Test 2 – your test method. The data for the repeated measurements on a single part for Test 2 are shown in Table 2 below.

Table 2: Repeated Measurements Results for Test 2 (Y)

Sample No.Y Sample No.Y
11.350 161.350
21.350 171.352
31.356 181.350
41.350 191.350
51.350 201.350
61.350 211.350
71.350 221.356
81.356 231.354
91.356 241.354
101.351 251.350
111.350 261.350
121.350 271.350
131.356 281.354
141.356 291.356
151.355 301.350

 

You may construct the individuals control chart on the data in Table 2. We will not do that here to save some time and space. But Test 2 is in statistical control. The average moving range for Test 2 is 0.0022. The measurement error (t) for Test 2 is then:

t = R/1.128 = 0.0022/1.128 = 0.002

At this point, we have the two estimates of the measurement error for the tests and the knowledge that both tests are in statistical control. We can move forward with the Deming regression.

Calculating  Lambda

Lambda (l) is the ratio of the measurement errors (as variances) between the two tests:

l = s2/t2

Note that the ratio is X (Test 1) to Y (Test 2). Some sources use the ratio of Y variance to X variance. The equations below involving lambda use the ratio as shown in the equation above, i.e., X to Y. This value is assumed constant in the Deming regression. For our two tests, the value of lambda is:

l = s2/t2 = (0.0016)2/0.002)2 = 0.64

Collecting the Data for Deming Regression

The next step is to collect the data for the Deming regression. Thirty parts that reflect the specification range are collected. Each part is numbered and run in each of the two tests (the supplier’s and the customer’s test). The results are shown in Table 3.

Table 3: Data for 30 Parts from Supplier (X) and Customer (Y)

Part No.XY Part No.XY
11.3271.338 161.2951.294
21.2961.306 171.3811.375
31.3501.337 181.3361.319
41.2991.312 191.3681.382
51.3861.393 201.3261.306
61.3881.393 211.3511.363
71.3351.337 221.3341.350
81.3371.331 231.3681.369
91.2981.294 241.3821.381
101.3521.350 251.3621.370
111.3101.306 261.3601.363
121.3081.313 271.3441.357
131.3321.331 281.3331.337
141.3851.381 291.3151.319
151.2961.300 301.3491.356

 

Performing the Regression

The following equations give the estimates of the coefficients that minimize the Deming regression sum of squares.

where b1 and bo are estimates of b1 and bo respectively, x and y are the means of yi and xi respectively, and

Using  = 1.340 and  = 1.342, the values for u, v, and r for the data in Table 3 are:

u = 0.0252587

v = 0.0272687

r = 0.0251167

With lambda = 0.64, the values of the coefficients are:

b0 = -0.0389

b1 = 1.031

The best fit equation is then given by:

y = -0.0389+ 1.031x

You can get an idea of how good the fit is by plotting Y vs X and the best-fit line. This is done in Figure 3.

Figure 3: Deming Regression Results

Figure 3 shows most of the points are near the best-fit line, so this would imply that there is a correlation between the two test methods. They give comparable results.

Deming Regression Output

In addition to the best-fit equation and Figure 3, there is other output that often accompanies Deming regression in software to help you analyze the results. We will look at the output from the SPC for Excel software which contains Deming regression.

Regression Coefficients

 

Coefficient

Standard Error

df

t Stat

p-value

LCL

UCL

Intercept

-0.0389

0.0652

29

-0.597

0.5552

-0.172

0.0944

X

1.031

0.0484

29

21.31

0.0000

0.932

1.129

 

The table above contains information on the regression coefficients. The “Coefficient” column contains the values of b0 and b1 calculated above. The standard error is estimated using a jackknifing technique. This is a resampling method that is used to estimate the standard error of a statistic, in our case, the coefficients. The technique involves leaving out one observation at a time and calculating the values for each of these. The results can be used to estimate the standard errors.

The rest of the table is the typical regression coefficient table where:

  • df: degrees of freedom, N – 1, where N is the number of parts in the study; some sources recommend using N- 2.
  • t Stat: the t statistic given by coefficient/standard error (i.e., 1.031/.0484 = 21.31).
  • p-value: the probability of getting that t statistic if the null hypothesis is true.
  • LCL: the lower confidence limit.
  • UCL: the upper confidence limit.

Let’s explore the p-value a little more. With the intercept (b0),  we are testing two hypotheses:

H0: b0 = 0

H1: b0 ≠ 0

Ho is the null hypothesis; H1 is the alternative hypothesis.   For the intercept, we calculated a t statistic (-0.597).  The p-value represents the probability that we would get this value for the t statistic if the null hypothesis is true.  If the p-value is sufficiently small (often, ≤ 0.05), we conclude that the probability of getting this t-statistic is small and the null hypothesis is rejected – the intercept is different than 0.  If the p-value is larger, we conclude that the probability of getting this t-statistic is large and the null hypothesis is accepted.  The intercept can equal 0.

The p-value for the intercept is 0.5552. Here, the p value is large, so we accept the null hypothesis and conclude the intercept could be 0.

The same approach is used for the coefficient. We are testing the null hypothesis that the coefficient is 0. The p-value for the coefficient is very small (0.0000). We conclude that we would not get this p-value if the null hypothesis is true, so we reject the null hypothesis and conclude that the coefficient is significantly different from 0.

The p-values can be calculated using the Excel function T.DIST.2T. The two entries are the absolute value of the t- statistic and the degrees of freedom.

Now, let’s explore the confidence limits a little more. The confidence limit for the intercept is -0.172 to 0.0944. Note that this confidence interval contains 0. This means that it is possible for the intercept to be 0. This matches what we concluded from the p-value. The confidence limit for the coefficient is 0.932 to1.129. Note that it does not include 0, so we conclude that the coefficient is significantly different from 0.

The confidence intervals are calculated as b +/- t(SE), where b is the intercept or coefficient, t is the t value and SE is the standard error for b. The t value can be calculated using Excel’s formula T.INV.2T, where the input is alpha value (usually 0.05), and the degrees of freedom.

So, Figure 3, along with the p-value and the confidence intervals imply that the two methods give comparable results.

You can also do hypothesis testing to directly test if the Deming regression slope is 1. The null hypothesis is that b1 = 1 and the alternate hypothesis is that b1 <> 1. The t test statistic in this case is:

t = (b11)/SE(b1)

Likewise, you can directly test that the two means,  and, are the same. In this case, the test statistic is:

t = (yx)  /SE(yx)

The output from the SPC for Excel program is given below for the hypothesis testing.

Hypothesis Testing

 

Parameter

Standard Error

df

t Stat

p-value

LCL

UCL

Slope Test

0.0305

0.0484

29

0.632

0.5326

-0.0838

0.145

Means Test

0.00200

0.00162

29

1.232

0.2280

-0.00184

0.00584

 

You interpret this table the same way you did for the regression coefficients. The hypothesis tests for the slope are:

H0: b1 – 1 = 0

H1: b1 – 1 ≠ 0

Note that the slope test has a high p-value and the confidence limit contains 0.  This means that b1 – 1 can be 0.

The null hypothesis for the means test is that the difference in the two means is 0; the alternate hypothesis is the  difference in the two means is not 0. The means test has a high p-value as well and the confidence interval contains 0. So, you conclude that the means are the same.

The slope test and the mean test further confirm that the two methods are comparable. Note, that if either of these tests fail, the two methods are not comparable.

To further determine how good of a model the Deming regression produces is to see if the residuals are normally distributed. There are a number of ways of doing this. One option in the SPC for Excel program is to determine if the standardized residuals are normally distributed by doing a normal probability plot. Residuals are the difference between the actual values and the predicted values. The standardized residuals are the residuals divided by the standard deviation of the residuals. Figure 4 is a normal probability plot of the standardized residuals.

Figure 4: Normal Probability Plot of Standardized Residuals

If the points lay along the straight line, the residuals are normally distributed.

The conclusion for these two test methods is they produced comparable results.

Summary

This publication introduced Deming regression. This technique is used to determine if two test methods produce comparable results. You start by ensuring both test methods are consistent and predictable and use those results to determine the measurement error and set the value of lambda. You then run paired measurements through both tests and perform the Deming regression to estimate the coefficients as well as perform the hypothesis tests. The slope test and means test confirm if the two methods are comparable.

   

Quick Links

Thanks so much for reading our SPC Knowledge Base. We hope you find it informative and useful. Happy charting and may the data always support your position.

Sincerely,

Dr. Bill McNeese
BPI Consulting, LLC

View Bill McNeese

Connect with Us

guest
6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Cesana Bruno Mario

Very nice introduction to Deming’s regression.
However, the fact that we do not reject the null hypotheses intercept equal to zero and slope equal to 1 is not sufficient for concluding that the two measurement methods are in agreement. Indeed, the 95%CI of the slope (and intercept) are too wide. It is necessary to fix “a priori” two “agreement thresholds” (0.9 and 1.1 for example for the slope) and the conclusion of agreement is possible if the 95%CI are within these thresholds like an equivalence test.

Raphy Bar

What about the The nonparametric Passing-Bablok procedure, which also compares two methods? Will this be incorporated in SPC for Excel?

Cesana Bruno Mario

The Passing – Bablok regression compares two methods by considering them in agreement if the 95%CI of the slope includes 1 and the intercept includes 0.
So, there is the problem I said before of 95% CI too wide deponing for a no agreement that wrongly allows to conclude for the agreement. Furthermore, the testing procedure in the agreement settings is an equivalence test and accordingly the required sample sizes have to be calculated. I think that the Passing and Bablok regression is a procedure that has to be forgot in an agreement analyis.
You can see my papers: Cesana Antonelli Ferraro Critical appraisal of the CLSI guideline EP09c 10.1515_clinical chemistry and laboratory medicine-2024-0595
Cesana Antonelli Sample_Size_for_Agreement_Studies_on_Quantitative variables_Epidemiology Biostatistics and Public Health 2024 for the sample size; paper correcting Shieh’s recent proposal.

Martin Gibson

Interesting article. I think this what Professor Peter Sprent (University of Dundee) called stochastic regression circa 1969.

Scroll to Top