Talk:Deming regression
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Why Calculate Deming Regression
editCan someone explain in the article why Deming regression is useful? It seems more logical to calculate error for both X and Y than just for Y. But, does it actually make a better fit? In what cases does it perform better than OLS?
How to Calculate Deming Regression
editThis is a request made after much frustration with this topic: Would someone who knows how to calculate a Deming regression please share that information? What is given in this Deming article is the standard text book explanation. Frankly, despite 25 years as a professional clinical chemist, I have yet to find anyone who can actually do the Deming regression. Can anyone help? Otherwise, discussion of something like Deming regression that nobody in real life can actually do is a futile theoretical undertaking. Is there a "spreadsheet" available to do it?
I speculate that the Deming line is the average of the slopes and intercepts of the two regression lines that one gets when one calculates regression lines for the same X vs Y data upon reversing the axes (X vs Y, then Y vs X). But then, I am not sure.—Preceding unsigned comment added by Realusernamesareallused (talk • contribs) 21:05, 8 April 2007
try this article
Incorrect Least-Squares Regression Coefficients in Method-Comparison Analysis P. Joanne Cornbleet and Nathan Gochman Clin Chem. 1979 Mar;25(3):432-438.
See http://peltiertech.com/WordPress/deming-regression/ ; jon has afree utility for excel that you can download. Havn't tried it myself, but jon's stuff is usually pretty good. Beyond that, why on earth do you care how the calculation is done ? If, like me, you don't have the math skills to do it, either you get a program (and there a lot out there for a few hundred bucks) or you get someone to do it. the idea you or I should understand the math is silly.
The easiest way to think of this calculation is when the x and y variables have the same uncertainity, that is,
In which case, the errors are measured perpendicularly to the fitted line. (SURELY THIS IS WRONG. IF X AND Y VARIABLES HAVE THE SAME UNCERTAINTY, THEN ERRORS ARE MEASURED AT 45° TO THE HORIZONTAL. THEY ARE ONLY MEASURED PERPENDICULAR TO THE FITTED LINE IF THE FITTED LINE HAS A GRADIENT OF 1 OR -1. THE FIGURE IN THE ARTICLE IS ALSO WRONG, I THINK) The usual linear regression has the error measured vertically (y-direction), or sometimes, horizontally, in the x-direction: just swap the usage of x and y for that. I have routinely used perpendicular errors when I can't say that either x or y values are "exact".
In the usual ritual, suppose you have many measurements of weight and height. You could take the height measurements to the nearest inch (or centimetre) and calculate average weights for each grouping. With many observations in each grouping, and narrow groups, you could say that there is a uniform distribution of heights inside each group and thus the average height of each such group would be very close to the middle of that group's span. (Say values of 60.15233, 60.49335, 60.3031, 60.7799 to indicite excessive precision of measurement in the height group 60 to 61, whose span middle is 60.5 You should be careful about the measurement and the subsequent grouping and rounding: does a group named 60 represent values 59.5 to 60.5, or, 60 to 61? For example, a name value of 2011 represents a time span centred in the middle of the year.) For narrow groups, one would presume that the deviation from uniform spread would be slight. So then you say that the expected value of the height in each group is exactly the middle of the span.
So, you now say that the grouped data are a set of exact x-variables (because the span of each group is exact although the measurements falling into them are inexact), with a y-variable (average weight) along which you will assess variation for the usual linear regression. You could do this the other way round, and group the weight measurements instead, in which case the weight grouping would become exact and the heights in each weight group averaged. There is often no reason beyond habit or chance to prefer one way to the other, though with a bit of thought, a rationale may arise.
But you could also perform the regression with the full set of individual measurements. This involves many more numbers, but computers are usually good at that. If so, you can no longer honestly claim exactness in one axis. You can ignore this point, or, assert that one axis has measurements far more precise than the other (and this may be so), and stick with the standard linear regression: many do so anyway. Or use the perpendicular method, or finally, equipped with knowledge of the ratio of the accuracy in x and y, and it not close to one, use the full method. Remember that the ratio will depend on the units employed. The same heights may be measured in inches, feet, centimetres, metres, millimetres or furlongs; whatever. This will affect the scale along the corresponding axis even though the data are the same. This ratio is of the relative accuracies of the individual (x,y) measurements (and in principle, different measurements in the set could have different accuracies; this escalates to weighting each observation according to its reliability), presumed the same (or close enough the same) for all (x,y) values, and not to the variation of the variables across their ranges. That is, x values might be measured accurately or less accurately, and quite separately x values may range over a wide or narrow span.NickyMcLean (talk) 22:38, 7 September 2011 (UTC)
Hi, I love the article, it seems like if you solve the minimization problem listed though you get two solutions for beta1,
Why only take the positive solution? If you are using this for clinical instrument validation that makes sense (since if your correlation is negative your instruments are clearly not alligned), but is there a better reason not to use the negative solution in a more general regression context?
Hi, the formula calculates generally the extrema. The positive solution is the place of the minimum of the squered sum of deviations. The negative one is the maximum, "the worst solution". Regards: Imre
— Preceding unsigned comment added by Boros.i (talk • contribs) 05:27, 5 May 2014 (UTC)
I'm attempting to implement Deming Regression, but am not particularly skilled with statistics, and thus find the solution section of the page rather inadequate. I've determined that the and values are sample variances, and that is a sample covariance (they would be the population variances and covariance if the data was the entire population, and the divisor was N, rather than N-1). It would greatly help less sophisticated readers if the page said that, especially since there is a page showing how to calculate them.
Regarding the calculation of :
- Shouldn't the denominator include a factor? It is included in the inverse calculation of .
- More generally, the calculation is more than a bit opaque. I would be helped by an English explanation, geometric justification and/or any refactoring of the formula.
Compliant by 120.149.105.18
editUser:120.149.105.18 made an edit, Special:Diff/1187893747, that states for the "Solution" section:
The equation below for the slope is faulty: see Linnet, K. Estimation of the linear relationship between the measurements of two methods with proportional errors. Statistics in Medicine 1990, 9 (12), 1463-1473. DOI: https://doi.org/10.1002/sim.4780091210 and NCSS Statistical Software. Deming Regression. https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Deming_Regression.pdf https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Deming_Regression.pdf (accessed December, 2023).
I don't have the energy right now to check the references. Well, I did click on the NCSS one, but as James Synge said, the very opaque way the article calculates "the following quantities" does not let my hungry brain to line them up with the NCSS formulae quickly. Artoria2e5 🌉 05:16, 9 March 2024 (UTC)