Talk:Confounding

This is the talk page for discussing improvements to the Confounding article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Addition of History Section

Added a section detailing the historical evolution of confounding, up to and including the modern causal interpretations. There are likely other sources that could be added here as well.

Added a paragraph to the causal definition.

Added a paragraph to the control of confounding section.

AForns (talk) 04:17, 20 August 2014 (UTC)[reply]

Reorganization of Introduction

Some elements of the introduction belong in existing sub-sections. I have reorganized several elements to better the flow of the article, including:

The latter part of the introduction starting with: "In the case of risk assessments evaluating the magnitude and nature of risk to human health..." has been moved to the Types of Confounding sub-section.

The concrete example portion of the introduction has been moved to the Examples sub-section.

The segue for the causal definition was moved to the introduction (did not belong in the Examples).

Various simplifications were made to the causal definition.

Moved causal definition and control of confounding below introduction as they are more set-up that should precede the examples.

AForns (talk) 10:20, 11 August 2014 (UTC)[reply]

Inclusion of Causal Definition and Example

Perhaps the first step in bringing the present article up to standards is providing an update that reflects modern understanding of causal calculus. With the community's approval and edits, I propose the following amendments that formalize and clarify the causal notion of confounding (see below):

(added to the end of the present section 2 or the beginning of the proposed section 3, below):

The above correlation-based definition, however, is metaphorical at best – a growing number of analysts agree that confounding is a causal concept, and as such, cannot be described in terms of correlations nor associations ^[1]^[2]^[3] (See causal definition).

3. Causal Definition

The concept of confounding can be formalized, and managed, when information is available about the data generating model (as in the Figure above). To be more specific, let X be some independent variable, Y some dependent variable, and M a causal model that asserts the cause-effect relationships between variables in the system. To estimate the effect of exposure X on outcome Y, the statistician must suppress the effects of extraneous variables that influence both X and Y. We say that, X and Y are confounded by some other variable Z whenever Z is a cause of both X and Y.

In the causal framework, denote $P(y|do(x))$ as the probability of event Y = y under the hypothetical intervention X = x. X and Y are not confounded in causal model M if and only if the following holds:

P(y|do(x))=P(y|x)

(1)

for all values X = x and Y = y, where $P(y|x)$ is the conditional probability upon seeing X = x. Intuitively, this equality states that X and Y are not confounded whenever the observationally witnessed association between them is the same as the association that would be measured in a controlled experiment, with x randomized.

4. Control of Confounding

Consider the scenario of a physician deciding to administer drug X to a patient with gender Z. The physician knows that gender differences influence a patient's choice of drug as well as their chances of recovery. In this scenario, gender Z is a confound of administering drug X on recovery outcome Y since Z is a cause of both X and Y:

Causal diagram of Gender as common cause of Drug use and Recovery

Consequently, we will encounter the inequality:

P(y|do(x))\neq P(y|x)

(2)

since the observational quantity contains information about the correlation between X and Z, and the interventional quantity does not (being an unbiased estimate of the effect of X on Y). Clearly the statistician desires the unbiased estimate, but in cases where only observational data is available, an unbiased estimate can only be obtained by "adjusting" for all confounding factors, namely, conditioning on their various values and averaging the result. In the case of a single confounder Z, this leads to the "adjustment formula":

P(y|do(x))=\sum _{z}P(y|x,z)P(z)

(3)

which gives an unbiased estimate for the causal effect of X on Y. The same adjustment formula works when there are multiple confounders except, in this case, the choice of a set Z of variables that would guarantee unbiased estimates must be done with caution. The criterion for a proper choice of variables is called the Back-Door ^[4]^[5] and requires that the chosen set Z "blocks" (or intercepts) every path from X to Y that ends with an arrow into X. Such sets are called "Back-Door admissible" and may include variables which are not common causes of X and Y, but merely proxies thereof.

Returning to the physician example, since Z complies with the Back-Door requirement (i.e., it intercepts the one Back-Door path X $\leftarrow$ Z $\rightarrow$ Y), the Back-Door adjustment formula is valid:

{\begin{aligned}P(Y={\text{recovered}}\ |\ do(x={\text{give drug}}))&=P(Y={\text{recovered}}\ |\ X={\text{give drug}},Z={\text{male}})P(Z={\text{male}})\\& P(Y={\text{recovered}}\ |\ X={\text{give drug}},Z={\text{female}})P(Z={\text{female}})\end{aligned}}

(4)

In this way the physician can predict the likely effect of administering the drug from observational studies in which the conditional probabilities appearing on the right-hand side of the equation can be estimated by regression.

References

^ Pearl, J., (2009). Simpson's Paradox, Confounding, and Collapsibility In Causality: Models, Reasoning and Inference (2nd ed.). New York, NY, USA: Cambridge University Press.
^ VanderWeele, T.J. & Shpitser, I. (2013). On the definition of a confounder. Annals of Statistics, 41:196-220.
^ Greenland, S., Robins, J. M., & Pearl, J. (1999). Confounding and Collapsibility in Causal Inference. Statistical Science, 14(1), 29–46.
^ Pearl, J., (1993). "Aspects of Graphical Models Connected With Causality," Statistical Science
^ Pearl, J. (2009). Causal Diagrams and the Identification of Causal Effects In Causality: Models, Reasoning and Inference (2nd ed.). New York, NY, USA: Cambridge University Press.

Additionally, I'd be happy to recruit several experts to touch up the other elements of the page, though I believe the above represents the first, best, and gentlest improvement to the article. Your thoughts and comments are appreciated.

AForns (talk) 06:35, 20 July 2014 (UTC)[reply]

three body problem in causation. Some examples may require an arbitrary estimation. (The real world is a special case). I see no errors. It certainly has a generous component of English. 75.156.178.30 (talk) 17:08, 25 July 2014 (UTC)[reply]

Sorry to bring negative energy into this talk page but this article is a complete joke and should be flagged for deletion unless it gets serious help

The terms "negative" and "positive" correlation should never be used as they are relative terms and are not descriptive at all. Much better terms to use are "directly" and "inversely" 64.134.69.103 (talk) 22:21, 13 January 2014 (UTC)[reply]

A correlation near one means "almost directly". A correlation near negative one means "almost inversely". 75.156.178.30 (talk) 16:18, 25 July 2014 (UTC)[reply]

Page still incorrect and more confusing than ever - read this instead

Do not use the Wikipedia page if you want to understand confounding. Some of what is written is correct, but the most critical definition is INCORRECT. I don't have time to fix the page, but here is a simple explanation of the concept of confounding.

A confounder (or confounding variable) is something that is correlated with the independent (causative) variable you are investigating, and causes or prevents the effect (dependent variable) you are investigating. Because it is associated with both of them, it will interfere with the ability of statistical tests to correctly indicate the impact of your causative variable; that is, the confounder will caused biased estimates of the impact of your causative variable.

Note that a true confounder is itself another causative/preventive variable. (Variables that are only correlated with the effect won't cause confounding.) For instance, drinking and smoking are correlated, people who do one tend to do the other. Today we know that tobacco worsens heart disease, but alcohol is protective against heart disease. Tobacco's effect is bigger than alcohol's, so together they cause net harm. Early studies of alcohol use and heart disease indicated that alcohol CAUSED heart disease because researchers had no data on smoking. Once both factors were included (along with other important variables), the truth was understood. In this example, tobacco use was the confounder for early studies investigating the influence of alcohol on heart disease.

I'm a health economist, and we call confounders that, or confounding variables. Never heard the term lurking variable except for Wikipedia. —Preceding unsigned comment added by Scientist99 (talk • contribs) 23:18, 18 January 2008 (UTC)[reply]

Actually, while I agree that this page is atrociously written and wrong, your alternative definition is not any better. Confounding is very easy to define: there is confounding between variables x and y if p(y | x) is not equal to p(y | do(x)). Confounders are very hard to define, and in fact defining them is currently an open problem (I cowrote a paper on the subject currently in review). Many seemingly reasonable definitions of 'confounder' fail. IlyaShpitser (talk) 19:52, 26 October 2011 (UTC)[reply]

Personally, that definition does not help me, because I have no idea what do in p(y|do(x)) means. That is also used in the Wikipedia page and the reason I came here. The Wikipedia page does explain what dependent probability is, which I think is very common. I have never seen do before. At least a reference should be given. DavidFarago (talk) 16:39, 9 February 2023 (UTC)[reply]

The want to merge articles on 'lurking variable', 'counfounding variable' and 'counfounding' came about because of the bias toward pure mathematical knowledge where I believe the term 'Lurking variable' is used more than 'confounding variable' to describe a similar concept. Supposed (talk) 13:23, 22 January 2008 (UTC)[reply]

Fair comment - I propose to re-write this page completely in the next two weeks. I will put my text here first, and if non-one objects strongly, I will replace it. Astaines (talk) 21:54, 2 August 2008 (UTC)[reply]

Hi. I'd be interested in contributing to that re-write so can we make sure any structure for a new article is discussed first. Thanks.Davwillev (talk) 16:37, 6 August 2008 (UTC)[reply]

Ice Cream Murder

Are ice-creams and murder rates really a suitable example for comparison? --74.120.133.109 16:19, 13 August 2007 (UTC)[reply]

If they correlate, then as far as I know they correlate only because of a confounding factor like the weather. So, yes, it's a useful example. 75.156.178.30 (talk) 15:42, 25 July 2014 (UTC)[reply]

Confounding factor should redirect here.--BigaZon 17:02, 18 February 2006 (UTC)[reply]

This is silly. Why are you placing the term 'confounding' under a 'lurking variable' umbrella? It's a term on its own that needs expanding, there could be an article on socio-economic confounding which is a huge topic. I can't find anywhere in the discussion of socio-economic confounding where 'lurking variable' is mentioned. It's a facet of wikipedia. Confounding should have its own article. 152.78.123.52 23:53, 9 May 2006 (UTC)[reply]

No, Confounding factor deals with the statistical use of the term, interchangable with "lurking variable" or "confounding variable", not "socio-economic" confounding... see introductory texts like [http://www.amazon.com/gp/product/0716747731/sr=8-1/qid=1153095480/ref=pd_bbs_1/102-8581294-8937716?ie=UTF8 "The Practice of Statistics"] and many others. Glad this now redirects properly. If you decide to make an article on some other type of confounding, you might want to make a disambiguation page. --BigaZon 00:31, 17 July 2006 (UTC)[reply]

I don't know why lurking variable can't mean unknown confounding variable. There's bound to be some mathematical difference between variables that affect significance and not the confidence interval. (I think I got that right, could be the other way around). 75.156.178.30 (talk) 15:42, 25 July 2014 (UTC)[reply]

Old requested move

I think "confounding variable" or "confound" is more commonly used than "lurking variable". Is anyone opposed to a name change? --Jcbutler 20:18, 11 February 2007 (UTC)[reply]

I agree. RandomMonitor 11:21, 1 March 2007 (UTC)[reply]

Me too - I was looking for information on this subject and I typed "confounding variable" Mrweetoes 21:28, 4 March 2007 (UTC)[reply]

Since "Confounding variable" already exists as a redirect, it requires an administrator to move. I've listed it as a request for move. Whosasking 20:09, 5 March 2007 (UTC)[reply]

I've moved the page, per the request at WP:RM and consensus here. Cheers. -GTBacchus^(talk) 03:05, 11 March 2007 (UTC)[reply]

I noticed calls for improvement to this article from medicine and probability. From a medicine standpoint it seems like it needs to mention the immense effect of confounding variables on medical care and public health policy, with links to the Women's Health Initiative, Social Darwinism, and Evidence Based Medicine. Does that sound about right? Flkevin 02:27, 17 May 2007 (UTC)[reply]

Requested move

Confounding variable → Confounding — I recommend that this article be renamed as 'Confounding'. This should redirect from the 'Confounding variable' and 'Lurking variable' articles at least. This is standard terminology in Epidemiology (my own field) and in the quantitative sociology literature. Anyone know what economists call it? Astaines 23:18, 10 November 2007 (UTC)[reply]

I agree. 'Confounding' can occur in the absence of a known variable.Davwillev (talk) 20:47, 16 January 2008 (UTC)[reply]

I've moved the page, per the above discussion. Please let me know if I can be of further assistance. -GTBacchus^(talk) 05:16, 24 January 2008 (UTC)[reply]

Major edit

Hi, I've rewritten the section on managing confounding. It was confusing and not very relevant. —Preceding unsigned comment added by Astaines (talk • contribs) 00:13, 28 November 2007 (UTC)[reply]

Yes, you've removed many, many interesting things. And the current revision is actually more confusing. For example:

In a typical situation there are far more controls than cases. It is very useful to have a guide for selecting controls.

Uh, what "guide"? I love Wikipedia, but those other people that rob and delete content under the guise of "improvements" are a huge pain. How about contacting the author before doing large changes? I will reinstate the old version of that section.--Keimzelle (talk) 18:45, 15 January 2008 (UTC)[reply]

Examples?

Are the Clever Hans effect, Hawthorne effect and Placebo effect examples of confounding factors? —Preceding unsigned comment added by 159.83.196.1 (talk) 21:55, 8 March 2011 (UTC)[reply]

No. Irrelevant.Jimjamjak (talk) 15:55, 12 December 2011 (UTC)[reply]

Expert-subject

I have moved this Talk contribution to the bottom and replaced the "accuracy" template in the article with an "expert" template as this seems a better one to use for the points here and above. Melcombe (talk) 11:45, 22 July 2011 (UTC)[reply]

I tend to agree with the comments at Talk:Confounding#Page still incorrect and more confusing than ever - read this instead.

Certainly in AP Statistics a distinction is made between lurking variables and confounding variables -- they are not considered to be exactly the same.

See for example the following [http://www.amazon.com/Barrons-Statistics-Martin-Sternstein-Ph-D/dp/0764140892/ref=sr_1_1?ie=UTF8&s=books&qid=1287717188&sr=1-1 book] (Topic 8: Planning and Conducting Experiments)

In terms of this statement:

The methodologies of scientific studies therefore need to control for these factors to avoid a type 1 error; an erroneous 'false positive' conclusion that the dependent variables are in a causal relationship with the independent variable.

When you make a type I error that means that you reject the null hypothesis when the null hypothesis in fact holds. However, the null hypothesis does not necessarily say anything about cause and effect.

So making a type I error does not necessarily mean that the researcher infers a false cause and effect relationship.

Jjjjjjjjjj (talk) 03:18, 22 October 2010 (UTC)[reply]

Correct, type I errors say nothing about cause and effect - that depends on whether causation is posited in the hypothesis. The misunderstanding here, I think, is the statement that the variables are in a causal relationship; the author could have said that the variables are correlated. That said, I'm not sure about the value of the observation. Paulwhaley (talk) 18:44, 30 April 2013 (UTC)[reply]

Risk assessment

I am very confused by the content pertaining to "risk assessment" in this article. It seems to me that the studies that are being described are not risk assessments, but epidemiological studies. The two references cited in the second paragraph are articles describing epidemiological studies. I am not denying that epidemiological data may be extensively used in some kinds of human health risk assessments, but I think that the article presents a very confused view of this fact. If I have misunderstood, please could someone clarify what kind of risk assessment this is actually referring to and cite something appropriate? If not, I would suggest that this content is essentially incorrect and I will remove it - there is a considerable amount of detail in other parts of the article dealing with epidemiological studies.Jimjamjak (talk) 15:39, 12 December 2011 (UTC)[reply]

Peer review

This is a very strange point: "Peer review is a process that can assist in reducing instances of confounding. It is a process of evaluating the provision, work process, or output of an individual or collective operating in the same field as the reviewer(s)." I would suggest that this abbreviated overview of peer review is not relevant to the page. Yes - peer review of an article based on an analysis where confounding variables were not identified may be one way in which that analysis can be improved, but then so would review by a colleague, and I don't think we need to describe that process here.Jimjamjak (talk) 15:51, 12 December 2011 (UTC)[reply]

You answered your own question in "Unidentified confounding variables". Peer review contributes to understanding causation. People have written books about causation. And if you do not get the mathematics, it's a three body problem. Some statisticians may not be aware of relevant stratifications that have been done. So the trick in doing a good job of it is not only getting good raw data, but getting good approximations of the answer to a three body problem in causation. 75.156.178.30 (talk) 16:43, 25 July 2014 (UTC)[reply]

Statistical significance

I would argue that the concept of statistical significance has no relevance here: "These two variables have a positive, and potentially statistically significant, correlation with each other".Jimjamjak (talk) 15:54, 12 December 2011 (UTC)[reply]

Ice cream example

I quite like the use of the ice cream example in this page, but I feel it is rather long-winded at present. I think that the same message could be put across with much less text, using the same example.Jimjamjak (talk) 15:58, 12 December 2011 (UTC)[reply]

Organization

The sections seem to go in no particular order. Also, the section "Decreasing the potential for confounding to occur" overlaps somewhat with "Experimental controls". It'd be helpful if someone could reorganize these sections such that there is more logical flow between them. My suggestion:

Move ice cream example to introduction
Move risk assessment section into examples, supplement with other real-world examples
Move types of confounding section to top, right after intro
Combine the controls section with the decreasing confounding section

Anyone can feel free to do this, or some other reasonable reorganization. Unfortunately, I don't have the time to do it myself right now. 138.16.21.199 (talk) 08:03, 10 December 2012 (UTC)[reply]

Two dubious statements

I have added "dubious" tags to these two statements, for the following reasons:

multivariate analyses reveal much less information about the strength of the confounding variable than do stratification methods.

As far as I've ever known, they find out exactly the same things if done correctly, though one or the other may be more convenient or easier to do in certain contexts.

The best available defense against this possibility [confounding] is often to ... conduct a randomized study of a sufficiently large sample taken as a whole, such that all confounding variables (known and unknown) will be distributed by chance across all study groups.

Unless I'm misunderstanding the intent of this sentence, it's just wrong. If the confounding variable is in fact a confounding variable -- if it is correlated with both the dependent variable and an independent variable of interest -- then this is exactly the approach that leads to spurious results, as the independent variable picks up effects that are really due to the omitted, correlated, confounding variable. Duoduoduo (talk) 18:33, 20 December 2012 (UTC)[reply]

Okay, I see the intent of the second quote above: it is intended to say that a potentially confounding variable is made to be not confounding by making it uncorrelated with the dummy variable for the control group. I'll try to find a way to clarify this in the article. My first dubious-tag discussed above still applies. Duoduoduo (talk) 16:51, 21 December 2012 (UTC)[reply]

I think a property of confounding should be removed

I agree that 2) a confounder C should be associated with variable V and 3) with outcome O indepedently of T, and that 4) should not be in the causal pathway between V and O. However I'm not convinced of 1) marginal association between C and O. Suppose that C is whether a person is blue or green, and that green would have lower outcomes O in absence of the treatment V, but are treated in a higher percentage than the blue, and the treatment is effective (it leads to higher O), so that the two group have the same marginal distribution of O. In this case, condition 2 (treatment V associated with being blue, so with C), 3 (controlling for treatment, being blue associated with lower outcomes, so dependence between C and O after controlling for V) and 4 (treatment does not affect color, so C is not in the causal pathway between V and O) are respected, but 1 doesn't (blue and green have the same marginal distribution of O, so C and O are not marginally associated). However, in this case, the estimate of the effect of the treatment would be biased downward, if we didn't correct for color, because most of the treated would belong to the green, i.e. the group with lower average potential (i.e. conditional to the tretament V) values.

Edit: To be more precise, the 4 conditions maybe are sufficient, but non necessary. Given it may be case C is independent of O conditioning on V, but we don't know how things would be for different values of V (so we should stress that conditioning is not only on factual but also on counterfactual values), I think conditions should be rephrased as: (A) a confounder C should be associated with variable V and (B) with potential outcomes O(V), for each possible value of the variable V, and that (C) should not be in the causal pathway between V and O.

— Preceding unsigned comment added by borisba (talk) 10:49, 16 September 2016 (UTC)[reply]

Removed stunning POV text

Removed stunningly POV text. A big chunk was simply author’s personal agenda meandering waffle, with no references whatsoever. It was truly embarrassing for an encyclopaedia entry.

E.g.

It is not at all ethically obvious that the sufferers of medical conditions should be denied the opportunity to act as philanthropists by being denied the right to participate in such research. Surely it is their informed choice?

— Preceding unsigned comment added by Kevin aylward (talk • contribs) 17:01, 19 January 2019 (UTC)[reply]

Thanks for removing that. Checking the history, that section was added in Jan 2018 by an anonymous editor. Such things do get missed occasionally, unfortunately. --Qwfp (talk) 17:12, 19 January 2019 (UTC)[reply]

Positive and negative confounding

I think the article should mention positive confounding: increasing the observed effect, negative confounding: decreasing the observed effect, and qualitative confounding: a reversal of effect or Simpson's Paradox. Agnerf (talk) 16:31, 23 March 2019 (UTC)[reply]

Would suicide's effect on unemployment be a good example?

Hi All. Early in the article where confounding is being first explained, would the following be a clear example? " being unemployed raises an individual's risk of suicide. People who have a mental illness are more likely to be unemployed. They are also more likely to commit suicide. Thus mental illness is a confounding factor in the relationship between unemployment and suicide. (ref: Blakely TA, Collings SCD, Atkinson J Unemployment and suicide. Evidence for a causal association?Journal of Epidemiology & Community Health 2003;57:863-600. https://jech.bmj.com/content/57/8/863.info) Snowinmelbourne (talk) 10:51, 11 June 2020 (UTC) Richard Snow ("Snowinmelbourne")[reply]

Internal or external validity?

The lead mentions internal validity only, but section Artifacts mentions both, in language that leaves it unclear which one (if not both) was meant. The Crab Who Played With The Sea (talk) 11:54, 10 March 2023 (UTC)[reply]

Causal inference replacing statistics

I have replaced "statistics" with "causal inference" in the short description and in the first sentence of the introduction. Causal inference has some overlap with statistics, but it is interdisciplinary and really is a separate field. (This change involved replacing the link statistics with causal inference.) Johsebb (talk) 17:17, 13 June 2023 (UTC)[reply]

History

I have revised the history section to correct two points relating to statistics:

Fisher used the word "confounding" in his 1935 book "The Design of Experiments" to denote any source of error in his ideal of randomized experiment.

This is incorrect: Fisher was not concerned with "any source of error", but rather with the control of heterogeneity as a kind of nuisance factor, and viewed blocking as a way to deal with it. Confounding is a natural consequence of blocking, and the design challenge is to control confounding by choosing blocks appropriately. As far as I know, this has nothing to do with confounding as applied to causal inference, which is the focus of this article.

According to Vandenbroucke (2004) it was Kish who used the word "confounding" in the modern sense of the word, to mean "incomparability" of two or more groups (e.g., exposed and unexposed) in an observational study.

The use of the term "confounding" in causal inference is not more "modern" than Fisher's use -- confounding with blocks is very much a current area of research. I've reworded this accordingly.

I note that Vandenbroucke says that Fisher's discussion is

quite complicated to read and understand. No less than 40 pages of the 260 pages of his book are devoted to "confounded designs". The book is not chiefly remembered for it.

It's a fairly clear admission that Vandenbroucke doesn't understand Fisher. Moreover, his last assertion is clearly incorrect.

Finally, I have done a bit of re-paragraphing, and I corrected a reference to "Greenland, Pearl and Robins, 1999" (Pearl should be after Robins). Johsebb (talk) 17:22, 15 June 2023 (UTC)[reply]

Lurking variable vs. confounding variable

Some others online say that lurking variables affect both the explanatory and response variables, while confounding variables only affect the response variable. This article does not talk about this distinction, and in general this article should be reworked.

I also have no clue what I'm talking about and I'm not qualified to talk about statistics, but I would like someone to deal with this. ItsTact (talk) 15:06, 8 November 2023 (UTC)[reply]

[1] Pearl, J., (2009). Simpson's Paradox, Confounding, and Collapsibility In Causality: Models, Reasoning and Inference (2nd ed.). New York, NY, USA: Cambridge University Press.

[2] VanderWeele, T.J. & Shpitser, I. (2013). On the definition of a confounder. Annals of Statistics, 41:196-220.

[3] Greenland, S., Robins, J. M., & Pearl, J. (1999). Confounding and Collapsibility in Causal Inference. Statistical Science, 14(1), 29–46.

[4] Pearl, J., (1993). "Aspects of Graphical Models Connected With Causality," Statistical Science

[5] Pearl, J. (2009). Causal Diagrams and the Identification of Causal Effects In Causality: Models, Reasoning and Inference (2nd ed.). New York, NY, USA: Cambridge University Press.

[1]

[2]

[3]

[4]

[5]