Saturday, March 18, 2017

Causal inference (2): Confounding and adjustment

In my last post, I reviewed, in a non math-y way, Judea Pearl's definition of causality in terms of action: setting a variable from one value to another while leaving other variables in the system constant (Pearl, 2000; 2009).  Defining causality in terms of action implies that causality is different from association, which means that the concepts of association, such as correlation, regression, and adjustment, can never, by themselves, establish causality.  We can only identify a particular causal relationship by making assumptions, and our causal inferences are only as good as our justifications for these assumptions.

In this post, I will extend these ideas to multiple-variable designs and explain how we can estimate causal effects even in the presence of spurious causal influences.


Multiple variable designs


Each additional variable we add to a given model increases dramatically the number of available causal relationships.  Continuing an example from my last post, let's assume that we're still interested in whether ice cream attitudes cause ice cream purchases, but, having just read about the Theory of Planned Behavior (Ajzen, 1991), we allow that it's pretty unlikely that attitudes are the only psychological variable that cause purchases.  We therefore add two additional psychological variables to our model: norms (to what extent do others view ice cream consumption as acceptable?) and perceived control (to what extent do we believe that we can control our ice cream consumption?).  Unfortunately, we haven't actually read any of Ajzen's writings about the Theory of Planned Behavior, so we have no idea of the precise causal relationships between our four variables.

This is a problem.  Given a state of ignorance, here is a graph of the possible causal relationships between our four variables:


This is not even exhaustive: Not shown are the six causal relationships between the unmeasured causes of attitudes, norms, perceived control and ice cream purchases.  In total, there are 18 possible causal relationships between our four variables!

Even if we assume that we are really only interested in estimating one of these 18 relationships, our problem is completely intractable -- unless we are able to delete some of these 17 extraneous relationships by assuming that they are zero.  In other words, we must make a large number of causal assumptions to make our problem tractable, and each of these assumptions must be justified based on prior knowledge.  This is a formidable task, and we are unlikely to reach a point where we can justifiably delete all 17 paths besides the one of interest.


Identification


Fortunately, we are not limited to only estimating causal effects in systems where our cause and effect variables are isolated from everything else.  Given a particular graph, Tian and Pearl's identification condition (Tian & Pearl, 2002), as well as a variety of other criteria that are subsumed as special cases (e.g., the Causal Markov Condition), can tell us whether a given causal effect is identifiable from non-experimental data generated by that causal structure.

The intuition behind identification is that, although there is a bright line separating causal concepts from associative concepts, causal influences have associative consequences, and there are certain situations in which only one associative structure can be generated from a particular set of causal influences.  Once we have determined that only one associative structure can be generated from a set of causal assumptions, we can use standard techniques for analyzing associative data, such as multiple regression, to estimate our causal effect of interest.


Identification through adjustment


One way of determining whether identification is possible hinges on the idea of adjustment.  That is, this method asks the following question: For a given causal effect, is there a set of variables in our causal graph such that, if we observe those variables and adjust our expectations accordingly, the adjustment process allows us to uniquely estimate our causal effect?

Adjustment relies on the notion of conditional independence, which is the idea that sometimes observing a particular variable gives us enough information about other variables that, once we have accounted for that information, the other variables are essentially random with respect to each other.  In the context of linear relationships between variables, adjustment corresponds precisely to the idea of "controlling for" extraneous influences in linear regression.

The assumptions embodied in causal graphs sometimes imply relationships of conditional independence.  For example, let's say that we finally get around to reading Ajzen's work and come up with the following causal graph (even though we read it, evidently we didn't internalize Ajzen's work very well).

"U" with a subscript indicates an unknown cause

We can find variables whose observation brings about relationships of conditional independence by finding variables that, if observed and accounted for, "block" the associative influence of a particular causal pathway.  There are three rules determining the effects of adjusting for a particular variable:

(1) If a variable emits an arrow along the pathway, adjusting for that variable blocks the pathway
(2) "Colliders", or variables that are sandwiched between two inward-facing arrows (→variable←) are blocked by default
(3) Adjusting for either a collider or one of its effects unblocks the pathway

The third rule might seem a counter-intuitive.  What's happening that if two causes both create an effect, observing the value of that effect tells you something about the likely value of both causes, creating dependence.  For example, we know that both joy and sadness can cause tears.  Given that you have observed someone crying, that observation increases the likelihood of both joy and sadness.  This result is known as Berkson's paradox or conditioning on a collider.

To illustrate these rules in action: pathway UA ⟷ UN → Norms → Purchases ← Uis blocked as long we do not adjust for the collider, purchases.  Adjusting for purchases or its effect, perceived control, unblocks this pathway.  In contrast, the Attitudes → Purchases → Control pathway is unblocked.  Adjusting for purchases blocks this pathway.

These rules allow us to select variables that allow the identification of a particular causal pathway.  For example, if we are interested in identifying the Attitudes → Purchases causal relationship, we can trace the unblocked pathways that end in an arrow pointing to attitudes to determine the causal influences confounding our causal effect of interest.  Selecting nodes that block all of these extraneous pathways gives us a set of nodes that allow identification of our target pathway.  If these nodes do not exist, identification may not be possible.

This is the intuition behind the back-door criterion, which tells us that a set of variables identifies a causal pathway between $X$ and $Y$ if:

(1) None of the variables is an effect of $X$
(2) The variables block all "back-door" pathways from $Y$ to $X$ (i.e., all pathways that end in an arrow pointing into $X$)

In our model above, there are two back-door pathways from purchases to attitudes:

Purchases ← Norms → Attitudes
Purchases ← Norms ← U⟷ UA → Attitudes

Adjusting for norms is sufficient to block both of these pathways, which identifies the causal relationship between attitudes and purchases.  In the context of linear relationships, adjusting for norms corresponds to adding norms to a linear model that predicts purchases from attitudes.

If you have taken a course in experiment design, you might be thinking that I just pulled a fast one on you.  After all, confounding is supposed to be the kiss of death for causal inference, even in experimental contexts.  What makes adjustment special here?

What gives adjustment its power is not any sort of statistical law, but rather the set of causal assumptions that govern our causal graph.  As I emphasized in my first post, causal inference is only as sound as the assumptions upon which it rests.  If our assumptions are flawed, our inferences will be flawed.

In multiple-variable designs, we must make a huge number of causal assumptions to allow any causal relationship to be identified.  Those assumptions need to be justified out in the open on the basis of prior knowledge.  So indeed, confounding can be the kiss of death for causal inference in experimental contexts because it can render the assumptions that we can usually make in these contexts (e.g., uncorrelated errors) untenable.

Conclusion


In multiple variable designs, the number of potential causal relationships increases dramatically above what is possible in the two-variable case.  This can make the identification of causal relationships intractable unless we are willing to make a large number of causal assumptions.

Given a sufficient number of assumptions, it is sometimes possible to use criteria like the back-door criterion to identify variables that, if adjusted for, allow the identification of causal relationships.  Even in these situations, our inferences are only as good as the assumptions upon which the inferences rest.


References

Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179–211.

Pearl, J. (2000). Causality: models, reasoning, and inference. Cambridge, U.K. ; New York: Cambridge University Press.

Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146.

Tian, J., & Pearl, J. (2002, July). A general identification condition for causal effects. In AAAI/IAAI (pp. 567-573).

2 comments:

  1. Great set of blog posts. I'm reading Book of Why but I find this treatment much more accessible. Thanks!

    ReplyDelete