# Causation

## The Axiomatic And Epistemological Turn: 1985–2004

Although there will always be those unwilling to give up on a reductive analysis of causation, by the mid-1980s it was reasonably clear that such an account was not forthcoming. What has emerged as an alternative, however, is a rich axiomatic theory that clarifies the role of manipulation in much the way Woodward wants and connects rather than reduces causation to probabilistic independence, as Nancy Cartwright insisted. The modern theory of causation is truly interdisciplinary and fundamentally epistemological in focus. That is, it allows a rigorous and systematic investigation of what can and cannot be learned about causation from statistical evidence. Its intellectual beginnings go back to at least the early twentieth century.

### Path analysis.

Sometime around 1920 the brilliant geneticist Sewall Wright realized that standard statistical tools were too thin to represent the causal mechanisms he wanted to model. He invented "path analysis" to fill the gap. Path analytic models are causal graphs (like those shown in Figs. 1 and 2) that quantify the strength of each arrow, or direct cause, which allowed Wright to quantify and estimate from data the relative strength of two or more mechanisms by which one quantity might affect another. By midcentury prominent economists (e.g., Herbert Simon and Herman Wold) and sociologists (e.g., Hubert Blalock and Otis Dudley Duncan) had adopted this representation. In several instances they made important contributions, either by expanding the representational power of path models or by articulating how one might distinguish one causal model from another with statistical evidence.

Path models, however, did nothing much to help model the asymmetry of causation.

In the simplest possible path model representing that X is a cause of Y (Fig. 3), we write Y as a linear function of X and an "error" term (that represents all other unobserved causes of Y besides X. The real-valued coefficient β quantifies X's effect on Y. Nothing save convention, however, prevents us from inverting the equation and rewriting the statistical model as:
X = αY + δ, where α = 1/β and δ = −1/β ε

Figure 4. The asymmetry of common cause and common effect
SOURCE: Courtesy of the author

This algebraically equivalent model makes it appear as if Y is the cause of X instead of vice versa. Equations are symmetrical, but causation is not.

### Philosophy.

In the early 1980s two philosophers of causation, David Papineau and Daniel Hausman, paying no real attention to path analysis, nevertheless provided major insights into how to incorporate causal asymmetry into path models and probabilistic accounts of causation. Papineau, in a 1985 article titled "Causal Asymmetry," considered the difference between (1) two effects of a common cause and (2) two causes of a common effect (Fig. 4). He argued that two effects of a common cause (tar-stained fingers and lung cancer) are associated in virtue of having a common cause (smoking) but that two causes of a common effect (smoking and asbestos) are not associated in virtue of having a common effect (lung cancer). In fact he could have argued that the two effects of a common cause C are associated in virtue of C, but are independent conditional on C, whereas the two causes of a common effect E are not associated in virtue of E but are associated conditional on E.

Daniel Hausman, in a 1984 article (and more fully in a 1998 book Causal Asymmetries), generalized this insight still further by developing a theory of causal asymmetry based on "causal connection." X and Y are causally connected if and only if X is a cause of Y, Y a cause of X, or there is some common cause of both X and Y. Hausman connects causation to probability by assuming that two quantities are associated if they are causally connected and independent if they are not. How does he get the asymmetry of causation? By showing that when X is a cause of Y, anything else causally connected to X is also connected to Y but not vice versa.

Papineau and Hausman handle the asymmetry of causation by considering not just the relationship between the cause and effect but rather by considering the way a cause and effect relate to other quantities in an expanded system. How does this help locate the asymmetry in the path analytic representation of causation? First, consider the apparent symmetry in the statistical model in Figure 3. X and are not causally connected and have Y as a common effect. Thus following both Papineau and Hausman, we will assume that X and are independent and that in any path model properly representing a direct causal relation C → E, C and the error term for E will be independent. But now consider the equation X = α Y + δ, which we used to make it appear that Y → X. Because of the way is defined, Y and will be associated, except for extremely rare cases.

### Statistics and computer science.

Path analytic models have two parts, a path diagram and a statistical model. A path diagram is just a directed graph, a mathematical object very familiar to computer scientists and somewhat familiar to statisticians. As we have seen, association and independence are intimately connected to causation, and they happen to be one of the fundamental topics in probability and statistics.

Paying little attention to causation, in the 1970s and early 1980s the statisticians Phil Dawid, David Spiegelhalter, Nanny Wermuth, David Cox, Steffen Lauritzen, and others developed a branch of statistics called graphical models that represented the independence relationships among a set of random variables with undirected and directed graphs. Computer scientists interested in studying how robots might learn began to use graphical models to represent and selectively update their uncertainty about the world, especially Judea Pearl and his colleagues at the University of California, Los Angeles (UCLA). By the late 1980s Pearl had developed a very powerful theory of reasoning with uncertainty using Bayesian Networks and the Directed Acyclic Graphs (DAGs) attached to them. Although in 1988 he eschewed interpreting Bayesian Networks causally, Pearl made a major epistemological breakthrough by beginning the study of indistinguishability. He and Thomas Verma characterized when two Bayesian Networks with different DAGs entail the same independencies and are thus empirically indistinguishable on evidence consisting of independence relations.

### Philosophy again.

In the mid-1980s Peter Spirtes, Clark Glymour, and Richard Scheines (SGS hereafter), philosophers working at Carnegie Mellon, recognized that path analysis was a special case of Pearl's theory of DAGs. Following Hausman, Papineau, Cartwright, and others trying to connect rather than reduce causation to probabilistic independence, they explicitly axiomatized the connection between causation and probabilistic independence in accord with Pearl's theory and work by the statisticians Harry Kiiveri and Terrence Speed. Their theory of causation is explicitly nonreductionist. Instead of trying to define causation in terms of probability, counterfactuals, or some other relation, they are intentionally agnostic about the metaphysics of the subject. Instead, their focus is on the epistemology of causation, in particular on exploring what can and cannot be learned about causal structure from statistics concerning independence and association. SGS formulate several axioms connecting causal structure to probability, but one is central:

Causal Markov Axiom: Every variable is probabilistically independent of all of its noneffects (direct or indirect), conditional on its immediate causes.

The axiom has been the source of a vigorous debate (see the British Journal for the Philosophy of Science between 1999 Figure 5. Ideal interventions in SGS theory
SOURCE: Courtesy of the author
and 2002), but it is only half of the SGS theory. The second half involves explicitly modeling the idea of a manipulation or intervention. All manipulability theories conceive of interventions as coming from outside the system. SGS model an intervention by adding a new variable external to the system that

1. is a direct cause of exactly the variable it targets and
2. is the effect of no variable in the system

and by assuming that the resulting system still satisfies the Causal Markov Axiom.

If the intervention completely determines the variable it targets, then the intervention is ideal. Since an ideal intervention determines its target and thus overrides any influence the variable might have gotten from its other causes, SGS model the intervened system by "x-ing out" the arrows into the variable ideally intervened upon. In Figure 5a, for example, we show the causal graph relating room temperature and wearing sweaters. In Figure 5b we show the system in which we have intervened upon room temperature with I1 and in Figure 5c the system after an ideal intervention I2 on sweaters on.

This basic perspective on causation, elaborated powerfully and presented elegantly by Judea Pearl (2000), has also been adopted by other prominent computer scientists (David Heckerman and Greg Cooper), psychologists (Alison Gopnik and Patricia Cheng), economists (David Bessler, Clive Granger, and Kevin Hoover), epidemiologists (Sander Greenland and Jamie Robins), biologists (William Shipley), statisticians (Steffen Lauritzen, Thomas Richardson, and Larry Wasserman), and philosophers (James Woodward and Daniel Hausman).

How is the theory epistemological? Researchers have been able to characterize precisely, for many different sets of assumptions above and beyond the Causal Markov Axiom, the class of causal systems that is empirically indistinguishable, and they have also been able to automate discovery procedures that can efficiently search for such indistinguishable classes of models, including models with hidden common causes. Even in Figure 6. Non-screening off
SOURCE: Courtesy of the author
such cases, we can still sometimes tell just from the independencies and associations among the variables measured that one variable is not a cause of another, that two variables are effects of an unmeasured common cause, or that one variable is a definite cause of another. We even have an algorithm for deciding, from data and the class of models that are indistinguishable on these data, when the effect of an intervention can be predicted and when it cannot.

Like anything new, the theory has its detractors. The philosopher Nancy Cartwright, although having herself contributed heavily to the axiomatic theory, has been a vocal critic of its core axiom, the Causal Markov Axiom. Cartwright maintains that common causes do not always screen off their effects. Her chief counterexample involves a chemical factory, but the example is formally identical to another that is easier to understand. Consider a TV with a balky on/off switch. When turned to "on," the switch does not always make the picture and sound come on, but whenever it makes the sound come on, it also makes the picture come on (Fig. 6). The problem is this: knowing the state of the switch does not make the sound and the picture independent. Even having been told that the switch is on, for example, also being told that the sound is on adds information about whether the picture is also on.

The response of SGS and many others (e.g., Hausman and Woodward) is that it only appears as if we do not have screening off because we are not conditioning on all the common causes, especially those more proximate to the effects in question. They argue that we must condition on the Circuit Closed, and not just on the Switch, in order to screen off Sound and Picture.

A deeper puzzle along these same lines arises from quantum mechanics. A famous thought experiment, called the Einstein-Podolosky-Rosen experiment, considered a coupled system of quantum particles that are separated gently and allowed to diverge. Each particle is in superposition, that is, it has no definite spin until it is measured. J. S. Bell's famous inequality shows that no matter how far apart we allow them to drift, the measurements on one particle will be highly correlated with the other, even after we condition on the state of the original coupled system. There are no extra hidden variables (common causes) we could introduce to screen off the measurements of the distant particles. Although the details are quite important and nothing if not controversial, it looks as if the Causal Markov Axiom might not hold in quantum mechanical systems. Why it should hold in macroscopic systems when it might not hold for their constituents is a mystery.

The SGS model of an intervention incorporates many controversial assumptions. In a 2003 tour de force, however, James Woodward works through all the philosophical reasons why the basic model of intervention adopted by the interdisciplinary view is reasonable. For example, Woodward considers why a manipulation must be modeled as a direct cause of only the variable it targets. Not just any manipulation of our roomful of sweater-wearing people will settle the question of whether sweater wearing causes the room temperature. If we make people take off their sweaters by blowing super-hot air on them—sufficient to also heat the room—then we have not independently manipulated just the sweaters. Similarly, if we are testing to see if confidence improves athletic performance, we cannot intervene to improve confidence with a muscle relaxant that also reduces motor coordination. These manipulations are "fat hand"—they directly cause more than they should.

Woodward covers many issues like this one and develops a rich philosophical theory of intervention that is not reductive but is illuminating and rigorously connects the wide range of ideas that have been associated with causation. For example, the idea of an independent manipulation illuminates and solves the problems we pointed out earlier when discussing the counter-factual theory of causation. Instead of assessing counterfactuals like (1) George would not have plunged into the East River had he not jumped off the Brooklyn Bridge and (2) George would not have jumped off the bridge had he not plunged into the East River, we should assess counterfactuals about manipulations: (1) George would not have plunged into the East River had he been independently manipulated to not jump off the Brooklyn Bridge and (2) George would not have jumped off the bridge had he been independently manipulated not to have plunged into the East River. The difference is in how we interpret "independently manipulated." In the case of 2 we mean if we assign George to not plunging but leave everything else as it was, for example, if we catch George just before he dunks. In this way of conceiving of the counterfactual, George would have jumped off the bridge, and so we can recover the asymmetry of causation once we augment the counterfactual theory with the idea of an independent manipulation, as Woodward argues.