6 minute read

# Probability

## Interpretations Of Probability

The classical interpretation, historically the first, can be found in the works of Pascal, Huygens, Bernoulli, and Leibniz, and it was famously presented by Laplace (1814). It assigns probabilities in the absence of any evidence and in the presence of symmetrically balanced evidence. In such circumstances, probability is shared equally among all the possible outcomes—the so-called principle of indifference. Thus, according to the classical interpretation, the probability of an event is simply the fraction of the total number of possibilities in which the event occurs. This interpretation was inspired by, and typically applied to, games of chance that by their very design create such circumstances—for example, the classical probability of a fair die landing with an even number showing up is 3/6. Notoriously, the interpretation falters when there are competing sets of possible outcomes. What is the probability that the die lands 6 when tossed? If we list the possible outcomes as {1, 2, 3, 4, 5, 6}, the answer appears to be 1/6. But if we list them as {6, not-6}, the answer appears to be 1/2.

The logical interpretation retains the classical interpretation's idea that probabilities are determined a priori by the space of possibilities. But the logical interpretation is more general in two important ways: the possibilities may be assigned unequal weights, and probabilities can be computed whatever the evidence may be, symmetrically balanced or not. Indeed, the logical interpretation seeks to determine universally the degree of support or confirmation that a piece of evidence E confers upon a given hypothesis H. Rudolf Carnap (1950) thus hoped to offer an "inductive logic" that generalized deductive logic and its relation of "implication" (the strongest relation of support).

A central problem with Carnap's program is that changing the language in which hypotheses and items of evidence are expressed will typically change the confirmation relations between them. Moreover, deductive logic can be characterized purely syntactically: one can determine whether E implies H, or whether H is a tautology, merely by inspecting their symbolic structure and ignoring their content. Nelson Goodman showed, however, that inductive logic must be sensitive to the meanings of words, for syntactically parallel inferences can differ wildly in their inductive strength. So inductive logic is apparently not of a piece with deductive logic after all.

Frequency interpretations date back to Venn (1876). Gamblers, actuaries, and scientists have long understood that relative frequencies are intimately related to probabilities. Frequency interpretations posit the most intimate relationship of all: identity. Thus, the probability of heads on a coin that lands heads in 7 out of 10 tosses is 7/10. In general, the probability of an outcome A in a reference class B is the proportion of occurrences of A within B.

Frequentism still has the ascendancy among scientists who seek to capture an objective notion of probability independent of individuals' beliefs. It is also the philosophical position that lies in the background of the classical approach of Ronald A. Fisher, Jerzy Neyman, and Egon S. Pearson that is used in most statistics textbooks. Frequentism faces some major objections, however. For example, a coin that is tossed exactly once yields a relative frequency of heads of either 0 or 1, whatever its true bias—the infamous problem of the single case. Some frequentists (notably Hans Reichenbach and Richard von Mises) go on to consider infinite reference classes of hypothetical occurrences. Probabilities are then defined as limiting relative frequencies in infinite sequences of trials. If there are in fact only finitely many trials of the relevant type, this requires the actual sequence to be extended to a hypothetical or virtual infinite sequence. This creates new difficulties. For instance, there is apparently no fact of the matter of how the coin in my pocket would have landed if it had been tossed once, let alone an indefinitely large number of times. A well-known problem for any version of frequentism is that relative frequencies must be relativized to a reference class. Suppose that you are interested in the probability that you will live to age eighty. Which reference class should you consult? The class of all people? All people of your gender? All people who share your lifestyle? Only you have all these properties, but then the problem of the single case returns.

Propensity interpretations, like frequency interpretations, regard probability as an objective feature of the world. Probability is thought of as a physical propensity or disposition or tendency of a given type of physical situation to yield an outcome of a certain kind, or to yield a long-run (perhaps infinite) relative frequency of such an outcome. This view, which originated with Karl Popper (1959), was motivated by the desire to make sense of single-case probability attributions, particularly those found in quantum mechanics, on which frequentism apparently foundered (see Gillies for a useful survey).

A prevalent objection is that it is not informative to be told that probabilities are propensities. For example, what exactly is the property in virtue of which this coin, when suitably tossed, has a "propensity" of 1/2 to land heads? Indeed, some authors regard it as mysterious whether propensities even obey the axioms of probability in the first place. To the extent that propensity theories are parasitic on long-run frequencies, they also seem to inherit some of the problems of frequentism.

Subjectivist interpretations, pioneered by Frank P. Ramsey (1926) and Bruno de Finetti (1937), regard probabilities as degrees of belief, or credences, of appropriate agents. These agents cannot be actual people, since, as psychologists have repeatedly shown, people typically violate probability theory in various ways, often spectacularly so. Instead, we have to imagine the agents to be ideally rational. Ramsey thus regarded probability theory to be the "logic of partial belief." Underpinning subjectivism are so-called Dutch Book arguments. They begin by identifying agents' degrees of belief with their betting dispositions, and they then prove that anyone whose degrees of belief violate the axioms of probability is "incoherent"—susceptible to guaranteed losses at the hands of a cunning bettor. Equally important, but often neglected, is the converse theorem that adhering to the probability axioms protects one from such an ill fate. Subjectivism has proven to be influential, especially among social scientists, Bayesian statisticians, and philosophers.

A more general approach, again originating with Ramsey, begins with certain axioms on rational preferences—for example, if you prefer A to B and B to C, then you prefer A to C. It can be shown that if you obey these axioms, then you can be represented by a probability function (encapsulating your credences about various propositions) and a utility function (encapsulating the strengths of your desires that these propositions come about). This means that you will rate the choice worthiness of an action open to you according to its expected utility—a weighted average of the various utilities of possible outcomes associated with that action, with the corresponding probabilities providing the weights. This is the centerpiece of decision theory.

Radical subjectivists such as de Finetti recognize no constraints on initial (or "prior") subjective probabilities beyond their conforming to axioms (A1) to (A3). But they typically advocate a learning rule for updating probabilities in the light of new evidence. Suppose that you initially have credences given by a probability function Pinitial, and that you become certain of E (where E is the strongest such proposition). What should your new probability function Pnew be? The favored updating rule among Bayesians is conditionalization, where Pnew is related to Pinitial as follows:
(Conditionalization) Pnew(X) = Pinitial(X|E) (provided Pinitial(E) ≥ 0)

Radical subjectivism has faced the charge of being too permissive. It apparently licenses credences that we would ordinarily regard as crazy. For example, you can assign without its censure a probability of 0.999 to your being the only thinking being in the universe—provided that you remain coherent (and update by conditionalization). It also seems to allow fallacious inference rules, such as the gambler's fallacy (believing, for instance, that after a surprisingly long run of heads, a fair coin is more likely to land tails). A standard defense (e.g., Howson and Urbach) appeals to famous convergence-to-truth and merger-of-opinion results. Their upshot is that in the long run, the effect of choosing one prior rather than another is attenuated: successive conditionalizations on the evidence will, with probability 1, make a given agent eventually converge to the truth, and thus initially discrepant agents eventually come to agreement. Some authors object that these theorems tell us nothing about how quickly the convergence occurs; in particular, they do not explain the unanimity that we in fact often reach, and often rather rapidly.