Susan Haack is generally a good philosopher (I interviewed her a few years ago). She’s made important strides in unifying disparate positions in epistemology (and I am very fond of unification efforts in philosophy: I think they are generally on the right track, in every domain; as for example my unification of moral theory). But every philosopher makes a boner mistake or two (me included; and I’m always happy to discover and correct them). And for Haack’s part, it’s that she doesn’t understand Bayesian epistemology. So much so, that she ironically denounces it, while going on to insist we replace it with her otherwise-correct foundherentist epistemology…which can be entirely described by Bayes’ Theorem.

This is a common mistake people make who don’t like Bayes’ Theorem. I made it myself, repeatedly, when I was just as hostile to Bayes’ Theorem as an epistemological model, until I realized I was doing it…and then I became a Bayesian. All of us who make that boner mistake say “Bayes’ Theorem entails x; x is bad; if we fix x, we get y,” and unbeknownst to us, y is just a restatement of Bayes’ Theorem. This happens because the person who thinks BT entails x, is wrong. And if they understood how in fact it does not entail x, they would see how y is just a correct form of Bayes’ Theorem. Hence, ironically, they are correcting their own mistake in understanding Bayes’ Theorem, by mistakenly thinking they are replacing Bayes’ Theorem. In other words, we unconsciously straw man BT, and then build a steel man, and never discover that our steel man is the actual BT.

The spoiler is, that all correct epistemologies are mathematically modeled by BT. All. There is no valid epistemology, that is not simply described by BT. When you translate Haack’s epistemology from English into mathematical notation, you end up with BT. They all reduce to it. And thus, you might then notice, they all reduce to each other. Hence the merits of unification efforts—even flawed epistemologies, when you correct them, become a system modeled by BT. For instance, unifying intuitionism with empiricism, by merging the realities and virtues of each while discarding their flaws, ends you up with an engine of reason that accurately describes reality (e.g. how intuition actually works, hence my discussion of intuition science in Sense and Goodness without God III.9.1, pp. 178-80) and is entirely modeled mathematically by Bayes’ Theorem. Science has shown our intuitive and perceptual brain mechanisms use calculations that, though not necessarily Bayesian, increasingly approximate Bayesian reasoning, and become Bayesian at the point of their idealization, i.e. if we projected forward in an imaginary evolution to a maximally accurate mechanism in the brain, that end-point would be a perfect application of BT.

Today I’ll demonstrate this with Haack’s epistemology, starting from her critique of BT, and then what she intends to replace it with and why she thinks her replacement is better. A lot can be learned about how BT actually works, and how an epistemology can be correctly built out of it, by following this thought process.

Context

Susan Haack knows Bayesians keep telling her she doesn’t understand it. In the second edition of her seminal work Evidence and Inquiry (2009) she wrote that she “wasn’t greatly surprised” that “mainstream epistemologists” thought she was “just willfully blind to the epistemological power of Bayes’ Theorem” (p. 22, still the only mention of Bayesian reasoning in the whole book; to the contrary, she denounces even probability as being relevant to epistemology, e.g. p. 20, so you can see how far wrong she already is). I don’t think she’s willfully blind (I wasn’t, when I was as hostile as she is to the notion; I was simply mistaken). And I think her critics are as mistaken as she is in how to explain the problem. It’s not that Bayes’ Theorem has some sort of mystical “epistemological power.” It’s that even her epistemology can be written out on a chalkboard in mathematical notation using Bayes’ Theorem. They might not have conceptualized it that way yet, but really, that’s what her critics mean by its epistemological power. We’re all talking about the same thing—BT—we just don’t realize it yet. It’s the engine driving everyone’s car. They’re all just describing it with different words, and thus mistakenly thinking they’re describing a different thing.

Hence I’ve demonstrated that Inference to the Best Explanation is Bayesian, and the Hypothetico-Deductive Method is Bayesian (Proving History, pp. 100-03 and 104-06, respectively); and Historical Method as a whole (ibid,. passim; and see Nod to Tucker and Tucker’s Review); and all empirical arguments whatever (ibid., pp. 106-14). It all ends up boiling down to BT, in different descriptions and applications.

In her much more recent treatise on legal epistemology, Evidence Matters (2014), Haack asserts that “subjective Bayesianism is still dismayingly prevalent among evidence scholars,” even though, she insists, “probabilistic conceptions of degrees of proof” are “fatally flawed” (p. xviii). So she intends to replace it with her two-step process of first showing that epistemology is really all about “degrees to which a conclusion must be warranted by the evidence presented” and then showing that her “foundherentist epistemology” solves the problem she claims exists with that (in particular, the problem that “degrees of epistemic warrant simply don’t conform to the axioms of the standard calculus of probabilities,” and therefore “degrees of proof cannot plausibly be construed probabilistically”). But only someone who isn’t correctly applying “the axioms of the standard calculus of probabilities” could conclude such a thing. And that’s what happens with Haack, as I’ll show.

One caveat I must make is that Evidence Matters is about legal epistemology, which I think Haack might not realize is confounded with risk theory. We do not set standards in law that track what actually warrants human belief, but that manages risk. We can all have a fully justified belief in a conclusion, yet that could never be proved in a court of law by the standards therein. Because what the law wants to do is not simply warrant belief, but also reduce the frequency of bad outcomes (innocent people being jailed, for example), and thus it manages risk by setting higher standards when worse outcomes are risked. Thus, civil court requires a much lower burden of evidence than criminal court, because the penalties—the costs of being wrong—are correspondingly less. Likewise, private organizations will have their own fact-finding standards (e.g. to determine if a student cheated or an employee stole or harassed someone) that are even lower than those of civil court, yet still adequate to warrant some measure of human belief, all simply because the risk of their being wrong is likewise lower.

This makes the law not the best model for a general epistemological theory, except insofar as we want to talk about not epistemology but decision theory: what level of certainty should we have before making a certain decision based on our belief. Which is a values decision, not an epistemic one, and as such will always be calibrated to the cost of being wrong, which actually has less to do with whether the proposition at issue is true (a proposition’s being false can have either enormous or trivial consequences; a thing merely being true or false does not entail one or the other, but only in relation to a whole system of facts apart from it). And yet, contra Haack, this still necessarily requires us to formulate the degree of warrant in our believing anything as a probability. Otherwise, we cannot predict the frequency of risked outcomes (how often we will be wrong, when applying a given epistemic standard to a given quality of evidence). And there is no way to get a probability of a belief being true as a conclusion, without obeying probability theory. And there is no way to apply probability theory without Bayes’ Theorem describing what you are doing (as I demonstrate is a necessary fact of logic for all empirical propositions in Proving History, pp. 106-14).

Nevertheless, hereafter, I’m no longer talking about standards in law, which are matched to risk tolerance and not actually models of belief formation in general. I’m only talking from now on of the general claims Haack is making about Bayes’ Theorem, and the relationship between evidence and probability generally. That she might be confusing in her book risk tolerance with belief warrant I’ll set aside as possible but not what I’m concerned with here. Similarly, some of what is in the law (from statutory evidence standards, judicial precedents, and jury instructions) is actually illogical and probably wrong; yet Haack treats it all as flawlessly perfect and the true normative standard for all epistemology, which is a bit absurd. For example, she makes remarks about jury instructions that never considers the very real possibility that our legal system is not correctly instructing juries how to weigh evidence. I will not address this aspect of her book either, though I think it’s responsible for a lot of her mistakes in it.

Haack Against BT

Haack’s principal attack in “Degrees of Warrant Aren’t Mathematical Probabilities” is on p. 62, where she claims probability theory cannot explain degrees of warrant because:

(a) Mathematical probabilities form a continuum from 0 to 1; but because of the several determinants of evidential quality, there is no guarantee of a linear ordering of degrees of warrant. [Which she says was already argued by J.M. Keynes in his 1921 A Treatise on Probability.]

(b) The mathematical probability of (p and not-p) must add up to 1; but when there is no evidence, or only very weak evidence, either way, neither p nor not-p is warranted to any degree.

(c) The mathematical probability of (p & q) is the product of the probability of p and the probability of q—which, unless both have a probability of 1, is always less than either; but combined evidence may warrant a claim to a higher degree than any of its components alone would do. [Which she says was also argued by L.J. Cohen his 1977 The Provable and the Probable.]

Every one of these statements is wrong.

The Problem of Diminishing Probabilities Is Actually Solved by BT

As to (c): Cohen has been refuted on this point by Tucker, for example, in The Generation of Knowledge from Multiple Testimonies” (Social Epistemology 2015), whose conclusion can easily be generalized to all forms of evidence that meet the same generic conditions, not just eyewitness testimony (Cohen has also been refuted on many other points of probability theory by numerous experts, and IMO should no longer be cited as an authority on the subject: see David Kaye’s Paradoxes, Gedanken Experiments and the Burden of Proof, as a whole, and also his footnote 5 on p. 635). In effect, Tucker shows that the problem Cohen reports is actually eliminated by Bayesian reasoning. With Bayes’ Theorem, multiple items of evidence can multiply together to produce an ever-smaller conjunctive probability and still increase the probability of the conclusion—even quite substantially.

Bayes’ Theorem has this effect in two ways. First, because we are comparing the likelihoods across hypotheses, and it’s the ratio of them that produces the probability of a belief being true; so they can be vanishingly small probabilities, it makes no difference—the ratio can still be enormous. And second, the low innate probability of what’s being testified to can make matching testimonies highly unlikely on their falsity, and thus their conjunction increases the posterior probability, not the other way around.

The first point is fundamental to Bayes’ Theorem (see If You Learn Nothing Else, point one). Suppose we want to know how likely it is that a particular document was forged. If we have three items of evidence, and the probability those items would exist if the document was forged is 0.7, 0.5, and 0.3, respectively, then the probability of all three items being observed is 0.7 x 0.5 x 0.3 = 0.105, which is much less probable than any one of those items alone, which seems counter-intuitive. But we aren’t done. Bayes’ Theorem requires us to input also what the probability is of those same three items of evidence on the conclusion that the document wasn’t forged. If those probabilities are 0.6, 0.4, and 0.1, respectively, and thus their conjunction is 0.6 x 0.4 x 0.1 = 0.024, the ratio between this conjunct probability and the other is 0.105/0.024 = 4.375. The document is over four times more likely to have been forged on the conjunction of this evidence being observed.

In other words, even though the probability of all three items of evidence on the forgery hypothesis is a mere 10%, which is even three or more times less than the probability of encountering any of those items of evidence alone, the probability of all three on the non-forgery hypothesis is over four times less than even that mere 10%. And mathematically the effect is that the probability that the document is forged will be increased, and in fact by more than four times, even though that conjunction of evidence was only 10% expected on the forgery theory. And if we kept adding evidence of the same character, both “probabilities of the evidence” would continue to decrease, yet the probability of forgery would at the same time just as continually increase. That’s always the effect of adding supporting evidence in BT.

This fact I simplify out of common math problems for historians by factoring out “coefficients of contingency” (see Proving History, index, along with all of pp. 214-28). Which simplifies the math, without changing anything, but it can disguise the fact that those coefficients are there, e.g. that conjunctions of evidence are always less probable than single items of it, because there are so many possible ways things can play out (see my discussion regarding “the exact text” of Mark’s Gospel in Proving History, p. 219; and also the role of inexact prediction in simplifying Bayesian calculations: ibid., pp. 214-28). But that being the case makes no difference to the posterior probability, which is always based on the ratio of the conjunct probabilities of the evidence, not the conjunct probabilities of the evidence alone.

Haack seems to not know this. She seems to think that because the conjunct probability of added evidence goes down, that therefore the probability of the conclusion goes down. That’s false. That does not happen in Bayesian reasoning. It can—if the ratio between those conjunct probabilities (on h and on ~h) goes down, then so does the probability of the conclusion (h); but if that ratio goes up, the probability of the conclusion always goes up as well—no matter how astronomically improbable that conjunction of evidence becomes. Bayes’ Theorem thus solves the problem of diminishing probabilities. It is disturbing that Haack doesn’t know this. It’s fundamental to Bayesianism. And is in fact the very thing that made the work of Thomas Bayes groundbreaking.

The second way Bayes’ Theorem doesn’t have the effect Haack claims relates to the innate improbability of chance conjunctions. This, too, has been known for hundreds of years, ever since Laplace demonstrated it (as discussed by Tucker, ibid., p. 4):

The generation of knowledge from testimonies has been formalized at least since Laplace’s treatise on probabilities (1840, 136–156). Laplace demonstrated first that the posterior probability of a hypothesis supported by a single testimony is identical to the reliability of the testimony. Single testimonies transmit their epistemic properties. But in the groundbreaking last couple of pages of the chapter on testimonies, Laplace formalized the generation of knowledge from multiple testimonies by employing Bayes’ theorem. He demonstrated that in a draw of one from one hundred numbers (where the prior probability of each number is 1:100), when two witnesses testify that the same number was drawn and their reliabilities are respectively 0.9 and 0.7, the posterior probability of the truth of the testimonies leaps to 2079/2080. Laplace showed that multiple testimonies can generate knowledge that has higher reliability than their own. Low prior probability increases the posterior probability of what independent testimonies agree on. It is possible to generate knowledge even from unreliable testimonies, if the prior probability of the hypothesis they testify to is sufficiently low. Lewis (1962, 346) and Olsson (2005, 24–26) reached identical conclusions to Laplace’s.

Thus, as Tucker goes on to point out (after surveying more of the modern literature confirming the mathematical point), emphasis mine:

Courts convict beyond reasonable doubt on the exclusive basis of the multiple testimonies of criminals who are individually unreliable, as long as the prior probabilities of their testimonies are low and their testimonies are independent. Historians look for testimonies in archives and for corroborating independent testimonies in corresponding archives, irrespective of the individual reliabilities of the sources. Investigative journalists, likewise, search assiduously for second corroborating independent testimonial sources as do intelligence analysts. Common to all these expert institutional practices is the inference of knowledge from multiple testimonies that can be individually unreliable or whose reliability cannot be estimated.

As Laplace explained two hundred years ago (here pp. 122-23 of the linked English translation), regarding those witnesses to a random number being drawn from a hundred different numbers:

Two witnesses of this drawing announce that number 2 has been drawn, and one asks for the resultant probability of the totality of these testimonies. One may form these two hypotheses: the witnesses speak the truth; the witnesses deceive. In the first hypothesis the number 2 is drawn and the probability of this event is 1/100. It is necessary to multiply it by the product of the veracities of the witnesses, veracities which we will suppose to be 9/10 and 7/10: one will have then 63/10,000 for the probability of the event observed in this hypothesis. In the second, the number 2 is not drawn and the probability of this event is 99/100. But the agreement of the witnesses requires then that in seeking to deceive they both choose the number 2 from the 99 numbers not drawn: the probability of this choice if the witnesses do not have a secret agreement is the product of the fraction 1/99 by itself; it becomes necessary then to multiply these two probabilities together, and by the product of the probabilities 1/0 and 3/10 that the witnesses deceive; one will have thus 1/330,000 for the probability of the event observed in the second hypothesis. Now one will have the probability of the fact attested or of the drawing of number 2 in dividing the probability relative to the first hypothesis by the sum of the probabilities relative to the two hypotheses; this probability will be then 2079/2080, and the probability of the failure to draw this number and of the falsehood of the witnesses will be 1/2080.

In other words, with one witness (one piece of evidence), the truth of the report in this case is unlikely. But with two witnesses (two pieces of evidence) the truth of the report is almost assured: it becomes more than two thousand times more likely to be true than false; when with the single item of evidence, it was over a hundred times more likely to be false. Thus, once again, conjunctions of evidence, when we factor in the improbability of chance conjunctions (as we must, since BT requires all its probabilities to be conditioned on all known data), actually increase the probability of hypotheses, not the other way around as Haack alleged.

So when we use BT correctly, we do not assign the degree or weight of evidence as being equal to merely the probability of that evidence on a theory (the probability that that evidence would be observed), but as being equal to the ratio of the probabilities of the evidence on a theory and its denial. Both must be calculated. And placed in ratio to each other. (And all facts weighed in.) This is called the “likelihood ratio,” and this is how we measure evidential weight in Bayesian epistemology. Since this fact completely eliminates Haack’s third objection, we can call strike one on her, and look at her other two attempts at bat.

BT Explains the Condition of No Evidence Equaling No Warrant

As to (b): Haack’s statement that “when there is no evidence, or only very weak evidence, either way, neither p nor not-p is warranted to any degree” is both false and moot.

First: It’s false because in the absence of specific evidence, conclusions remain warranted on considerations of prior probability. For example, I may have no evidence bearing on whether my car’s disappearance was caused by magic or theft, but it is in no way true that my belief that “it wasn’t magic” is not warranted in “any degree.” In fact, in that instance, my belief that not-p can be extremely well warranted—and on no evidence whatever as to what specifically happened to my car. Likewise, my belief that my car was stolen can be almost certainly true and I can be well warranted in concluding so, without any evidence that my car was stolen—other than its absence.

Unless, of course, priors are decent that I misplaced the car, that it was legally towed, or whatever else, but that just expands the same point to multiple hypotheses: in the absence of specific evidence, we are warranted in believing the prior probabilities, all the way down the line. For example, if 2 in 3 times cars go missing in my neighborhood it’s theft and the other 1 in 3 it’s “legally towed,” I am fully warranted in believing it’s twice as likely my car was stolen than towed. Of course, that’s still close enough to warrant checking for evidence of legal towing before being highly confident it was stolen; but that gets us back to the issue of decision theory vs. our confidence in whichever hypothesis.

We depend on this fact almost the entirety of our life. We don’t re-verify whether objects magically disappear every time we lose something. We don’t constantly check on our car to make sure it hasn’t been vaporized, but instead we confidently, and will full warrant, believe it’s still where we parked it. We’ve learned what’s usual and operate on warranted beliefs about what’s likely and unlikely all the time, without checking the evidence in a specific case. Only when the cost of being wrong is too high to risk, or when the warrantable hypotheses are close in probability (and we need to know which is true), or when we’re just idly curious, do we aim to look for specific evidence that increases the probability of our being right. And we do that by looking for evidence that’s extremely improbable unless we are right. That’s what “good evidence” means in mathematical terms.

As I pointed out in Everyone Is a Bayesian, you can’t pretend you aren’t relying on these prior probability assumptions all the time throughout the whole of your life and in all of your beliefs. Because you are.

Second: Even when true it’s moot, because “neither p nor not-p is warranted” is a mathematical statement fully describable on Bayes’ Theorem as “the posterior probability that p is 0.5.” Those two sentences are literally entirely semantically identical in meaning. So even if we charitably assume Haack did not really mean to say that background probabilities are irrelevant to determining warrant, but that by “absence of evidence” she meant even background evidence, her statement carries no argument whatever against Bayesian or any kind of valid probabilistic epistemology.

For example, suppose someone looks at our book collection, picks one up, and tells us we should sell it because it’s frong; and we’re interrupted by some event and don’t get to find out what “frong” meant. We have no background knowledge to go on. We can imagine many hypotheses as to what they meant (valuable? abominable? something entailing either? both?). But we have no idea which one they meant. On that occasion, every hypothesis has the same prior probability as every other. Assume we could break it down to definitely only two: “primarily” valuable or “primarily” abominable.

In that event we would be warranted only in believing it’s equally likely they meant valuable or abominable. And therefore, we are not warranted in believing p and we are not warranted in believing not-p. Exactly as Haack imagines. In the language of Bayes’ Theorem, this describes the condition P(p|e.b) = 0.50. (Indeed, in plain language philosophy, I’d even say BT conditions like P(p|e.b) = 0.60 would correspond to the “no belief warranted” position, owing to the high probability of being wrong, which we translate as suspecting p but being too uncertain to be sure.) Since Bayes’ Theorem accounts for and explains exactly the condition Haack says would obtain, this fact completely eliminates Haack’s second objection, so we can call strike two on her, and look at her one remaining attempt at bat.

BT Accommodates Any Ordering of Degrees of Warrant

As to (a): We can order degrees of belief any way we want to with probabilities. Contrary to what Haack claims, Keynes did not argue otherwise. In the section she cites, Keynes describes a case where a woman was one of fifty contestants for a prize and because of contingencies was prevented from having a chance to be selected a winner and therefore claimed damages equal to the “expected value,” which would normally be [chance of winning] x [the value of the prize]. The problem at issue was that the winners were not selected at random. So the expected value equation actually doesn’t apply. They were selected according to the personal tastes of a contest judge (who was selecting the “most beautiful” photographs to be the winners). The problem here is not a defect of probability theory but decision theory: how do courts treat unknowable propositions.

In terms of the law, really, the only thing that should have mattered would be whether the contest judge would have chosen her picture, which cannot be known by probabilistic reasoning (maybe it theoretically could be with the right data and technologies, but neither was available); it can only be known by hailing the contest judge into court and asking them to testify as to the fact of whether they would have chosen her photograph. And even they might be uncertain as to the answer (our aesthetic intuitions can daily alter and be mysterious even to us), but also it’s possible her photograph was so awful they could confidently assert under oath they’d never have selected her, or so extraordinarily beautiful they could confidently assert under oath they’d definitely have picked her as a winner. Whereas if they asserted they weren’t sure, that would entail a roughly 50% chance they’d have selected her (since if it was less, they’d assert they would not; and if it was more, they’d assert they would have).

There is nothing in Keynes’s case that renders any problem whatever for applying probability theory or Bayes’ Theorem to this case or any other. All it describes is a condition of high uncertainty. Which in real life can be modeled with wide margins of errors (the probability the judge would have picked her will be a range between the lowest and highest probabilities of selecting her that the contest judge themselves believes reasonably possible; see below for how this works). But because courts don’t track ordinary reality (they are not concerned with what probabilities an event had, but in making binary decisions: pay x or not pay x, etc.), they need special rules for making decisions. And under those rules, if the contest judge testified to not knowing whether he’d have picked her, that would legally entail a 50/50 chance either way, and thus an expected value of 0.5 x [prize value]; and if they testified to knowing they would or would not, that would entail a 1 or 0 chance, respectively. Not because that’s a mathematically realistic description of what actually happened, but simply because that’s an expediency that serves the needs of the court.

As far as human warrant, we’d never accept a fallacy such that our only options are 1, 0, or 0.50. But neither would we ever accept a fallacy such as “guilty or not guilty,” either. Criminal court makes no more logical sense. Obviously every assertion of guilt or innocence is predicated on varying degrees of certainty. There is no such thing as “either guilty or innocent” in human knowledge. No matter what we think we know, there is always a nonzero probability of being wrong (and the exceptions are irrelevant to the present point), and often that probability of being wrong is high enough to worry about, and so on. But courts can’t operate that way. And the fact that courts have to make binary decisions does not mean beliefs are binary decisions. The courts are not brains. They are machines for dealing out justice. And imperfectly at that.

In the real world, for Keynes’s case, we’d believe the truth falls somewhere unknown to us within a range of probabilities (a maximum and minimum), based on what the contest judge says they would have done, and our confidence in their reliability (the probability we believe they’d accurately report on this). And because we’d have thus selected the lowest and highest probabilities we (because they) reasonably believe possible, we would be fully warranted in saying that’s the lowest and highest probabilities we reasonably believe possible. This is no challenge to BT. We can run BT for maximums and minimums and for ranges of values, even values with different probabilities of being correct. (I’ll explain this point in detail below.)

Even if we were in a state whereby the contest judge was dead and couldn’t be asked, we’d have to declare the absence of knowledge in the matter. What the law requires may have no bearing on what’s actually sound reasoning as to belief. The law has other concerns. As I’ve already noted. But in ordinary reality, it is often the case that we just don’t know something. And we can represent that in probability theory.

In fact we are fully able to model all manner of degrees of uncertainty, and even the complete lack of pertinent knowledge, when modeling a belief’s probability of being true using BT (see Proving History, index, “a fortiori, method of” and “margin of error”). So this eliminates Haack’s second objection, and we now can call three strikes on her. She’s out.

Not Getting It

Haack’s principal problem seems to be an inability to correctly translate English into Math. For example, in her defense of her indefensible assertion (b), she asserts that, “It’s not enough that one party produce better evidence than the other; what’s required is that the party with the burden of proof produce evidence good enough to warrant the conclusion to the required degree.” Evidently she doesn’t realize that “evidence good enough to warrant the conclusion to the required degree” translates mathematically as “evidence entailing a Bayesian likelihood ratio large enough to meet the court’s arbitrarily chosen standard of probability.” And that is a statement of Bayes’ Theorem—simply placed in conjunction with the arbitrary decisions of the legal system.

The courts have to make an arbitrary call as to “how probable” the conclusion must be to warrant a binary judgment. Because courts have to convert reality, which is always understood on a continuum of probabilities, into the absurdity of what is actually a black-or-white fallacy of “either true or false,” and it has to do this, because it has to make a decision in light of what the voting community says is an acceptable risk of being wrong. Which is a question of decision theory. Not epistemology. Epistemologically, the strength of evidence is always a likelihood ratio. It’s always Bayesian.

Haack similarly keeps talking about “degrees of rational credibility or warrant” without realizing that simply translates into Math as “our probability of being right,” which indeed simply means “the frequency with which we will be right, when given that kind and quality of evidence (e) and background knowledge (b).” Thus, her own epistemology is just a disguised Bayesianism. Whenever she talks about a piece of evidence making her more confident in a belief, she is actually saying that that evidence increased the probability of her being right (about that belief being true), and conversely of course, it decreased the probability of her being wrong (and we more commonly intuit increased confidence in terms of a reduced chance of being wrong). So she is doing probability theory and doesn’t even know it. Foundherentism just adds the wisdom that both experiential data and coherence are evidence to place in e.

Haack’s Cases

I thoroughly demonstrate that even so-called “subjective Bayesianism” is just a new frequentism in disguise in Proving History (pp. 265-80). Every time someone asserts a “degree of belief” that p of, say, 80%, they are literally saying they expect they’ll be wrong about beliefs like p 1 out of 5 times, where “like” means “based on evidence and background knowledge of analogous weight.” In other words, they are saying they think there is a four in five chance they are right in that case, and all cases relevantly similar.

That’s a frequency measure. And these frequency measures of our accuracy will always converge on the real probability of a thing being true, the more evidence we acquire. Stochastically, at least—since every random selection of evidence will bias the conclusion in different ways, and intelligent agents may even be actively selecting evidence for us for that very reason; so like the second law of thermodynamics, which says systems trend toward increasing disorder as time is added (yet low probability reversals of that direction are statistically also inevitable, the overall trend is so), so also subjective probabilities trend toward the objective probabilities as evidence is added (yet low probability reversals of that direction are statistically also inevitable, the overall trend is so).

This also means that errors in estimating frequencies from available evidence will translate into erroneous subjective probabilities. And always, GIGO. Thus, finding that someone did the math wrong, is never an argument against the correct model being Bayesian. Yet that’s all Haack ever finds. If she finds anything pertinent at all.

The Commonwealth Case

Haack uses two legal cases to try and show Bayesianism doesn’t work, Commonwealth v. Sacco and Vanzetti (1920-27) and People v. Collins (1968). Her examples I suspect face the other two problems I noted—she confuses what courts are doing (which is risk management), with what actually constitutes valid belief formation; and she naively assumes the courts are always following epistemically correct principles (i.e. that the principles they apply aren’t ever mistaken). But my interest is in how her examples don’t reveal anything wrong with BT to begin with.

For the Commonwealth case, Haack critiques an attempted Bayesian analysis of it by Jay Kadane and David Schum (pp. 65ff.). Their book on this may well suck. I have no opinion on that. But all her pertinent statements about it are false. Haack asserts:

First: the fact that Kadane and Schum offer a whole range of very different conclusions, all of them probabilistically consistent, reveals that probabilistic consistency is not sufficient to guarantee rational or reasonable degrees of belief.

This is a straw man fallacy. When subjective Bayesians talk about probabilistic consistency, they mean in a subject’s total system of beliefs. When Kadane and Schum show, for example, two different models based on a different prior probability assignment, they are not claiming those two prior probability assignments are consistent with each other. For Haack to act like that’s what they are saying is kind of appalling. But in any event, there are three ways she is wrong to think this.

First, there is no inconsistency in declaring two values for every probability to describe the margins of error at your desired confidence. That polls give us results like “45-55% at 99.9% confidence” is not a mathematical inconsistency. To the contrary, those three numbers are entailed by mathematical consistency. The confidence interval (e.g. 45-55%) will widen as we increase the confidence level (e.g. 99.9%), and narrow as we lower that level, in a logically necessary mathematical relationship. And the statement “45-55% at 99.9% confidence” means that we are 99.9% certain that whatever the actual probability is, it is between 45% and 55%; or in other words, there is only a 1 in 1000 chance the probability isn’t in that interval.

This translates to all epistemic belief even in colloquial terms (see Proving History, index, “margin of error”). If when estimating the prior probability that p, I choose a minimum and maximum probability, such that the minimum I choose is as low as I can reasonably believe possible, and the maximum I choose is as high as I can reasonably believe possible (and I do the same with the likelihoods), then any posterior probability that results will also be a range, from a low to a high value, and the low value will be the lowest probability that p that I can reasonably believe possible, and the high value will be the highest probability that p that I can reasonably believe possible. Because the confidence level (“what I can reasonably believe possible”) transfers from the premises to the conclusion. That’s not being inconsistent. To the contrary, insisting we know exactly what the probability is would produce an inconsistent system of probabilities. The only way to maintain mathematical consistency is to admit margins of error, and that that margin only applies at our chosen confidence level (the probability, again, of being wrong), and consistency also requires us to admit that the probability might lie outside our interval, but that it’s just very unlikely to.

So the fact that I will have two prior probabilities (and thus two posterior probabilities, and two different systems of calculation accordingly) is not being inconsistent. It’s mathematically required of any coherent system of epistemic probability. They are not inconsistent because they are not the same thing: the minimum is my minimum, the maximum is my maximum. Those are measuring two different things.

Second, in choosing those boundaries of our confidence, whereby we need a minimum and maximum in each case, Kadane and Schum are saying that you have to pick one, and whichever you pick, the consequences they calculate follow. How you pick one will require further epistemic examination of your system of beliefs. They cannot lay out the entirety of every reader’s belief system all the way down to all raw sensory data over the whole of their life in a book. To expect them to is absurd.

Obviously, for your minimum and maximum, you have to choose one of the systems they built, based on what you know (what evidence and background knowledge you have). And that’s going to be a further system of consistent probabilities, down to and including the frequency of raw sensory data (see my Epistemological End Game). Although of course we do this with intuitive shortcuts, and if we are honest, we account for the margins of error that introduces (future AI might do it directly, as present AI already does; unless the shortcuts increase efficiency by enough to outweigh the costs of the errors introduced by using them). But showing us what then results, given what inputs we are confident in, is not being inconsistent. Philosophers argue this way all the time (“I don’t believe x, but even if you believe x, y then follows”). It’s disingenuous to characterize that as being inconsistent.

Third, it’s also not inconsistent to argue, for example, “the prior probability that p is at least x,” and calculate for x, to produce the conclusion, “the posterior probability that p is at least y” (see Proving History, index, “a fortiori, method of”). You do not have to believe the probability is x. All you have to consistently believe is that the probability is x or higher. Thus, I can consistently say the prior probability that a meteorite will strike my house tomorrow is “less” than 1 in 1000 (and therefore arrive at a conclusion that the probability that my house will still be there tomorrow is “more” than some resulting value), even as I believe the probability that a meteorite will strike my house tomorrow is not 1 in 1000 but, say, 1 in 1,000,000,000. Because 1 in 1,000,000,000 is less than 1 in 1000, and thus perfectly consistent with the statement “less than 1 in 1000.”

Haack does not demonstrate that Kadane and Schum meant anything other than any or all of these three things, when they presented multiple calculations to select from (I am not asserting they didn’t, only that she needs to show that to make the assertions she does). Therefore, in these bare statements, she has no argument against even their subjective Bayesianism, much less a fully adequate Bayesian epistemology. For instance, I believe most subjective Bayesians make the same mistake Haack does: thinking their probabilistic “degrees of belief” are not frequencies, when in fact they are—since those “degrees of belief” are actually stating the frequencies of their being right given comparably weighted evidence. And this is what’s wrong with Haack’s second objection to the Kadane and Schum analysis: that they don’t acknowledge (nor does Haack realize) that what they mean by “personal, subjective, or epistemic probabilities” are frequencies, such that every P(p|e) simply states the frequency with which they will be right that p when they know anything comparable to e.

Finally, Haack’s claim that all their math is “mostly decorative” because the inputs simply represent the subjective judgment of experts, she is being disingenuous again. It is not mere decoration to show that certain inputs necessarily entail certain outputs. Showing the public what their beliefs logically entail is not “decoration,” it’s a description of the entire field and purpose of philosophy. Moreover, Haack has nothing to offer to replace such “subjective judgment of experts.” She, too, can simply employ in her epistemology nothing other than the subjective judgment of experts. That she uses the word “weight” to describe those judgments rather than “probability” is merely semantics. Or, perhaps we should say, merely decoration.

This is demonstrated when she introduces yet another “subjective” expert, Felix Frankfurter, to rebut Kadane and Schum—thus accomplishing only a demonstration that Kadane and Schum produced the wrong inputs, and not that Bayes’ Theorem was the wrong model to accomplish the task. You could use their Bayesian model with Frankfurter’s inputs, arguing correctly that his inputs have more coherence with the evidence and our background knowledge of the world, and have exactly the solution Haack claims her foundherentism produces. Thus, again, she is a Bayesian and doesn’t even know it.

For example, she describes Frankfurter’s notice that one witness’s testimony was to accurately identifying a man she saw in a car moving so quickly at a distance so great that her telling the truth is highly improbable. In Bayesian terms, b, our background knowledge, which includes our knowledge of human visual acuity and its relation to objects moving at speed, at distance, and under cover of a vehicle fuselage, renders P(testimony|e.b) very small, much smaller than Kadane and Schum claimed. This simply demonstrates an incoherence in Kadane and Schum’s analysis: an inconsistency between the probabilities they input, and the probabilities they know to be the case regarding human visual acuity. That they didn’t realize this (by not putting two and two together) simply introduced an error in judgment that Frankfurter corrected. That’s how Bayes’ Theorem works.

It’s as much a fallacy to say that “invalid inputs into Bayes’ Theorem gets invalid results, therefore Bayes’ Theorem is invalid” as to say that “invalid inputs into standard deductive logic gets invalid results, therefore standard deductive logic is invalid.” Yet Haack’s entire argument against Bayesianism here relies on exactly that fallacy. Often opponents of Bayesian reasoning do this: they don’t realize that their proposed solution to a supposed fault of BT is actually a correct application of BT. I noted this, for example (and it’s relevant to this case as well) with respect to C. B. McCullagh and the murder of King William II in Proving History (pp. 273-76), which he claimed BT couldn’t solve, but his method could; I showed that “his method” was simply nothing more than a correct application of BT (not just with that example, but I proved this to be the case in general on pp. 98-100, where I analyze his Inference to the Best Explanation model).

The Collins Case

In her second example (pp. 71ff.), Haack discusses a misuse of evidence. But this repeats that same fallacy above: just because “legal probabilism can seduce us into forgetting that the statistical evidence in a case should be treated as one piece of evidence among many” it does not follow that legal probabilism is wrong. She is saying it’s wrong to make that specific error. But BT also says that. So she is not saying anything contrary to BT. She is in fact just instructing legal probabilists to be better Bayesians.

Here she specifically critiques a paper on this case by M.O. Finkelstein and W.B. Fairley. Here they even tell her they mean by subjective probability estimates an actual frequency. Her complaint is that the frequency is derived from an estimated set of counterfactuals. Which she doesn’t like. Ironically in a book in which she makes countless assertions based on her own frequency estimates regarding sets of counterfactuals. All philosophy is built on statistical estimates regarding sets of counterfactuals. Most human judgment in life is built on statistical estimates regarding sets of counterfactuals.

Haack asks how we do this. The answer is simple: all else being equal (e.g. starting with a neutral prior), what all logically possible legal cases can have in common is the likelihood ratio, otherwise known as the Bayes’ Factor (which measures how strongly the evidence weighs toward the conclusion, e.g. four times, a hundred times, a million times, etc.), such that when we assert that we believe there is, say, an 80% chance a party is guilty on the same quality of evidence (the quality being that Bayes’ Factor the evidence produces, and that alone; everything else about the evidence can infinitely vary), we are saying that 8 out of 10 times, we’ll be right (and 2 out of 10 times we will unknowingly convict an innocent person). And this is exactly the same thing Haack would say, only in terms of how confident she is that we are not convicting the innocent. Which is just English for how probable she thinks it is that we are not (which is the frequency that we will not).

Haack does not understand that this is what Finkelstein and Fairley are saying, and consequently her entire analysis of their case is impertinent. We can disregard it. She can’t critique them until she correctly describes what she is critiquing. Moreover, her entire critique from there on out is all about simply saying they overlooked background knowledge that changes the probabilities they imagined applied to the case. In other words, all she is doing is correcting their application of BT, by fixing their inputs. She at no point shows BT is the wrong model to analyze the case with.

Foundherentism Is Just a Description of BT

Haack’s own foundherentism simply says that coherence is evidence, as well as experiential data, but that just restates BT by including evidence (in either e or b) that others may have been improperly excluding. Similarly she says a belief becomes more warranted as evidence increases in supportiveness, independent security, or comprehensiveness. But that’s just three different ways evidence can have a high likelihood ratio (as others have already shown). In other words, that’s just Bayes’ Theorem. Just as I showed for a similar colloquialization known as the Inference to the Best Explanation (Proving History, pp. 98-100).

As shown by R.H. Kramer at Machine Epistemology, by “supportiveness” Haack means “how well the belief in question is supported by his experiential evidence and reasons,” which is simply a restatement of evidential weight, which is measured in BT by the Likelihood Ratio (aka Bayes’ Factor): “how well” a belief is supported by any evidence equals how many times more likely that belief is to be true when that evidence is present. And by “independent security” she means “how justified [our] reasons are, independent of the belief in question,” which is simply a restatement of prior probability being conditioned on prior knowledge (of the whole world, of logic, of past similar cases, and so on), or (depending on how you build your model; since all priors are the posteriors of previous runs of the equation), it’s a restatement of the effect of innate probability on expected evidence (as Laplace showed with respect to multiple testimonies, as I described above). So, still BT.

Finally, by “comprehensiveness” she means “how much of the relevant evidence [our] evidence includes,” which is just a restatement of the fact that the evidence we lack is also evidence, which BT definitely takes into account when properly applied. In particular, BT takes into account missing evidence in two ways. First, the likelihood (the expectancy) that the evidence we lack would be lacking is a probability multiplied in. So, if it is only 30% likely on hypothesis h that we’d be missing that evidence, and 60% likely on ~h, the fact that that evidence is unavailable to us reduces the probability that h by half; although other evidence can easily reverse that factor, so it is never determinative of the conclusion by itself. Then, secondly, if we lack the evidence simply because we chose not to go check it, its absence is 100% expected on both h and ~h, since the fact that we didn’t check is in b, and all the probabilities in BT are conditioned on b. But now we know we have not checked the pertinent evidence, so that fact has to be included in our estimates of how likely it would be that we would have the sample of evidence we do on either theory, and this can tip far in favor of ~h if our sample is at all likely to be biased in favor of h.

Thus, for instance, that we would have someone’s testimony to someone else having committed a crime can be highly expected on that testimony being false (because people frequently lie or are mistaken, particularly if they have a grudge, or their testimony is to behavior that is highly improbable on the totality of our background knowledge). If we know there is evidence that would corroborate or impeach their testimony and chose not to check it, we might not be able to use BT to justify believing the testimony. Depending on the case, the probability of being wrong may be too great. (As Laplace’s example illustrated, for instance; but even a result of, say “80% likely to be true” is often too disturbingly low to act on—again, risk theory: what, honestly, should we risk on a 1 in 5 chance of being wrong?) Whereas, knowing we have all the accessible evidence allows us to assign an expectancy to the missing evidence that may get us more effect than the 1/1 “no effect” Bayes’ Factor we got from the evidence being missing solely because we chose it to be. For instance, if we check and don’t find an expected record of the event (a record that surely, or very likely, would have been generated had it occurred), its absence is much less likely on h than ~h, and it therefore substantially decreases the probability of h (see Proving History, “Bayesian Analysis of the Argument from Silence,” pp. 117-19). The “absence” of that record from our evidence merely because we didn’t look for it, by contrast, has no such effect.

This is just one of many ways BT explains the effect of evidence presence, evidence absence, and evidence biasing (such as only examining a sample of the available evidence) on how warranted we are in believing something, and it properly tells us what it means to be more warranted: to face a lower probability of being wrong. Which we need to know when we apply our warrant to risk. Because to know how likely it is that the risked harm will occur requires knowing how likely it is we are wrong. Thus, it is not possible to make rational decisions even, without a Bayesian epistemology. And everyone is using one, whether they know it or not, and whether they are doing it well or poorly. It’s better to know it. So you can learn how to do it well.

Conclusion

All assertions of fact are probabilistic; you cannot, no matter how hard you try, actually in fact mean anything but a probability when you assert a degree of belief. And you cannot, no matter how hard you try, avoid relying on an intuitively estimated prior probability and likelihood ratio when you estimate your confidence in a thing. I demonstrate this, and that they entail Bayes’ Theorem is the only valid way left to justify your beliefs, in Proving History (pp. 106-14).

Haack has not presented any case to the contrary. She doesn’t understand Bayes’ Theorem. She asserts as arguments against it that in fact don’t undermine it at all, and she proposes to replace it with itself, as her own epistemology is ultimately just another colloquial reformulation of Bayes’ Theorem. Hopefully some day she will notice. As I eventually did. Because once I agreed with her. Then I realized I was wrong.

 

§

To comment use Add Comment field at bottom or click a Reply box next to a comment. See Comments & Moderation Policy.

Discover more from Richard Carrier Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading