Susan Haack is generally a good philosopher (I interviewed her a few years ago). She’s made important strides in unifying disparate positions in epistemology (and I am very fond of unification efforts in philosophy: I think they are generally on the right track, in every domain; as for example my unification of moral theory). But every philosopher makes a boner mistake or two (me included; and I’m always happy to discover and correct them). And for Haack’s part, it’s that she doesn’t understand Bayesian epistemology. So much so, that she ironically denounces it, while going on to insist we replace it with her otherwise-correct foundherentist epistemology…which can be entirely described by Bayes’ Theorem.
This is a common mistake people make who don’t like Bayes’ Theorem. I made it myself, repeatedly, when I was just as hostile to Bayes’ Theorem as an epistemological model, until I realized I was doing it…and then I became a Bayesian. All of us who make that boner mistake say “Bayes’ Theorem entails x; x is bad; if we fix x, we get y,” and unbeknownst to us, y is just a restatement of Bayes’ Theorem. This happens because the person who thinks BT entails x, is wrong. And if they understood how in fact it does not entail x, they would see how y is just a correct form of Bayes’ Theorem. Hence, ironically, they are correcting their own mistake in understanding Bayes’ Theorem, by mistakenly thinking they are replacing Bayes’ Theorem. In other words, we unconsciously straw man BT, and then build a steel man, and never discover that our steel man is the actual BT.
The spoiler is, that all correct epistemologies are mathematically modeled by BT. All. There is no valid epistemology, that is not simply described by BT. When you translate Haack’s epistemology from English into mathematical notation, you end up with BT. They all reduce to it. And thus, you might then notice, they all reduce to each other. Hence the merits of unification efforts—even flawed epistemologies, when you correct them, become a system modeled by BT. For instance, unifying intuitionism with empiricism, by merging the realities and virtues of each while discarding their flaws, ends you up with an engine of reason that accurately describes reality (e.g. how intuition actually works, hence my discussion of intuition science in Sense and Goodness without God III.9.1, pp. 178-80) and is entirely modeled mathematically by Bayes’ Theorem. Science has shown our intuitive and perceptual brain mechanisms use calculations that, though not necessarily Bayesian, increasingly approximate Bayesian reasoning, and become Bayesian at the point of their idealization, i.e. if we projected forward in an imaginary evolution to a maximally accurate mechanism in the brain, that end-point would be a perfect application of BT.
Today I’ll demonstrate this with Haack’s epistemology, starting from her critique of BT, and then what she intends to replace it with and why she thinks her replacement is better. A lot can be learned about how BT actually works, and how an epistemology can be correctly built out of it, by following this thought process.
Context
Susan Haack knows Bayesians keep telling her she doesn’t understand it. In the second edition of her seminal work Evidence and Inquiry (2009) she wrote that she “wasn’t greatly surprised” that “mainstream epistemologists” thought she was “just willfully blind to the epistemological power of Bayes’ Theorem” (p. 22, still the only mention of Bayesian reasoning in the whole book; to the contrary, she denounces even probability as being relevant to epistemology, e.g. p. 20, so you can see how far wrong she already is). I don’t think she’s willfully blind (I wasn’t, when I was as hostile as she is to the notion; I was simply mistaken). And I think her critics are as mistaken as she is in how to explain the problem. It’s not that Bayes’ Theorem has some sort of mystical “epistemological power.” It’s that even her epistemology can be written out on a chalkboard in mathematical notation using Bayes’ Theorem. They might not have conceptualized it that way yet, but really, that’s what her critics mean by its epistemological power. We’re all talking about the same thing—BT—we just don’t realize it yet. It’s the engine driving everyone’s car. They’re all just describing it with different words, and thus mistakenly thinking they’re describing a different thing.
Hence I’ve demonstrated that Inference to the Best Explanation is Bayesian, and the Hypothetico-Deductive Method is Bayesian (Proving History, pp. 100-03 and 104-06, respectively); and Historical Method as a whole (ibid,. passim; and see Nod to Tucker and Tucker’s Review); and all empirical arguments whatever (ibid., pp. 106-14). It all ends up boiling down to BT, in different descriptions and applications.
In her much more recent treatise on legal epistemology, Evidence Matters (2014), Haack asserts that “subjective Bayesianism is still dismayingly prevalent among evidence scholars,” even though, she insists, “probabilistic conceptions of degrees of proof” are “fatally flawed” (p. xviii). So she intends to replace it with her two-step process of first showing that epistemology is really all about “degrees to which a conclusion must be warranted by the evidence presented” and then showing that her “foundherentist epistemology” solves the problem she claims exists with that (in particular, the problem that “degrees of epistemic warrant simply don’t conform to the axioms of the standard calculus of probabilities,” and therefore “degrees of proof cannot plausibly be construed probabilistically”). But only someone who isn’t correctly applying “the axioms of the standard calculus of probabilities” could conclude such a thing. And that’s what happens with Haack, as I’ll show.
One caveat I must make is that Evidence Matters is about legal epistemology, which I think Haack might not realize is confounded with risk theory. We do not set standards in law that track what actually warrants human belief, but that manages risk. We can all have a fully justified belief in a conclusion, yet that could never be proved in a court of law by the standards therein. Because what the law wants to do is not simply warrant belief, but also reduce the frequency of bad outcomes (innocent people being jailed, for example), and thus it manages risk by setting higher standards when worse outcomes are risked. Thus, civil court requires a much lower burden of evidence than criminal court, because the penalties—the costs of being wrong—are correspondingly less. Likewise, private organizations will have their own fact-finding standards (e.g. to determine if a student cheated or an employee stole or harassed someone) that are even lower than those of civil court, yet still adequate to warrant some measure of human belief, all simply because the risk of their being wrong is likewise lower.
This makes the law not the best model for a general epistemological theory, except insofar as we want to talk about not epistemology but decision theory: what level of certainty should we have before making a certain decision based on our belief. Which is a values decision, not an epistemic one, and as such will always be calibrated to the cost of being wrong, which actually has less to do with whether the proposition at issue is true (a proposition’s being false can have either enormous or trivial consequences; a thing merely being true or false does not entail one or the other, but only in relation to a whole system of facts apart from it). And yet, contra Haack, this still necessarily requires us to formulate the degree of warrant in our believing anything as a probability. Otherwise, we cannot predict the frequency of risked outcomes (how often we will be wrong, when applying a given epistemic standard to a given quality of evidence). And there is no way to get a probability of a belief being true as a conclusion, without obeying probability theory. And there is no way to apply probability theory without Bayes’ Theorem describing what you are doing (as I demonstrate is a necessary fact of logic for all empirical propositions in Proving History, pp. 106-14).
Nevertheless, hereafter, I’m no longer talking about standards in law, which are matched to risk tolerance and not actually models of belief formation in general. I’m only talking from now on of the general claims Haack is making about Bayes’ Theorem, and the relationship between evidence and probability generally. That she might be confusing in her book risk tolerance with belief warrant I’ll set aside as possible but not what I’m concerned with here. Similarly, some of what is in the law (from statutory evidence standards, judicial precedents, and jury instructions) is actually illogical and probably wrong; yet Haack treats it all as flawlessly perfect and the true normative standard for all epistemology, which is a bit absurd. For example, she makes remarks about jury instructions that never considers the very real possibility that our legal system is not correctly instructing juries how to weigh evidence. I will not address this aspect of her book either, though I think it’s responsible for a lot of her mistakes in it.
Haack Against BT
Haack’s principal attack in “Degrees of Warrant Aren’t Mathematical Probabilities” is on p. 62, where she claims probability theory cannot explain degrees of warrant because:
(a) Mathematical probabilities form a continuum from 0 to 1; but because of the several determinants of evidential quality, there is no guarantee of a linear ordering of degrees of warrant. [Which she says was already argued by J.M. Keynes in his 1921 A Treatise on Probability.]
(b) The mathematical probability of (p and not-p) must add up to 1; but when there is no evidence, or only very weak evidence, either way, neither p nor not-p is warranted to any degree.
(c) The mathematical probability of (p & q) is the product of the probability of p and the probability of q—which, unless both have a probability of 1, is always less than either; but combined evidence may warrant a claim to a higher degree than any of its components alone would do. [Which she says was also argued by L.J. Cohen his 1977 The Provable and the Probable.]
Every one of these statements is wrong.
The Problem of Diminishing Probabilities Is Actually Solved by BT
As to (c): Cohen has been refuted on this point by Tucker, for example, in “The Generation of Knowledge from Multiple Testimonies” (Social Epistemology 2015), whose conclusion can easily be generalized to all forms of evidence that meet the same generic conditions, not just eyewitness testimony (Cohen has also been refuted on many other points of probability theory by numerous experts, and IMO should no longer be cited as an authority on the subject: see David Kaye’s Paradoxes, Gedanken Experiments and the Burden of Proof, as a whole, and also his footnote 5 on p. 635). In effect, Tucker shows that the problem Cohen reports is actually eliminated by Bayesian reasoning. With Bayes’ Theorem, multiple items of evidence can multiply together to produce an ever-smaller conjunctive probability and still increase the probability of the conclusion—even quite substantially.
Bayes’ Theorem has this effect in two ways. First, because we are comparing the likelihoods across hypotheses, and it’s the ratio of them that produces the probability of a belief being true; so they can be vanishingly small probabilities, it makes no difference—the ratio can still be enormous. And second, the low innate probability of what’s being testified to can make matching testimonies highly unlikely on their falsity, and thus their conjunction increases the posterior probability, not the other way around.
The first point is fundamental to Bayes’ Theorem (see If You Learn Nothing Else, point one). Suppose we want to know how likely it is that a particular document was forged. If we have three items of evidence, and the probability those items would exist if the document was forged is 0.7, 0.5, and 0.3, respectively, then the probability of all three items being observed is 0.7 x 0.5 x 0.3 = 0.105, which is much less probable than any one of those items alone, which seems counter-intuitive. But we aren’t done. Bayes’ Theorem requires us to input also what the probability is of those same three items of evidence on the conclusion that the document wasn’t forged. If those probabilities are 0.6, 0.4, and 0.1, respectively, and thus their conjunction is 0.6 x 0.4 x 0.1 = 0.024, the ratio between this conjunct probability and the other is 0.105/0.024 = 4.375. The document is over four times more likely to have been forged on the conjunction of this evidence being observed.
In other words, even though the probability of all three items of evidence on the forgery hypothesis is a mere 10%, which is even three or more times less than the probability of encountering any of those items of evidence alone, the probability of all three on the non-forgery hypothesis is over four times less than even that mere 10%. And mathematically the effect is that the probability that the document is forged will be increased, and in fact by more than four times, even though that conjunction of evidence was only 10% expected on the forgery theory. And if we kept adding evidence of the same character, both “probabilities of the evidence” would continue to decrease, yet the probability of forgery would at the same time just as continually increase. That’s always the effect of adding supporting evidence in BT.
This fact I simplify out of common math problems for historians by factoring out “coefficients of contingency” (see Proving History, index, along with all of pp. 214-28). Which simplifies the math, without changing anything, but it can disguise the fact that those coefficients are there, e.g. that conjunctions of evidence are always less probable than single items of it, because there are so many possible ways things can play out (see my discussion regarding “the exact text” of Mark’s Gospel in Proving History, p. 219; and also the role of inexact prediction in simplifying Bayesian calculations: ibid., pp. 214-28). But that being the case makes no difference to the posterior probability, which is always based on the ratio of the conjunct probabilities of the evidence, not the conjunct probabilities of the evidence alone.
Haack seems to not know this. She seems to think that because the conjunct probability of added evidence goes down, that therefore the probability of the conclusion goes down. That’s false. That does not happen in Bayesian reasoning. It can—if the ratio between those conjunct probabilities (on h and on ~h) goes down, then so does the probability of the conclusion (h); but if that ratio goes up, the probability of the conclusion always goes up as well—no matter how astronomically improbable that conjunction of evidence becomes. Bayes’ Theorem thus solves the problem of diminishing probabilities. It is disturbing that Haack doesn’t know this. It’s fundamental to Bayesianism. And is in fact the very thing that made the work of Thomas Bayes groundbreaking.
The second way Bayes’ Theorem doesn’t have the effect Haack claims relates to the innate improbability of chance conjunctions. This, too, has been known for hundreds of years, ever since Laplace demonstrated it (as discussed by Tucker, ibid., p. 4):
The generation of knowledge from testimonies has been formalized at least since Laplace’s treatise on probabilities (1840, 136–156). Laplace demonstrated first that the posterior probability of a hypothesis supported by a single testimony is identical to the reliability of the testimony. Single testimonies transmit their epistemic properties. But in the groundbreaking last couple of pages of the chapter on testimonies, Laplace formalized the generation of knowledge from multiple testimonies by employing Bayes’ theorem. He demonstrated that in a draw of one from one hundred numbers (where the prior probability of each number is 1:100), when two witnesses testify that the same number was drawn and their reliabilities are respectively 0.9 and 0.7, the posterior probability of the truth of the testimonies leaps to 2079/2080. Laplace showed that multiple testimonies can generate knowledge that has higher reliability than their own. Low prior probability increases the posterior probability of what independent testimonies agree on. It is possible to generate knowledge even from unreliable testimonies, if the prior probability of the hypothesis they testify to is sufficiently low. Lewis (1962, 346) and Olsson (2005, 24–26) reached identical conclusions to Laplace’s.
Thus, as Tucker goes on to point out (after surveying more of the modern literature confirming the mathematical point), emphasis mine:
Courts convict beyond reasonable doubt on the exclusive basis of the multiple testimonies of criminals who are individually unreliable, as long as the prior probabilities of their testimonies are low and their testimonies are independent. Historians look for testimonies in archives and for corroborating independent testimonies in corresponding archives, irrespective of the individual reliabilities of the sources. Investigative journalists, likewise, search assiduously for second corroborating independent testimonial sources as do intelligence analysts. Common to all these expert institutional practices is the inference of knowledge from multiple testimonies that can be individually unreliable or whose reliability cannot be estimated.
As Laplace explained two hundred years ago (here pp. 122-23 of the linked English translation), regarding those witnesses to a random number being drawn from a hundred different numbers:
Two witnesses of this drawing announce that number 2 has been drawn, and one asks for the resultant probability of the totality of these testimonies. One may form these two hypotheses: the witnesses speak the truth; the witnesses deceive. In the first hypothesis the number 2 is drawn and the probability of this event is 1/100. It is necessary to multiply it by the product of the veracities of the witnesses, veracities which we will suppose to be 9/10 and 7/10: one will have then 63/10,000 for the probability of the event observed in this hypothesis. In the second, the number 2 is not drawn and the probability of this event is 99/100. But the agreement of the witnesses requires then that in seeking to deceive they both choose the number 2 from the 99 numbers not drawn: the probability of this choice if the witnesses do not have a secret agreement is the product of the fraction 1/99 by itself; it becomes necessary then to multiply these two probabilities together, and by the product of the probabilities 1/0 and 3/10 that the witnesses deceive; one will have thus 1/330,000 for the probability of the event observed in the second hypothesis. Now one will have the probability of the fact attested or of the drawing of number 2 in dividing the probability relative to the first hypothesis by the sum of the probabilities relative to the two hypotheses; this probability will be then 2079/2080, and the probability of the failure to draw this number and of the falsehood of the witnesses will be 1/2080.
In other words, with one witness (one piece of evidence), the truth of the report in this case is unlikely. But with two witnesses (two pieces of evidence) the truth of the report is almost assured: it becomes more than two thousand times more likely to be true than false; when with the single item of evidence, it was over a hundred times more likely to be false. Thus, once again, conjunctions of evidence, when we factor in the improbability of chance conjunctions (as we must, since BT requires all its probabilities to be conditioned on all known data), actually increase the probability of hypotheses, not the other way around as Haack alleged.
So when we use BT correctly, we do not assign the degree or weight of evidence as being equal to merely the probability of that evidence on a theory (the probability that that evidence would be observed), but as being equal to the ratio of the probabilities of the evidence on a theory and its denial. Both must be calculated. And placed in ratio to each other. (And all facts weighed in.) This is called the “likelihood ratio,” and this is how we measure evidential weight in Bayesian epistemology. Since this fact completely eliminates Haack’s third objection, we can call strike one on her, and look at her other two attempts at bat.
BT Explains the Condition of No Evidence Equaling No Warrant
As to (b): Haack’s statement that “when there is no evidence, or only very weak evidence, either way, neither p nor not-p is warranted to any degree” is both false and moot.
First: It’s false because in the absence of specific evidence, conclusions remain warranted on considerations of prior probability. For example, I may have no evidence bearing on whether my car’s disappearance was caused by magic or theft, but it is in no way true that my belief that “it wasn’t magic” is not warranted in “any degree.” In fact, in that instance, my belief that not-p can be extremely well warranted—and on no evidence whatever as to what specifically happened to my car. Likewise, my belief that my car was stolen can be almost certainly true and I can be well warranted in concluding so, without any evidence that my car was stolen—other than its absence.
Unless, of course, priors are decent that I misplaced the car, that it was legally towed, or whatever else, but that just expands the same point to multiple hypotheses: in the absence of specific evidence, we are warranted in believing the prior probabilities, all the way down the line. For example, if 2 in 3 times cars go missing in my neighborhood it’s theft and the other 1 in 3 it’s “legally towed,” I am fully warranted in believing it’s twice as likely my car was stolen than towed. Of course, that’s still close enough to warrant checking for evidence of legal towing before being highly confident it was stolen; but that gets us back to the issue of decision theory vs. our confidence in whichever hypothesis.
We depend on this fact almost the entirety of our life. We don’t re-verify whether objects magically disappear every time we lose something. We don’t constantly check on our car to make sure it hasn’t been vaporized, but instead we confidently, and will full warrant, believe it’s still where we parked it. We’ve learned what’s usual and operate on warranted beliefs about what’s likely and unlikely all the time, without checking the evidence in a specific case. Only when the cost of being wrong is too high to risk, or when the warrantable hypotheses are close in probability (and we need to know which is true), or when we’re just idly curious, do we aim to look for specific evidence that increases the probability of our being right. And we do that by looking for evidence that’s extremely improbable unless we are right. That’s what “good evidence” means in mathematical terms.
As I pointed out in Everyone Is a Bayesian, you can’t pretend you aren’t relying on these prior probability assumptions all the time throughout the whole of your life and in all of your beliefs. Because you are.
Second: Even when true it’s moot, because “neither p nor not-p is warranted” is a mathematical statement fully describable on Bayes’ Theorem as “the posterior probability that p is 0.5.” Those two sentences are literally entirely semantically identical in meaning. So even if we charitably assume Haack did not really mean to say that background probabilities are irrelevant to determining warrant, but that by “absence of evidence” she meant even background evidence, her statement carries no argument whatever against Bayesian or any kind of valid probabilistic epistemology.
For example, suppose someone looks at our book collection, picks one up, and tells us we should sell it because it’s frong; and we’re interrupted by some event and don’t get to find out what “frong” meant. We have no background knowledge to go on. We can imagine many hypotheses as to what they meant (valuable? abominable? something entailing either? both?). But we have no idea which one they meant. On that occasion, every hypothesis has the same prior probability as every other. Assume we could break it down to definitely only two: “primarily” valuable or “primarily” abominable.
In that event we would be warranted only in believing it’s equally likely they meant valuable or abominable. And therefore, we are not warranted in believing p and we are not warranted in believing not-p. Exactly as Haack imagines. In the language of Bayes’ Theorem, this describes the condition P(p|e.b) = 0.50. (Indeed, in plain language philosophy, I’d even say BT conditions like P(p|e.b) = 0.60 would correspond to the “no belief warranted” position, owing to the high probability of being wrong, which we translate as suspecting p but being too uncertain to be sure.) Since Bayes’ Theorem accounts for and explains exactly the condition Haack says would obtain, this fact completely eliminates Haack’s second objection, so we can call strike two on her, and look at her one remaining attempt at bat.
BT Accommodates Any Ordering of Degrees of Warrant
As to (a): We can order degrees of belief any way we want to with probabilities. Contrary to what Haack claims, Keynes did not argue otherwise. In the section she cites, Keynes describes a case where a woman was one of fifty contestants for a prize and because of contingencies was prevented from having a chance to be selected a winner and therefore claimed damages equal to the “expected value,” which would normally be [chance of winning] x [the value of the prize]. The problem at issue was that the winners were not selected at random. So the expected value equation actually doesn’t apply. They were selected according to the personal tastes of a contest judge (who was selecting the “most beautiful” photographs to be the winners). The problem here is not a defect of probability theory but decision theory: how do courts treat unknowable propositions.
In terms of the law, really, the only thing that should have mattered would be whether the contest judge would have chosen her picture, which cannot be known by probabilistic reasoning (maybe it theoretically could be with the right data and technologies, but neither was available); it can only be known by hailing the contest judge into court and asking them to testify as to the fact of whether they would have chosen her photograph. And even they might be uncertain as to the answer (our aesthetic intuitions can daily alter and be mysterious even to us), but also it’s possible her photograph was so awful they could confidently assert under oath they’d never have selected her, or so extraordinarily beautiful they could confidently assert under oath they’d definitely have picked her as a winner. Whereas if they asserted they weren’t sure, that would entail a roughly 50% chance they’d have selected her (since if it was less, they’d assert they would not; and if it was more, they’d assert they would have).
There is nothing in Keynes’s case that renders any problem whatever for applying probability theory or Bayes’ Theorem to this case or any other. All it describes is a condition of high uncertainty. Which in real life can be modeled with wide margins of errors (the probability the judge would have picked her will be a range between the lowest and highest probabilities of selecting her that the contest judge themselves believes reasonably possible; see below for how this works). But because courts don’t track ordinary reality (they are not concerned with what probabilities an event had, but in making binary decisions: pay x or not pay x, etc.), they need special rules for making decisions. And under those rules, if the contest judge testified to not knowing whether he’d have picked her, that would legally entail a 50/50 chance either way, and thus an expected value of 0.5 x [prize value]; and if they testified to knowing they would or would not, that would entail a 1 or 0 chance, respectively. Not because that’s a mathematically realistic description of what actually happened, but simply because that’s an expediency that serves the needs of the court.
As far as human warrant, we’d never accept a fallacy such that our only options are 1, 0, or 0.50. But neither would we ever accept a fallacy such as “guilty or not guilty,” either. Criminal court makes no more logical sense. Obviously every assertion of guilt or innocence is predicated on varying degrees of certainty. There is no such thing as “either guilty or innocent” in human knowledge. No matter what we think we know, there is always a nonzero probability of being wrong (and the exceptions are irrelevant to the present point), and often that probability of being wrong is high enough to worry about, and so on. But courts can’t operate that way. And the fact that courts have to make binary decisions does not mean beliefs are binary decisions. The courts are not brains. They are machines for dealing out justice. And imperfectly at that.
In the real world, for Keynes’s case, we’d believe the truth falls somewhere unknown to us within a range of probabilities (a maximum and minimum), based on what the contest judge says they would have done, and our confidence in their reliability (the probability we believe they’d accurately report on this). And because we’d have thus selected the lowest and highest probabilities we (because they) reasonably believe possible, we would be fully warranted in saying that’s the lowest and highest probabilities we reasonably believe possible. This is no challenge to BT. We can run BT for maximums and minimums and for ranges of values, even values with different probabilities of being correct. (I’ll explain this point in detail below.)
Even if we were in a state whereby the contest judge was dead and couldn’t be asked, we’d have to declare the absence of knowledge in the matter. What the law requires may have no bearing on what’s actually sound reasoning as to belief. The law has other concerns. As I’ve already noted. But in ordinary reality, it is often the case that we just don’t know something. And we can represent that in probability theory.
In fact we are fully able to model all manner of degrees of uncertainty, and even the complete lack of pertinent knowledge, when modeling a belief’s probability of being true using BT (see Proving History, index, “a fortiori, method of” and “margin of error”). So this eliminates Haack’s second objection, and we now can call three strikes on her. She’s out.
Not Getting It
Haack’s principal problem seems to be an inability to correctly translate English into Math. For example, in her defense of her indefensible assertion (b), she asserts that, “It’s not enough that one party produce better evidence than the other; what’s required is that the party with the burden of proof produce evidence good enough to warrant the conclusion to the required degree.” Evidently she doesn’t realize that “evidence good enough to warrant the conclusion to the required degree” translates mathematically as “evidence entailing a Bayesian likelihood ratio large enough to meet the court’s arbitrarily chosen standard of probability.” And that is a statement of Bayes’ Theorem—simply placed in conjunction with the arbitrary decisions of the legal system.
The courts have to make an arbitrary call as to “how probable” the conclusion must be to warrant a binary judgment. Because courts have to convert reality, which is always understood on a continuum of probabilities, into the absurdity of what is actually a black-or-white fallacy of “either true or false,” and it has to do this, because it has to make a decision in light of what the voting community says is an acceptable risk of being wrong. Which is a question of decision theory. Not epistemology. Epistemologically, the strength of evidence is always a likelihood ratio. It’s always Bayesian.
Haack similarly keeps talking about “degrees of rational credibility or warrant” without realizing that simply translates into Math as “our probability of being right,” which indeed simply means “the frequency with which we will be right, when given that kind and quality of evidence (e) and background knowledge (b).” Thus, her own epistemology is just a disguised Bayesianism. Whenever she talks about a piece of evidence making her more confident in a belief, she is actually saying that that evidence increased the probability of her being right (about that belief being true), and conversely of course, it decreased the probability of her being wrong (and we more commonly intuit increased confidence in terms of a reduced chance of being wrong). So she is doing probability theory and doesn’t even know it. Foundherentism just adds the wisdom that both experiential data and coherence are evidence to place in e.
Haack’s Cases
I thoroughly demonstrate that even so-called “subjective Bayesianism” is just a new frequentism in disguise in Proving History (pp. 265-80). Every time someone asserts a “degree of belief” that p of, say, 80%, they are literally saying they expect they’ll be wrong about beliefs like p 1 out of 5 times, where “like” means “based on evidence and background knowledge of analogous weight.” In other words, they are saying they think there is a four in five chance they are right in that case, and all cases relevantly similar.
That’s a frequency measure. And these frequency measures of our accuracy will always converge on the real probability of a thing being true, the more evidence we acquire. Stochastically, at least—since every random selection of evidence will bias the conclusion in different ways, and intelligent agents may even be actively selecting evidence for us for that very reason; so like the second law of thermodynamics, which says systems trend toward increasing disorder as time is added (yet low probability reversals of that direction are statistically also inevitable, the overall trend is so), so also subjective probabilities trend toward the objective probabilities as evidence is added (yet low probability reversals of that direction are statistically also inevitable, the overall trend is so).
This also means that errors in estimating frequencies from available evidence will translate into erroneous subjective probabilities. And always, GIGO. Thus, finding that someone did the math wrong, is never an argument against the correct model being Bayesian. Yet that’s all Haack ever finds. If she finds anything pertinent at all.
The Commonwealth Case
Haack uses two legal cases to try and show Bayesianism doesn’t work, Commonwealth v. Sacco and Vanzetti (1920-27) and People v. Collins (1968). Her examples I suspect face the other two problems I noted—she confuses what courts are doing (which is risk management), with what actually constitutes valid belief formation; and she naively assumes the courts are always following epistemically correct principles (i.e. that the principles they apply aren’t ever mistaken). But my interest is in how her examples don’t reveal anything wrong with BT to begin with.
For the Commonwealth case, Haack critiques an attempted Bayesian analysis of it by Jay Kadane and David Schum (pp. 65ff.). Their book on this may well suck. I have no opinion on that. But all her pertinent statements about it are false. Haack asserts:
First: the fact that Kadane and Schum offer a whole range of very different conclusions, all of them probabilistically consistent, reveals that probabilistic consistency is not sufficient to guarantee rational or reasonable degrees of belief.
This is a straw man fallacy. When subjective Bayesians talk about probabilistic consistency, they mean in a subject’s total system of beliefs. When Kadane and Schum show, for example, two different models based on a different prior probability assignment, they are not claiming those two prior probability assignments are consistent with each other. For Haack to act like that’s what they are saying is kind of appalling. But in any event, there are three ways she is wrong to think this.
First, there is no inconsistency in declaring two values for every probability to describe the margins of error at your desired confidence. That polls give us results like “45-55% at 99.9% confidence” is not a mathematical inconsistency. To the contrary, those three numbers are entailed by mathematical consistency. The confidence interval (e.g. 45-55%) will widen as we increase the confidence level (e.g. 99.9%), and narrow as we lower that level, in a logically necessary mathematical relationship. And the statement “45-55% at 99.9% confidence” means that we are 99.9% certain that whatever the actual probability is, it is between 45% and 55%; or in other words, there is only a 1 in 1000 chance the probability isn’t in that interval.
This translates to all epistemic belief even in colloquial terms (see Proving History, index, “margin of error”). If when estimating the prior probability that p, I choose a minimum and maximum probability, such that the minimum I choose is as low as I can reasonably believe possible, and the maximum I choose is as high as I can reasonably believe possible (and I do the same with the likelihoods), then any posterior probability that results will also be a range, from a low to a high value, and the low value will be the lowest probability that p that I can reasonably believe possible, and the high value will be the highest probability that p that I can reasonably believe possible. Because the confidence level (“what I can reasonably believe possible”) transfers from the premises to the conclusion. That’s not being inconsistent. To the contrary, insisting we know exactly what the probability is would produce an inconsistent system of probabilities. The only way to maintain mathematical consistency is to admit margins of error, and that that margin only applies at our chosen confidence level (the probability, again, of being wrong), and consistency also requires us to admit that the probability might lie outside our interval, but that it’s just very unlikely to.
So the fact that I will have two prior probabilities (and thus two posterior probabilities, and two different systems of calculation accordingly) is not being inconsistent. It’s mathematically required of any coherent system of epistemic probability. They are not inconsistent because they are not the same thing: the minimum is my minimum, the maximum is my maximum. Those are measuring two different things.
Second, in choosing those boundaries of our confidence, whereby we need a minimum and maximum in each case, Kadane and Schum are saying that you have to pick one, and whichever you pick, the consequences they calculate follow. How you pick one will require further epistemic examination of your system of beliefs. They cannot lay out the entirety of every reader’s belief system all the way down to all raw sensory data over the whole of their life in a book. To expect them to is absurd.
Obviously, for your minimum and maximum, you have to choose one of the systems they built, based on what you know (what evidence and background knowledge you have). And that’s going to be a further system of consistent probabilities, down to and including the frequency of raw sensory data (see my Epistemological End Game). Although of course we do this with intuitive shortcuts, and if we are honest, we account for the margins of error that introduces (future AI might do it directly, as present AI already does; unless the shortcuts increase efficiency by enough to outweigh the costs of the errors introduced by using them). But showing us what then results, given what inputs we are confident in, is not being inconsistent. Philosophers argue this way all the time (“I don’t believe x, but even if you believe x, y then follows”). It’s disingenuous to characterize that as being inconsistent.
Third, it’s also not inconsistent to argue, for example, “the prior probability that p is at least x,” and calculate for x, to produce the conclusion, “the posterior probability that p is at least y” (see Proving History, index, “a fortiori, method of”). You do not have to believe the probability is x. All you have to consistently believe is that the probability is x or higher. Thus, I can consistently say the prior probability that a meteorite will strike my house tomorrow is “less” than 1 in 1000 (and therefore arrive at a conclusion that the probability that my house will still be there tomorrow is “more” than some resulting value), even as I believe the probability that a meteorite will strike my house tomorrow is not 1 in 1000 but, say, 1 in 1,000,000,000. Because 1 in 1,000,000,000 is less than 1 in 1000, and thus perfectly consistent with the statement “less than 1 in 1000.”
Haack does not demonstrate that Kadane and Schum meant anything other than any or all of these three things, when they presented multiple calculations to select from (I am not asserting they didn’t, only that she needs to show that to make the assertions she does). Therefore, in these bare statements, she has no argument against even their subjective Bayesianism, much less a fully adequate Bayesian epistemology. For instance, I believe most subjective Bayesians make the same mistake Haack does: thinking their probabilistic “degrees of belief” are not frequencies, when in fact they are—since those “degrees of belief” are actually stating the frequencies of their being right given comparably weighted evidence. And this is what’s wrong with Haack’s second objection to the Kadane and Schum analysis: that they don’t acknowledge (nor does Haack realize) that what they mean by “personal, subjective, or epistemic probabilities” are frequencies, such that every P(p|e) simply states the frequency with which they will be right that p when they know anything comparable to e.
Finally, Haack’s claim that all their math is “mostly decorative” because the inputs simply represent the subjective judgment of experts, she is being disingenuous again. It is not mere decoration to show that certain inputs necessarily entail certain outputs. Showing the public what their beliefs logically entail is not “decoration,” it’s a description of the entire field and purpose of philosophy. Moreover, Haack has nothing to offer to replace such “subjective judgment of experts.” She, too, can simply employ in her epistemology nothing other than the subjective judgment of experts. That she uses the word “weight” to describe those judgments rather than “probability” is merely semantics. Or, perhaps we should say, merely decoration.
This is demonstrated when she introduces yet another “subjective” expert, Felix Frankfurter, to rebut Kadane and Schum—thus accomplishing only a demonstration that Kadane and Schum produced the wrong inputs, and not that Bayes’ Theorem was the wrong model to accomplish the task. You could use their Bayesian model with Frankfurter’s inputs, arguing correctly that his inputs have more coherence with the evidence and our background knowledge of the world, and have exactly the solution Haack claims her foundherentism produces. Thus, again, she is a Bayesian and doesn’t even know it.
For example, she describes Frankfurter’s notice that one witness’s testimony was to accurately identifying a man she saw in a car moving so quickly at a distance so great that her telling the truth is highly improbable. In Bayesian terms, b, our background knowledge, which includes our knowledge of human visual acuity and its relation to objects moving at speed, at distance, and under cover of a vehicle fuselage, renders P(testimony|e.b) very small, much smaller than Kadane and Schum claimed. This simply demonstrates an incoherence in Kadane and Schum’s analysis: an inconsistency between the probabilities they input, and the probabilities they know to be the case regarding human visual acuity. That they didn’t realize this (by not putting two and two together) simply introduced an error in judgment that Frankfurter corrected. That’s how Bayes’ Theorem works.
It’s as much a fallacy to say that “invalid inputs into Bayes’ Theorem gets invalid results, therefore Bayes’ Theorem is invalid” as to say that “invalid inputs into standard deductive logic gets invalid results, therefore standard deductive logic is invalid.” Yet Haack’s entire argument against Bayesianism here relies on exactly that fallacy. Often opponents of Bayesian reasoning do this: they don’t realize that their proposed solution to a supposed fault of BT is actually a correct application of BT. I noted this, for example (and it’s relevant to this case as well) with respect to C. B. McCullagh and the murder of King William II in Proving History (pp. 273-76), which he claimed BT couldn’t solve, but his method could; I showed that “his method” was simply nothing more than a correct application of BT (not just with that example, but I proved this to be the case in general on pp. 98-100, where I analyze his Inference to the Best Explanation model).
The Collins Case
In her second example (pp. 71ff.), Haack discusses a misuse of evidence. But this repeats that same fallacy above: just because “legal probabilism can seduce us into forgetting that the statistical evidence in a case should be treated as one piece of evidence among many” it does not follow that legal probabilism is wrong. She is saying it’s wrong to make that specific error. But BT also says that. So she is not saying anything contrary to BT. She is in fact just instructing legal probabilists to be better Bayesians.
Here she specifically critiques a paper on this case by M.O. Finkelstein and W.B. Fairley. Here they even tell her they mean by subjective probability estimates an actual frequency. Her complaint is that the frequency is derived from an estimated set of counterfactuals. Which she doesn’t like. Ironically in a book in which she makes countless assertions based on her own frequency estimates regarding sets of counterfactuals. All philosophy is built on statistical estimates regarding sets of counterfactuals. Most human judgment in life is built on statistical estimates regarding sets of counterfactuals.
Haack asks how we do this. The answer is simple: all else being equal (e.g. starting with a neutral prior), what all logically possible legal cases can have in common is the likelihood ratio, otherwise known as the Bayes’ Factor (which measures how strongly the evidence weighs toward the conclusion, e.g. four times, a hundred times, a million times, etc.), such that when we assert that we believe there is, say, an 80% chance a party is guilty on the same quality of evidence (the quality being that Bayes’ Factor the evidence produces, and that alone; everything else about the evidence can infinitely vary), we are saying that 8 out of 10 times, we’ll be right (and 2 out of 10 times we will unknowingly convict an innocent person). And this is exactly the same thing Haack would say, only in terms of how confident she is that we are not convicting the innocent. Which is just English for how probable she thinks it is that we are not (which is the frequency that we will not).
Haack does not understand that this is what Finkelstein and Fairley are saying, and consequently her entire analysis of their case is impertinent. We can disregard it. She can’t critique them until she correctly describes what she is critiquing. Moreover, her entire critique from there on out is all about simply saying they overlooked background knowledge that changes the probabilities they imagined applied to the case. In other words, all she is doing is correcting their application of BT, by fixing their inputs. She at no point shows BT is the wrong model to analyze the case with.
Foundherentism Is Just a Description of BT
Haack’s own foundherentism simply says that coherence is evidence, as well as experiential data, but that just restates BT by including evidence (in either e or b) that others may have been improperly excluding. Similarly she says a belief becomes more warranted as evidence increases in supportiveness, independent security, or comprehensiveness. But that’s just three different ways evidence can have a high likelihood ratio (as others have already shown). In other words, that’s just Bayes’ Theorem. Just as I showed for a similar colloquialization known as the Inference to the Best Explanation (Proving History, pp. 98-100).
As shown by R.H. Kramer at Machine Epistemology, by “supportiveness” Haack means “how well the belief in question is supported by his experiential evidence and reasons,” which is simply a restatement of evidential weight, which is measured in BT by the Likelihood Ratio (aka Bayes’ Factor): “how well” a belief is supported by any evidence equals how many times more likely that belief is to be true when that evidence is present. And by “independent security” she means “how justified [our] reasons are, independent of the belief in question,” which is simply a restatement of prior probability being conditioned on prior knowledge (of the whole world, of logic, of past similar cases, and so on), or (depending on how you build your model; since all priors are the posteriors of previous runs of the equation), it’s a restatement of the effect of innate probability on expected evidence (as Laplace showed with respect to multiple testimonies, as I described above). So, still BT.
Finally, by “comprehensiveness” she means “how much of the relevant evidence [our] evidence includes,” which is just a restatement of the fact that the evidence we lack is also evidence, which BT definitely takes into account when properly applied. In particular, BT takes into account missing evidence in two ways. First, the likelihood (the expectancy) that the evidence we lack would be lacking is a probability multiplied in. So, if it is only 30% likely on hypothesis h that we’d be missing that evidence, and 60% likely on ~h, the fact that that evidence is unavailable to us reduces the probability that h by half; although other evidence can easily reverse that factor, so it is never determinative of the conclusion by itself. Then, secondly, if we lack the evidence simply because we chose not to go check it, its absence is 100% expected on both h and ~h, since the fact that we didn’t check is in b, and all the probabilities in BT are conditioned on b. But now we know we have not checked the pertinent evidence, so that fact has to be included in our estimates of how likely it would be that we would have the sample of evidence we do on either theory, and this can tip far in favor of ~h if our sample is at all likely to be biased in favor of h.
Thus, for instance, that we would have someone’s testimony to someone else having committed a crime can be highly expected on that testimony being false (because people frequently lie or are mistaken, particularly if they have a grudge, or their testimony is to behavior that is highly improbable on the totality of our background knowledge). If we know there is evidence that would corroborate or impeach their testimony and chose not to check it, we might not be able to use BT to justify believing the testimony. Depending on the case, the probability of being wrong may be too great. (As Laplace’s example illustrated, for instance; but even a result of, say “80% likely to be true” is often too disturbingly low to act on—again, risk theory: what, honestly, should we risk on a 1 in 5 chance of being wrong?) Whereas, knowing we have all the accessible evidence allows us to assign an expectancy to the missing evidence that may get us more effect than the 1/1 “no effect” Bayes’ Factor we got from the evidence being missing solely because we chose it to be. For instance, if we check and don’t find an expected record of the event (a record that surely, or very likely, would have been generated had it occurred), its absence is much less likely on h than ~h, and it therefore substantially decreases the probability of h (see Proving History, “Bayesian Analysis of the Argument from Silence,” pp. 117-19). The “absence” of that record from our evidence merely because we didn’t look for it, by contrast, has no such effect.
This is just one of many ways BT explains the effect of evidence presence, evidence absence, and evidence biasing (such as only examining a sample of the available evidence) on how warranted we are in believing something, and it properly tells us what it means to be more warranted: to face a lower probability of being wrong. Which we need to know when we apply our warrant to risk. Because to know how likely it is that the risked harm will occur requires knowing how likely it is we are wrong. Thus, it is not possible to make rational decisions even, without a Bayesian epistemology. And everyone is using one, whether they know it or not, and whether they are doing it well or poorly. It’s better to know it. So you can learn how to do it well.
Conclusion
All assertions of fact are probabilistic; you cannot, no matter how hard you try, actually in fact mean anything but a probability when you assert a degree of belief. And you cannot, no matter how hard you try, avoid relying on an intuitively estimated prior probability and likelihood ratio when you estimate your confidence in a thing. I demonstrate this, and that they entail Bayes’ Theorem is the only valid way left to justify your beliefs, in Proving History (pp. 106-14).
Haack has not presented any case to the contrary. She doesn’t understand Bayes’ Theorem. She asserts as arguments against it that in fact don’t undermine it at all, and she proposes to replace it with itself, as her own epistemology is ultimately just another colloquial reformulation of Bayes’ Theorem. Hopefully some day she will notice. As I eventually did. Because once I agreed with her. Then I realized I was wrong.
Under “BT Explains the Condition of No Evidence Equaling No Warrant” then under “Second,” did you mean ““a posterior probability that p is 0.5?”
The original was fine as English goes, but I like your wording better. So I emended it to “the posterior…is 0.5.” Helpful suggestion. Thanks.
“Finding that someone did the math wrong, is never an argument against the correct model being Bayesian.” Amen. So sick of hearing that!
Fyi’s
Maybe look at this site instead.
https://errorstatistics.com
Mayo has a book coming out called “Statistical Inference as Severe Testing”
Also, despite what inductivists say, Critical Rationalism (Popper) has not been debunked/discredited … they only wish. Here is a corroboration account equal to the popular Bayesian alternative to Duhem-Quine.
“Corroboration and auxillary hypotheses: Duhem’s thesis revisited.” Darrell Rowbottom.
Also two books by David Miller on critical rationalism.
Those all do the same thing: their criticisms of Bayesianism simply argue for better Bayesianism. The unification of frequentist statistics with Bayesian models is obviously correct (and already commonly employed, and inherent in my epistemology as laid out in Proving History). And the existence of objective probabilities toward which subjective probabilities are approximations is not in doubt to anyone who adopts any form of scientific realism.
Mayo’s may argue that way but I have seen her actually say that Bayesians are actually moving in the direction of classical error statistics and at that point why bother with Bayesian-ism? But you better check with her on that one!
The other case is totally anti-Bayesian and anything to do with justificationist theories of knowledge and the use of induction that way. After all, we are talking about the legacy of Karl Popper!
It doesn’t matter. Everything they propose to replace it with, is describable by BT. That they don’t know that is folly, but makes no difference to the truth of it.
“Critical Rationalism: A restatement and defense” 1994
“Out Of Error: Further Essays on Critical Rationalism” 2006
What _is_ your take on Popper, Dr. Carrier?
I do not understand your comment about being “folly.” What is folly and why? Also, I do not see how corroboration accounts can even be translated in Bayesian accounts. It seems to me that the concepts are antithetical to each other where well corroborated is not probable.
And that’s folly.
They actually are compatible. And when one completes any epistemology, no matter how much it uses frequentism, it always ends up doing so in an ultimately Bayesian framework.
People don’t realize this because they don’t see how what they describe in English, and the assumptions they don’t state but require even in their math, translates into BT. The above article is just one example of countless one could give of exactly this going on.
Thank you. I am still a bit skeptical but you are not the only one who thinks so. Statistical scientist Andrew Gelman seems to think so as well:
http://andrewgelman.com/2004/10/14/bayes_and_poppe/
But David Miller, close colleague of Karl Popper and a professional philosopher goes into a large section critical of Bayesianism. See section 6.5 “Pure Bayesianism” in his book called “Critical Rationalism: A restatement and Defense.” So I guess I have to chew on this for a while to see what I am not understanding. Thx.
Also, this new article by Gelman claiming that current Bayesianism is not inductive but deductive and compatible with Popper
More on this with a critical rationalist arguing against Bayesianism:
“Why Bayesian Rationality Is Empty,
Perfect Rationality Doesn’t Exist,
Ecological Rationality Is Too Simple,
and Critical Rationality Does the Job”
http://www.rmm-journal.de/downloads/004_albert.pdf
Bayesian reasoning is a component of critical rationality. See CFAR.
This sounds wrong. I think you meant “less than” not “more than”:
“Because 1 in 1,000,000,000 is more than 1 in 1000, and thus perfectly consistent with the statement “more than 1 in 1000.””
Correct. Fixed. Thanks!
Dear Richard Carrier,
(I know you won’t publish my comment, but I’d really like to get an answer somehow, so I’ve provided an email where you can reach me, in case you don’t want to respond on your blog.)
When talking about probabilities I’ve seen you write that 0.5 means ignorance or something like that, but I don’t see how that is the case.
Let’s take flipping a coin as an example. To met it seems that saying that the prior equals 0.5, that is the prior probability of the coin being fair, has nothing to do with ignorance. Let’s describe the true bias of the coin with the parameter φ. A φ=0 would mean the coin would always land tails, φ=1 would mean always heads, φ=0.5 would mean a perfectly fair coin.
Now the question is what P(fair) means exactly. Let’s say we’re fine with concluding the coin is fair if 0.45 < φ < 0.55. So what you're saying is that P(0.45 < φ < 0.55) = 0.5 … and that supposedly means we're ignorant about the coin's bias (φ), but that is not the case at all.
Let's say the only other option is that the coin is not fair, so that P(not fair) = 0.5.
This means that the prior probability density looks like this:
P(0 <= φ <= 0.45) = 0.555… (integrates to 0.25)
P(0.45 < φ < 0.55) = 5 (integrates to 0.5)
P(0.55 <= φ <= 1) = 0.555… (integrates to 0.25)
to satisfy that this function integrates to 1.
This has introduced a clear bias. And as you narrow down the "fair range" of φ to the point where it has to equal 0.5 for the coin to be considered fair, you're essentially saying that you are absolutely certain that the coin is fair — the complete opposite of ignorance.
So it seems to me that you cannot say P(fair)=0.5 means ignorance.
What specifies the ignorance about the true φ is the width of the prior probability density function: for example a flat prior gives equal credibility to each possible value of φ … that is ignorance.
The result of applying Bayes' theorem will then be a posterior probability distribution that will have narrowed down in width, i.e. ignorance has been reduced.
You are confusing epistemic with physical probability.
When it is the case that you cannot argue, for lack of any knowledge, that the probability is greater than 50% and likewise you cannot argue for less than 50%, then so far as you know, it is 50%. Until you get more information (like starting to flip the coin and see if it’s loaded; if you never get to do that, you never get to know).
Now, if you can argue it’s different, then the epistemic probability is no longer 50/50.
This is essentially the principle behind Laplace’s Rule of Succession (that article covers the underlying math). Assuming an equal probability for all probabilities, gets you that rule. See also William Faris’s book review of Probability Theory: The Logic of Science (by E. T. Jaynes), in Notices of the American Mathematical Society 53, no. 1 (January 2006): 33–42. Which I cite in Proving History.
Margins of error can then be described around that 50/50 (in fact, must); but the further you get from it, the less likely it becomes that a coin is “that loaded,” based on the logic of configurations (there are more ways the world can be configured to get a 50/50 coin +/-10% than a 50/50 coin +/-30%, for example). That is, until you start getting information to the contrary. Then that changes things.
This is all conditioned on b, our background knowledge, with includes logic, permutation theory, and whatever physical facts pertain (e.g. “sided” objects; people as psychological agents; discovered causes as world events; and so on). So it’s never “total ignorance” (that’s an impossible state; you couldn’t even do math in a state of total ignorance–as you wouldn’t even know math). But it’s pertinent ignorance: epistemically, when you don’t know the odds are good or bad, you are in the condition of Buridan’s Ass: so far as you know, it’s equal odds either way. Until data starts to change that conclusion. And this follows even when you run a complete probability distribution as you suggest. As Laplace discovered two hundred years ago.
Note, (a) “The result of applying Bayes’ theorem will then be a posterior probability distribution that will have narrowed down in width, i.e. ignorance has been reduced” is impossible without a defined prior (so what prior will it be?) and (b) Laplace did indeed apply Bayes’ Theorem to answering the question in (a) and got the result I report (the linked Wikipedia article walks you through the math).
(Please delete my previous message, as greater and less than signs destroyed parts of my post.)
“When it is the case that you cannot argue, for lack of any knowledge, that the probability is greater than 50% and likewise you cannot argue for less than 50%, then so far as you know, it is 50%.”
But that is true for any probability, be it 1%, 13.62% or 50%. I cannot argue that it’s less than 1% or more than 1% so it is 1%? That argument is invalid.
Not being able to argue for more or less than X% in my book is exactly the ignorance I was talking about. This ignorance is properly reflected e.g. by a flat or beta(1,1) or Jeffreys … prior. Saying it is 50% is the complete opposite of that, as I’ve tried to explain before.
Even if we can argue differently (that less or more than X%), you still should use a prior density function that reflects the remaining uncertainty or ignorance. For example a beta(20,20) prior could be the choice given some knowledge about the production process of the coin: values near 0.5 are way more credible for φ than at either extreme. When doing 10 coin flips then this prior will give a very different result than e.g. a flat prior we could use if we had no clue where the coin came from.
I am not sure how the “Rule of succession” applies here, since we don’t necessarily have a few observations. Also, it deals with P(X=heads) or P(X=tails) which is very different from P(fair) or more precisely e.g. P(0.45 less φ less 0.55).
But if you *know* that some random coin can land on either side, then you could start out with a simplified P(heads) = 1-P(tails) = (heads + 1) / (flips + 2), sure.
I do understand your point about background knowledge, e.g. like the physics of a two-sided coin-shaped metal object tells us that it will likely result in an unbiased and random outcome, but the coin could still be from a magician’s shop and be constructed such that it looks like a normal coin but will actually almost always land on heads.
Laplace also used a flat aka uniform prior to describe uncertainty (or to describe our ignorance about the process), yes, which results in a posterior probability distribution. The single-number result however is still just P(head) or the expected value of that distribution. It does not tell us anything about uncertainty or ignorance.
Let’s say such a result was 0.666… What does this mean? A single look at the posterior PDF would tell use how certain we were that 0.666… was a credible value for φ. It could a very narrow peak near that value (meaning high credibility) or it could be a very wide one. With more interesting processes there could even be multiple asymmetric peaks, etc.
What I’m saying is that “P(anything) = 0.5” cannot inform about ignorance.
The math has been done. The mathematicians didn’t get the result you are claiming. I don’t know what to tell you. The math is the math.
But if you don’t get the math, we can try psychological access to the concept of epistemic probability:
You are faced with an unknown binary event. You have no information that makes either option more likely than the other. You run a casino. Someone says they will bet you $1,000 on one of the two options, as soon as you set the odds—that is, they pick which option they will bet on after you set the odds. Explain why you would pick any value other than 1:1.
“The mathematicians didn’t get the result you are claiming. I don’t know what to tell you. The math is the math.”
What do you mean? I didn’t contradict any of the maths. I pointed out to you what ignorance means (put simply it’s the width of the PDF) and not some single probability like .5. The maths you linked to does not deal with inference about hypotheses (like the coin is *fair*), but successive probability of the outcome (like X=1) of a Bernoulli process. In fact, the “Mathematical details” section of the linked Wikipedia article matches what I’ve been saying about the uncertainty about the true value φ or as I called it, our ignorance about it.
It specifically says: “The proportion p is assigned a uniform distribution to describe the uncertainty about its true value.”
“You have no information that makes either option more likely than the other.”
And this ignorance is the only thing I’m talking about. Mathematically this is not sufficiently described by a 0.5 probability, 1:1 odds or the expected value of a PDF being 0.5, but only the full prior PDF itself.
You see, in your example, it could actually be the 1000th time this bet was made and the true value of φ could have been inferred to equal 0.5 (well, there’d still be some uncertainty…). Or we actually knew that the only credible values for φ are very close to 0.5 beforehand. The PDF shows that. It shows our ignorance or certainty. The odds or probability of some outcome do not. They do not tell us how ignorant you are about the true value of φ.
Then you don’t understand epistemic probability.
The 1:1 odds of being right on a bet like that exactly is what epistemic probability is about. That’s what you get as the prior odds in an Odds Form of Bayes’ Theorem for any binary undecidable on a state of effective ignorance.
If you don’t understand that, I can’t help you.
(Not sure what happened with my final post. Trying again..)
Okay then please let me ask this final question.
The odds you gave for the example above (where we are totally ignorant about the binary outcome) are 1:1.
What are the odds if we know with *certainty* that the event produces outcomes that match that of a perfectly fair coin?
If your answer is again 1:1 then by simple logic these odds cannot say anything about certainty or ignorance.
I get the impression that you don’t like to continue this discussion. I thank you for your time, although I’d have liked to talk some more about my point that the probability of a hypothesis or model, e.g. the coin being considered fair (that what we’re actually interested in!) e.g. if φ is between 0.45 and .55 or the coin being considered biased if φ falls outside this narrow range, which has a different meaning from the probability of some outcome. If you used the same uniform prior as before then we’d have to start at P(fair) = 0.1, since that’s what the flat prior over that range integrates to, and after observing 7 heads for N=10 this would only rise to about 13%. If you started with P(fair)=0.5 your prior wouldn’t look at all uniform but more like a beta(23,23) distribution.
But it’s fine if you don’t want to.
The fact that you are asking this question proves you don’t know what epistemic probability is. It’s the probability precisely when we don’t know anything with certainty.
We aren’t concerned about certain. We are literally only concerned with what odds you would assign when offered the imagined bet. That’s literally what you do every day of your life: set odds on bets on the outcome of events. The bets may be in the form of your life, a certain amount of time or money wasted, emotional costs, etc. But it’s always a bet on odds. And all we want to know are, because you have to bet, and cannot be certain, what do you set the odds at under those conditions?
You have to do it. So you can’t just sit it out. You are automatically doing it the moment you make any choice whatever—even just in deciding what to believe, how to act, what to say. Epistemic probability is what we use to deal with this unavoidable requirement of betting and setting odds in every decision of our lives no matter how minor. It does not have anything to do with being certain about anything. It has to do with the consequences of our uncertainty, and how we measure the differential probability of those consequences given the little we know.
(I’m sorry but I have to reply again. I have to confirm that I’m not misrepresenting your views!)
“It’s the probability precisely when we don’t know anything with certainty.”
I’m sorry if I wasn’t clear. With certainty I was referring to a situation where you have very strong evidence, and epistemic probabilities certainly do not only apply to situations of total ignorance (they actually don’t in that specific case). Or are you saying that when evidence is gathered at some point epistemic probability stops being applicable? I’m certain that is not the case. 😉
I’ve stumbled upon “Epistemic Probability And Epistemic Weight” by Gunnar Berglund today, and he defines the epistemic probability P as the *expected value* of the *posterior* distribution given e.g. a uniform prior distribution. He goes on to define the epistemic weight W of the evidence with respect to the hypothesis, which depends on the *standard deviation* of the posterior distribution!
(These would just be the mathematical definitions without any philosophical interpretation.)
He writes: “It is important to always characterise our knowledge position by stating both the epistemic probability […] and the epistemic weight.”
This makes sense, clears up my confusion and also mostly matches what I’ve argued, but when talking maths I’d still just provide the probability distributions since they contain all the information.
To answer your question “what do you set the odds at under those conditions?”:
Given total ignorance, i.e. we do not even know whether both outcomes are possible, there are no odds.
Given some random Bernoulli process, i.e. we only know that neither outcome is impossible, the odds result from the expected value of the posterior distribution. The expected value unsurprisingly turns out to be 0.5 or 1:1 in this case.
tl/dr: The odds cannot say anything about ignorance or uncertainty. The posterior probability distributions do.
You can’t get a posterior probability without a prior.
So you always have to pick a prior.
And when you do, when you don’t know that it’s higher or lower (because no information available to you makes either option more likely), your prior is 1:1.
That’s just the way it is. Otherwise, you can’t do any Bayesian calculation whatever. You can never get a final odds or a posterior probability.
You can calculate the posterior for every possible prior. But when you then merge those results to get a single concluding outcome, you end up with the same result as if your prior were 1:1 (with transferred margins of error). Hence Laplace’s Rule of Succession.
“You can calculate the posterior for every possible prior. But when you then merge those results to get a single concluding outcome, you end up with the same result as if your prior were 1:1 (with transferred margins of error). Hence Laplace’s Rule of Succession.”
I don’t dispute the math.
I am not sure how familiar you are with uninformative, reference, objective … priors. A uniform prior is not the only choice, and not the best in some cases. But of course you cannot introduce bias towards either heads or tails without good reason.
But anyway, given the hypothesis of “heads” you also have 1:1 odds after doing 1000 coin tosses that came up 500 times heads and 500 times tails.
That is why 0.5, 50%, 1:1 or “the middleground” say nothing about ignorance. That is my main point.
Oh and that given other hypothesis, starting with 1:1 odds could mean something completely different (higher certainty, ignorance or bias).
It sounds to me like you are not speaking regular English. You are trying to import obscurely defined terms in a specialized dialect into regular plain language philosophy. And that’s just not applicable. We are not concerned with the technical things you are talking about. We are simply saying things in ordinary English, and as such, what we are saying already subsumes and includes what you are saying. Getting any more precise than that is simply unnecessary to any point I make in Proving History. It may be handy in obscure contexts that don’t relate to what historians and everyday reasoners need to do, but we aren’t concerned with those obscure contexts here. See, for example, my discussion of the coin example you just gave, in Chapter 6. That’s what we are talking about.
I am using fairly simple terms, but I will try one last time to get through to you:
Haack wrote: “when there is no evidence […] neither p nor not-p is warranted”. You wrote “[this] is a mathematical statement fully describable on Bayes’ Theorem as “the posterior probability that p is 0.5.””
If p means heads, what is the posterior probability after we’ve flipped a coin 1000 times and it came up heads/tails exactly 500/500 times? It’s still 0.5.
So again: “0.5” is only part of the information. What Haack (and you as well) seem to be missing is that the posterior is a probability distribution, whose standard deviation or variance tells you how certain you are about it based on the evidence.
Assuming a random event with binary outcome, before you have any evidence, the posterior will have very high standard deviation.
After observing a couple of outcomes this standard deviation will have decreased significantly.
I have explained this like 10 times now but you kinda keep evading it.
In your other objection to Haack, you could have calculated: P(forged|evidence). That is what Haack confuses with P(evidence|forged) after all. I cannot believe that someone talking about probability does not understand that?!
We can calculate that, given (eN means evidence #N):
P(eN|forged) = 0.7, 0.5 and 0.3
P(eN|not forged) = 0.6, 0.4, and 0.1
By Bayes’ Theorem and using P(forged) = 0.5 prior, we get:
P(forged|eN) = 0.5385, 0.5556 and 0.75
P(forged|e1 and e2) = 0.5932
P(forged|e1 and e3) = 0.7778
P(forged|e2 and e3) = 0.7895
P(forged|e1 and e2 and e3) = 0.814
Simply applying the maths gives us the straightforward answer that with all 3 pieces of evidence, the probability of the document being forged is 81.4% and of course higher than any other combination of less such evidence.
You are the one who isn’t listening.
The probability distribution is communicated colloquially in plain English with the margins of error and confidence level (“as far as I can reasonably believe” = the widest margins of error I can reasonably believe). Everything outside that margin is beyond reasonable belief (by definition; one can then debate whether the margins assigned are at the limit of what’s reasonable, but that then requires evidence, which requires knowledge). Everything you are doing is a waste of time and of no use to regular people or historians. It’s great for when you want to do hyper-precise science papers. But we aren’t doing hyper-precise science. We are talking about conditions considerably below the precision available to the sciences. And when we do that, we do just what you just did: assume a prior of 0.5 (with a margin of error around that, representing the limits of what we can reasonably credit) when we have no information that makes it higher or lower. That’s what the rest of us mean when we model ignorance mathematically. And you just did that yourself (“By Bayes’ Theorem and using P(forged) = 0.5 prior, we get…”).
Yes, certainly, the more evidence we have (the less ignorant we are) the more certain our results (the smaller the margins of error). But that isn’t what you were talking about before, and it isn’t anything I have ever objected to. Instead, you have gone back to saying exactly what I had been saying: that when we don’t know P(x) is higher or lower, we have to use 0.5. I even add that we then also include a margin of error around that, which you didn’t even do in your own supposedly superior example, an example that simply exactly replicates everything I have been doing!
Bayesianism still has problems with induction … or should i say that induction still has problems with Bayesianism.
http://www.pitt.edu/~jdnorton/papers/material_theory/1.%20Material%20Theory.pdf
This a very general account which is not directed specifically at Richard Carrier’s wonderful set of books on the historicity of Jesus question. I am on his side when it comes to Jesus just being fiction. However, it is a deeper dive into the topics of induction and current trends of Bayesianism.
Norton doesn’t know how Bayesian reasoning is properly applied. Maybe he is being misled by some bad method someone else proposed, but what he describes as a Bayesian approach is not sound. He incorrectly says that a Bayesian scientist looks for evidence that’s highly likely on the tested hypothesis. That’s incorrect. Because that’s only a tertiary goal. Bayesian scientists actually look for evidence that’s highly unlikely on any alternative hypothesis. Finding which produces the high likelihood ratio we need. They likewise look for evidence that’s highly unlikely on the tested hypothesis, and increase that hypothesis’s probability by failing to find it. This is fundamental to the scientific method: science tests theories by trying to prove them false and failing; not by trying to find evidence that fits the hypothesis—the latter is the method distinctive of medieval pre-scientific thinking; yes, we need that evidence, too, but finding it is not what makes a test scientific. Because it is by failing to find unexpected evidence, despite such a diligent search that it should have been found (thus guaranteeing not finding it is highly unlikely unless the hypothesis is true) that warrants a high likelihood ratio and thus a high posterior probability. Likewise, conversely, for alternative hypotheses: because finding that unlikely evidence tanks their posteriors.
Ironically, Norton’s own proposal is the correct approach. He doesn’t even notice that his replacement method is fully and properly Bayesian. It looks like he has been misled by some other weirdly abstract Bayesian method someone else somewhere used that isn’t even based on comparing physical models, which would not be correct Bayesian epistemology, which certainly must be built out of comparing physical models derived from background knowledge, exactly the way Norton says conclusions in chemistry should be reached.