Greg Mayer posted at Jerry Coyne’s blog on “Why I am not a Bayesian.” In his explanation, he goes wrong at three key points. And they are illustrative of common mistakes people make in trying to understand or apply Bayesian reasoning. In reality, Mayer is a Bayesian. He just doesn’t understand why. Here is the breakdown.
Error the First
Mayer’s assertion (quoting Royall) “pure ignorance cannot be represented by a probability distribution” is false. It is in fact self-refuting. “I don’t know whether h is more likely than ~h” is by definition a probability distribution (it entails P(h|b) = 0.50). I made this point already in Proving History, and there provide a demonstration refuting Mayer-Royall style claims generally (pp. 83-85 and 110-14).
Although one need merely point out: if you cannot define the prior probability, then you can never know if any likelihood ratio warrants belief. Ever. For example, saying a claim is ten times more likely to be true given a new piece of evidence doesn’t tell you if it’s even likely at all. For instance, ten times 1% is only 10% … which means the claim remains false. Thus likelihood ratios are logically useless in the absence of a prior probability to update.
But more damning is the fact that all prior probabilities are the posterior probabilities of prior likelihood ratios. In other words, every prior is the outcome of a preceding likelihood ratio. So if you believe in likelihood ratios (as Mayer says, “only the likelihoods make a difference … So, why not just use the likelihoods?”), you have to believe in priors. Because the one creates the other, and by transitive logic, the other can be reverse-engineered into the one. (See “iteration” in the index of Proving History.) It’s just a needless waste of time to do that all the way down to the raw uninterpreted data of human sensation (as is what you would do, if you wanted to build a Bayesian argument all the way down to its foundational ratios in undeniable data).
Because we don’t need to. Science conclusively establishes priors all the time (and so can human experience of all varieties and in all fields, especially when employing a fortiori reasoning: see “a fortiori” in the index to Proving History). In fact prior probabilities in the sciences are called base rates. The prior probability that you have cancer, for example, has a well documented value. Therefore we know the prior probability that you have cancer when a certain test for cancer is applied. The results of that test then tell you the updated probability that you have cancer given that new piece of information: the test coming up positive or negative. And how much the resulting likelihood ratio alters the prior then depends on the false positive and false negative rates of the test.
Ignoring base rates—ignoring priors, in other words, as Mayer wants—is actually an established logical error called the Base Rate Fallacy.
Scientists should not be promoting fallacious reasoning on science blogs.
Just saying.
Error the Second
Mayer writes that “In the end, only the likelihoods make a difference; but this is less a defense of Bayesianism than a surrender to likelihood” because adding in priors means to “boldly embrace subjectivity,” but “then, since everyone has their own prior, the only thing we can agree upon are the likelihoods. So, why not just use the likelihoods?”
That last question I already answered above: the answer is, because you can’t. It’s logically impossible to know how likely h is with only the likelihoods. For even a million-to-one favorable Bayes factor can still end up giving you a posterior probability of under 1%, in other words a guaranteed false claim. So even saying “we have a million-to-one likelihood ratio in favor of h” tells you nothing about whether you should believe h.
But that’s his first mistake, which I already addressed.
Mayer’s second mistake appears here at the point where he complains, in effect, that priors are so subjective anyone can make up any prior they want, willy nilly (tell that to oncologists; get laughed at in the face). Again I already refuted that in Proving History (pp. 81-85). Key point there: he is falling victim to an equivocation fallacy, conflating “subjective” with “arbitrary.” There can be subjective elements in estimating a starting prior (not always significantly, though, e.g. base rates of cancer), but you can’t just start with any prior. You have to justify your priors with prior data (e.g. Proving History, pp. 229-56; data is even extendable with previously confirmed hypotheses: pp. 257-65)—elsewise, all hypotheses start out equally likely. Which is a prior probability.
You can’t escape the consequences of your own reasoning.
Uncertainty is then accounted for with margins of error, which you use to account for which possible prior probabilities the data available can and cannot support (Proving History, pp. 265-80, with pp. 257-65). So it is simply not the case that “everyone has their own prior” in any sense sufficient to carry Mayer’s conclusion. Though people can sometimes differ on where their intuition puts a prior, everyone who is basing their intuition on actual data (as in background knowledge, the b in a Bayesian equation on which all the probabilities in it are conditioned—including the likelihoods, incidentally!) will be plotting priors in the same region. In other words, within the same margins of error objectively supported by the data. (See “disagreement” in the index to Proving History.)
The same subjectivity also operates on assigning likelihoods, too. But the same rules apply there as well.
The crucial lesson, though, is that Bayesian reasoning forces you to make this fact explicit. Whereas all other methods conceal it, and thus allow people like Mayer to pretend they aren’t being just as subjective as Bayesianism requires. Bayesianism is honest. Everything else is in some measure or other a lie. Maybe a lie you tell yourself, but a lie all the same. And that fact is illustrated by Mayer’s third mistake…
Error the Third
Mayer says:
The problem with Bayesianism is that it asks the wrong question. It asks, ‘How should I modify my current beliefs in the light of the data?’, rather than ‘Which hypotheses are best supported by the data?’. Bayesianism tells me (and me alone) what to believe, while likelihood tells us (all of us) what the data say.
He mistakenly thinks these are saying different things. They are not. And that betrays the fact that he doesn’t really understand Bayesian reasoning. What does it mean to say “this hypothesis (h) is better supported by this evidence (e)”? That sentence is vacuous. Unless you can explain how e makes h more likely. Because if e does not make h more likely than it does ~h, there can be no intelligible sense in which e supports h over ~h. Ooops. Guess what that means. ‘Which hypotheses are best supported by the data?’ = ‘How should I modify my current beliefs in the light of the data?’.
That Mayer doesn’t notice this, is kind of embarrassing.
Similarly, “Bayesianism tells me (and me alone) what to believe, while likelihood tells us (all of us) what the data say” is likewise vacuous. Not only because likelihoods are just as subjective as priors (for all the reasons I’ve surveyed throughout this article already), but more importantly because: if you can’t explain to someone why b + e tells “you” what to believe, then it doesn’t in fact tell you what to believe. Sorry, but it doesn’t. Whereas if you can explain to someone why b + e tells “you” what to believe, then you’ve just told them what to believe. Ooops. Guess what that means. ‘Bayesianism tells me (and me alone) what to believe’ = ‘likelihood tells us (all of us) what the data say’.
Because there is no way data can tell you “alone” what to believe—unless you have access to data others do not, in which case obviously your job as a scientist is to provide others with that data. And then it’s no longer telling you “alone” what to believe. And when you can’t do that, you are facing a universal problem in epistemology that has nothing to do with Bayesianism: sometimes you know stuff other people don’t, and therefore sometimes you are warranted in believing something others are not. That’s just true. You can’t escape it by denying Bayesianism.
Hence, if you can explain to someone why b + e tells “you” what to believe, then you’ve just told them what to believe, unless they can show you that you have left something out of b or e, or inserted something in them that doesn’t actually exist. In which case your conclusion will change into alignment with theirs, once you adjust your b + e to align with theirs. (Again, see “disagreement” in the index to Proving History.) This is in fact what is actually wrong with bad uses of Bayes’ Theorem, as for example to prove God exists: always they are fucking with the data. Stop fucking with the data, and Bayesianism gives you atheism. Every time. The problem with bad Bayesian arguments is not Bayesianism. The problem with bad Bayesian arguments is that they are bad. (Proving History, pp. 67, 79-80, 91, and 305 n. 33.)
So much for the distinctions Mayer thought he was making. They dissolve on simple analysis.
The Major Point
Mayer needs to learn why Prior Assumptions Matter. [See also The Fallacy of Arbitrary Priors.]
The real embarrassment here is how he is already always a Bayesian in everything he does—and doesn’t know it. Why, for example, for any study does he not consider the likelihood that the CIA or aliens meddled with the experiment or observations and thus the data was unknowingly faked? Well, because he assumes—subjectively!—that those hypotheses have vanishingly small priors. So Mayer is already relying on his feared subjective priors; he just won’t admit it or doesn’t realize it. (Proving History, pp. 104-05.)
How would Mayer defend his refusal to take those hypotheses seriously, even though they have exactly the same likelihoods as any theory any scientist tests? (Because, after all, they are engineered to predict exactly all the same evidence: see “gerrymandering” in the index of Proving History.) He would appeal to background knowledge (i.e., b) which establishes a very low prior frequency of the CIA doing that, or of aliens doing anything, much less that. He would, in other words, admit to relying on subjective priors. Only, he would then have to admit they are not arbitrarily subjective: if someone came along and said “I have my own prior for alien meddling, and it’s 90%” he would duly point out that they have no data on which to base such a high base rate for alien meddling in human affairs at all (much less in science), whereas Mayer has tons of data (a vast database of human experience, including his own) in which the evident frequency of alien meddling is effectively zilch. So if it’s happening, it’s happening exceedingly rarely.
Attempting to bypass that with a Cartesian Demon then fails because it reduces the prior probability even further by adding too many improbable unevidenced assumptions to the hypothesis (and the prior probability of each multiplies against the rest to produce diminishing probabilities overall). (See “Ockham’s Razor” in the index to Proving History.) And note only the prior is thus reduced (unless you dig all the way down to raw sensory data before running your Bayesian series). Hence Mayer’s proposed “likelihoodism” can’t explain why Cartesian Demons aren’t the cause of everything ever. That’s a serious epistemological defect. He might want to tend to that.
But to bring this point back to reality, there is a hypothesis the Mayers of the world fail to take into account in published research, and which Frequentism can’t take into account, only Bayesianism can do it: human fraud.
Guess what, that’s a thing. Numerous high profile examples have appeared in science news and journals in just the last three years. Unlike aliens and a meddlesome CIA, scientific fraud has a real measurable frequency. It therefore has a base rate. And a minimum one at that, since per the Conjunction Fallacy (famously illustrated with the Linda problem), necessarily there must be more than has been caught. One then must use broader background data to try and estimate how much un-caught fraud there is, and come up with a defensible, data-based, high-confidence margin of error for it. And until that’s been done, guess what? Ooops. You have to subjectively guestimate what it is. Otherwise, no conclusion in science is defensible, because it can all be fraudulent (as Creationists and Climate Denialists would love to hear). So Mayer must be assuming the total base rate of fraud in science—in other words, its Bayesian prior probability—is sufficiently low as to sustain his trust in published scientific conclusions. And he has to be assuming that subjectively.
Because this certainly isn’t being calculated for in science papers. They might tell you that the null hypothesis is to be rejected with 95% certainty and therefore we can be kinda-sorta 95% sure the hypothesis is verified by the data. But that’s simply never actually true, because they didn’t calculate for the probability of fraud. Add that in, and the probability that their conclusion is true is not what any paper ever says, but somewhat lower. How much lower? Well, Mayer has to rely on his subjective intuition to say. But it won’t be arbitrary. Mayer would argue against someone who claimed to have “their own prior” for fraud that’s 90% (like, say, Creationists and Denialists). And he would argue the point with data. And he would be right. Just as in the case of the aliens and meddling CIA. And so we all do. For everything. All the time. In science and out.
Thus, we are all Bayesians. Including Mayer. He’s just, you know, in denial.
I’d like to see a response to the argument given for Royall’s claim that “pure ignorance can never be represented by a probability distribution.” You’ve given a good argument for the opposite claim, but you haven’t shown how the argument given FOR the claim has gone wrong. This puts me not in a state of knowing you’re right, but instead, in a state of paradox so to speak–I’m looking at two apparently good arguments for incompatible claims.
I quote the argument given for the claim below:
Let’s look at simple genetic example: a gene with two alleles (forms) at the locus (say alleles A and a). The two alleles have frequencies p + q = 1, and, if there are no evolutionary forces acting on the population and mating is at random, then the three genotypes (AA, Aa, and aa) will have the frequencies p², 2pq and q², respectively. If I am addressing the frequency of allele a, and I am a Bayesian, then I assign equal prior probability to all possible values of q, so
P(q>.5) = .5
But this implies that the frequency of the aa genotype has a non-uniform prior probability distribution
P(q²>.25) = .5.
My ignorance concerning q has become rather definite knowledge concerning q² (which, if there is genetic dominance at the locus, would be the frequency of recessive homozygotes; as in Mendel’s short pea plants, this is a very common way in which we observe the data). This apparent conversion of ‘ignorance’ to ‘knowledge’ will be generally so: prior probabilities are not invariant to parameter transformation (in this case, the transformation is the squaring of q). And even more generally, there will be no unique, objective distribution for ignorance. Lacking a genuine prior distribution (which we do have in the diagnosis example above), reasonable men may disagree on how to represent their ignorance. As Royall (1997) put it, “pure ignorance cannot be represented by a probability distribution”.
“This puts me not in a state of knowing you’re right…”
Ironically, this answers your own question.
If you literally know nothing either way, then for you, the epistemic probability I’m right is 50%. By definition. That’s literally what it means to say “you literally know nothing either way.” The one is just a translation of English into mathematical notation.
If you know anything that argues against me, and only then, then you have reason to believe it’s less than 50%. If you know anything that argues for me, and only then, then you have reason to believe it’s more than 50%.
That’s how probability works. As far as how it translates (how degrees of belief covertly reduce to claims about frequency), see PH, pp. 265-80.
To the example (and to provide you some information that I think should bump your 50% for me well above that), note you are not quoting Royall (incidentally, I cannot find that quote in an electronic edition of Royall; a page number would be helpful if anyone has found it). So we are actually talking about Mayer’s attempt at an example. But he doesn’t give any justification for “reasonable men may disagree on how to represent their ignorance,” e.g. he doesn’t give us any examples of how anyone would model the situation differently than he just did, or why their model would be as correct as his. This is my point: he seems to think priors are arbitrary, when in fact he just demonstrated (unwittingly) that they are not.
Indeed, for him to say he extracted knowledge from ignorance betrays ignorance of the fact that probability theory is knowledge, not ignorance. He gets the .25 from knowledge: knowledge of the number of permutations in the scenario, and knowledge of what logic then entails by that. He is not getting it from ignorance. In fact, the .25 is a description of the state of ignorance he is talking about: if you don’t know the frequency is higher or lower than 50% for each (as in, you have no information at all arguing that it is either), then by definition you don’t know the frequency of that one permutation is higher or lower than .25.
At best, he is identifying a common place where errors in Bayesian modeling can occur: the frequency of one allele combination in his scenario cannot be 50%, even if you don’t know what it’s frequency is, because probability theory tells you that is impossible, unless one of the two in the combination has a greater of lower frequency than even; so if someone were to claim that, all else being equal, one of those combinations occurs 50% of the time, they would simply be wrong. Because of the laws of probability. Unless they had evidence that one of the components of that combination had a frequency that was not 50/50. But then we have knowledge again.
Kris,
You raise a valid point that over a continuous hypothesis space, an apparently obvious choice of ignorance prior may (and often will) turn out to lack transformation invariance.
Edwin Jaynes showed how to find the appropriate unique distribution in such cases that lack transformation invariance using group theory (see this paper, for example). The principle of indifference (used to obtain a uniform distribution over a discrete hypothesis space) is a special case of this method.
In many other cases where there seems to be ‘not enough information to determine a prior’, one may also find the appropriate unique distribution using the method of maximum entropy.
It is always a mistake to think that there isn’t enough information to conduct a probability calculation – probability theory is the machine we use to quantify what we do know, so what we don’t know can’t possibly interfere!
I am confused by your easy switching between the terms “likelihood” and “likelihood ratio”. They are not the same thing yet your argument implies that they are. Please explain.
I would also appreciate it if you described the probability distribution function for ignorance. You claim that it has a probability distribution so this should be easy.
On the second question, read the referenced pages in PH that already answer your question. You can get a glimpse by reading my response to Kris Rhodes in this thread, but if you want the complete demonstration, it’s in PH.
On the first question, I need an example of where you see this. I only use “likelihood” by itself in the following cases: (1) I repeatedly directly quote Mayer who uses “likelihoods,” which being in the plural is a valid stand-in for ratio; (2) I do the same, so, ditto; and (3) I once talk about factoring in the “likelihood” for the CIA meddling hypothesis, which is a correct useage (factoring in that likelihood changes the ratios, the point I was making).
“Attempting to bypass that with a Cartesian Demon then fails because it reduces the prior probability even further by adding too many improbable unevidenced assumptions to the hypothesis (and the prior probability of each multiplies against the rest to produce diminishing probabilities overall)”
I assume this would also be the Baynesian reply to presuppositionalist apologetics (e.g., Sye, Hovind, etc.)?
Only if they resort to a Cartesian Demon.
Presuppositionalism is incoherent, so it doesn’t really need a Bayesian rebuttal. It is plain illogical from the word go.
But usually what they are arguing is for (some variant of) the hypothesis that God causes logic to be correct, and then they sort of try to argue that P(e|h) = 1 (where e = logic is correct) and P(e|~h) = 0 (again where e = logic is correct). They go wrong precisely there. Not only does a P(e|~h) = 0 require a demonstration of logical impossibility (and they never present any), because only the logically impossible has a P of 0 (everything else has a nonzero probability, even if small: see PH, axiom 4, pp. 23-26); but also, they don’t even ever demonstrate that P(e|~h) is low. They never explain what a universe without logic in it would look like, or why a godless universe would allow logical contradictions to exist in it.
Whereas, since the LNC (the Law of Non-Contradiction) is simply a physical description of all universes that contain distinctions (start here for that analysis), and universes without gods in them can contain distinctions (in fact, nearly all of them imaginable do), the presuppositionalist claim that a godless universe would not have distinctions in it is false (and therefore the claim that it would not be described by the LNC is false).
The more so when we realize L (the existence of living observers) is in b (our background knowledge) which entails P(e|~h) = 1. Because P(e|~h) is properly P(e|~h.b) and therefore P(e|~h.L), and L can never be true in a universe without distinctions–and this can be demonstrated as a logically necessary truth–therefore if L, even if ~God, then always LNC; because L will never, as in literally 0% of the time, observe itself in a universe not governed by LNC, regardless of whether a god had anything to do with that universe (e.g. even if there are godless universes not described by the LNC, L will never observe itself being in one; so L observing that it is in a universe described by the LNC cannot tell you whether that universe is connected to a god or not, as it could just as easily be one of the imaginable godless universes described by the LNC).
Another thing that bothers me about the “priors are subjective” argument is that it ignores the evidential side of things. The only bad priors you cannot overcome by piling more and more evidence on top of them are p = 0 and p = 1, and if you’re assigning those probabilities to anything… you should rethink your life choices.
Subjective priors are not a problem, if you take some care.
Note: Forgot to hyperlink Mayer’s article! Fixed.
I agree that Bayesian logic is sound. But I still don’t see it as the best tool for making a positive argument in this case.
Using it to refute the arguments of historicists is another matter. It is really clear that there is a massive inconsistency at the core of modern Christianity and Judaism. An inconsistency that priests try to skim over by telling us ‘God moves in mysterious ways’ with a smug little smirk.
I find the argument for trinitarianism to be most peculiar. The support for it in scripture is pretty much non-existent yet it is central. And the position of the messiah in Judaism is equally odd. Shouldn’t such a central character be mentioned rather more often?
The approach I take is a little different: First, if someone is arguing vehemently against something it is quite probably because it is true. Second, any ancient text that has survived was almost certainly written as a commentary on the times in which it was written rather than the time it purports to describe.
“I still don’t see it as the best tool for making a positive argument in this case.”
Which case?
Hello Dr Carrier, Im a big fan of your work on infidels.org and just discovered your blog. I just recently came across a criticism of your work on resurrection on a website. What are your thoughts on it? I’d like to know. If for some reason you can’t reply here, you can mail the reply to me at blueisthecolor@mail.com I’ll post them below. http://www.rightreason.org/2011/richard-carrier-on-the-resurrection-part-1/
http://www.rightreason.org/2012/richard-carrier-on-the-resurrection-part-2/
Thanks. I’ll be hoping for a reply soon.
Really desperately bizarre that.
There are hundreds upon hundreds of these kinds of things. I can only bother to answer a rare few.
No, I don’t consider that one worth responding to. What they say is either already refuted in my writings, or their fallacy is easily spotted.
I’m not a wiz at this point at sitting down and being able to swiftly plug numbers into the equation, but I’m becoming more and more of a formal Bayesian all the time, since I’ve got evidence that I’ve been a Bayesian all the time, just as the title of this entry says we all are. But I can already see that it sure seems that the anti-Bayesians are all reacting to the fact that there’s that estimate in there. With an estimate as part of the formula, one of the things that becomes clarified about the very way we think is that there’s guess work involved in everything we do, and those guesses are based on earlier guesses. That’s not acceptable to a lot of people. I really think that’s what it reduces down to.
That’s really well put.
You’re quite right I think. People don’t want to admit how much guessing actually goes into all their beliefs and conclusions and decisions.
what I am interested in is how much of what happened is actually knowable at this point. And for that I think that Claude Shannon rather than Bayes is likely to be the better guide.
We have signal here and we have noise. And we have a set of historical claims that were written with the express purpose of establishing some sort of basis for a spiritual claim. And then we have the effect of censorship. How much of the original signal can we recover?
A fuzzy signal indeed. Hence the wide margins of error.