Understanding Bayesian History

9 October 2012

So far I know of only two critiques of my argument in Proving History that actually exhibit signs of having read the book (all other critiques can be rebutted with three words: read the book; although in all honesty, even the two critiques that engage the book can be refuted with five words: read the book more carefully).

As to the first of those two, I have already shown why the criticisms of James McGrath are off the mark (in McGrath on Proving History), but they at least engage with some of the content of my book and are thus helpful to address. I was then directed to a series of posts at Irreducible Complexity, a blog written by an atheist and evolutionary scientist named Ian who specializes in applying mathematical analyses to evolution, but who also has a background and avid interest in New Testament studies.

Ian’s critiques have been summarized and critiqued in turn by MalcolmS in comments on my reply to McGrath, an effort I appreciate greatly. I have added my own observations to those in that same thread. All of that is a bit clunky and out of order, however, so I will here replicate it all in a more linear way. (If anyone knows of any other critiques of Proving History besides these two, which actually engage the content of the book, please post links in comments here. But only articles and blog posts. I haven’t time to wade through remarks buried in comment threads; although you are welcome to pose questions here, which may be inspired by comments elsewhere.)

Ian’s posts (there are now two, A Mathematical Review of “Proving History” by Richard Carrier and An Introduction to Probability Theory and Why Bayes’s Theorem is Unhelpful in History; he has promised a third) are useful at least in covering a lot of the underlying basics of probability theory, although in terms that might lose a humanities major. But when he gets to discussing the argument of my book, he ignores key sections of Proving History where I actually already refute his arguments (since they aren’t original; I was already well aware of these kinds of arguments and addressed them in the book).

When Ian isn’t ignoring the refutations of his own arguments in the very book he’s critiquing, he is ignoring how applications of Bayes’ Theorem in the humanities must necessarily differ from applications in science (again for reasons I explain in the book), or he is being pointlessly pedantic and ignoring the fact that humanities majors need a more colloquial instruction and much simpler techniques than, for instance, a mathematical evolutionist employs.

To illustrate these points I will reproduce in bold the observations of MalcolmS on what Ian argues (which also does a fine job of summarizing Ian’s substantive points; my thanks to him for all of it), and then follow with my own remarks, which I have also expanded upon here (saying a bit more than I did in the original comments).

1. Your form of Bayes’s theorem is “confusing and unnecessarily complex.” His preferred form of BT is P(H|E) = P(E|H)P(H)/P(E).

He has 2 objections to your form of the formula: (1) The denominator has been expanded, which he feels is unnecessary. [I pointed out to him that most textbooks actually state BT with the expanded denominator and that most applications of the theorem that one encounters use it in that form as well, but he replied that in his own field (AI) he only needs P(E), so maybe this is just his personal preference from his own experience. Moreover, you give both forms of BT in the appendix.]

And (2) adding the background [i.e., the term b] explicitly is “highly idiosyncratic,” “condescending,” and “irksome,” reminiscent of William Lane Craig. [I agree that it is unusual and unnecessary, but it is not wrong, as even he acknowledges. Moreover, it is easier to transition to the form of the equation that you actually are using, whereby some of the evidence is incorporated into the background, but you never make this explicit.]

2. He also criticizes you for failing to explain the derivation of BT or discussing the definition of conditional probability generally: “While Carrier devotes a lot of ink to describing the terms of his long-form BT, he nowhere attempts to describe what Bayes’s Theorem is doing. Why are we dividing probabilities? What does his long denominator represent?” Consequently BT becomes “a kind a magic black box.”

He then states cryptically: “In this Carrier allows himself to sidestep the question whether these necessarily true conclusions are meaningful in a particular domain. A discussion both awkward for his approach, and one surely that would have been more conspicuously missing if he’d have described why BT is the way it is.”

[I’m not sure what point Ian is making here, but I think he is alluding to the difficulty of calculating P(E), which he discusses in his 2nd post. As I’ll point out later, his criticism is based on a misunderstanding of how you are applying BT.]

MalcolmS finds these two objections to be too trivial. I think they are outright pedantic and belie an abandonment of pedagogical goals and therefore are not really a worthwhile criticism. For example, a humanities major is not going to understand what P(E) is or how it derives from the sum of all probabilities; they are also going to have a much easier time estimating P(E|H) and P(E|~H) because estimating those probabilities is something they have already been doing their whole lives–they just didn’t know that that’s what they are doing. Much of my book is about pointing this out.

This is also why I keep the term b [for background knowledge] in all equations (as I even explain in an endnote, which clearly Ian did not read: see note 10, p. 301): so that laymen won’t lose sight of its role at every stage. Mathematicians like Ian don’t need it there. But historians are not mathematicians like Ian. This is also why I don’t waste the reader’s time by explaining how Bayes’ Theorem (or BT) was derived or proved; instead, I refer interested readers to the literature that does that (e.g., note 9, pp. 300-01). That’s how progress works: you don’t repeat, but build on existing work. I don’t have to prove BT or explain how it was derived; that’s already been done. I just have to reference that work and then show how BT can be applied to my field (history).

3. His next criticism is one that I partially share: “Carrier correctly states that he is allowed to divide content between evidence and background knowledge any way he chooses, provided he is consistent. But then fails to do so throughout the book.” He cites as an example, p. 51, where the prior is defined to explicitly include evidence in it. [The prior should be the probability that the hypothesis is true before any consideration of the evidence.] He continues with this quote from your book [which I also find objectionable]: “For example, if someone claims they were struck by lightening five times … the prior probabilty they are telling the truth is not the probability of being struck by lightening five times, but the probability that someone in general who claims such a thing would be telling the truth.”

This is his response: “This is not wrong, per se, but highly bizarre. One can certainly bundle the claim and the event like that, but if you do so Bayes’s Theorem cannot be used to calculate the probability that the claim is true based on that evidence. The quote is valid, but highly misleading in a book which is seeking to examine the historicity of documentary claims.”

Ian’s point is that BT is defined as P(H|E) = P(E|H)P(H)/P(E), with the prior by definition being P(H), i.e., without any conditioning on the evidence. In the example of the claim of being struck by lightning 5 times, the hypothesis H would normally be “someone was struck by lightning 5 times” and the evidence E would be “he claims to have been struck by lightning 5 times.” Then the prior would indeed be the probability of being struck by lightning 5 times. You instead have as your prior the conditional probability that someone is telling the truth when he claims “in general such a thing,” which would be (part of) the evidence for the claim.

[Effectively what you are doing is treating the evidence E (“someone claims to have been struck by lightning 5 times”) as if it were an intersection of E with a larger set, F (“someone claims such a thing”), that is more general, and then absorbing F into B. The form of BT that you are using is then: P(H|EB) = P(E|HFB)P(H|FB)/P(E|FB).

Now, this trick may be useful for actually calculating P(H|BE), since then you avoid having to calculate P(H|B) or P(E|B), but you haven’t been entirely upfront with the reader about what you are doing.] Moreover, as he points out in the quote above, you still are left with P(H|FB), which is very similar to P(H|EB), so you haven’t really used BT to solve the problem.

[In the actual applications in your book, what you generally do is use BT in this way to reduce the problem to conditional probabilities with more general evidence, and then use an empirical frequency to estimate them. This avoids the problem that he raises about having to estimate, say, the probability of the NT existing.]

Indeed. And this likewise ignores the fact that historians need to do different things than scientists. Thus the way I demarcate b from e is what is most useful to historians–and again, I even explain this explicitly in an endnote (note 10, p. 301) and discuss it several other times in the book (see “background knowledge vs. evidence” in the index, p. 333). Ian is simply ignoring all of that, and thus not responding to what my book actually argues.

Historians are testing two competing hypotheses: that a claim is true vs. the claim is fabricated (or in error etc.), but to a historian that means the actual hypotheses being tested are “the event happened vs. a mistake/fabrication happened,” which gives us the causal model “the claim exists because the event happened vs. the claim exists because a mistake/fabrication happened.” In this model, b contains the background evidence relating to context (who is making this claim, where, to what end, what kind of claim is it, etc.), which gives us a reference class that gives us a ratio of how often such claims typically turn out to be true, vs. fabricated (etc.), which historians can better estimate because they’ve been dealing with this kind of data for years. We can then introduce additional indicators that distinguish this claim from those others, to update our priors. And we can do that anywhere in the chain of indicators. So you can start with a really general reference class, or a really narrow one–and which you should prefer depends on the best data you have for building a prior, which historians rarely have any control over, so they need more flexibility in deciding that (I discuss this extensively in chapter 6, pp. 229-56).

You could, if you wanted, build out the whole Bayesian chain (e.g. see endnote 11, page 301), all the way from raw data, but why should historians trouble themselves with that? They already have context-determined estimates of the global reliability of statements based on their experience. If they get into an argument over conflicting estimates there, then they can dig into the underlying assumptions and build out the whole Bayesian case from raw data, or at least from further down the chain of underlying assumptions. But it’s a massively inefficient waste of their time to ask them to do that all the time, or even a lot of the time.

Ultimately, all Bayesian arguments start in the middle somewhere. If they didn’t, they’d all have priors of 0.5 (or whatever equally ramifies the spread of all possible hypotheses). Ian might prefer to start somewhere past the assembly of raw sensory data and toward the frequency-sets of basic event-occurrences (so maybe he would try to answer the question “Did Winston Churchill exist?” by starting with questions like “What is the physical probability that a man named Winston Churchill would be born in pre-WWII England?,” which would be a ridiculous way to proceed). But even that is doing what I am doing (he, too, is skipping a step: in this case, how we know the frequency data about names is correct, given a certain body of sensations Ian experiences, and so on). Historians usually skip all the science steps. Because they’re doing history. Not science (in the narrow sense; I discuss the relation between history and science on pp. 45-49). But one can always go back in and check those steps. If you had to for some reason.

In short, historians need to be more flexible in how they model questions in Bayes’ Theorem. Ian’s pedantry wouldn’t help them at all. Because it really doesn’t matter how you build the model–as long as you can articulate what you are doing, and it’s correct (as I explain in chapter six, especially). Because then it can be vetted and critiqued. Which is all we want. And that is all my book arms the historian to do. And that’s all she needs in order to get started.

Indeed, having these conversations (about what models to use and what frequencies fall out of them, and thus how to define h and demarcate e from b in any given case) is precisely what historians need to be doing. My book gives them the starting point for doing that. Because otherwise it won’t be the same for every question, because the data-availability differs for each case and thus historians have to demarcate differently in different cases. Scientists don’t face this problem, because they always have old data (which entails a prior) and new data (which gives likelihoods) and then only address problems that have tons of precise data to work from. Historians can almost never do any of those things. They have to adapt their application of Bayesian reasoning to the conditions they are actually in. Proving History explains why. A lot. Ian, apparently, just ignores that.

MalcolmS also observes how pedantic and insubstantial these criticisms are…

So far all of his criticisms have been stylistic (either about how equations were expressed or how they were explained), rather than truly mathematical. The rest of his post is no different.

4. He takes you to task for using Bayes’s formula as a synonym for Bayesian reasoning. In particular, he ridicules this quote from your book: “any historical reasoning that cannot be validly described by Bayes’s Theorem is itself invalid.” His objection seems to be that, while BT is of course true, there are other equations that one could derive from the definition of conditional probability that couldn’t be derived from BT.

[Actually, one could derive the formula for conditional probability from BT if one had some sort of definition of conditional probability that implied P(AB|B) = P(A|B) and P(A|AB) = 1: If one assumes BT (i.e., P(H|E) = P(E|H)P(H)/P(E)) then P(H|E) = P(HE|E) = P(E|EH)P(HE)/P(E) = P(HE)/P(E). But this is besides the point, since you only limited your claim to “historical reasoning,” a term which you unfortunately didn’t define.]

He further states your attempt to prove this “laughable” assertion is not credible, but gives no other reason than what I just stated above.

Which is an example of a non-critique critique: saying something is wrong, but giving no reason why, nor even interacting with the argument you are gainsaying at all. Ian’s overall claim is that Bayes’ Theorem can’t be used to reach historical conclusions because the probabilities are all unknown. But if that’s true for BT, it’s true for all probabilistic reasoning about history, which means all reasoning about history whatever.

I demonstrate (with a formal deductive syllogism even: pp. 106-14; supporting the informal arguments on pp. 62-65 and 81-93) that all historical arguments are fundamentally Bayesian (whether historians realize this or not), so if Ian were correct that no conclusions about history can be reached by Bayesian reasoning, then he is saying no conclusions about history can be reached. Period. Such radical skepticism about history I have refuted before (in Rosenberg on History, where I also show how, if that were true, science is also impossible, as it depends on historical facts, i.e. data and reports about things that happened or were observed, so if you can’t do history, you can’t do science).

That Ian totally ignores this, and doesn’t address my syllogistic argument at all, makes his critique here useless. Indeed, that is the point of my formalizing an argument: so critics will be able to identify any errors that invalidate the conclusion. If he’s not even going to do that, then he isn’t taking the book’s argument seriously. And so neither should we take his critique seriously.

One wonders what method he thinks would replace Bayes’ Theorem, that historians can use. Since all historical arguments consist of deriving conclusions from statements of probability, is there any logically valid way for them to derive those conclusions other than Bayes’ Theorem? (Or anything that reduces to it? See pp. 92-93; and again, pp. 106-14.) If you want to be a useful critic, you have to answer that question. I suspect any sincere effort to do so will result in realizing the answer is no.

5. He then goes on to discuss your “cheeky” proposal to unify frequentist and Bayesian interpretations of probability. His criticisms here are that your proposal is “unnuanced” and presented as if were original, when it is not. (Not that he is accusing you of taking credit for others’ ideas but rather of being possibly unaware of previous work in the field.) He also states that this “hubris” is typical of “a tone of arrogance and condescension that I consistently perceived throughout the book.”

Which is just ad hominem. I’m quite sure I don’t know all the arguments published on the debate between Frequentists and Bayesians (it must be in the thousands, counting books and articles), as I’m sure neither does Ian. Or any living person probably. But certainly, if anyone has articulated the same conclusion as mine before, I’d love to accumulate those references (it seems Ian claims they exist, but then fails to adduce a single example). So by all means, if anyone knows, post them in comments here.

That is neither here nor there. The real issue is whether my resolution of that debate is correct. Whether Ian dislikes my tone or thinks it’s arrogant or condescending is not a valid rebuttal to whether it is correct. I also don’t think he’s making an objective assessment, since I am responding to the debate as framed in recent literature by leading professors of mathematics (some of which I actually cite in the book), so he is here actually critiquing them for not knowing the solution I propose. After all, if even they don’t know about this supposedly condescendingly unoriginal argument of mine (and if they did, they’d have resolved the debate with it in the literature I cite), then why is it condescending for anyone like me to suggest it?

6. As a final comment on the mathematics he raises 2 issues but doesn’t elaborate on either: “I felt there were severe errors with his arguments a fortiori…and his set-theoretic treatment of reference classes was likewise muddled (though in the latter case it coincidentally did not seem to result in incorrect conclusions).” This is the entire extent of his discussion of these, from his perspective, problems.

I readily concede that my colloquial discourse will lead to ambiguities that chafe at mathematicians; but this is precisely the kind of shit they need to get over, because they are simply not going to be able to communicate with people in the humanities if they don’t learn how to strategically use ambiguity to increase the intelligibility of the concepts they want to relate.

It’s like Heisenberg’s Uncertainty Principle: you can have precision with unintelligibility to almost everyone but extremely erudite specialists, or you can have ambiguity but with intelligibility to everyone else. The more ambiguity, the greater the clarity, but the lower the precision. This is a fundamental principle of all nonfiction literature, especially any popularization of scientific or mathematical concepts to a nonscientific, nonmathematical public.

I have a particular audience. I am writing for them. And they are not mathematicians or scientists. However, I always think there are several points where I could be a better writer. Because I always know there is room for improvement. It would be more helpful to see someone articulate a point I make in my book better than I did. I would love that. And if anyone points me to any examples of that, I’ll definitely blog about it.

7. In his conclusion he has some positive things to say about the book:

“Outside the chapters on the mathematics, I enjoyed the book, and found it entertaining to consider some of the historical content in mathematical terms….History and biblical criticism would be better if historians had a better understanding of probability….

I am also rather sympathetic to many of Carrier’s opinions, and therefore predisposed towards his conclusions. So while I consistently despaired of his claims to have shown his results mathematically, I agree with some of the conclusions, and I think that gestalts in favour of those conclusions can be supported by probability theory.”

But here is his final critique:

“But ultimately I think the book is disingenuous. It doesn’t read as a mathematical treatment of the subject, and I can’t help but think that Carrier is using Bayes’s Theorem in much the same way that apologists such as William Lane Craig use it: to give their arguments a veneer of scientific rigour that they hope cannot be challenged by their generally more math-phobic peers.”

As you can see, he hasn’t presented any concrete objection to the mathematics in the book – just the way the mathematics was presented and explained and the overall tone of the book.

So it would seem. That’s not really a substantive critique.

Moreover, the difference between me and W.L. Craig is revealed by all of Ian’s qualifying remarks–like “coincidentally did not seem to result in incorrect conclusions,” a backhanded way to admit I’m actually using it correctly, unlike Craig, thus negating his analogy. The whole point of my book is to prevent Craig-style abuses by making it clear how to use BT correctly and how to spot its being used incorrectly. And indeed, I repeatedly emphasize that anyone who wants to use it needs to be clear in how they are using it so it can be vetted and critiqued, thus avoiding the “dazzling with numbers” tactic by arming the reader with the ability to see through it. (Hence I somehow managed to “psychically” refute Ian’s argument in note 33, p. 305, before he had even made it; likewise my remarks on pp. 90-92. In other words, Ian didn’t really read the book very carefully, as he is clearly unaware of my rebuttals to his own arguments.)

From the whole of his initial critique, Ian doesn’t seem to have as much experience as I do in trying to explain Bayesian reasoning to nonmathematicians. Much of my book was formed in response to the difficulties I faced when doing that. Things Ian thinks would be a better way to proceed, I have discovered first-hand are often the worst way to proceed. In communicating ideas to humanities majors especially, I have learned you have to approach explanations in very different ways than trained mathematicians do; and that often, mathematicians do not understand this.

Now I’ll turn my attention to Ian’s 2nd post…

Most of his post is taken up with an explanation of conditional probability and Bayes’s theorem, which is actually pretty good; it would’ve been a good idea to devote a few pages in your book to something like this. But so far there’s no criticism of your book, or even really much mention of it. I’ll start the numbering over again from 1 to list his criticisms, which eventual start to appear.

1. For historical questions, there usually is no easy way to calculate, or even estimate in many cases, P(E). He says, “I’ve never seen any credible way of doing so. What would it mean to find the probability of the New Testament, say?…I’m not sure I can imagine a way of calculating either P(H∩E) or P(E|H) for a historical event. How would we credibly calculate the probability of the New Testament, given the Historical Jesus? Or the probably of having both New Testament and Historical Jesus in some universe of possibilities?”

[He writes as if he didn’t actually read your book, although I know that he has, because he is going by his own knowledge of using Bayes’s theorem and not looking at the examples where you apply it. As I pointed out in a previous post, you get around the issue of estimating P(E) and P(H) by conditioning on general statements of the evidence, so that you’re calculating P(H|F) and P(E|(~H)&F). These need to be estimated somehow, but may be easier since the evidence F is more commonly encountered. It’s funny that he observed you doing this in his first post but then never thought through the implications for the rest of his posts. I suggest that if you explained what you were doing mathematically and how it differed from the way scientists usually state and apply Bayes’s theorem there would not be so much confusion.]

MalcolmS is right: my book is actually articulating ways to get around the very problem Ian’s talking about (which I certainly acknowledge: note, for example, my discussion of it on pp. 110-14). Where we can’t, we can’t claim to know anything (Axiom 3, page 23; also Axiom 5, page 28). His question about how we derive “the probability of the New Testament” is unclear (what exactly does he mean?), but I address something quite close to it on pp. 77-79, using the Gospel of Mark rather than the whole NT (and I get even more specific in my discussion of emulation criteria later on: pp. 192ff.), which appears to completely answer his question. So why, then, does he not know that I answered his question? If he is ignoring my answer, then he is not critiquing my book, but some straw man of it.

In any event, the problem he is talking about (and that I also talk about in the book) is addressed by (a) being less ambitious in what you will attempt to prove (a lesson historians often need to learn anyway) and (b) being more clear and precise in laying out what evidence it is that you think produces a differential in the consequent probabilities (most evidence simply will not, and therefore can be ignored). Thus “the whole NT” is irrelevant to historicity; likewise even “the exact content of Mark.” We will need to get much more specific than that. What is it in Mark that makes any significant difference? And why? And how much? Whatever your answers to those questions are–literally, whatever they are–we can model your answer using BT. And in my next book I do that.

Ironically, as I noted before, Ian is committing the very mistake here that I warn against in the book: if we cannot estimate P(E|H), then historical knowledge is simply impossible. Because all historical conclusions implicitly rely on estimates of P(E|H) (and/or P(E|~H)), or their differential (using the “Odds Form” of BT: see “Bayes’ Theorem, odds form” in the index, p. 333). That’s all historical conclusions ever reached before now, and all that will ever be reached by anyone ever. Thus, if BT can’t solve this problem, no method can. And if Ian thinks otherwise, it’s his task to produce that method, a method by which (a) a historian can get a conclusion about history without (b) ever relying on any implicit assumption about any P(E|H) or P(E|~H); or, for that matter, P(H). Good luck with that. Because it can’t be done. If he’d tried it, he’d know.

2. He then asks whether using the expanded-denominator version of the formula [which he again annoyingly attributes to you and Craig as if you are the only ones who write it this way, and he’s not talking about the inclusion of the background here either] could ameliorate this problem with estimating P(E):

“This is just a further manipulation. The bottom of this equation is still just P(E), we’ve just come up with a different way to calculate it, one involving more terms. We’d be justified in doing so, only if these terms were obviously easier to calculate, or could be calculated with significantly lower error than P(E).”

[He doesn’t seem to think that P(E|~H) is often easier to calculate than P(E), nor does he notice the advantage of using a variable, P(E|~H), that is independent of the other 2, P(H) and P(E|H), unlike P(E), which isn’t. Perhaps if he looked at your examples or even the standard example of medical testing he’d see why the expanded form often works better.]

Here MalcolmS has already effectively rebutted this point. Ian just fails to grasp the way BT has to be employed in the humanities, and what it takes to translate how people in the humanities already reason into terms definable within BT. The utility of knowing the whole structure of Bayes’ Theorem is so we can understand the logical structure of our own thought–and thus make better arguments, and better identify the flaws in others.

3. His next criticism is a bit bizarre, as he complains about having to use estimates: “If these terms are estimates, then we’re just using more estimates that we haven’t justified. We’re still having to calculate P(E|H), and now P(E|~H) too. I cannot conceive of a way to do this that isn’t just unredeemable guesswork. And it is telling nobody I’ve seen advocate Bayes’s Theorem in history has actually worked through such a process with anything but estimates.”

[Of course you use estimates – even in the sciences one does. Unless you’re doing a problem with dice or cards, the numbers one plugs in are always estimates. And you present several examples where you attempt to justify your estimates. It would help if he actually addressed one to show why it was nothing but “unredeemable [sic] guesswork.”]

He sums up in a similar vein: “So ultimately we end up with this situation. Bayes’s Theorem is used in these kind of historical debates to feed in random guesses and pretend the output is meaningful.”

Here MalcolmS has already effectively rebutted this point, too. Ian seems to be conflating “not knowing x precisely” with “not knowing x at all.” I explicitly address this fallacy several times in the book (early in chapter six, and in my discussion of arguing a fortiori: pp. 85ff.). In short, historians don’t need the kind of precision Ian seems to want. In fact, as I explain in the book, that they can’t get it even if they wanted it is precisely what demarcates history from science (pp. 45-49).

4. As a teaser for what he intends to write about in a later post, he states why he thinks a fortiori reasoning doesn’t work:

“But, you might say, in Carrier’s book he pretty much admits that numerical values are unreliable, and suggests that we can make broad estimates, erring on the side of caution and do what he calls an a fortiori argument – if a result comes from putting in unrealistically conservative estimates, then that result can only get stronger if we make the estimates more accurate. This isn’t true, unfortunately, but for that, we’ll have to delve into the way these formulas impact errors in the estimates. We can calculate the accuracy of the output, given the accuracy of each input, and it isn’t very helpful for a fortiori reasoning.”

[His characterization of your a fortiori argument – “if a result comes from putting in unrealistically conservative estimates, then that result can only get stronger if we make the estimates more accurate” – is easily demonstrably true: P(H|E) is monotonically increasing in P(H) and P(E|H) and decreasing in P(E|~H), so it follows immediately that if one takes a maximum estimate for P(H) and P(E|H) and a minimum one for P(E|~H) then the estimate for P(H|E) using Bayes’s theorem is a maximum, and similarly one derives a minimum for P(H|E). Furthermore, tightening the possible ranges for these variables yields a tighter range for P(H|E), so the a fortiori argument is in fact valid.

I think what he is intending to say is that while one may in principle get a possible range for P(H|E) from possible ranges for the other probabilities, in practice, the range turns out to be too wide to be useful. Whether that turns out to be the case will be seen when you actually try to apply it.]

His final comment supports my reading of him: “It doesn’t take much uncertainty on the input before you loose any plausibility for your output.”

[If this were true it would true for all uses of BT, so how does he account for its use in science? It’s not like those estimates for false positives in DNA testing (in his example) lack errors.]

Here MalcolmS has already effectively rebutted this point as well. Like he says, we’ll have to see what Ian comes up with. But I suspect he is ignoring everything I explained in the book about the fact that historians have to live with some measure of ambiguity and uncertainty, far beyond what scientists deal with. That’s what makes history different from science (again: pp. 45-49; and again, Axioms 3 and 5, per above). Ian seems either to want history to be physics, or to think I want history to be physics. The one is foolish; the other betrays a failure to really read my book (e.g. pp. 60-67).

5. He hints at another problem here but says he’ll explain some other time:

“[I]n subjective historical work, sets that seem not to overlap can be imagined to overlap in some situations. This is another problem for historical use of probability theory, but to do it justice we’ll need to talk about philosophical vagueness and how we deal with that in mathematics.”

As MalcolmS says, Ian’s criticism on this point is too vague to even know what he means. At present no response is required.

6. There are 4 footnotes to the post, only the last of which could be taken as a criticism of using BT with uncertainty (specifically the form of BT with the denominator expanded). Here it is in full:

“You’ll notice, however, that P(E|H)P(H) is on both the top and the bottom of the fraction now. So it may seem that we’re using the same estimate twice, cutting down the number of things to find. This is only partially helpful, though. If I write a follow up post on errors and accuracy, I’ll show that errors on top and bottom pull in different directions, and so while you have fewer numbers to estimate, any errors in those estimates are compounded.”

[Since P(E|H)P(H) appears in both the numerator and denominator, an increase in it would increase both the numerator and denominator, so the effects from each would offset, not compound! But P(H) appears in the second term in the denominator, so it is not quite that simple. Changes in P(E|H) would partially cancel in the numerator and denominator, making P(H|E) less sensitive to changes in it, but for P(H) depending on the ratio of P(E|H) and P(E|~H), the effect of changes in the prior on the top and bottom of the fraction can indeed compound, but not for the reason he stated.]

That is his whole criticism. Basically he doubts that BT can be practically used in BT because he feels the inputs are too subjective and have uncertainties that are too wide. We’ll have to wait for your next book to see if you can pull it off.

The quoted argument doesn’t seem mathematically informed, unless he means errors in P(E|H) and P(E|~H) can pull in different directions; otherwise, P(H) and P(~H) always sum to 1, so they actually consist of a single estimate, not two, so they can’t pull against each other. If you estimate P(H) against yourself, you have also estimated P(~H) against yourself, by definition. And if you do the same with both P(E|H) and P(E|~H), they can’t ever pull in opposite directions, either. The compounded error between them will then only make your argument more a fortiori. So there isn’t any discernible criticism here that I can make out. (You can see how I already address this whole issue on pp. 85-93.)

Overall Conclusion: So far I am not impressed. It doesn’t look like Ian’s taking any argument in the book seriously. His critiques are almost like some Cato Institute effort at refuting a policy that they don’t really have any valid argument against but they have to refute it anyway so they come up with whatever trivia and handwaving they can think of. The fact that almost all of Ian’s critique is already rebutted in the book itself (often directly and explicitly) only strengthens the analogy.

46 Comments

Austerity on October 9, 2012 at 4:07 pm

I have one narrow criticism that, after reading the above, might be related to your discussion of point 3 in Ian’s first post. It might best be summarized as advice to prefer clarity when it comes to stating hypotheses.

I thought about posting this a couple of times but decided both times that maybe the distinction I am recommending is so subtle as to be no distinction at all. That perhaps it was just me being momentarily dense. It was something I had to stop and think about for a minute though, to make sure you were saying something sensible.

I have the Kindle edition and the Kindle tells me it is on page 133. The discussion regards Mark’s narrative following the Psalms.

You write, “In Bayesian terms, the probability of all these coincidences with the Psalms is much lower on the hypothesis that they really happened than on the hypothesis that Mark is creating a narrative out of the Psalms.”

So the evidence, e, is “coincidences with the Psalms are in Mark,” and the hypotheses are h1=”they really happened,” and h2=”Mark invented them by following the Psalms.” Shortly after you say that P(e|INVENTED) >>> P(e|HAPPENED).

This gave me pause. My thinking went in a few steps.

1) With the wording of the paragraph, P(e|HAPPENED) should represent how likely our evidence is given the hypothesis that the events really happened.

It wasn’t clear to me why it would be significantly lower than the INVENTED case. Especially if our background includes the fact that a Jesus cult formed. That seems like exactly the kind of thing someone would want to include. “You’re telling me the way he died followed the Psalms? I’m definitely including that!”

Now, HAPPENED might have a low prior but you are clearly talking about the consequent here, so I don’t think this is what you could really mean.

2) So I thought maybe you just meant that some-passion-or-other HAPPENED. Then P(e|HAPPENED) would be lower because there are a lot of ways that some-passion-or-other could happen and one that follows the Psalms is only one among many.

The problem here is that then HAPPENED isn’t mutually exclusive with INVENTED. Some-passion-or-other could happen and still have Mark invent his particular narrative. (i.e. P(H) + P(~H) won’t add up to 1)

3) Finally I decided you must mean the first hypothesis is h1=”Mark accurately recorded real events.” (I told you it was subtle!) This has the advantage that, by being non-specific about what exactly he is recording (other than it being true), it implicitly pulls in all the other ways some-passion-or-other could happen, while still being exclusive with INVENTED (or, technically, with the hypothesis that at least some of Mark was invented).

You are probably thinking something like, “So Mark accurately recorded real events? In other words , ‘they really happened’!” But, I think this distinction makes it more clear what is really going on, which might be important since your audience is people without any Bayes background. I think maybe being colloquial, in this case, gets in the way of making the use of BT clear.

Or maybe not. I know it is pretty subtle but, like I said, it gave me a bit of pause, and at first I thought you might have made a blunder. I was thinking there was a later instance of this kind of potentially confusing language, but I couldn’t locate it. It isn’t easy to thumb through a book on a Kindle.
Reply
- Richard Carrier on October 10, 2012 at 10:27 am
  
  You’re right, my wording is confusing there. This is an example of where my writing could be improved (as I mentioned in the article above).
  
  If we assume h is literally that the coincidences all happened as described, then P(e|h) = 1 and (as you note) the problem is moved to the prior (where the prior probability of such coincidences is then low; the math then works out the same–but the model should still then be described differently).
  
  So I should have explained “happened” means simply that Jesus was crucified etc. (and not the presumption of any specifics; which throughout the book I explain is the better way to form a hypothesis: otherwise, as in the scenario above, we are gerrymandering–see “gerrymandering” in the index, but esp. the basic explanation of the problem of trying to move improbabilities from the consequent to the prior on pp. 80-81).
  
  Then P(e|h) = the probability that “Jesus was crucified etc.” would generate any such set of coincidences, which is low; whereas P(e|~h) = the probability that “the author is inventing some set of coincidences” would generate some comparable set of coincidences, which is high. Here, in parallel, we don’t have ~h assert the exact coincidences, only the general hypothesis that the author is inventing coincidences (regardless of what they turn out to be; hence, pp. 77-79, with “coefficient of contingency” in the index).
  
  You note the problem of excluded middle: that “Jesus was crucified etc.” is compatible with “the author is inventing some set of coincidences” and therefore there is a third hypothesis that has to be considered. This would be P(e|hspecial): that “Jesus was crucified etc.” and that Mark’s account of it is completely (or mostly) fabricated. But P(e|h) = the probability that “Jesus was crucified etc.” would generate any such set of coincidences, which probability is the same whether Mark is fabricating his account or not (although by definition it is here assumed not); and P(e|~h) = the probability that “the author is inventing some set of coincidences” would generate some comparable set of coincidences, which probability is the same whether Jesus was crucified or not (and here, no assumption is stated regarding that). So there is no middle being excluded; the model simply isn’t even testing whether Jesus was crucified, but whether Jesus being crucified would cause e.
  
  This means P(e|hspecial) is already implicitly included in P(e|~h). In other words, ~h only asserts that Mark fabricated, not that Jesus wasn’t crucified; so h can then be restated as “Mark didn’t fabricate” rather than “Jesus was crucified etc.” (it’s just that “Mark didn’t fabricate” entails that “Jesus was crucified etc.” must have caused e). And that is essentially the solution you worked out. You have provided a guideline for a much better rewrite of that page.
  
  (And the cool thing is how well you thought like a Bayesian to work out the correct model! That’s exactly the sort of thing I want historians to know how to do.)
  
  BTW, one could still unpack that third hypothesis if they wanted to and have a three-hypothesis test. P(e|hspecial) would differ from P(e|~h) only by the degree to which it would be improbable for Jesus to actually have been crucified etc. and for Mark to fabricate an account of it anyway, which might not be very much different (i.e. on minimal historicity, whereby we assume most information was lost, thereby explaining the widespread contemporary silence about it even among Christians, it would not be very unlikely that Mark would fabricate a crucifixion account even for a real crucifixion; whereas theories that Mark would have good information, which would make the probability of his “fabricating anyway” lower, would then struggle to explain why we had to wait for Mark to write about it, which silence would in turn be less probable, and so on–but this then gets into more complex arrays of hypotheses to test, and over-complicates the problem for the purposes of the point being made on p. 133).
  Reply
youngalexander on October 9, 2012 at 4:41 pm

MalcolmS: “3. His next criticism is a bit bizarre, as he complains about having to use estimates …”

Indeed.

“Of course you use estimates – even in the sciences one does. Unless you’re doing a problem with dice or cards, the numbers one plugs in are always estimates.”

Of course they are. An observation of some property of a physical phenomena results in an estimate of its value within certain limits. It would be meaningless otherwise. Such errors are simply an integral part of the measurement procedure. When the data is subsequently processed the errors are as well.

As you reply, Ian appears to be complaining of the size of the errors in history, but that is irrelevant. He gives the impression of one who is recoiling from the ‘shock’ of the new, ie. the application of BT to history.
Reply
- Richard Carrier on October 10, 2012 at 10:42 am
  
  That or he is recoiling from the shock of realizing that historical knowledge is a lot less certain than scientific knowledge. Most of us already got that memo a long time ago. But perhaps it’s startling to scientists?
  
  I know the general public has a hard time with this–the levels of uncertainty that actually exist for historical claims (often even what we consider well nigh certain claims) would disturb people if the risk of being wrong were greater. For example, if we said the odds were 1 in 100 that Jesus didn’t exist, we’d confidently say it was certain that Jesus existed. But if we said the odds were 1 in 100 that your car will explode the next time you sit in it, you would not confidently say it was certain your car was safe. To the contrary, you sure as hell would never get into it again. Translating one’s willingness to grant 1 in 100 as a real risk from the one case to the other is intuitively difficult for people. And notably, a 1 in 100 chance of being wrong would be rejected by every scientific peer review process there is.
  
  Thus, historians, and people generally, have become complacent with a double standard, assuming their probabilities equate to certainties, when in fact they don’t really, not by scientific standards, nor by any standard that relates to a real risk. It only gets worse when we are looking at claims in history that have, say, a 1 in 5 chance of being false. Which would not be unusual. But that is still different from a 4 in 5 chance of being false. Thus, even highly uncertain outcomes in historical analysis make a difference to what we assert–a concept that might be hard for scientists to grasp, who are accustomed to simply throwing out all such uncertain results–being, as they are, unpublishable. But “unpublishable in a science journal” does not equate to “communicating no knowledge about the world at all.”
  
  You just have to be more comfortable with ambiguity and uncertainty than that. Obviously, not when the risk of being wrong is high. But that isn’t commonly the case in history. Although it is sometimes the case. So one has to be careful before using a historical claim on which to base a philosophy or social policy, for example. And that is why scientists have such high standards of documentation, since science depends on historical claims, and thus it must ensure those claims maintain extremely low probabilities of being false…the fact that people didn’t have that sensibility in antiquity is precisely what puts us in realms of uncertainty about such things as whether Jesus existed or how Christianity began, something you’d think God would anticipate and fix right from the get-go; just one more evidence Christianity had no God behind it.
  Reply
alexanderjohannesen on October 9, 2012 at 4:49 pm

That settles it; I’ll buy your book right now. Ian’s criticism is damn outright puzzling, but I think he actually explained his error the best;

“It doesn’t read as a mathematical treatment of the subject”

Um, yes? It isn’t, nor intended to be, nor advertised as such. Someone is, indeed, being pedantic, nitpicking at the straw outline of a straw man.
Reply
IrrcoIan on October 10, 2012 at 5:31 am

There’s a style of rebuttal by exhaustive quoting. Which is fair enough, but degenerates into picking fault with the character of what is said. So although there are lots of things in this response that in my opinion are wrong, misunderstood or unfair (discussing through an intermediary is probably not useful), I want to focus on the book.

It might be worth me summing up my conclusions on the book.

1. I agree with a lot of your conclusions. Particularly with regard to criteria. I probably liked the book partly because the you chose was on things were you confirmed many of my biases.

2. I think probabilistic reasoning under a Bayesian interpretation can help build stronger intuitions about history, and are valuable skills to have for anyone in the arts. I’m not convinced it is the most important thing a historian should learn, but if learned, and learned rigorously, can be nothing but helpful. Another way to state this: Malcolm on my blog and you here (and in the book) make the point that, even with the loosest assumptions and the worst approximations, we’re no worse off with probability theory than making the same assumptions and approximations narratively. Which is what most historians do anyway. I agree, totally, but that doesn’t mean it is anything more than trivially more accurate. Can you quantify the increase in accuracy? I don’t think so, not without quantifiable test cases.

3. Pedagogy: the book seemed to be aimed at helping folks interested in this area understand the applicability of Bayes’s Theorem to the field. It didn’t read as a book that assumed that knowledge and suggested an application, but read as an introduction to the topic. So I think it is a valid and non-pedantic criticism to say that it’s lack of any explanation of what probability is, what conditional probability is, what Bayes’s theorem is doing and why, and how it is derived, is important. The fact that such foundational content is relegated to footnote references to other books made me doubt your sincerity in actually wanting to help your readers get to grips with the topic.

4. Technical accuracy: It is understandable if you’re approaching the math as an amateur, but you give the impression you simply don’t understand certain things you discuss. Issues like Frequentist/Bayesian issues: where the controversy isn’t even discussed in recognizable terms, let alone your solution being valid. It isn’t that I’m claiming your solution is not novel, but that you haven’t even stated the problem properly. Similarly with Bayes under sets of evidence: the problem of independence of multiple pieces of evidence. Statistical sensitivity to reference classes. Changes of reference class when updating for new evidence. These are tricky, tricky issues, that when you do this for real, you have to spend time on. Ignoring them has sent innocent people to jail.

5. The upshot: Doing this non-rigorously (you make the point that you don’t want to use this scientifically, or mathematically) gives the massive danger of choosing inputs to the process (probability estimates, error ranges, reference classes, choice of evidential features, phrasing of the evidence classes, phrasing of the hypothesis class — all of which are under your control) which give the output you want. There are simply so many people who’ve used Bayes’s Theorem as a black box to give them back their pre-conceptions. This happens *a lot* in science, where statistical analysis demonstrate the researchers biases, which then disappear when repeated by others without the same bias. I agree with you that being clear about what those inputs are is important, but the choosing of the actual numbers is perhaps the most obvious and least important. From what I’ve seen over years of this, you are falling into the trap of thinking you’ve done more than restate your conclusions.

So tl:dr – I liked the book; but I got the sense it was more polemic and less pedagogic than you wanted it to appear; you displayed a tendency to use terminology sloppily, over-extend your knowledge (clearly you have a good grasp of some elements of the math), and make generalizations without enumerating the conditions for them to hold; and the book did very little to show that you’d properly wrestled with the errors inherent in your selection biases and numeric estimates, and therefore that you conclusions were anything more than what you put in.

There is a big literature dealing with some of these issues, and while I would never expect you to have mastered it (nor have I, or perhaps anyone), it is important to know where the main problems are.

Unfortunately, the general style of internet arguments degenerates rapidly into people insisting others prove their misunderstandings wrong. Which by definition is hard. So I’ve struggled to address, for example, Malcolm, on my blog, without saying “go away and do some of this for real problems where you can quantitatively and objectively check your answers, and then you’ll see how tricky it is.”
Reply
- Richard Carrier on October 10, 2012 at 11:43 am
  
  Thank you for these clarifications. They didn’t come through in your blog, so it’s helpful to have them.
  
  Malcolm on my blog and you here (and in the book) make the point that, even with the loosest assumptions and the worst approximations, we’re no worse off with probability theory than making the same assumptions and approximations narratively. Which is what most historians do anyway. I agree, totally, but that doesn’t mean it is anything more than trivially more accurate. Can you quantify the increase in accuracy? I don’t think so, not without quantifiable test cases.
  
  I think you are confusing here two different kinds of improvement: improvement in probability estimates for correctly reasoned arguments; and improvement from incorrectly reasoned arguments to correctly reasoned arguments.
  
  I don’t claim much in the way of the former (that can perhaps arise from the process of using BT as a tool for better mediating disagreement, as I explain, for example, in chapter 6, pp. 208-14, and chapter 3, pp. 88-93; and also by an understanding of BT teaching historians the importance of getting more data, as has happened in archaeology; and in the simple sense of forcing historians to actually confront what their probability estimates are, and what that means as far as odds of being wrong and the attending humility that often will entail), but I am much more concerned with the latter (hence re-read pp. 92-93 in particular).
  
  Historians are notorious for making illogical arguments and thinking they’ve made a good case (not only is this massively documented in Fischer’s Historians’ Fallacies but it’s what I essentially document throughout chapter 5), or rejecting logical arguments by resorting to fallacious dismissals of them (rejecting sound arguments from silence, or not knowing when an argument from silence is sound, or how to test the soundness of an argument from silence, is a very common example, hence understanding BT can help historians do all of those things, and thus get out of the rut of relying on their uninformed and often illogical gut intuition: hence pp. 117-19).
  
  This is where BT can cause tremendous improvement in how historians reason, argue, and debate. Fischer notes in his book that someone needs to discover the logic of history, and that (as of his writing) no one had done that yet. Proving History does that. Alhough I wasn’t the first to think of it, as essentially the same case is made in Aviezer Tucker’s Our Knowledge of the Past: A Philosophy of Historiography, only he doesn’t convert it into practical advice or explain much of how historians can use BT or even write for historians, he just shows that historical reasoning is already necessarily Bayesian (sadly, I was unaware of his book until after mine came out, so I didn’t get it included in the endnotes, where I’d surely want it; it probably wouldn’t have influenced PH, though, as everything useful in it I had already thought of independently, which IMO makes him Wallace to my Darwin…or vice versa).
  
  So I think it is a valid and non-pedantic criticism to say that it’s lack of any explanation of what probability is, what conditional probability is, what Bayes’s theorem is doing and why, and how it is derived, is important. The fact that such foundational content is relegated to footnote references to other books made me doubt your sincerity in actually wanting to help your readers get to grips with the topic.
  
  Except I do explain what conditional probability is (pp. 79-81), and I explain as much as a historian needs to know about what BT is doing (chapter 3). The practical fact is that historians don’t care about the history of BT or its logical foundations or indeed almost any of what you wrote about on your blog. You and I find that interesting. But they don’t, and quickly get bored with it. I know this from experience. And the fact is, they really don’t need to know it. All they need to know is what I provide in chapters three and four. Anyone who wants to know more, can follow the references provided.
  
  I would love to find a really good textbook on probability theory that isn’t unintelligible or almost entirely useless or irrelevant to humanities majors, so I can recommend it as a twin to mine. So far the best I have is McKellar’s Math Doesn’t Suck, which is perfect except that it only treats probability very briefly and rudimentarily. If she ever produces a book just on statistics (which will certainly cover all the basics of probability theory), that would be the dream. Or if anyone else does that. If you know of any already, do let me know. But it really has to be something comparable (as in, aimed at teaching nonmathematicians; a blizzard of differential equations doesn’t do that).
  
  It is understandable if you’re approaching the math as an amateur, but you give the impression you simply don’t understand certain things you discuss.
  
  If there are better ways to argue a point in my book, please improve on my work by producing it. I am actively looking for blogs and blog articles that do that, and if they are good enough and accomplish the task, I will definitely blog about them (if I know of them; so usually, someone has to tell me about them).
  
  Otherwise, all I need to know is what I try to convey. Errors I need to correct; but someone voicing impressions that don’t relate to any actual errors to correct, is not useful.
  
  Issues like Frequentist/Bayesian issues: where the controversy isn’t even discussed in recognizable terms, let alone your solution being valid. It isn’t that I’m claiming your solution is not novel, but that you haven’t even stated the problem properly. Similarly with Bayes under sets of evidence: the problem of independence of multiple pieces of evidence. Statistical sensitivity to reference classes. Changes of reference class when updating for new evidence. These are tricky, tricky issues, that when you do this for real, you have to spend time on. Ignoring them has sent innocent people to jail.
  
  I don’t think historians are worried about going to jail over any of this, so the hyperbole is unwarranted. I didn’t write a book for lawyers or risk managers.
  
  As to the more specific points, I would love–really love–good blog posts covering every one of those issues you list, in terms a nonmathematician can understand. Produce them, and I’ll blog them. Produce enough of them, and you’ll have a book you can market to historians who want to go beyond what I cover or to understand it better.
  
  That’s how a field makes progress. And having scientists working to help historians is precisely the kind of interdisciplinary work I want to see more of. You just have to understand that they speak a different language and lack almost all the assumptions and mathematical background you take for granted, and have different needs and interests than you (as for example our different interest in how BT was proved and what it is doing mathematically, vs. that being of no use or interest to historians who just want to know how to use it and benefit from it; although a good blog post/book chapter on that subject would still be great, in any work that aims to expand on mine and help historians dig deeper into the whole matter of Bayesian reasoning and the pitfalls and wonders of probability theory).
  
  Doing this non-rigorously (you make the point that you don’t want to use this scientifically, or mathematically) gives the massive danger of choosing inputs to the process (probability estimates, error ranges, reference classes, choice of evidential features, phrasing of the evidence classes, phrasing of the hypothesis class — all of which are under your control) which give the output you want.
  
  That’s true of all methods whatever. Historians are already doing this, all the time. It’s precisely because they don’t know how to logically vet or model what they are doing that they (a) don’t know they are doing it or (b) don’t know how to detect or prove when it’s being done. Knowing BT arms you against exactly those problems, arming you against misuses of BT. If you understand the mechanics of BT, you can then spot and therefore defend yourself against exactly the dangers you refer to. And if you still miss a problem or mistake, for its complexity perhaps, someone else can come along and catch it and explain the problem. Then you redo. And progress has occurred. I fully expect that to happen to me. It’s precisely what I want.
  
  But if we don’t even get started on this process, we will experience no progress whatever. Thus, we have to start somewhere. The potential for errors is irrelevant, precisely because that potential exists now (no matter what method you use; again, as Fischer and I demonstrate), yet we have no tool to get beyond it. But now we do. That’s why we need to get started on learning and applying that tool.
  
  Indeed, I expect there are many ways mathematicians could help historians do this even better, by covering the logic of everything you mention here, in terms understandable to and usable by historians (rather than in esoteric terms only intelligible or useful to scientists or mathematicians; for example, you have to cast aside science-level quality standards and ask instead how we deal with highly uncertain information and small datasets, since the latter is what historians face–in fact, the latter is what most people face, most of the time, so really scientists and mathematicians are leaving neglected a vast realm in great need of improved reasoning, all because of a misapplied standard that disregards everything that wouldn’t pass scientific peer review as being moot, when in fact it’s not moot, it’s just not that certain, and yet nevertheless is far from being moot, but is often essential to daily life and a great many professions).
  
  There are simply so many people who’ve used Bayes’s Theorem as a black box to give them back their pre-conceptions.
  
  And when you understand BT, you can call them out on this when it happens. It’s when you don’t understand BT that you are in danger of being misled or bamboozled by this.
  
  And again, this trick is used even without BT, thus avoiding BT won’t avoid the problem anyway (see, again, note 33, p. 305, and my remarks on pp. 90-92.). Whereas since any argument can be modeled with BT, you can use BT to vet even other methods to see if the same tricks are being pulled there as well.
  
  And again, even my own errors can be more easily detected (and thus, once pointed out, more easily corrected) the more people who understand the mechanics of BT. I consider that a tremendously useful feature. Because I actually like catching and correcting my own errors.
  
  But that actually has to be done. That is, you can’t just vaguely handwave about there being errors, and then never point out where these errors are or why they are errors (or how to fix them, which would be useful, too).
  Reply
IrrcoIan on October 10, 2012 at 1:32 pm

Thanks Richard.

I wonder where the most constructive area to focus is. Some of your response seems fair enough, other bits I still just don’t buy. But no point just restating my objections.

not “improvement in probability estimates for correctly reasoned arguments;” but “improvement from incorrectly reasoned arguments to correctly reasoned arguments.”

So, am I understanding you right, you don’t see using Bayes’s Theorem is a way of arriving at estimates of confidence in conclusions starting from estimates of its inputs. But rather you see it as a tool to look at both sides of the equation at the same time to figure out if they are consistent?

A fan of yours who interacted on my blog said this today:

“I think I understand the debate well enough to say that Carrier states that he doesn’t view or explain BT as “a strict input-output process” and I agree with him.”

I replied that, if that is what you’re saying, I agree too.

I got the sense from the book you thought that we could estimate the inputs independent of estimates of the conclusion and derive important insights on the conclusion. If you’re not saying that, then a chunk of my objections are rendered moot.

—

“That is, you can’t just vaguely handwave about there being errors, and then never point out where these errors”

Yeah, it is tough to write in detail in a review. So I picked issues that were maybe more pedantic, but I wanted to communicate my feeling that you were using the math polemically rather than mathematically.

In the comments to the two posts there are discussions about sensitivity to reference classes, the way BT is ill-conditioned for certain probability bounds, the problem of adding evidence to small probabilities. I’d also say my issue with your characterization of Bayesian/Frequentist postilions is quite specific: you do not describe them in terms that describes why they are not trivially unified (if the problem were as you describe it, you wouldn’t be the first to describe your solution to it!).

If you want to unpack any of these, can I suggest we pick one and discuss, because I find it quite tough to interleave multiple issues.
Reply
- Richard Carrier on October 12, 2012 at 8:49 am
  
  So, am I understanding you right, you don’t see using Bayes’s Theorem is a way of arriving at estimates of confidence in conclusions starting from estimates of its inputs. But rather you see it as a tool to look at both sides of the equation at the same time to figure out if they are consistent?
  
  That’s not quite what I said. But yes, that is also one other useful thing we can do with it (as I explain on p. 214).
  
  But more exactly, I said two things: (1) that in some cases, BT can be used to improve our estimates (the kinds of cases I spelled out in my reply, for example) and (2) that the main thing understanding BT does is help us argue correctly instead of incorrectly. In a very abstract sense you can say that any fallacious argument (even by someone who knows nothing about BT) can be modeled with BT and that will show that there is an inconsistency as you suggest, and then understanding BT can help a historian fix or avoid that inconsistency. But doing either requires understanding BT, and doing both is badly needed in the field of history.
  
  And of course once historians can do (2), they can do (1), through debate. That’s the other benefit: once a historian sees the Bayesian model for what an opponent is arguing, he can start challenging the premises even of a consistent argument, and a debate can ensue as to how credible those estimates are and whether they should be changed; and once changed, what effect that has on the conclusion. This is something historians, right now, cannot coherently do at all. They do it in some sort of vague, hard-to-pin-down, get-nowhere battle of intuitions. BT will allow them to actually see how to proceed and make arguments in this process that can actually derive from observations and not just intuitions.
  
  This is what I explain in the sections on mediating disagreement (in chapters 3 and 6).
  
  I got the sense from the book you thought that we could estimate the inputs independent of estimates of the conclusion and derive important insights on the conclusion. If you’re not saying that, then a chunk of my objections are rendered moot.
  
  If you mean, you thought historians approach hypotheses with no idea whether they may or may not be true until they run them through BT, then that doesn’t even correctly describe how scientists behave. Even scientists often have a good idea of how the results of a study or equation will come out, but do it anyway just to be sure (because sometimes they find they are wrong; yet usually, depending on their experience, they were ballpark right, certainly with the same accuracy as historians, who almost never get results more accurate than ballpark anyway).
  
  Yes, historians, like scientists, sometimes don’t have any idea what a result will be until they run the numbers. But usually, once they see the data, they have an idea. (That’s why any history has ever been done successfully until now: obviously historians have been using some arguments that BT would verify are correct.) It’s then just a question of making sure they’re right. And BT can do that; although it can also give you surprises (since people are often bad at estimating the effect on a larger conclusion of a firm probability belief they have).
  
  Above all, even if all you do is use BT to get the result you want, by using it you’ve let the cat out of the bag. Now others can come in and point out that your inputs are implausible. Which will result in a debate over how we can know that (or not know it’s not that, as the case may be). Which will result in progress toward a consensus that will be less dependent on what a proponent wants to be the case and more dependent on what the data can actually support. Again, I cover this in the sections on mediating disagreement in chapters 3 and 6.
  
  But this is all within the paradigm of historians who use BT. I’m trying to get historians into that paradigm. Right now, they are using not BT but a slew of, let’s say, “wonky theorems,” often or wholly fallacious formulas of getting from premises to conclusions (I give examples all through chapter 5), so that even sound inputs aren’t getting them sound outputs; and because their reasoning’s logic is not made explicit with any model, critics have no clear way to identify what they are doing wrong, or even to detect that something wrong was done. Thus, “mediating disagreement” has no objective procedure, and progress is near impossible (beyond random drift).
  
  By contrast, BT reduces all arguments to a debate over just three numbers. That’s a tremendous advance.
  
  Indeed, usually historians do not disagree egregiously over one or two of those numbers, so often BT can reduce an argument to a debate over just one number, or two. It’s easy to see how progress can then ensue–or even at worst, how historians can settle on where their disagreement actually is (e.g. faith-based historians will over-estimate the prior probability of miracles, secular historians will not; but even there, the only way faith-based historians can get the results they want is by so hugely over-estimating the prior probability of miracles that their position can easily be exposed as ridiculous…by anyone who understands BT: see my deployment of this point in my use of BT to refute the McGrews’ use of BT to defend the resurrection of Jesus in The Christian Delusion, although I never name the McGrews’, I address their every argument, as well as the standard args. of Craig and Habermas and Licona, and put the effect on the Bayesian model in my endnotes; I expand that argument to even more devastating effect in my Bayesian analysis of the origins of Christianity in The End of Christianity, where I don’t even need probability estimates, just what could be reduced to completely nonnumerical statements of relative probability [like C > D and C = D], another example of how the logic of BT can be used to good effect without even using numbers).
  
  In the comments to the two posts there are discussions about sensitivity to reference classes, the way BT is ill-conditioned for certain probability bounds, the problem of adding evidence to small probabilities. I’d also say my issue with your characterization of Bayesian/Frequentist postilions is quite specific: you do not describe them in terms that describes why they are not trivially unified (if the problem were as you describe it, you wouldn’t be the first to describe your solution to it!). If you want to unpack any of these, can I suggest we pick one and discuss, because I find it quite tough to interleave multiple issues.
  
  I like them all (as well as the others you listed before). You should pick the one you are most enthusiastic or comfortable blogging on, and do a blog post on it, and let me know the URL. Since you are certainly more expert in these details, and there could be great benefit in this. I’m most interested in how to improve on PH. So a blog that shows how to frame a problem correctly or what the pitfalls are and how to avoid them (even if you aren’t sure how a historian would do that, just outline what in general would need to be done, maybe give an example of how it’s done in a scientific case–I might then be able to give examples from the historical field that are analogous).
  Reply
Richard Martin on October 10, 2012 at 5:02 pm

Hi Richard,

In response to your point about the lack of good introductory texts on probability and statistics for the humanities, I would like to propose the following as providing considerable insight into concepts of everyday probability and risk, both in business and life in general.

Phil Rosenzweig, The Halo Effect
Douglas W. Hubbard, How to Measure Anything
Leonard Mlodinow, The Drunkard’s Walk – How Randomness Rules Our Lives
Dan Gardner, Risk
Dan Gardner, Future Babble: Why Expert Predictions Fail – and Why We Believe Them Anyway
Ian Hacking, Probability and Inductive Logic
Morris Kline, Mathematics for the Non-mathematician (there are chapters on probability and statistics).

A book that just came out is by Nate Silver, The Signal and the Noise: Why So Many Predictions Fail – But Some Don’t

Dan Ariely’s book Predictably Irrational provides a look at why people have such a hard time with probability and rational thinking, as does Daniel Kahneman’s Thinking Fast and Slow.

More advanced but more complete is Jonathan Baron’s Thinking and Deciding, 4th edition.

I hope that helps,

Richard Martin
Reply
- Richard Carrier on October 12, 2012 at 9:01 am
  
  I’ll look into those. Thanks.
  Reply
jt512 on October 10, 2012 at 9:06 pm

I’m not sure what either Ian or MalcolmS is getting at with P(E|H)P(H) being in both the top and the bottom of the fraction in Bayes’ Theorem. Errors in P(E|H)P(H) neither “pull in opposite directions” nor “offset.”

The effect of each term in Bayes’ Theorem on the posterior probability of H becomes clear by examining the odds form of Bayes’ Theorem, because each term only appears once in the equation:

odds(H|E) = P(E|H)/P(E|~H) × odds(H)

Jay
Reply
- Richard Carrier on October 12, 2012 at 9:11 am
  
  Well, no, the same terms are all there (the numerator and denominator of a fraction can pull in opposite directions). But that’s the same thing I mentioned: odds(H), which is P(H)/P(~H), can look like it has two terms that can be independently in error, but as we know, there is really only one term here, since P(H) and P(~H) must sum to one, so there is only one error that can happen here, and if it’s against you, you are arguing a fortiori, and all is well. Then there is P(E|H)/P(E|~H), and those can vary independently of each other, and thus could, in principle, create a compound error (if each errs in your favor, you have a double error in your favor), but if you are arguing a fortiori, you are going to make sure they both err in the other direction, and thus the compound error only makes your argument even more a fortiori, which is what you want.
  
  When the errors are as much against you as you can reasonably believe them to be, then the conclusion is as much against you as you can reasonably believe it to be (by commutative logic). Thus, you simply cannot reasonably believe the result is any more against you than that. Now, yes, there are lots of ways you can be so badly misinformed or ignorant that even a reasonable belief is false, but that’s true for all beliefs whatever, whether derived by BT or not. We can only do the best we can do, whatever our methods. And BT reminds us that there is always some probability of being wrong–since the conclusion of BT is not “the probability of h is x” but “the probability of h is x given b and e,” where b plus e represents the sum of all you know at that point in time. The resulting probability has a converse that is the probability that you are still wrong (and that will be because of some fact you did not know about and thus did not include in b or e…but that’s the whole point of empirical methods being tentative and revisable, and why science, and thus also history, can advance in knowledge).
  Reply
malcolms on October 10, 2012 at 11:23 pm

I have a couple of quick remarks before I move onto something more substantive in my next comment.

1. I feel it is bad form for you to have responded to Ian’s 2 posts only via the filter of my comments on your post. Originally you had said that you would reply to his criticisms only if he posted a comment on your blog, so I summarized his arguments in the comments for you to respond, and you did so. Nothing wrong with that. But when you decided to devote a blog post to his review, I think it would’ve been better to directly quote from his blog. As it is now, one can’t even tell for sure from your post if you’ve even read his blog. Maybe I’ve mischaracterized or misrepresented him, or at least Ian might feel that way. If the situation were reversed you’d probably want that courtesy as well.

I realize that you said, “I will reproduce in bold the observations of MalcolmS on what Ian argues (which also does a fine job of summarizing Ian’s substantive points),” but I still don’t think this goes far enough.

2. I also feel that we (I’m guilty of it, too) may have been too harsh on Ian in the description of Ian’s criticisms of the way you’ve presented the mathematics as not being “mathematical.” That is, originally we became aware of his posts through some commenters on your blog who felt that Ian had demolished the mathematical content in your book. A read through of the review, though, revealed that this was not the case, as most (if not all) of the criticisms did not involve any actual mathematical errors. However, his post was merely titled a “Mathematical Review” of the book, and even in mathematics, a review of a book (such as you’ll find, e.g., in the “Bulletin of the AMS,” or even Amazon book reviews), usually addresses the author’s choice of content and way of presenting it, not the accuracy of the assertions, so while many of his criticisms may have been merely stylistic, they still would be considered mathematical.
Reply
- Richard Carrier on October 12, 2012 at 9:21 am
  
  1. I hope Ian will correct me if I (or you) mistook any of his arguments. You did such a good job, I could not improve on what you said (beyond what I said in turn, if that even counts as improvement). But yes, we could both be wrong about that. And Ian is welcome to point out any instances of it. I would be more than comfortable with our roles reversed in this instance.
  
  2. I agree. We both approached this from a context Ian might not have intended. Our points remained valid, but mine at least were phrased a little too harshly, for thinking he was accomplishing what naysayers said he was, when in fact he himself didn’t say that’s what he was doing (or at least not in the posts themselves…I didn’t read his comments elsewhere). I have fallen victim to this effect before.
  
  To you an Ian, I apologize for the mistake. I don’t know how to avoid it in future (since once an argument has been framed by a third party, there is no obvious way to guarantee you’ll realize that that has been done and you are still looking at it from that POV), but I’ll try to be more mindful of it.
  Reply
- IrrcoIan on October 12, 2012 at 12:33 pm
  
  “mine at least were phrased a little too harshly”
  
  Definitely don’t worry about tone towards me. As long as we’re all willing to communicate, a seasoning of gentle hostility is fine by me. It only becomes a problem, imho, when used as an excuse to prevent discussion.
  
  I didn’t have a problem with this post, or how it was structured.
  
  I would say that Malcolm was very useful in pointing out sloppy mistakes on my blog, and helping me communicate more clearly. So kudos for that. Due to his critique, some of the things quoted in this post, I’ve now rephrased and improved.
  Reply
malcolms on October 11, 2012 at 5:24 am

A few of us were continuing the conversation over on Ian’s blog after I had posted my comments on yours. A couple of problems with applying BT came up, which I would like to illustrate with some examples you discuss in your book.

First of all, there’s the reference-class problem. You of course discuss this in your book, and Ian mentions it in his comment above, but I think it would help to examine it in a particular case, say, the case of a claim that there was a global 3-hour darkness during the day either in 1983 or 2000 years ago. The hypothesis is that such a darkness actually occurred (i.e., the claim of it having occurred is true), so your prior, employing the method you prefer for solving these problems, would be the a priori probability that such a claim would be true, before any consideration of evidence used to support or refute this claim.

But what other claims would be in this reference class? Only claims of 3-hour worldwide darkness? How many of them have their been? You would of course agree that this class is too narrowly defined. OK, then how broad should we make it? All claims of darkness over a large area for an extended period of time? There are still not that many cases (can you think of any?). Then we would have to expand it to include “comparable” cases, but how do we know that another claim is comparable? Do we mean claims of events that are equally improbable? That would require estimating the probability of the 3-hour worldwide darkness, without conditioning on the claim, with all its concomitant difficulties (recall, you’re trying to avoid this in the first place). Moreover, one would have to estimate the probability of all other claims to determine whether they lie within this class or not, a probably impossible task. Your example of a person who claims to have been struck by lightning 5 times also suffers from this deficiency.

In your worked example you just picked a number out of thin air – 1% – without any real attempt to argument for it, other than to say that it is “small” (apparently because you are implicitly doing some sort of application of BT involving the raw prior probability of such a darkness). The exact value was not of concern to you because you were trying to show that the same prior could give rise to different results depending on the likelihood of the evidence, conditioning on various hypotheses, but you seem to have overlooked that in at least one case the value of P(H|B) is crucial (which will my next point below).

So how would you go about trying to estimate, to the best of your ability, P(H|B) for either of these examples? If you can’t do this, how could you possibly hope to use BT to estimate how likely it is that Jesus existed?

Moreover, the case of the claimed darkness in 1983 illustrates another difficulty in applying this method: A fortori arguments only work if you can bound both the numerator and denominator away from zero. I stated in my earlier comment, which you quoted above, that a fortiori reasoning should always be valid, but tha is only partially true. While it is true that bounds on the 3 inputs to BT will always yield a bound on P(H|E), it is not always true that improving those bounds will improve the final result; in particular, if P(H) or P(E|H) has a lower bound of zero, changing the upper bound of P(E|~H) won’t have any impact on the lower bound for P(H|E) (which will also be 0) and similarly if P(E|~H) has a lower bound of zero, changing the upper bound of P(E|H)or P(H) won’t have any impact on the upper bound for P(H|E), which would be 1. The worst case situation would be where both P(E|~H) and either P(E|H)or P(H) are so small that they can’t be bounded from below. In that case we will know absolutely nothing (0<=P(H|E)<=1) and changing the nonzero bounds won't help.

We can see this in the example from 1983, where both P(H) (the prior probability that the claim of the darkness is true) and P(E|~H) (the probability that all this photographic, etc., evidence existed despite there having been no such darkness) would both be very small. Could you put a lower limit on either and say, "well, the probability has to be larger than X"? If not, then you will know nothing. (Moreover, your estimate of P(E|H)~1 is way too large as well, since the evidence would include your own experience that you never heard of this event before, despite having even been alive at that time.) It was only by plugging in an artificial value of 0.01 for P(H) that you managed to get a value for P(H|E) close to 1. In your book you claimed that you were being conservative, but this is only true for the Gospel example. In the 1983 case a conservative estimate would be a smaller one, not a larger one. How small would P(E|~H) have to be before you would really start to believe in such an event? Everybody would probably have a different answer – I don't even know what mine would be.

I'll mention here one other point with this example (in the Gospel case) that Ian raised concerning the miraculous. In particular, if your hypothesis includes the possibility of a supernatural darkness, which I think you allowed in your book, then when calculating P(E|H) you would have to consider the possibility that a supernatural agent structured the evidence to be the way that it is, with reports only appearing in one source. How would one calculate the probability that God prevented people all over the earth from recording this event, if one is conditioning on God having caused the darkness?
Reply
- Richard Carrier on October 12, 2012 at 10:28 am
  
  But what other claims would be in this reference class? Only claims of 3-hour worldwide darkness? How many of them have their been? You would of course agree that this class is too narrowly defined. OK, then how broad should we make it?
  
  As broadly as you can reasonably believe acceptable. Or even more broadly than that, if the conclusion still comes out a fortiori in your favor (and that is obviously what I did in the book, since the prior I chose was obviously vastly larger than anyone could ever reasonably believe it to be, no matter what reference class they preferred…you call it “artificial,” which is correct, but moot, since it still works, so it doesn’t matter how artificial it is, as long as it is defensible as an a fortiori estimate).
  
  All claims of darkness over a large area for an extended period of time? There are still not that many cases (can you think of any?).
  
  Of course Christian apologists do: hence the claim that the Evangelists just meant a dark cloud front. Certainly, if you start from that assumption, the math comes out differently; it’s just that if you did a Bayesian analysis on what the Evangelists meant by their words, it comes out as almost certainly not this.
  
  Then we would have to expand it to include “comparable” cases, but how do we know that another claim is comparable? Do we mean claims of events that are equally improbable? That would require estimating the probability of the 3-hour worldwide darkness, without conditioning on the claim, with all its concomitant difficulties (recall, you’re trying to avoid this in the first place). Moreover, one would have to estimate the probability of all other claims to determine whether they lie within this class or not, a probably impossible task.
  
  Indeed. That’s why historians can’t and don’t need to be that precise. They can argue a fortiori and get along fine. Hence my example of the asteroid (p. 85).
  
  Really, scientists do this, too: when they exclude, for example, “magic” as an explanation of a drug study’s results. Try getting an exact estimate of the prior probability that magic has relevantly affected the data of any scientific study. Scientists, if pressed, will admit they have no idea what that prior is or how to calculate it, but that they have more than enough reason to believe it is sufficiently low that they can safely disregard it–unless a difference in consequent probabilities came along that was superbly huge; then we’d have to start thinking more seriously about the prior probability of magic.
  
  So how would you go about trying to estimate, to the best of your ability, P(H|B) for either of these examples? If you can’t do this, how could you possibly hope to use BT to estimate how likely it is that Jesus existed?
  
  That last is actually a lot easier than you think. But you’ll have to wait for my next book to see why.
  
  If we ignore what I will actually do and ask instead what we would do with no clear data set to start from, then even the worst case would leave us only to build out what the difference in consequent probabilities is (which means, what it is at best; I have found the odds form approach is most agreeable here) and then explain what prior you would have to accept in order to accept historicity (and what prior is needed to be highly confident in historicity, which is not the same thing). This then re-frames the debate around whether any of those priors is really credible or not, and thus whether asserting historicity requires an act of desperation, or whether asserting mythicism does, or whether agnosticism is what falls out as the most credible position. Either way, material progress…and finally some facts and numbers historians can start debating.
  
  Moreover, the case of the claimed darkness in 1983 illustrates another difficulty in applying this method: A fortori arguments only work if you can bound both the numerator and denominator away from zero. I stated in my earlier comment, which you quoted above, that a fortiori reasoning should always be valid, but that is only partially true. While it is true that bounds on the 3 inputs to BT will always yield a bound on P(H|E), it is not always true that improving those bounds will improve the final result; in particular, if P(H) or P(E|H) has a lower bound of zero, changing the upper bound of P(E|~H) won’t have any impact on the lower bound for P(H|E) (which will also be 0) and similarly if P(E|~H) has a lower bound of zero, changing the upper bound of P(E|H)or P(H) won’t have any impact on the upper bound for P(H|E), which would be 1. The worst case situation would be where both P(E|~H) and either P(E|H)or P(H) are so small that they can’t be bounded from below. In that case we will know absolutely nothing (0<=P(H|E)<=1) and changing the nonzero bounds won't help.
  
  This seems to confuse physical with epistemic probabilities. No epistemic probability can ever be zero (except in very specific cases not applicable here: see Axiom 4, pp. 23-26, with endnotes; also, see my “nonzero” remarks on pp. 55, 62, 80, 83, 94, 246-50, 260, 268).
  
  Certainly, when we know only that (0<=P(H|E)<=1), we know nothing. Historians often do face that reality, and usually accept it (although for some reason religiously charged subjects seem impervious to that humility, sometimes even when approached by secular historians). But we're often not in so dire a state of ignorance. In all other cases, we only care about one bound, not the other. Because we only care what the probability is that we are wrong. In particular, how high that probability can reasonably be. The other bound (how much more probably right we might be than that) is generally of no use knowing, even if it could be known (and it usually can't). For example, I would guess the probability of supernatural phenomena (suitably defined) is actually in fact zero (see The God Impossible), but I don’t know that for sure. What I want to know is how likely it is that I am wrong about that (or in particular, how likely it is that I am wrong to say there is no supernatural phenomena in this universe). And that only relates to the other bound: which, arguing a fortiori, is the highest probability I can reasonably believe that bound to be.
  
  I haven’t explored that, since I’ve never had to–nothing has come even close to it, so I can use wildly exaggerated bounds and still conclude no miracle occurred in any given case (the Gospel darkness, for example). But if evidence started getting strong, then I’d have to start seriously examining what that bound is, which amounts to examining at what point I would believe in the supernatural (in effect, what differential in consequent probabilities would it take), and I can certainly countenance there being such a point (since I believe the resort to excuses would certainly get implausible at some point, as I discuss in Defining the Supernatural).
  
  So if we only ask what that one bound is for all three key terms (P(E|H), P(E|~H), and P(H)), and indeed even allow exaggerations beyond that bound, always a fortiori, then there is never a problem–unless the result comes out ambiguous or close and we want to be more certain, then we can take away the exaggerations and try to generate some numbers closer to the actual data, whatever that happens to be.
  
  For example, if h is “my wallet was stolen” and I want to argue for that a fortiori, I would pick a prior I know is too low (let’s say that turns out to be 0.6) and a ratio of consequents I knew was as much against h as I could reasonably believe possible (let’s say, P(E|H) = P(E|~H)). The result would be my wallet was probably stolen, but I couldn’t be entirely certain (since there would be a 40% chance it wasn’t). Any improvement of the numbers toward what their actual values were would increase that probability (and thus make h even more likely) but at the cost of less confidence (since my model is getting less a fortiori and thus approaching greater possibility for error; cf. p. 87). Unless, of course, I could improve those numbers with minimal loss of confidence: e.g., if my starting estimates were not merely a fortiori, but exaggeratedly a fortiori, such that moving them over would actually keep them a fortiori (as we could surely do in the Gospel darkness case). But any further, and I can no longer have confidence in my numbers, and thus in the premises, and thus in the conclusion. The resulting probability would therefore be useless to me (hence: pp. 111-14).
  
  I’ll mention here one other point with this example (in the Gospel case) that Ian raised concerning the miraculous. In particular, if your hypothesis includes the possibility of a supernatural darkness, which I think you allowed in your book, then when calculating P(E|H) you would have to consider the possibility that a supernatural agent structured the evidence to be the way that it is, with reports only appearing in one source. How would one calculate the probability that God prevented people all over the earth from recording this event, if one is conditioning on God having caused the darkness?
  
  That’s called Cartesian Demon reasoning and has been a standard question in epistemology for hundreds of years (thus it is not a problem at all unique to BT, but affects all methods and epistemologies whatever). You can always ask that question. For example, “what if God arranged all my drug study data to hide the fact that the drug killed everyone who took it?”; “What if God arranged all my Large Hadron Collider data to hide the fact that it generated a dozen unicorns and a flying turtle?” Etc.
  
  The short answer is that Cartesian Demons have vanishingly small priors and therefore can be ruled out (to see why, revisit my discussion of ad hoc theory enhancement: pp. 80-81, and “gerrymandering” in the index; also, think of the many degrees of CD: from Punked, to The Truman Show, to The Matrix, to the scenario which you just described, which must necessarily have a prior even lower than those).
  
  That does mean CDs can always evade detection, but then they are defined in exactly that way (a CD is by definition an entity that always evades detection). It remains the case that they are very unlikely to exist. That does not mean the probability of their existing is zero, however. People often have a hard time grasping the distinction. And a lot of it has to do with problems in the definition of knowledge standardly used in philosophy, as justified true belief. We can only ever have justified true belief in the improbability of CDs; we can never have justified true belief in the impossibility of CDs (as such knowledge is logically impossible; unless someone, someday, proves CDs to be logically impossible).
  
  And, of course, weaker CDs can be exposed eventually or in principle (as they are in Punked, The Truman Show, and The Matrix, and God could likewise expose himself or if God is not perfect, someone else might manage to expose him–there are some amusing ancient Jewish legends along those lines). But the stronger the CD, the more a priori improbable that CD is (since you have to heap on more and more undocumented assumptions to strengthen that CD from detection). You can also always invent a CD to explain away the exposure of another CD (The Truman Show inside The Matrix…think, Inception), but you can see why the prior probability of that scenario is vastly less (since when two very small prior probabilities multiply, the result is an improbability many, many, many times smaller than either).
  Reply
- malcolms on October 13, 2012 at 11:00 pm
  
  Dr. Carrier,
  
  You still don’t seem to grasp the full extent of this a fortiori reasoning problem. I’ll try to make my point again, hopefully more clearly.
  
  First of all, even though probabilities are not 0 (unless something really is logically impossible, in which it probably would not be an interesting historical question), when one does an a fortiori argument one needs to use ranges for the inputs (P(H), P(E|~H), P(E|H)). In the case where one cannot give a minimum estimate for a probability the lower bound would be 0. For example, if we say that we know P(H) is very small, say, lower than 1%, then our range for P(H) would be 0<P(H)<0.01, so one still has to deal with a zero bound in calculations, even though the probability itself would never be zero.
  
  So if the lower bound for P(H) or P(E|H) is zero, then the lower bound for P(H|E) is zero. So far no problem. But what if the lower bound for P(E|~H) is also zero? In that case, this gives an upper bound for P(H|E) of 1 (because a lower bound for P(E|~H) gives an upper bound for P(H|E)). Thus in the case where both P(H) (or P(E|H)) and P(E|~H) are known to be small, but we can't say which one is larger, we will get a range for P(H|E) of 0 to 1.
  
  Here's a numerical example. Let's say we can say with confidence that P(H) < 5% and P(E|~H) < 2%. (P(E|H) doesn't matter so much in this example, so let's say it is exactly 1.) What is our possible range for P(H|E)? 0 to 1, i.e., we know nothing. Now let's improve our bounds on P(H) and P(E|~H) to, say, 1% for both. Now what is our possible range for P(H|E)? Still 0 to 1. So tightening our range didn't help at all.
  
  When I made my previous post I had claimed that your example of a claim of worldwide darkness in 1983 falls into this type of situation, but I had overlooked the fact that in your book you explicitly stated that P(H) < P(E|~H)/1000. In that kind of case, where one has a bound of the ratio P(H)/P(E|~H), then this problem is avoided. But having such an estimate for the ratio is crucial; without it one couldn't say anything about P(H|E).
  
  For that specific example in your book I also would dispute the estimate of P(E|H) ~ 1, since one must take into account that one had never heard about this claim until recently, and also the fact that one had not witnessed the darkness (or a report of it) even having been alive at the time. For example, if one said that P(E|H) = 0.1%, then one gets about 50% for P(H|E), and this is true no matter how small you choose P(H) and P(E|~H) so long as their ratio is 1:1000. Thus whether H is almost certain, fifty-fifty, or even improbable depends critically on the exact ratio of P(H)/P(E|~H) to P(E|H), which would require a more elaborate argument one than you present in your book.
  
  Therefore I assert that this case is actually one of them where BT can't help us too much in getting a good handle on whether the claim is true or not. Of course it's not just BT that has this problem – any attempt to reason this out logically would founder on the same problems as well. In fact, BT has an advantage here in that it allows one to see precisely how and why this claim is difficult to evaluate, whereas if one were just to respond intuitively one might not grasp how sensitive one's answer is to small changes in assumptions.
  Reply
  - Richard Carrier on October 15, 2012 at 11:41 am
    
    MalcolmS: …when one does an a fortiori argument one needs to use ranges for the inputs (P(H), P(E|~H), P(E|H)).
    
    You don’t need to give a range. Because the other end of the margin is irrelevant. Only one margin is relevant: the one that draws the line between whether you are right or wrong, and does so beyond where that line actually is, in favor of your being wrong. Hence producing a conclusion a fortiori (“from the stronger reason”). The other margin does not produce an a fortiori argument but an a tenuiori argument, which is a fallacy (because that line between whether you are right or wrong is the weakest, being much less likely correct, making your argument look the stronger but actually making it weaker). You therefore don’t need to bother with that other bound.
    
    That doesn’t mean you can’t work with different ranges, if there is something useful in doing so, e.g. to show what the conclusion is with different assumptions.
    
    For example, I estimate the prior probability of a passage about Jesus in a non-Christian source before the 4th century being an interpolation is a fortiori better than 1 in 200. We could say the other bound is then 1 in 1 (i.e. that they are all interpolations and no evidence could ever show otherwise), but what use is that? None. Obviously if the most against-interpolation bound (1 in 200) produces a result, in the face of the evidence for a specific passage, that that passage is probably an interpolation, then using the other bound (1 in 1) will make the probability that that passage is an interpolation much higher (in fact, it automatically becomes 100% no matter what evidence you did or didn’t produce). But of what use is knowing that? None. Only the a fortiori bound produces a conclusion that can be at all persuasive. Moreover, only the a fortiori bound produces a conclusion that we can have a high confidence in. I cannot have a high confidence in a conclusion produced from the other bound (of 1 in 1, or even anything close to 1 in 1). So the conclusion that comes from that bound produces no appreciable confidence. It’s therefore useless even to us, much less for the task of persuading others. And this is true even in scenarios like you describe.
    
    But what if the lower bound for P(E|~H) is also zero?
    
    It depends on whether the “lower” bound here is a fortiori or not. If not, it’s moot (since, as I just explained, such a bound cannot produce an a fortiori argument nor generate confidence and is therefore useless). But if it is a fortiori, as are all the other bounds being used, and the scenario is as you describe:
    
    Thus in the case where both P(H) (or P(E|H)) and P(E|~H) are known to be small, but we can’t say which one is larger, we will get a range for P(H|E) of 0 to 1.
    
    If we assume P(E|H) is high, then this translates to: we don’t know. You have just mathematically modeled a class of historical claims whose truth is unknowable on present evidence. That is not a problem. Because we already know that most historical claims are such. Unless you want to claim all historical claims are described by this scenario; but clearly you can’t be saying that. So what use is this as an objection to anything I argue in PH? That some claims are undecidable is already affirmed repeatedly in PH.
    
    But let’s instead imagine a scenario in which all three are low (P(H), P(E|H) and P(E|~H)), that is, the a fortiori bound for each is somewhere uncertainly close to zero.
    
    Note then that the key statement here would be “we can’t say which one is larger.” In the case of the likelihoods, that translates to: we do not know the ratio of likelihoods would be in the Odds Form of BT–in fact, not only do we not know them, but we don’t even know them a fortiori. Which means, for all we know, that ratio is 1 to 1: we have no knowledge establishing it is any higher, nor any knowledge establishing it is any lower–because if we had either, “we can’t say which one is larger” would be false, and the scenario would not apply.
    
    But if for all we know that ratio is 1/1, and the ratio of priors is approaching 0, then we should conclude H is probably false. Until we get information that allows us to argue that the ratio of likelihoods (i.e. the evidence) favors H over ~H.
    
    Indeed, that’s the definition of “having evidence for H” : having a ratio of likelihoods that favors H.
    
    The question then is how much evidence favoring H do we need in order to argue that H is true when H has a very small prior. That is the unusual scenario you are describing. And that gets into a whole slew of other questions.
    
    If we are dealing with an absurd claim, then we will have a vanishingly small prior and a vanishingly small likelihood ratio. We should conclude against H.
    
    That leaves only one scenario to be concerned with: one where the a fortiori prior for H approaches very near to 0 but the ratio of likelihoods is vastly in favor of H. In that case we can’t use “1” and “0” because now we are dealing with a case where we have to start taking seriously the boundaries of our epistemic certainty. For example, as I said before, though I suspect the prior probability of miracles is 0, my confidence in that bound is not epistemically high. It therefore can never be an a fortiori bound.
    
    Thus, if I were faced with a case where the prior is vanishingly small but the evidence is extraordinarily strong (this has never happened, and is a very bizarre scenario–a red flag for any philosophical argument: if you have to create a completely unrealistic scenario in order to make a point, odds are the point wasn’t worth making, especially if your aim is to discuss how to approach reality), then I have to ask what I think the epistemic prior probability of (let’s say) a miracle really is, in other words how much evidence will finally convince me miracles exist (which translates to: how improbable P(E|~H) must be, relative to P(E|H)).
    
    I know there is some such probability, as I have described scenarios that would persuade me before (in Why I Am Not a Christian, for example), so all I need do is figure what my probability estimates really translated to in those cases, in particular the a fortiori bounds, and then benchmark back to the case at hand. And that can then be open to further debate, if someone wants to insist I’m wrong to set that benchmark. And so progressive debate ensues.
    
    I actually discuss these kinds of bizarre scenarios (and how they differ from actual scenarios of low P(H) and P(E|~H)) in PH, pp. 246-55 (but see again pp. 243-46 for context). The example of Matthias the Galilean industrial mechanic, and what evidence would it take to persuade a historian that such a man existed, is exactly on this issue (being a realistic example): of starting with very low priors, but then getting good evidence (yet notice here the most plausible a fortiori prior, the lowest we can reasonably believe it to be, will not be zero, or anywhere near as low as in the case of successful alchemy or sorcery, the counter-examples I explore).
    
    So, to adapt the Matthias the Galilean industrial mechanic example to your numbers:
    
    Here’s a numerical example. Let’s say we can say with confidence that P(H) < 5% and P(E|~H) < 2%.
    
    This cannot be a fortiori. Because your P(H) < 5% is the wrong bound (the useless one). The bound we want is the lowest we can reasonably believe P(H) to be, which I proposed is P(H) > 0.000001. I also explored the highest reasonable prior and found it to be 0.002, but that is too high, because I know the actual prior is less than that, and so I cannot use that with any confidence. I can accept debate over the other bound of 0.000001, however, since one might make an evidence-based case that that is too low (in fact, I actually do believe it is too low), but even then all they would do is end up making a case that the a fortiori bound is somewhere else (the whole point of that section: these are the kinds of debates historians should be having), although I suspect it will still be closer to 0.000001 than to 0.002.
    
    Your P(E|~H) < 2% however would then be the right bound, since to make an a fortiori argument we want to know the highest this probability could reasonably be. And if that’s what we had, a 1 in 50 chance a source is lying or in error (or whatever), and we were confident the odds were at least that high, then we wouldn’t have sufficient evidence to believe in Matthias the Galilean industrial mechanic; we would believe that the source probably made him up. But of course that assumes that’s what we premised, that the source is that unreliable on claims like this. And that might be very arguable; indeed, it might not be a reasonable belief at all, no confidence being warranted in so high an estimate of the likelihood of fabrication on that kind of point (and so on).
    
    For example, finding a sarcophagus in the Palestinian region for Matthias the Galilean industrial mechanic, much like the one we have found in Turkey (cf. n. 36, p. 331), would not have a 1 in 50 chance of being forged or in error; the odds of that would be many millions to one. It would therefore more than overwhelm even an a fortiori prior of 0.000001. When we turn to the case of a historian referring to him, then the matter may be more complicated, and may end in uncertainty–for example we could conclude that the historian’s reliability on such details must be at least X (X being a frequency of correctness on such points, and thus the prior) in order for us to be confident that Matthias the Galilean industrial mechanic existed (rather than “was made up” etc.).
    
    This X might be inside the range of uncertainty and thus not capable of making an argument a fortiori. In which case we would state as much: in colloquial terms, we would say he might have existed, that it’s plausible but we’re nor sure; in exact terms, we’d say that we can be highly confident he existed only if we adopt assumptions (about the prior probability and/or the likelihood of fabrication or error) in which we are not highly confident, which means we can be confident neither that he did exist, nor that he didn’t (that latter distinguishing a case like this, from a more absurd case like alchemy: see again the distinction drawn between plausible unknowns and effective impossibilities in Axiom 5, pp. 26-29).
    
    Now what is our possible range for P(H|E)? Still 0 to 1. So tightening our range didn’t help at all.
    
    Here you are making a moot point. That the “possible range” includes many other values is of no use knowing. Because that “possible range” will include things like “the frequency of interpolated passages about Jesus in non-Christian literature before the 4th century is 100%” and no a fortiori argument can proceed from a premise like that. So that our “possible range” includes it is irrelevant. We don’t care what the “possible range” is. We only care what the a fortiori result is. And that only uses one bound for each value. It therefore does not produce a range, other than “X or less” or “X or more” (depending on whether we are arguing a fortiori for or against H).
    
    When I made my previous post I had claimed that your example of a claim of worldwide darkness in 1983 falls into this type of situation, but I had overlooked the fact that in your book you explicitly stated that P(H) < P(E|~H)/1000. In that kind of case, where one has a bound of the ratio P(H)/P(E|~H), then this problem is avoided. But having such an estimate for the ratio is crucial; without it one couldn't say anything about P(H|E).
    
    Indeed. This is why I also discuss the Odds Form in the book (someone having rightly convinced me of its importance) and why I discuss the tactic of employing artificial ratios even when using the straight form (as in my discussion of neighbors with criminal records on pp. 74-76).
    
    So that can’t be the problem.
    
    That leaves only this…
    
    For that specific example in your book I also would dispute the estimate of P(E|H) ~ 1, since one must take into account that one had never heard about this claim until recently, and also the fact that one had not witnessed the darkness (or a report of it) even having been alive at the time. For example, if one said that P(E|H) = 0.1%, then one gets about 50% for P(H|E), and this is true no matter how small you choose P(H) and P(E|~H) so long as their ratio is 1:1000. Thus whether H is almost certain, fifty-fifty, or even improbable depends critically on the exact ratio of P(H)/P(E|~H) to P(E|H), which would require a more elaborate argument one than you present in your book.
    
    You are introducing elements not stipulated in the analogy. That makes this a straw man argument. I never said anything about “one had not witnessed the darkness despite being alive at the time” nor does “one had never heard about this claim until recently” make a difference if, for example, you are in school and hearing all sorts of things for the first time. In other words, if “hearing about this claim for the first time” is not unexpected, it makes no difference to the consequent probability; you have to stipulate that it is unexpected, which changes the scenario.
    
    Obviously, if you change the scenario that has been stipulated, then you change how it gets modeled in BT. That is not an argument against a fortiori reasoning.
    
    If we stipulated the scenario you do, that we have no personal memory of the event even though we should have, and only just now are hearing about it and that this would be strange, then indeed we may be in a state of indecision, given all the other evidence there is. That simply isn’t the scenario I posited. But we could posit it, as an example of a bizarre scenario where we might not be able to know what is true. That just wouldn’t be relevant to the point I was making with the analogy there (which was to illustrate what it would take to convince us, not what it would take to produce uncertainty). Nor would that scenario be analogous to any actual situation we are ever in (since I cannot think of a single “comparably incredible claim” for which I have such a comparably vast scale of evidence contradicting my own memory, outside of a Philip K. Dick novel).
    
    In short, I see no objection here to a fortiori reasoning. All I see is a recognition that some claims are unknowable (both in practice, and in extremely bizarre fiction). Which is already argued in Proving History.
sc_7bca544d596f84a5f56d0f9674c0e22e on October 11, 2012 at 6:48 am

Professor Carrier, maybe you should change your main claim, and assert that you are not doing Bayesian statistics at all, but instead what you are doing is formally correct fuzzy logic, as defined by Lofti Zadeh and others.

As soon you assert that you are manipulating “truth values” (subjective statistics), and not actual statistics, many of the objections raised by statisticians may disappear.

Just a thought.
Reply
- Richard Carrier on October 12, 2012 at 10:36 am
  
  Because it’s not exactly the same thing. Fuzzy logic involves a much broader and more complex system of rules, terms, and procedures, almost none of which is of practical use to historians. Although I would expect any Bayesian model can be described in fuzzy logic, that does not mean doing so is useful (it’s just too much work for no practical gain: see my discussion of this problem in the case of Dempster-Shafer theory, in note 19, page 303). This is rather like trying to drive to work by building a car, rather than just using the car already parked in your driveway.
  
  And I never claim to be doing statistics, either. I occasionally use analogies from statistics, but I never say I am doing “Bayesian statistics” (that phrase appears nowhere in Proving History, for example; except in the titles of other people’s books I cite). I am talking about Bayesian reasoning. There is a significant difference (not only between that and “Bayesian statistics,” but also between that and “fuzzy logic,” even though they might conceptually overlap).
  Reply
IrrcoIan on October 11, 2012 at 3:39 pm

I added another post on error, its causes and effects: http://irrco.wordpress.com/2012/10/11/the-effect-of-error-in-bayess-theorem/
Reply
- Richard Carrier on October 12, 2012 at 11:06 am
  
  Thank you.
  
  In effect I have responded to the point of that post already upthread.
  
  But overall, the point you make there doesn’t contradict anything I argue in Proving History, where I repeatedly note the fact that there will be cases too close to call and therefore in which nothing can be known (and that in fact this happens a lot in history, ancient history especially). You essentially just built out a detailed mathematical demonstration of why and when that’s true.
  
  Your examples are also useful starting points, and similar to examples I gave myself (e.g., pp. 250-56 and 212-14, and my whole discussion of Axiom 5, pp. 26-19). If historians were concerned, for example, they can now start discussing whether your method of deriving the prior probability that Caesar was in Alexandria at a given date is actually the right way to go about it, or if there is a much more practical approach (having more to do with the prior probability that a source claiming he was there is reliable)? To see this problem magnified, look at what I consider to be the wholly incorrect way Michael Martin goes about adducing a prior probability that Jesus was God’s Atonement Sacrifice in The Empty Tomb, pp. 43-54, which is similar to what you were attempting for Caesar, only much more obviously incorrect.
  
  The one main flaw in your post is what seems to be an unjustified inference from (a) we will have all these errors [true], to (B) therefore these errors will always stack up to such a large cumulative error as to make knowledge of historical facts impossible [false].
  
  When that does happen, then yes, historical knowledge is impossible. And indeed, you just defined how historians can determine what is unknowable (like Caesar dicing with Maxsuma). But it simply doesn’t always happen. We obviously can know some of the dates that Caesar was in Alexandria, for example, to at least some reasonably high probability. Even after accounting for all likely sources of error.
  
  You also seem to think that this problem is only introduced by using BT. In fact, BT makes exactly zero difference to any of this. These errors (and their effects on our uncertainty) exist no matter what. Thus, historians are saddled with them even if they never use BT. Their only recourse is to argue fallaciously, or nonfallaciously. If they choose the former, they are writing fiction. If they choose the latter, they are going to be following BT (whether they are aware of it or not). Best to just get everyone on the latter page, so they know what they are doing and how to do it consistently and better. In other words, BT forces historians to confront what their sources of error really are, and how to compensate for them (or if, in a particular case, they can).
  
  Which brings me to this:
  
  In reality Carrier (and anyone else doing this) will also be choosing the definitions, and choosing the reference classes, and there is no similar a fortiori process for determining which are the least favourable definitions to ones cause, and which reference classes are the most troubling, and adopting those
  
  There actually is. I discuss the criteria of better and worse reference classes in chapter six (and show how sometimes starting with a different reference class won’t logically get a different result anyway, since all the data has to go in eventually). Likewise I frequently talk about the relative merits of simple vs. complicated hypotheses and the way definitions (i.e. context) affects estimates, and so on. Moreover, once we get to this point, of arguing over whether we are using a valid definition or reference class, or whether there is a better way to model the problem, we’re making progress. It is precisely a knowledge and understanding of BT that makes that possible.
  
  And as soon as I (or anyone else) chooses a definition and a reference class, others can then start examining whether I am making a sound or unsound start, and argue for a better definition or reference class, or ask if different ones get different results and why (and likewise start looking at what possible sources of error there are and what effect they have). In other words, this starts a useful dialogue that can make progress on almost any question in the field, as now we know what we’re supposed to be arguing about.
  
  Finally, your endnote 3 seems to ignore my discussion on pp. 240-43 (or any of my other discussions of staged iteration: see “iteration, method of” in the index). It seems like you are saying I never discuss this procedure, when in fact I recommend it several times.
  Reply
- IrrcoIan on October 12, 2012 at 11:38 am
  
  “The one main flaw in your post is what seems to be an unjustified inference from (a) we will have all these errors [true], to (B) therefore these errors will always stack up to such a large cumulative error as to make knowledge of historical facts impossible [false].”
  
  But unfortunately, unless you actually do the error analysis numerically, you’ll never know the difference. And you don’t. Unless I’ve missed it.
  Reply
  - Richard Carrier on October 12, 2012 at 12:15 pm
    
    But unfortunately, unless you actually do the error analysis numerically, you’ll never know the difference…
    
    If that were true, then all knowledge would be impossible. Humans don’t do numerical analyses of daily or professional judgments, for example. Nor need they. Only when the probabilities get dangerously close to uncertainty do we need to take greater care figuring out what they are. Otherwise, no amount of accumulated error is likely to make me wrong to say “the odds of an asteroid hitting my house today are less than 1 in 1000,” for example. So, too, in historical judgments. Where there is knowledge to be had.
    
    In general, we can look at all possible known sources of error and judge at-sight whether they will make a difference or not, were we to work up their numerical effect. And when it’s obvious they won’t, we don’t need to waste the time doing it. Just as scientists don’t waste any time trying to work out the prior probability that magic is affecting their data.
    
    In less clear cases, we just adopt less certain conclusions. For example, I can show that the actual rate of interpolated references to Jesus in pagan literature of the first three centuries is higher than 1 in 10 and of interpolated passages in the NT over that same period is higher than 1 in 400, and if we include interpolated phrases in existing passages, it’s over 1 in 200. So if I adopt a prior for interpolated references to Jesus in pagan literature of 1 in 200, there is no amount of error that is going to make me wrong to say “the prior probability a reference to Jesus in pagan literature is interpolated is no less than 1 in 200” (because in actual fact, it’s 1 in 10, so any adjustment of the prior toward reality would increase my result twenty times over; there is no likely accumulation of errors that is going to make me wrong by a factor of twenty–one would have to imagine an extremely implausible set of circumstances to get that kind of effect, which borders on Cartesian Demon reasoning).
    
    Likewise any other questions. It is simply not necessary to know exact numbers in order to know general facts like these. That’s why human reasoning is even possible in the first place. We couldn’t find our way to the bathroom otherwise.
- IrrcoIan on October 12, 2012 at 12:15 pm
  
  “You also seem to think that this problem is only introduced by using BT. In fact, BT makes exactly zero difference to any of this.”
  
  Not quite. My central point is that your numeric results cannot honestly be interpreted as probabilities of anything meaningful, because you are running BT informally, on poorly specified data, with no way of properly quantifying and therefore controlling for errors, and no way of empirically confirming your inputs.
  
  To the extent that you want to use BT to show flaws in bad arguments. That’s fine.
  
  But you seem to want to make positive arguments too, based on probabilistic calculations.
  
  If that is the case, then the sensitivity of your conclusion to errors in your input data is crucially important, it makes a big difference.
  
  The problem is, of course, you’ll never actually see the errors if you can’t do proper quantification. Because you’re never going to use it on anything you can go and check quantitatively.
  
  Your results are essentially unfalsifiable, at least quantitatively. And because there are a small number of boolean hypotheses you’re interested in, and you’re generating probability results, you’re never going to have enough results to confirm that your method works.
  
  As such I think using BT and all this messing around with numbers is at best unnecessary, and at worst tendentious.
  
  You could have achieved the same methodological critiques in your book without having to dress them up in the language of probability theory. And dressing them up probability theory means you’re here defending your use of probabilistic reasoning, rather than defending your critiques, which is a shame.
  
  If it helps your intuition to think that way, that’s fine. But I don’t think you are doing what you seem to want to claim to be doing.
  Reply
  - Richard Carrier on October 12, 2012 at 12:53 pm
    
    “You also seem to think that this problem is only introduced by using BT. In fact, BT makes exactly zero difference to any of this.”
    
    Not quite. My central point is that your numeric results cannot honestly be interpreted as probabilities of anything meaningful, because you are running BT informally, on poorly specified data, with no way of properly quantifying and therefore controlling for errors, and no way of empirically confirming your inputs.
    
    Which is a description of all historical reasoning generally. In fact, of almost all human reasoning generally (in daily life and professions).
    
    The bottom line is: we are already estimating priors and consequents and coming to conclusions from those estimates. Every historical argument ever made does that. We’re just being even more imprecise than we need to be and often illogical about it. We should recognize what we are doing, do it correctly, and get better at doing it. That’s the sum of Proving History‘s argument.
    
    To the extent that you want to use BT to show flaws in bad arguments. That’s fine.
    
    But you seem to want to make positive arguments too, based on probabilistic calculations.
    
    If that is the case, then the sensitivity of your conclusion to errors in your input data is crucially important, it makes a big difference.
    
    That’s always true, and just as true, whether we use BT or not. The advantage of BT is that now we know we’re doing it, and what we can do to do it better, or how much uncertainty we should actually accept knowing that we can’t.
    
    Your results are essentially unfalsifiable, at least quantitatively.
    
    If that were true, then all historical knowledge would be impossible (this follows from what I demonstrate on pp. 106-14).
    
    For example, the statement “there is less than a 1 in 1000 chance an asteroid will hit my house today” is certainly falsifiable. Even though it isn’t anywhere near exact. I give other examples relating to history in the book. For example, regarding silent reading in antiquity (Axiom 11, pp. 33-34, with note 10, pp. 298-99); and the libraries example (pp. 229ff.), and my discussion of Galilean industrial mechanics (pp. 250ff.), and the deleting of the sun (pp. 41ff.), and so on.
    
    I even discuss the problem of falsifiability in the graverobbing analogy (pp. 212-14).
    
    And because there are a small number of boolean hypotheses you’re interested in, and you’re generating probability results, you’re never going to have enough results to confirm that your method works.
    
    That depends on what you mean by “works.” How do you think any reasoning about history works? If correct reasoning about history is not Bayesian reasoning, then what is it? By what “working” method can historians claim to know anything?
    
    Can they do it without probabilistic reasoning?
    
    I’d love to see you demonstrate that.
    
    And if they have to use probabilistic reasoning (and they do), are they not then logically bound by Bayes’ rule? Is there any possible way they can argue without following Bayes’ rule?
    
    Let’s see an example of how a historian would do that.
    
    Until then, I don’t think you understand what is going on here.
- IrrcoIan on October 12, 2012 at 12:46 pm
  
  “It is simply not necessary to know exact numbers in order to know general facts like these. That’s why human reasoning is even possible in the first place. We couldn’t find our way to the bathroom otherwise.”
  
  Conversations like this can easily slip into violent agreement!
  
  Of course this is true. But I simply don’t see how it addresses the point.
  Reply
  - Richard Carrier on October 15, 2012 at 11:51 am
    
    Carrier: “It is simply not necessary to know exact numbers in order to know general facts like these. That’s why human reasoning is even possible in the first place. We couldn’t find our way to the bathroom otherwise.”
    
    Ian: Of course this is true. But I simply don’t see how it addresses the point.
    
    That depends on what the point was.
    
    If it was that some cases are undecidable on BT, that point is moot. Because that simply translates what is already true generally: with or without BT, some cases are undecidable. BT just tells us when and why.
    
    But if it was that BT cannot usefully model historical reasoning because the probabilities are always fuzzy, it certainly addresses the point. Because “being fuzzy” does not translate to “are not known to any useful degree.”
    
    Almost all human reasoning, and historical reasoning especially, deals in fuzzy probabilities. That does not make such knowledge impossible–not generally, nor in the hands of BT.
    
    So on neither the first point nor the second do I see any relevant objection being made.
    
    That leaves the possibility that your point was something other than those two things. In which case, you’ll need to explain what your point is, in such terms as to make clear how it is not either of those.
F on October 11, 2012 at 9:16 pm

I readily concede that my colloquial discourse will lead to ambiguities that chafe at mathematicians; but this is precisely the kind of shit they need to get over, because they are simply not going to be able to communicate with people in the humanities if they don’t learn how to strategically use ambiguity to increase the intelligibility of the concepts they want to relate.

T-shirt.
Reply
brettongarcia on October 12, 2012 at 12:44 am

Oddly enough, Historical Jesus SUPPORTERS have recently used Bayes. And I noted some problems with their application.

A few months ago I interfaced on a blog with Dr. James Tabor, the archeologist who is doing the “Jesus Tomb”/Talpiot A and B excavations. He’s been looking at the fact that the names on the six or seven bone boxes in the tombs, seem to rather exactly match Jesus’ family. Then Tabor cites Bayes mathematicians; who asserted that given the popularity of each individual name, then the likelihood of all of them coming together at random, the chance of this particular grouping of names not being of the Jesus family, is extremely small. The conclusion being that these two tombs are the authentic tombs of Jesus and/or his family.

But here was one objection I posed to the input into this application of Bayes: though to be sure, it was extremely unlikely at first that these particular names (and no others?) would come together at random, the odds of names like “Jesus” and “Joseph” and “Mary” and so forth occuring in one family … go up astronomically … shortly after the death of Jesus. As Christianity became somewhat more popular, there would be many more parents naming their children, after New Testament figures.

Applying Bayes to historical situations is quite complex; and much depends on how we think of each situation, and how many different scenarios we consider.
Reply
- Richard Carrier on October 12, 2012 at 12:00 pm
  
  I’ve blogged about that before: see The Jesus Tomb and Bayes’ Theorem.
  
  Although their error was not the one you suggest (which would not likely affect their data: the find unmistakably predates 70 AD, and such an effect on name frequencies would be too small to have any observable effect by then, e.g. given there were over a million Jews in Judea, even assuming ten thousand Christians in Judea by the year 70, an absurd over-estimate, Christian names could have only a hundredth part effect on name frequencies in Judea, too small to matter).
  Reply
sawells on October 12, 2012 at 1:42 am

I think you mean Wallace to your Darwin, not Watson; not that it isn’t a lovely mental image (Astounding, my dear Charles! Elementary, my dear Watson…).
Reply
- Richard Carrier on October 12, 2012 at 12:03 pm
  
  Oh doi. Thank you. Fixed!
  Reply
malcolms on October 12, 2012 at 9:41 am

BTW, Ian has followed up with that promised blog post that expands on his previous hints about mathematical difficulties in applying BT in history. The main 2 points he makes, about reference classes and error ranges when the inputs are very small, are what I was discussing in my previous post (which I see is still awaiting moderation).

Here’s the link: http://irrco.wordpress.com/2012/10/11/the-effect-of-error-in-bayess-theorem/

These are actually much more serious issues than the ones he concentrated on in his first 2 posts.
Reply
IrrcoIan on October 12, 2012 at 11:41 am

“There actually is. I discuss the criteria of better and worse reference classes in chapter six”

Yes, you do. I respectfully suggest you haven’t grasped what I’m talking about in that section. I go on to refer to your method for choosing a reference class, but that’s not a solution to the *sensitivity* of the result to changes in reference class.
Reply
- Richard Carrier on October 12, 2012 at 12:28 pm
  
  You didn’t give any relevant examples of that being a fatal problem, though. You discuss miracle claims, but those have vanishingly small priors and terrible evidence, two facts together that entail the only way errors could make us wrong to reject them is if the accumulated error is causing our estimates to be off by over a thousand times (in coincidentally the one convenient direction), and there is simply no plausible litany of errors that can do that in any relevant case (at least, none you list), much less an undetectable mistake in selecting a reference class (since we can rule out detectable mistakes, so those aren’t at issue).
  
  Why not apply your argument to the actual example I employ in chapter six: the probability that a newly excavated Roman city in Italy had a public library. I provide various possible numbers there that you can work with, and discuss a variety of possible sources of error, that you could expand on. Then you can show whether it is therefore impossible to know whether any Roman city had a library, because of the possible accumulation of errors that aren’t being numerically estimated make the prior probability indeterminate.
  
  And do that without proposing anything like a Cartesian Demon.
  
  Even better if you can come up with a better way historians can approach that problem than I map out.
  Reply
IrrcoIan on October 12, 2012 at 12:51 pm

“The short answer is that Cartesian Demons have vanishingly small priors and therefore can be ruled out ”

Not in situations where we’re explicitly being asked to determine if a supernatural event occurred. Its a different kettle of fish to finding correlations in gene sequences and ignoring the possibility that a deceptive God is playing with our instruments. We’re not talking about a Cartesian Demon here, but a teleological purpose for the evidence, that is related to the hypothesis.

I think there’s a very good reason to enforce methodological naturalism on the discussion in all cases and to explicitly disregard any supernatural event.

Sure, you lose any leverage over true believers, then, but I suspect you didn’t have much of that to start with.
Reply
- Richard Carrier on October 15, 2012 at 11:57 am
  
  Carrier: “The short answer is that Cartesian Demons have vanishingly small priors and therefore can be ruled out ”
  
  Ian: Not in situations where we’re explicitly being asked to determine if a supernatural event occurred.
  
  Untrue. The nature of the claim makes no difference. Either way you are still positing an entity with an extremely low prior. Indeed, a lower one, since a God who acts like a Cartesian Demon is inherently less likely than a God in general (this is thus just another instance of gerrymandering that halves the prior, at best: as I explain on pp. 80-81, and that’s even assuming that there is a straight 50/50 chance that if God exists, he acts like this, which IMO is an absurdly generous hypothesis, especially if one tries to make it compatible with the definition of any God anyone actually believes in).
  
  Its a different kettle of fish to finding correlations in gene sequences and ignoring the possibility that a deceptive God is playing with our instruments. We’re not talking about a Cartesian Demon here, but a teleological purpose for the evidence, that is related to the hypothesis. I think there’s a very good reason to enforce methodological naturalism on the discussion in all cases and to explicitly disregard any supernatural event. Sure, you lose any leverage over true believers, then, but I suspect you didn’t have much of that to start with.
  
  I fully agree with all of this. But that last point has to have a logically valid reason. BT provides that reason. And the first point makes no difference to what I was saying: you do not know the prior probability of a meddling god, yet you know it is low enough to exclude it. Thus “not knowing the prior” is not a valid argument against modeling reasoning with BT, whether in science or history.
  Reply
Mark Erickson on October 13, 2012 at 8:59 pm

I’m the “fan” Ian mentions above. To define myself, and in regards to Ian’s most recent post saying we’re all just tribalists, I really enjoyed reading Proving History and the same goes for this blog. Dr. Carrier is a good writer, both for clarity and entertainment, and his insights are valuable in my opinion. But I don’t uncritically accept whatever he says, and I’m open to correction, as I think I’ve shown in comments on Ian’s blog.

More importantly, this thread is great stuff, really digging in and mucking around in the weeds, but, it’s still in the weeds. I’d like to ask Carrier to rise up out of the weeds and give a big picture summary of this debate. It seems clear to me from both the book and lots on this thread that Carrier’s main point is to improve the results of historical debate, by exposing fallacious reasoning and providing a framework to moderate disputes. BT accomplishes this not so much based on mathematical rigor, but from explicitly stating premises and arguments. Put another way, BT for history is more logical and less statistical. Ian seems stuck in the statistical weeds. He may have very good points, but viewed from afar, it is just a small russle of the grass. Is that right? I also suggest including this in one of your upcoming posts, as very few will have gotten through all the weeds above. Thanks.
Reply
- Richard Carrier on October 15, 2012 at 9:39 am
  
  BT accomplishes this not so much based on mathematical rigor, but from explicitly stating premises and arguments.
  
  Spot on.
  Reply
Elle87 on November 3, 2012 at 10:05 am

Just to let you know, one of the main reviews on amazon is a pretty negative one. Figured you may want to have a brief look at it since its author claims to be a mathematician.

http://www.amazon.com/review/R392IPXC3QP131/ref=cm_cr_pr_viewpnt#R392IPXC3QP131
Reply
- Richard Carrier on November 6, 2012 at 9:34 am
  
  That’s a pretty lame critique.
  
  He gives no examples of what he claims to find when he says “his methods press well beyond the confines of any form of axiomatic probability theory” … okay, where exactly do I do that? And in what way would that even be bad, since not everything is about axiomatic probability theory?
  
  I have no idea what he means by “in general, heuristic arguments are not only invalid but also useless from a purely logical framework.” What is he critiquing with that statement? What heuristic arguments? What in my book is he claiming is a heuristic argument? And what does he mean by saying all heuristic arguments are invalid and useless? (Really? All heuristic arguments, by definition?)
  
  His claim that “most irritating was Carrier’s insistence that proving something to be unlikely is equivalent to proving something false” seems to be ignorant of basic epistemology (if being unlikely is not what we mean by saying something is false, then nothing can ever be claimed to be false, since everything has a nonzero probability of being true: Axiom 4, pp. 23-25).
  
  His reference to the Banach-Tarski Paradox has no discernible relevance to anything I argue in my book. He doesn’t even provide an explanation of what relevance he thinks it has to anything I actually argue in my book.
  
  As far as his claiming “I would consider his entire premise suspect, due to his insistence on applying subjective quantities to an objective theorem and general lack of mathematical rigour,” that suggests he didn’t actually read the book, which addresses that objection extensively and in detail (so if he has no argument against what my book says about that, and it appears he does not, then it appears he did not actually read the book).
  
  And as far as his vague and undefended complaint against my “fast-and-loose treatment of mathematics and logic,” I think my article here (above) addresses that more than adequately.
  Reply
blotonthelandscape on November 5, 2012 at 7:38 am

New book on Bayes: http://www.amazon.co.uk/The-BUGS-Book-Introduction-Statistical/dp/1584888490. It’s a companion to BUGS http://www.mrc-bsu.cam.ac.uk/bugs/, free Bayes software, although might be a bit advanced for the lay-user.
Reply
- Richard Carrier on November 6, 2012 at 9:48 am
  
  Yes. Far too advanced for most.
  Reply