A Test of Bayesian History: Efraim Wallach on Old Testament Studies

29 May 2018

An interesting article has been published under peer review, which tests my concept (developed and argued in Proving History) that all historical reasoning is already in fact Bayesian (historians just don’t know it), by applying it to the analysis of a major challenge to the consensus that won out in Biblical studies, regarding the origins of the Israelites—the mainstream consensus completely flipping, albeit after decades of debate. The model may fit the same paradigm shift on the historicity of Moses (which is directly related to what this article looks at). And so it might fit the future of Jesus mythicism, now that that has been defended under peer review (for the first time, in 2014). So I’ll summarize this article and its significant points.

I’ll eventually discuss here how this relates to everything from superstring theory and steampunk to the resurrection of Jesus and ancient aliens. No, seriously. I get them in there. It makes sense. Trust me. So buckle in and start the slow ride…

Introducing the Research

The article is by Efraim Wallach, titled “Bayesian Representation of a Prolonged Archaeological Debate,” in Synthese 195.1 (January 2018): 401-31. The abstract reads:

This article examines the effect of material evidence upon historiographic hypotheses. Through a series of successive Bayesian conditionalizations, I analyze the extended competition among several hypotheses that offered different accounts of the transition between the Bronze Age and the Iron Age in Palestine and in particular to the “emergence of Israel.” The model reconstructs, with low sensitivity to initial assumptions, the actual outcomes including a complete alteration of the scientific consensus. Several known issues of Bayesian confirmation, including the problem of old evidence, the introduction and confirmation of novel theories and the sensitivity of convergence to uncertain and disputed evidence are discussed in relation to the model’s result and the actual historical process. The most important result is that convergence of probabilities and of scientific opinion is indeed possible when advocates of rival hypotheses hold similar judgment about the factual content of evidence, even if they differ sharply in their historiographic interpretation. This speaks against the contention that understanding of present remains is so irrevocably biased by theoretical and cultural presumptions as to make an objective assessment impossible.

His last point is that progress toward reversing the consensus is possible (albeit slow) as long as the relevant consensus group does not disagree on the bare facts (and only disagrees on their interpretation). For example, how we interpret Paul’s two references to “Brothers of the Lord” is a disagreement of the second kind; almost everyone in that debate agrees at least on the bare facts (that the passages exist, and there are only the two; their formation in Greek; their context; and so on). However, some people in that debate have the wrong facts, for instance insisting Paul uses the phrase “Brothers in the Lord” for cult kinship rather than biological; Paul never used that phrase, anywhere (OHJ, n. 94, pp. 584-85). That would be a disagreement of the first kind, on the bare facts. Likewise people who claim eyewitnesses wrote the Gospels, or that we “have” Aramaic sources from first century Palestine attesting to a historical Jesus (like, incredibly, Bart Ehrman).

Take note of this. Because the scholars who endeavor to lie about the facts (such as what the facts are generally, or what evidence and arguments I make in my book), are thereby establishing themselves as no longer valid members of any credible consensus group. Only agreement on the facts can establish reliable consensus-based knowledge. Interpretations may continue to vary, but that then becomes a function of how much time someone spends actually examining the arguments on either side of different interpretations, and how many peers switch sides, thus creating a bandwagon effect as holdouts start to move once a certain percentage of their peers move before them. Which is also why it’s so important for historicity defenders to try and threaten anyone who changes their opinion on this, with open ridicule, loss of social status, or destroying their careers. They need to stem the tide of a bandwagon effect. As happened for Moses (both the attempts to stem that tide, and their eventual failure).

Wallach also found some experts arguing my thesis that historical methods are really Bayesian, besides myself and Aviezer Tucker. In particular Merrilee Salmon, in “‘Deductive’ versus ‘Inductive’ Archaeology,” American Antiquity (1976): 41, 376-81, further developed in her book Philosophy and Archeology (1982); and Alison Wylie, in “Explaining Confirmation Practice,” Philosophy of Science 55.2 (June 1988): 292-303, influencing her subsequent arguments on archeological methodology in Thinking from Things (2002). I had already noted in Proving History (n. 8, ch. 3) that the use of Bayesian methods was already beginning in that branch of history (archaeology), citing the oft-emplyed textbook by Caitlin Buck, William Cavanagh, and Clifford Litton, Bayesian Approach to Interpreting Archaeological Data (1996); and the test article by W.G. Cavanagh et al., “Empirical Bayesian Methods for Archaeological Survey Data: An Application from the Mesa Verde Region,” in American Antiquity 72.2 (April 2007): 241-72. I also noted some slightly clumsy attempts were being made to apply Bayesian reasoning in manuscript and textual studies (n. 7), e.g. Winsome Munro, “Interpolation in the Epistles: Weighing Probability,” New Testament Studies 36 (1990): 431-43; and James Albertson, “An Application of Mathematical Probability to Manuscript Discoveries,” Journal of Biblical Literature 78 (1959): 133-41.

Wallach on Bayesian Reasoning

Wallach models two generations of the debate over whether, basically, the Bible is right that the Israelites came from Egypt and conquered Palestine, or whether in fact the Israelites were just another native Palestinian tribe of Canaanites who absorbed surrounding tribes and then later invented the myth of their conquering from outside. Some alternative theories to those came and went over the course of the 20th century, which Wallach also includes. The observed phenomenon is that the field started convinced of the Biblical hypothesis, and ended up convinced of the Native hypothesis instead. Wallach asks: Is the process that the experts in that field underwent in that period Bayesian? He finds the answer is yes.

He points out the significance of this to Bayesian modeling of all historical argument, showing that it actually is a correct account, and that objections to it fail—Bayesianism does not require excess formalism or precision, and is not hamstrung by the role of theory-laden reasoning or subjectivity in assessing evidence (two points he cites me as having also made in Proving History). As Wallach says (my emphasis):

I construct the interplay between historiographic hypotheses and archaeological evidence in a simplified Bayesian model of successive conditionalization. Under the model’s fairly moderate assumptions, the actual two-generation process that led to a profound change in the beliefs of a disciplinary community can be reconstructed.

[And]

On the basis of this case study I argue that the relativist opinions are unwarranted. I demonstrate, in particular, that posterior probabilities of historiographic hypotheses can converge even if they start from deep and entrenched discord about prior assumptions.

This refutes the claim that prior assumptions predetermine outcome, and the claim that Bayesian reasoning requires excessive precision (or indeed any precision at all). In actual fact, imprecision can be perfectly well modeled by Bayesian reasoning, thus creating no difference between Bayesian reasoning and any other form of credible historical reasoning. Except one thing: with Bayesian reasoning, the inferences you are relying on are exposed to criticism and check.

Standard reasoning is vague as to how or why conclusions follow from premises, or even what the conclusions are (certainty? likelihood? what likelihood?), shielding them from analysis. You can’t fact-check a historian’s reasoning, if they can’t even tell you what their reasoning is. It’s like trying to guess at the fuel efficiency of a car without getting to pop the hood and see what the engine looks like. All Bayesian reasoning does is force you to pop the hood and show everyone what the engine looks like. You can’t hide any mistakes then. If you are relying on an invalid inference, it will be immediately exposed. If you are erring in your demarcation of evidence (putting facts in that don’t belong, or leaving facts out that do), it will become obvious. Otherwise, Bayesian reasoning literally is standard reasoning: it’s what historians are already always doing. They just don’t know it. They have no idea what the engine looks like or if it’s well built. They just drive the car and assume it works.

Wallach uses his study to illustrate ways to solve several popular Bayesian problems: the problem of novel theories, the problem of old evidence, the problem of disputed evidence, and the problem of surprising evidence or diverse evidence. His paper is of interest even just to anyone interested in any of those problems. By using a real-world example, he gets to why certain solutions to those problems work and others don’t. His paper is thus also interesting even to anyone interested in that event in the historiography of the Bible: how, and why, the consensus so radically changed on where the Israelites came from.

Bayesian Formalism

Section 1 of Wallach’s paper introduces the issues and previews the conclusion. Section 2 is “a minimal introduction to Bayesian formalism” as needed to understand the rest of his paper. Section 3 “delineates both the hypotheses about the Bronze/Iron Age transition in Palestine that were advanced in the 20th century and the relevant archaeological findings” and Section 4 “describes the methodology employed in the model.” Section 5 “comprises three Bayesian simulations, each one a ‘contest’ between pairs of hypotheses, together with sensitivity checks to the model’s assumptions.” Section 6 “discusses the results of these simulations” and Section 7 “concludes with some general remarks.”

His discussion of the formalism is I believe too terrifyingly formalized for pretty much any humanities major to stomach. But it is essentially right. He covers how Bayesian modeling works, and what shortcuts reality tends to require in its use, and what debates occupy even Bayesians about modeling. I don’t think such high formalism is needed in history, but it is needed in papers like Wallach’s, because he needs to demonstrate it works even under such rigorous formalism. You have to build an actual engine, and prove it works, before you can trust mere sketches of the engine. But historians can rely on sketches, now confident the engine is real and functions. The sketch just captures and includes the high formalism behind it. (For what a sketch, or “low formalism,” looks like, see my article What Is Bayes’ Theorem & How Do You Use It?. My book OHJ then provides an extended example. I sketch some others in Proving History.)

Objective or Subjective Bayesianism

Wallach confronts the dispute between Objective and Subjective Bayesians and picks the subjective pose, simply because he will be using a variety of starting assumptions to test how sensitive the conclusion is to variations in starting assumption. And that works the same way no matter whether you are building your probabilities objectively or subjectively. So the distinction hardly matters for what he aims to accomplish and what he concludes. But there is another reason the distinction doesn’t matter: as I show in chapter six of Proving History, all that subjective Bayesians really are saying is that we can only in most cases approximate to objective probabilities; but the more data we get, the more our subjective estimates will approach the objective reality.

For example, Wallach says Objective Bayesians “state that priors should reflect real-world probabilities, pointing out that convergence under a finite series of evidence is not assured if scientists are allowed to assign wildly arbitrary priors.” But of course, Subjective Bayesians do not and cannot honestly “assign wildly arbitrary priors.” That’s a straw man. Of course, everyone is always assuming some sort of prior probabilities, in every argument they make, to any empirical conclusion whatever, no matter how insistent they are that they are not. (See my discussion in If You Learn Nothing Else.) But more importantly, priors are always highly constrained by objective evidence. (See my discussion of “arbitrary priors” in my article Two Bayesian Fallacies.)

It’s just that, in history, the scale of uncertainty is too high for the comfort of, say, physicists who stare in horror at not being able to have vast reams of data to work with. Thus, subjective estimates in history are constrained by objective facts, they just aren’t as constrained by objective facts as probabilities in the hard sciences are—because there just aren’t as many facts to work from. The end result is really just wide margins of error: wider than would be tolerated by a science journal; but narrower than would warrant claiming the result is “wildly arbitrary.” Welcome to the field of history. We’ve been comfortable with this fact for centuries. (See my article History as a Science.)

For example, Wallach laments that it’s “difficult to see how the objective probability of…string theory being true should be determined, let alone that of complex nonquantitative hypotheses” like how Israelites came to exist. It actually isn’t all that hard…as Wallach himself shows by how he decides on priors for those “complex nonquantitative hypotheses” he’s testing. He allows wide margins of error, and constrains his choices by appeal to objective facts—beyond which probabilities outside his selected range would be obviously unbelievable, rather like assigning a “33% probability” to a body that went missing having gone missing by being reanimated from the dead. Were that true, then a third of all bodies that have ever in history gone missing, were reanimated from the dead. It’s objectively obvious why that’s impossible. It’s quite clear how objective facts, do constrain what probability assignments we can deem within the realm of credibility. Even when those assignments are subjective.

A subjective assignment of probability would be, yes, a measure of how likely things seem or feel to us, in lieu of drawing that probability from an objective data set (e.g. a complete list of all missing bodies in the whole of human history and how they went missing; which data is not available, at least not in such precise terms). But what we are doing when we estimate in that way is trying to guess what that objective probability most likely would be if we had access to the requisite data. We are thus, in fact, actually just trying to figure out the objective probability. We just don’t have good data to do that with. And to account for that, we allow very wide margins of error. But not so wide as to be absurd (like “the prior probability a missing body was reanimated from the dead is 0-to-30%”). Reality does constrain us (a third of all missing bodies, clearly were not reanimated; we’d have noticed by now).

So if someone were to ask (as Sheldon Cooper is actually doing in the screen capture I lead my What Is Bayes’ Theorem article with) what the prior probability is that some form of string theory is true, we actually do have some objective facts to estimate that from, they just are highly uncertain. If you were to poll the opinions of qualified physicists (those who actually study string theory or its competitors, or work in fundamental theory enough to have well informed opinions on the debate), and crowdsource a probability estimate therefrom (by averaging the answers), you’d end up with a prior probability that’s probably as close to actual as any other facts would deem. This wouldn’t be arbitrary, much less “wildly.” It would reflect the combined knowledge of thousands of relevant experts, and an emergent measure of how well string theory performs and is likely to perform in explaining reality. So, too, if one could have done that for the theories Wallach is testing. He can’t though, since nearly everyone he’d need to poll is now dead; and they would have polled differently even at different times of their lives. What he does instead is “get a sense” from the available data how such polls most likely would have gone, and to account for uncertainty, he builds in a wide but plausible margin of error.

That’s just Objective Bayesianism with less data. And IMO, that’s all Subjective Bayesianism ever is. (When honest; crank Bayesianism is also a thing; no different than crank logic or crank science or crank statistics or anything else pretending to be real but really just a scam.)

On Competing Models in Israelite Studies

Wallach models the fates and confidence in four hypotheses: the Conquest Hypothesis (anything at least loosely matching the Bible), the Immigration Hypothesis (matching the Bible sans war), the Revolt Hypothesis (matching the Bible sans immigration), and the Autochthonic Hypothesis (the Israelites were always there, and just culturally evolved over time). The last of these is now the mainstream consensus. The first of these was the mainstream consensus a century ago. The transition in consensus went from the first, through diverse attempts at the second and third, to the last, in the end a complete reversal of view.

The Autochthonic Hypothesis Wallach correctly outlines as follows:

Indigenous inhabitants of Late-Bronze Canaan were the major origin of the [proto-]Israelite population. The demographic source of this population were local agrarian residents (Dever 1998, 2003) or “internal nomads” that existed in Palestine for hundreds of years as a part of a dimorphic society (Finkelstein 1995, 1998a) and not outsiders who penetrated the land either by military conquest or by peaceful migration. They were driven to settle in the hill country of Palestine by the instability of the Late Bronze Age that followed the weakening and eventual withdrawal of Egyptian rule during the 12th century B.C.E. Rather than a revolt against the established order this process was a reaction to its disappearance. The coalescence of this diverse population was prolonged and gradual, and a national identity with more-or-less shared narratives did not materialize until later in the Iron Age.

He thus has a defined set of hypotheses properly demarcated, produces a reasonable range of estimates for their priors, and isolates 24 items of evidence “that span the period between the late 1920s and the late 1980s and had pronounced influence upon the debate among” the four competing hypotheses, and tests out their effects with various Bayesian models to show how their actual impact tracks Bayesian reasoning (though the historians and archaeologists exhibiting this behavior of course were not aware of this).

Wallach notably points out the Bible itself had no real effect. As the historiography of this debate shows it could be interpreted in any way that would support every hypothesis. Certainly there were “experts” who wanted the Bible to be literally true. But there were also experts who wanted to genuinely test what the Bible says against material evidence, and when the two conflicted, the obvious solution was to reinterpret the Bible (as either lying, mythologizing, fictionalizing, fossilizing error, or written nonliterally, among various other ways to make the text fit the facts). This was essentially due to a singular defect of the Bible: nothing in it was written by anyone actually alive when anything it says happened (in the relevant period of time). In fact, all relevant content was composed centuries afterward. By authors with no known skills in critical historiography, no known sources, and an obvious propagandist motivation. That’s literally the worst source to have in the whole field of history.

On the Evidence & Its Effects

“As to archaeological evidence,” Wallach says, “it is by nature fragmentary and its interpretation is theory-laden no less than in any other scientific discipline.” But, he finds, the evidence in this case confirms the assertion of Alison Wylie that “the strategies archaeologists developed for exploiting a range of background knowledge can be very effective in establishing a network of evidential constrains.” Key to this was agreement on what the evidence was, even if disagreement persisted as to which hypothesis it supported.

As Wallach puts it:

[M]ost of the evidential results…were undisputed, not only in their “low-level” content—that such and such material remains were found at a certain site—but also in their “middle-level” aspect—“what was here when”: [e.g.] That a particular site was destroyed at a certain time, that another one was uninhabited throughout a specific period, etc.

There were some exceptions, which he analyzes. Not just in that some of the evidence changed (e.g. being redated etc.), which had an impact on theory choice. But also in some disagreeing over the facts themselves (e.g. what the actual date of a particular find was). He shows the effect of this on the various models. He constructs “three diachronic, contrastive simulations that attempt to reconstruct in Bayesian terms the scholarly debate about what happened in the Bronze/Iron Age transition in Palestine.” These “simulations compare pairwise the Conquest hypothesis to the Immigration hypothesis, then the Immigration hypothesis to the Revolt hypothesis and again the Immigration hypothesis to the Autochthonic hypothesis.”

Moreover, “each simulation begins with the assumption that the degree of belief in the ‘incumbent’ (the current dominant) hypothesis is one hundred times stronger than that in the ‘usurper’ one, an assumption the sensitivity to which is subsequently examined.” In other words, he assumes the consensus position (the Bible Is Right) started with a prior probability of effectively 100 to 1, and shows how the evidence overwhelmed even that extreme level of confidence and reversed the consensus altogether. This is not surprising, as material evidence is very powerful in any probability matrix. Imagine if, for example, we uncovered a Christian burial conclusively dated to the 50s A.D. containing a lead tablet pronouncing the beliefs of the deceased, and it declared the archangel Jesus to have been slain in the sky by Satan. The impact of this on historicity would be catastrophic.

Alas, unlike how the Israelites came to exist, a question for which a huge array of material evidence was left for us to exmaine, the actual truth of Christianity in its first generation is almost completely lost to us (even the letters of Paul are highly compromised, and already decades late). It’s therefore unlikely to follow such a model. Nor can it claim such high priors. Because even historicity lacks any comparable material evidence in its support. The consensus on his historicity is actually demonstrably malformed: it is not based on any principled or disciplined survey of the evidence (see Chapters 1 and 5 of Proving History and Chapters 2 and 7 of OHJ). And there has not been, in fact, any peer reviewed defense of historicity published in almost a hundred years.

This is why the historicity of Jesus hovers around probabilities of high uncertainty. My analysis in OHJ ends up with a best chance Jesus existed of 1 in 3, for example. Which is still a respectably high probability for historicity. Even a single weak item of evidence (something, say, six times more likely on historicity than myth) could reverse it. Because strong evidence means a factor of 10 or 100 or 1000 times more likely, or even a million. Consider the kind of evidence we have for other historical figures, from Hannibal to Julius Caesar, even Socrates, for comparison—my article on Spartacus directs you to all of those and more. We just don’t have any kind of evidence like that for Jesus (pro or con).

Wallach maxes his scale out at 10 to 1 for “strong” evidence, and then uses 3 to 1 for “substantial evidence,” and 1.5 to 1 for weak evidence (what he calls “anecdotal” evidence). That’s an extremely conservative scale, so all his conclusions follow, IMO, a fortiori (see that term in the index of Proving History). Most of the evidence in his examined case, he concludes, doesn’t rate as “strong”; but some does. It’s the cumulative effect of it all that became convincing and reversed the consensus.

Notably, Wallach found something interesting in the literature: “I try to take cues from the way the scholars themselves reacted to the data,” Wallach says, “For example, when a discovery was followed by a flurry of auxiliary (often rather ad hoc) assumptions to reconcile it with one’s favorite hypothesis, I assume it to be strong evidence against that hypothesis,” which is reasonable; whereas he found that scholars promoting such evidence would highly over-rate it:

It is worth noting that researchers’ own appreciations of the evidential strength of their findings were often much stronger. Only rarely would a scholar say that his results “increase (diminish) the plausibility of hypothesis X by some degree.” Expressions like “there can be no doubt that this destruction was the deed of the Israelite tribes” (Yadin 1965) or “there is not the slightest doubt that we are now witnessing the beginning of the settlement of the Israelite tribes in the Negev” (Aharoni 1976) (italics mine) are much more frequent. But as we shall see, such strong claims were not always corroborated by later discoveries.

This leads to a very important lesson for historians: we must check ourselves against excessive certainty. I do this in OHJ by setting as extreme a boundary against my own conclusions as is reasonably possible, thus producing the a fortiori result of 1 in 3 for Jesus, in contrast to my own unchecked judgment of 1 in 12,000. I believe only the result of 1 in 3 is confidently defensible to critics. The lower bound of 1 in 12,000 is plausible, but not confidently knowable. In effect, I recognize the margin of error is large. The true probability is somewhere in between 1 in 12,000 and 1 in 3 and we cannot know where in between; which means even I cannot claim to know the probability is less than 1 in 3, only that it is not reasonable to say it’s higher than 1 in 3. Historicists need to adopt the same humility in the face of the extremely ambiguous and problematic evidence we are stuck with for Jesus.

Wallach also runs sensitivity checks against even his own models, seeing what happens when you move the priors or the valuations of the evidence, by even a factor of ten. He thus discovers what it would take to have gotten a different result, which is revealed to require assumptions and estimates so patently absurd that no one committed to any kind of objectivity could possibly agree with them. Thus demonstrating that even subjective estimates of probability are constrained by reality, and thus not in fact “wildly arbitrary.”

The Problems of Old Evidence & New Theories

Wallach finds:

Using fairly moderate assumptions, a Bayesian model has been constructed for a two-generation process that led to a significant change in the distribution of beliefs within a scientific community. In the model as well as in the actual case, large differences in prior assumed probabilities were “washed out” by the flow of evidence.

That Bayes’ Theorem models what happened, also means Bayes’ Theorem could have been used to monitor, understand, perfect, and check the process itself. Archaeologists are now increasingly agreeing with that conclusion (hence textbooks started espousing Bayesian reasoning in archaeology in the late 90s). Historians need to get on board. They will have to overcome their abject terror at math and formal logic first. But every profession that claims to generate accurate knowledge needs to do that. They cannot claim it is not their responsibility to understand the logic of their own arguments. And the logic of their own arguments is mathematical. Because all their arguments are over a probability. And probability is mathematics.

Wallach finds this case also provides examples of how the problem of old evidence is “resolved by considering the learning of the relevance of the ‘old’ information” as itself new information. It then gets incorporated into the model, updating the priors like any evidence should. I think Wallach gets a little more confused in trying to articulate how his case resolves the problem of novel theories. He’s right in his conclusion that it has something to do with allowing sub-theories to emerge from broader covering theories. But more simply, it’s just this: all theories are included in the competing hypothesis set. Including the ones we haven’t thought of yet. He mentions this earlier in the paper (if H is the Conquest Hypothesis, then ~H logically necessarily must contain every other theory possible, and not simply “The Immigration Hypothesis” proposed at the time). But Wallach seems to miss the significance of this point.

The Autochthonic Hypothesis was already sitting inside the “Immigration-or-Other” Hypothesis that was at the start of this process the ~H, the “what would be true” if H were false. Which must always be stated, or else H becomes semantically meaningless. In other words, H can only be intelligible as a hypothesis, if you can articulate what it would mean for it to be false. Which is why all correct empirical reasoning must be counter-factual: you are always only ever proving a theory true, by trying to prove its alternatives false. There is in fact no other way to prove a theory true.

But of course, there are lots of things that can be true if H is false, including things we never thought of before. For example, “the Jews were genetically engineered replicants planted in Palestine by extraterrestrials” is logically possible, and thus in fact included in ~H, even at the beginning of the 20th century. Despite, obviously, no one then ever even imagining such a thing, much less knowing it was possible. But that means it just had a vanishingly small prior—and that has of course remained the case. But if we were invaded by those aliens today and they showed us all their meticulous records of their Bronze Age mission in Palestine, the evidence then would overwhelm even that vanishingly small prior and it would become the consensus conclusion of what happened.

That new theory was always in ~H, and thus always a part of even Wallach’s model. It wouldn’t have been “added” from “outside,” as if in violation of Bayesian formalism as he incorrectly claims. Bayesian formalism logically requires Wallach to have included that (and every other possible theory, known or unknown), in ~H, otherwise his model is literally invalid. It’s just that it’s prior was so low, and the evidence so non-existent, that he could safely ignore it—along with every other logical possibility too improbable to trouble himself with that I’m sure he never thought of, like “the Jews were a spontaneously generated mole people who ascended into Palestine from the core of the earth using steam-powered drill machines.”

All these theories are included in ~H. Always and forever. Even when we don’t know it. But our not knowing it, is why their priors are so small they won’t even show up in the math—at the resolution used by Wallach. For example, Wallach never rounds any result to the nearest trillionth of a percent, so any theory residing in that zone of probability or below simply won’t show up in his model. And that being the case is no mystery. There really isn’t any problem to solve here.

The Power of Diverse Evidence

I’ve described the power of diverse evidence in Proving History (pp. 98-100). Wallach also says his case study argues that what matters is not so much the “diversity” of evidence as “its independence from the hypothesis being probed and from other evidential claims” and “the consilience of several independent lines of evidence.” I would argue the latter is precisely what is meant by diversity of evidence, but that Wallach is right about the methodological point. When independent lines of evidence converge on a common conclusion, that is more powerful evidence than any one line of evidence alone, and diversity of evidence is just one way this manifests in practice. The effect is simply the accumulation of evidence. Their independence simply makes each item of evidence measure more strongly. As we’d already expect. (See my discussion of how we account for dependent probabilities in A Bayesian Brief.)

For example, in the case of any Caesar’s existence, we have diverse, which really just means highly independent, evidence: at the very least, inscriptions, coins, busts, contemporary manuscripts, preserved eyewitness texts, and subsequent critical texts conveying access to eyewitnesses and documents now lost, and broader courses of historical events that Caesar’s actions are necessary to explain. If all we had were one of those, not the others—for instance, if we had tons and tons of inscriptions, but nothing else—as long as the missing evidence is not surprising (i.e. we fully expect it to be missing, for reasons unrelated to whether that Caesar existed), the conclusion would likely still be the same, or near enough. So diversity of evidence does not by itself increase the probability of a hypothesis. It’s simply our observation of what happens when a lot of independent evidence piles up. And that often happens with diversity, because being diverse, the evidence is often identifiably more independent. For example, inscriptions are unlikely to cause eyewitness texts or vice versa, at least less likely than that all the inscriptions might be causally related. For instance, if some singular Johnny Appleseed erected hundreds of inscriptions across the Empire praising a made-up person. But we already know that becomes increasingly unlikely, the more and diverse the inscriptions are.

This was reflected in the Wallach study in how more and more settlement excavations simply didn’t turn up evidence matching the Conquest hypothesis (some did, but not enough, and too incongruously), but instead turned up evidence matching the Autochthonic Hypothesis, which for a time was confused with the Immigration and Revolt Hypotheses, which were two different ways to try and have it “both ways” (to try and get the Bible to be at least a little bit right, while still explaining the conflicting evidence from excavations). But as more and more settlement excavations (and other evidence) piled up, even those solutions came to be ruled out as unlikely, and the only credible hypothesis left was the Autochthonic. Here, it was mostly similar evidence (one excavation after another), but its shear quantity became increasingly overwhelming. There wasn’t any plausible Johnny Appleseed to explain that away. Other than the obvious: the evidence is all the same, because the Israelites were always there.

The Problem of Changing Background Knowledge

Wallach makes another interesting observation of use to historians and Bayesians:

Background knowledge is often assumed to be static so that the term indicating it is sometimes omitted under the assumption that it is “built into” the probability distribution. … However, background knowledge is bound to change. Most case studies of Bayesian confirmation involve short periods of conditionalization, so a change in background knowledge may be considered negligible. A protracted conditionalization process complicates the matter, because earlier evidence becomes part of the knowledge base and can influence subjective likelihoods.

This is an important point. History differs from most of science in this respect (though it happens in science from time to time too). The debates and shifts in conclusion can take decades or even lifetimes, and over such spans, all manner of background knowledge can change. Think of how much changed when the Dead Sea Scrolls were discovered…and even then, it took decades for what was even discovered there, to (a) become accessible, (b) to be analyzed by experts and written about, and (c) for this information to become widely known in the field. And that’s just one thing that changed our background knowledge considerably, practically rewriting everything we thought about first century Judaism.

History is also vastly more complicated a field than any science. And I don’t mean in the sense of being harder, but in the sense that, the data you need to know, is far too vast for any one person even to know. When you research something in history, you often have to go and “upload” a ton of background knowledge, stuff that exists “in the literature,” but that in fact probably few even know about (and in many cases, that no one knows about, everyone having collected and written it up having died). The average historian of Christianity has not read “the complete works of Philo,” for example, much less every journal article ever published in the last hundred years analyzing its contents. Yet all that knowledge exists, as the background knowledge of the field. And it will certainly influence probability judgments.

Thus, in practical reality, historians are constantly updating their background knowledge. It can change considerably even in the span of a single day—as you “read up” on relevant material you may have known existed but never had time to absorb, or even material you didn’t know existed until you needed to know if it did, and looked.

And not recognizing this can lead to folly. Look at how Larry Hurtado made huge mistakes, all by arrogantly not actually checking and uploading the correct background knowledge, thus skewing all his probability judgments. And look at the mistakes made by James MacGrath and Bart Ehrman, all by simply assuming their background knowledge was complete, rather than having learned (as any real historian would have been taught in grad school) that they need to check before asserting things they haven’t checked before—like that only governments erected inscriptions in antiquity (which is false); or that angels were never called men (which is false); or that we don’t have any birth records or death certificates from antiquity (which is false); or that prefects were often in fact hired as procurators and Tacitus had a particular bee in his bonnet about that, which radically changes how we understand the passage on Pilate in Tacitus (though that’s super obscure background knowledge, so it’s easy not to know that).

So we do need to be more flexible in how we allow changes in background knowledge to update our probability judgments. In history, it’s just easier to rebuild the model, with the new background knowledge now in place and re-informing all our probability judgments. Though formally, you could do it by inputting the new knowledge as “new evidence” (being newly “discovered” by the individual historian), item by item, and updating all your priors with how each new item of background knowledge changes everything. But that would be extraordinarily and unnecessarily complicated, and being overly complicated (especially for a humanities major), easy to screw up. The other method is easier. Not only easier to do, but easier to check for error.

Conclusion

Wallach’s paper is an interesting demonstration of how to apply Bayesian reasoning to historical problems (and a famous one in Biblical studies at that), and of what we can learn (from modeling existing histories of change in historical opinion) about how a more overt Bayesian process of reasoning actually should operate in practice, solving a number of commonly cited problems. It’s a good read. With a handy bibliography. I recommend it to anyone interested in studying any of these questions.

11 Comments

Marc Miller on May 29, 2018 at 1:05 am

Dr. Carrier, as a historian, what are your thoughts on the difficulties future historians may have studying our own times, when sources such as correspondence, and records are increasingly becoming digital?
Reply
- Richard Carrier on May 30, 2018 at 11:28 am
  
  I haven’t thought enough about that, since it isn’t my field. But there are concerns. Increasingly less is being committed to print. Consequently, any catastrophe that results in the loss of the digital record, will wipe a lot of history (though not all of it, as there are still print sources, and equivalent, e.g. microfilm). But that’s not unlike ancient history, where rot wiped most of its history (vast archives of books and documents rotted away, or burned in accidental fires, etc.). There are projects to store digital data in permanent form (micro-etched metal plates, for example), but I don’t know how widely implemented they are or what’s being stored in them.
  Reply
Dee Emarr on May 30, 2018 at 2:16 pm

“Which is why all correct empirical reasoning must be counter-factual: you are always only ever proving a theory true, by trying to prove its alternatives false (and failing). There is in fact no other way to prove a theory true.”

Is this correct? Shouldn’t it be:

“… you are always only ever proving a theory true, by proving its alternatives false.”

or

“… you are always only ever proving a theory true, by trying to prove it false (and failing).”

If you fail to prove a theory’s alternatives false, doesn’t that mean that you HAVEN’T proven your theory true?

Or am I misunderstanding you?

(I presume this is a typo/editing error, and if so, you can delete this comment; it doesn’t add anything to the discussion if that’s the case.)
Reply
- Richard Carrier on May 31, 2018 at 10:39 am
  
  Yes! I had two versions of the sentence and they ended up merged and I didn’t notice. Thank you! Fixing…
  
  (And leaving the comment so people who had read the original will know it had an error that was repaired.)
  Reply
Daniel on May 31, 2018 at 11:40 am

Hi Richard, Congratulations on a very enjoyable article!

A couple of tangential questions:

Armed with the knowledge that the mainstream view has shifted so strongly in the last century, I was curious to see how the public perceives Old Testament historicity. Wikipedia seems in line with the academic literature, but a cursory examination of web hits and YouTube videos shows that most public articles and videos on the topic are apologetic, attempting to refute or ignoring the mainstream view. Has the mainstream view not percolated to the believers? Given how the mainstream view gets foregrounded when it comes to Jesus historicity, it seems like especially motivated reasoning to ignore the mainstream view for Old Testament historicity.

As a mathematician, I wanted to learn more about the specifics of the Bayesian analysis of Wallach. However, that article is not public. (This, presumably, skews the search results I was seeing.) Are there public versions of Wallach’s analysis, beyond your excellent article?

Thanks,
Daniel
Reply
- Richard Carrier on May 31, 2018 at 5:17 pm
  
  No. Sadly. This is a problem with the world of academia, IMO. Academics is for-profit, and no longer sees itself as existing to serve the public good. So they take work from content creators (like Wallach), paying them nothing. Then turn around and sell it for outrageous prices. And ban access to knowledge to all but the wealthy. I’m not a fan of their value system. But it’s what we’re stuck with.
  
  You can bypass this by going to a public library. That dying institution that actually still exists for the public good and not for profit. They can usually get any article for you for free, or a very small fee (a few dollars at most), through interlibrary loan. Often even in PDF form emailed to you. But if not that, then xerox.
  
  If you have a university near you, sometimes their libraries are open to the public (and you can get your own zerox or download a PDF to a flash drive). Also look for seminaries and theological unions. Their libraries are even more often open to the public. And they might carry Synthese (physically or electronically). You can always check ahead (calling their reference desk or finding and checking their online catalog).
  
  On your bigger question. My impression is the same as yours. Ehrman provides the best explanation (in respect to NT studies, but IMO OT is no different) in the first chapter or so of Jesus, Interrupted. No church has any interest in communicating the truth to its congregations, and every interest in preventing their congregations from learning it. Consequently, an entire industry exists to disinform and confuse the public. People need to learn how to vet the trustworthiness of an authority—and it’s not “the authorities that say what I want or defend the things I value,” but “the authorities who, when I fact check them, turn out to be telling the truth even when it’s uncomfortable.”
  
  But that’s a whole other challenge. Getting people to reason.
  Reply
  - Richard Martin on June 5, 2018 at 3:10 pm
    
    If you have access to ResearchGate.net you can request a copy of the article from Wallach. I did so and he sent it next morning, although it’s the pre-print version. Seems to have all the info though.
    Richard
  - Richard Carrier on June 6, 2018 at 11:21 am
    
    I already had that. Thanks. I also have the post print edition now, thanks to a colleague.
Tyler Esch on June 2, 2018 at 11:57 am

In your opinion do you think that other areas of study are lacking a solid mathmatical principal, such as economics, sociology, or pshycology as examples, or do you think that they work well the way they are?
Reply
- Richard Carrier on June 3, 2018 at 6:50 pm
  
  They are all evolving and increasing their mathematization. The examples you give, are full of experts fully comfortable with mathematics and mathematical modeling. So they are nowhere near analogous to where history is as a field.
  Reply
Richard Carrier on April 7, 2019 at 3:41 pm

Update: Another book has explored applications of Bayesian reasoning now in New Testament studies, and its author has written a really handy article defending and explaining the role Bayes’ Theorem should play from now on in all “criteria driven” reasoning about texts in history.

See “What Bayesian Reasoning Can and Can’t Do for Biblical Research” (27 March 2019) by Christoph Heilig.

Great quote therefrom:

Bayes’s theorem is indeed important for biblical studies. It is of course possible to take into account the evidence relevant for prior probability and likelihood without knowing these categories. Indeed, many good historians do so all the time (as I also clearly say in Hidden Criticism, p. 27: “Every good historical enquiry will always pay attention to both factors.”). The problem is that I’ve come to the conclusion that more often than not we (and I certainly include myself here) as biblical scholars are not actually following this example well enough.
Reply