An interesting article has been published under peer review, which tests my concept (developed and argued in Proving History) that all historical reasoning is already in fact Bayesian (historians just don’t know it), by applying it to the analysis of a major challenge to the consensus that won out in Biblical studies, regarding the origins of the Israelites—the mainstream consensus completely flipping, albeit after decades of debate. The model may fit the same paradigm shift on the historicity of Moses (which is directly related to what this article looks at). And so it might fit the future of Jesus mythicism, now that that has been defended under peer review (for the first time, in 2014). So I’ll summarize this article and its significant points.

I’ll eventually discuss here how this relates to everything from superstring theory and steampunk to the resurrection of Jesus and ancient aliens. No, seriously. I get them in there. It makes sense. Trust me. So buckle in and start the slow ride…

Introducing the Research

The article is by Efraim Wallach, titled “Bayesian Representation of a Prolonged Archaeological Debate,” in Synthese 195.1 (January 2018): 401-31. The abstract reads:

This article examines the effect of material evidence upon historiographic hypotheses. Through a series of successive Bayesian conditionalizations, I analyze the extended competition among several hypotheses that offered different accounts of the transition between the Bronze Age and the Iron Age in Palestine and in particular to the “emergence of Israel.” The model reconstructs, with low sensitivity to initial assumptions, the actual outcomes including a complete alteration of the scientific consensus. Several known issues of Bayesian confirmation, including the problem of old evidence, the introduction and confirmation of novel theories and the sensitivity of convergence to uncertain and disputed evidence are discussed in relation to the model’s result and the actual historical process. The most important result is that convergence of probabilities and of scientific opinion is indeed possible when advocates of rival hypotheses hold similar judgment about the factual content of evidence, even if they differ sharply in their historiographic interpretation. This speaks against the contention that understanding of present remains is so irrevocably biased by theoretical and cultural presumptions as to make an objective assessment impossible.

His last point is that progress toward reversing the consensus is possible (albeit slow) as long as the relevant consensus group does not disagree on the bare facts (and only disagrees on their interpretation). For example, how we interpret Paul’s two references to “Brothers of the Lord” is a disagreement of the second kind; almost everyone in that debate agrees at least on the bare facts (that the passages exist, and there are only the two; their formation in Greek; their context; and so on). However, some people in that debate have the wrong facts, for instance insisting Paul uses the phrase “Brothers in the Lord” for cult kinship rather than biological; Paul never used that phrase, anywhere (OHJ, n. 94, pp. 584-85). That would be a disagreement of the first kind, on the bare facts. Likewise people who claim eyewitnesses wrote the Gospels, or that we “have” Aramaic sources from first century Palestine attesting to a historical Jesus (like, incredibly, Bart Ehrman).

Take note of this. Because the scholars who endeavor to lie about the facts (such as what the facts are generally, or what evidence and arguments I make in my book), are thereby establishing themselves as no longer valid members of any credible consensus group. Only agreement on the facts can establish reliable consensus-based knowledge. Interpretations may continue to vary, but that then becomes a function of how much time someone spends actually examining the arguments on either side of different interpretations, and how many peers switch sides, thus creating a bandwagon effect as holdouts start to move once a certain percentage of their peers move before them. Which is also why it’s so important for historicity defenders to try and threaten anyone who changes their opinion on this, with open ridicule, loss of social status, or destroying their careers. They need to stem the tide of a bandwagon effect. As happened for Moses (both the attempts to stem that tide, and their eventual failure).

Wallach also found some experts arguing my thesis that historical methods are really Bayesian, besides myself and Aviezer Tucker. In particular Merrilee Salmon, in “‘Deductive’ versus ‘Inductive’ Archaeology,” American Antiquity (1976): 41, 376-81, further developed in her book Philosophy and Archeology (1982); and Alison Wylie, in “Explaining Confirmation Practice,” Philosophy of Science 55.2 (June 1988): 292-303, influencing her subsequent arguments on archeological methodology in Thinking from Things (2002). I had already noted in Proving History (n. 8, ch. 3) that the use of Bayesian methods was already beginning in that branch of history (archaeology), citing the oft-emplyed textbook by Caitlin Buck, William Cavanagh, and Clifford Litton, Bayesian Approach to Interpreting Archaeological Data (1996); and the test article by W.G. Cavanagh et al., “Empirical Bayesian Methods for Archaeological Survey Data: An Application from the Mesa Verde Region,” in American Antiquity 72.2 (April 2007): 241-72. I also noted some slightly clumsy attempts were being made to apply Bayesian reasoning in manuscript and textual studies (n. 7), e.g. Winsome Munro, “Interpolation in the Epistles: Weighing Probability,” New Testament Studies 36 (1990): 431-43; and James Albertson, “An Application of Mathematical Probability to Manuscript Discoveries,” Journal of Biblical Literature 78 (1959): 133-41.

Wallach on Bayesian Reasoning

Wallach models two generations of the debate over whether, basically, the Bible is right that the Israelites came from Egypt and conquered Palestine, or whether in fact the Israelites were just another native Palestinian tribe of Canaanites who absorbed surrounding tribes and then later invented the myth of their conquering from outside. Some alternative theories to those came and went over the course of the 20th century, which Wallach also includes. The observed phenomenon is that the field started convinced of the Biblical hypothesis, and ended up convinced of the Native hypothesis instead. Wallach asks: Is the process that the experts in that field underwent in that period Bayesian? He finds the answer is yes.

He points out the significance of this to Bayesian modeling of all historical argument, showing that it actually is a correct account, and that objections to it fail—Bayesianism does not require excess formalism or precision, and is not hamstrung by the role of theory-laden reasoning or subjectivity in assessing evidence (two points he cites me as having also made in Proving History). As Wallach says (my emphasis):

I construct the interplay between historiographic hypotheses and archaeological evidence in a simplified Bayesian model of successive conditionalization. Under the model’s fairly moderate assumptions, the actual two-generation process that led to a profound change in the beliefs of a disciplinary community can be reconstructed.

[And]

On the basis of this case study I argue that the relativist opinions are unwarranted. I demonstrate, in particular, that posterior probabilities of historiographic hypotheses can converge even if they start from deep and entrenched discord about prior assumptions.

This refutes the claim that prior assumptions predetermine outcome, and the claim that Bayesian reasoning requires excessive precision (or indeed any precision at all). In actual fact, imprecision can be perfectly well modeled by Bayesian reasoning, thus creating no difference between Bayesian reasoning and any other form of credible historical reasoning. Except one thing: with Bayesian reasoning, the inferences you are relying on are exposed to criticism and check.

Standard reasoning is vague as to how or why conclusions follow from premises, or even what the conclusions are (certainty? likelihood? what likelihood?), shielding them from analysis. You can’t fact-check a historian’s reasoning, if they can’t even tell you what their reasoning is. It’s like trying to guess at the fuel efficiency of a car without getting to pop the hood and see what the engine looks like. All Bayesian reasoning does is force you to pop the hood and show everyone what the engine looks like. You can’t hide any mistakes then. If you are relying on an invalid inference, it will be immediately exposed. If you are erring in your demarcation of evidence (putting facts in that don’t belong, or leaving facts out that do), it will become obvious. Otherwise, Bayesian reasoning literally is standard reasoning: it’s what historians are already always doing. They just don’t know it. They have no idea what the engine looks like or if it’s well built. They just drive the car and assume it works.

Wallach uses his study to illustrate ways to solve several popular Bayesian problems: the problem of novel theories, the problem of old evidence, the problem of disputed evidence, and the problem of surprising evidence or diverse evidence. His paper is of interest even just to anyone interested in any of those problems. By using a real-world example, he gets to why certain solutions to those problems work and others don’t. His paper is thus also interesting even to anyone interested in that event in the historiography of the Bible: how, and why, the consensus so radically changed on where the Israelites came from.

Bayesian Formalism

Section 1 of Wallach’s paper introduces the issues and previews the conclusion. Section 2 is “a minimal introduction to Bayesian formalism” as needed to understand the rest of his paper. Section 3 “delineates both the hypotheses about the Bronze/Iron Age transition in Palestine that were advanced in the 20th century and the relevant archaeological findings” and Section 4 “describes the methodology employed in the model.” Section 5 “comprises three Bayesian simulations, each one a ‘contest’ between pairs of hypotheses, together with sensitivity checks to the model’s assumptions.” Section 6 “discusses the results of these simulations” and Section 7 “concludes with some general remarks.”

His discussion of the formalism is I believe too terrifyingly formalized for pretty much any humanities major to stomach. But it is essentially right. He covers how Bayesian modeling works, and what shortcuts reality tends to require in its use, and what debates occupy even Bayesians about modeling. I don’t think such high formalism is needed in history, but it is needed in papers like Wallach’s, because he needs to demonstrate it works even under such rigorous formalism. You have to build an actual engine, and prove it works, before you can trust mere sketches of the engine. But historians can rely on sketches, now confident the engine is real and functions. The sketch just captures and includes the high formalism behind it. (For what a sketch, or “low formalism,” looks like, see my article What Is Bayes’ Theorem & How Do You Use It?. My book OHJ then provides an extended example. I sketch some others in Proving History.)

Objective or Subjective Bayesianism

Wallach confronts the dispute between Objective and Subjective Bayesians and picks the subjective pose, simply because he will be using a variety of starting assumptions to test how sensitive the conclusion is to variations in starting assumption. And that works the same way no matter whether you are building your probabilities objectively or subjectively. So the distinction hardly matters for what he aims to accomplish and what he concludes. But there is another reason the distinction doesn’t matter: as I show in chapter six of Proving History, all that subjective Bayesians really are saying is that we can only in most cases approximate to objective probabilities; but the more data we get, the more our subjective estimates will approach the objective reality.

For example, Wallach says Objective Bayesians “state that priors should reflect real-world probabilities, pointing out that convergence under a finite series of evidence is not assured if scientists are allowed to assign wildly arbitrary priors.” But of course, Subjective Bayesians do not and cannot honestly “assign wildly arbitrary priors.” That’s a straw man. Of course, everyone is always assuming some sort of prior probabilities, in every argument they make, to any empirical conclusion whatever, no matter how insistent they are that they are not. (See my discussion in If You Learn Nothing Else.) But more importantly, priors are always highly constrained by objective evidence. (See my discussion of “arbitrary priors” in my article Two Bayesian Fallacies.)

It’s just that, in history, the scale of uncertainty is too high for the comfort of, say, physicists who stare in horror at not being able to have vast reams of data to work with. Thus, subjective estimates in history are constrained by objective facts, they just aren’t as constrained by objective facts as probabilities in the hard sciences are—because there just aren’t as many facts to work from. The end result is really just wide margins of error: wider than would be tolerated by a science journal; but narrower than would warrant claiming the result is “wildly arbitrary.” Welcome to the field of history. We’ve been comfortable with this fact for centuries. (See my article History as a Science.)

For example, Wallach laments that it’s “difficult to see how the objective probability of…string theory being true should be determined, let alone that of complex nonquantitative hypotheses” like how Israelites came to exist. It actually isn’t all that hard…as Wallach himself shows by how he decides on priors for those “complex nonquantitative hypotheses” he’s testing. He allows wide margins of error, and constrains his choices by appeal to objective facts—beyond which probabilities outside his selected range would be obviously unbelievable, rather like assigning a “33% probability” to a body that went missing having gone missing by being reanimated from the dead. Were that true, then a third of all bodies that have ever in history gone missing, were reanimated from the dead. It’s objectively obvious why that’s impossible. It’s quite clear how objective facts, do constrain what probability assignments we can deem within the realm of credibility. Even when those assignments are subjective.

A subjective assignment of probability would be, yes, a measure of how likely things seem or feel to us, in lieu of drawing that probability from an objective data set (e.g. a complete list of all missing bodies in the whole of human history and how they went missing; which data is not available, at least not in such precise terms). But what we are doing when we estimate in that way is trying to guess what that objective probability most likely would be if we had access to the requisite data. We are thus, in fact, actually just trying to figure out the objective probability. We just don’t have good data to do that with. And to account for that, we allow very wide margins of error. But not so wide as to be absurd (like “the prior probability a missing body was reanimated from the dead is 0-to-30%”). Reality does constrain us (a third of all missing bodies, clearly were not reanimated; we’d have noticed by now).

So if someone were to ask (as Sheldon Cooper is actually doing in the screen capture I lead my What Is Bayes’ Theorem article with) what the prior probability is that some form of string theory is true, we actually do have some objective facts to estimate that from, they just are highly uncertain. If you were to poll the opinions of qualified physicists (those who actually study string theory or its competitors, or work in fundamental theory enough to have well informed opinions on the debate), and crowdsource a probability estimate therefrom (by averaging the answers), you’d end up with a prior probability that’s probably as close to actual as any other facts would deem. This wouldn’t be arbitrary, much less “wildly.” It would reflect the combined knowledge of thousands of relevant experts, and an emergent measure of how well string theory performs and is likely to perform in explaining reality. So, too, if one could have done that for the theories Wallach is testing. He can’t though, since nearly everyone he’d need to poll is now dead; and they would have polled differently even at different times of their lives. What he does instead is “get a sense” from the available data how such polls most likely would have gone, and to account for uncertainty, he builds in a wide but plausible margin of error.

That’s just Objective Bayesianism with less data. And IMO, that’s all Subjective Bayesianism ever is. (When honest; crank Bayesianism is also a thing; no different than crank logic or crank science or crank statistics or anything else pretending to be real but really just a scam.)

On Competing Models in Israelite Studies

Wallach models the fates and confidence in four hypotheses: the Conquest Hypothesis (anything at least loosely matching the Bible), the Immigration Hypothesis (matching the Bible sans war), the Revolt Hypothesis (matching the Bible sans immigration), and the Autochthonic Hypothesis (the Israelites were always there, and just culturally evolved over time). The last of these is now the mainstream consensus. The first of these was the mainstream consensus a century ago. The transition in consensus went from the first, through diverse attempts at the second and third, to the last, in the end a complete reversal of view.

The Autochthonic Hypothesis Wallach correctly outlines as follows:

Indigenous inhabitants of Late-Bronze Canaan were the major origin of the [proto-]Israelite population. The demographic source of this population were local agrarian residents (Dever 1998, 2003) or “internal nomads” that existed in Palestine for hundreds of years as a part of a dimorphic society (Finkelstein 1995, 1998a) and not outsiders who penetrated the land either by military conquest or by peaceful migration. They were driven to settle in the hill country of Palestine by the instability of the Late Bronze Age that followed the weakening and eventual withdrawal of Egyptian rule during the 12th century B.C.E. Rather than a revolt against the established order this process was a reaction to its disappearance. The coalescence of this diverse population was prolonged and gradual, and a national identity with more-or-less shared narratives did not materialize until later in the Iron Age.

He thus has a defined set of hypotheses properly demarcated, produces a reasonable range of estimates for their priors, and isolates 24 items of evidence “that span the period between the late 1920s and the late 1980s and had pronounced influence upon the debate among” the four competing hypotheses, and tests out their effects with various Bayesian models to show how their actual impact tracks Bayesian reasoning (though the historians and archaeologists exhibiting this behavior of course were not aware of this).

Wallach notably points out the Bible itself had no real effect. As the historiography of this debate shows it could be interpreted in any way that would support every hypothesis. Certainly there were “experts” who wanted the Bible to be literally true. But there were also experts who wanted to genuinely test what the Bible says against material evidence, and when the two conflicted, the obvious solution was to reinterpret the Bible (as either lying, mythologizing, fictionalizing, fossilizing error, or written nonliterally, among various other ways to make the text fit the facts). This was essentially due to a singular defect of the Bible: nothing in it was written by anyone actually alive when anything it says happened (in the relevant period of time). In fact, all relevant content was composed centuries afterward. By authors with no known skills in critical historiography, no known sources, and an obvious propagandist motivation. That’s literally the worst source to have in the whole field of history.

On the Evidence & Its Effects

“As to archaeological evidence,” Wallach says, “it is by nature fragmentary and its interpretation is theory-laden no less than in any other scientific discipline.” But, he finds, the evidence in this case confirms the assertion of Alison Wylie that “the strategies archaeologists developed for exploiting a range of background knowledge can be very effective in establishing a network of evidential constrains.” Key to this was agreement on what the evidence was, even if disagreement persisted as to which hypothesis it supported.

As Wallach puts it:

[M]ost of the evidential results…were undisputed, not only in their “low-level” content—that such and such material remains were found at a certain site—but also in their “middle-level” aspect—“what was here when”: [e.g.] That a particular site was destroyed at a certain time, that another one was uninhabited throughout a specific period, etc.

There were some exceptions, which he analyzes. Not just in that some of the evidence changed (e.g. being redated etc.), which had an impact on theory choice. But also in some disagreeing over the facts themselves (e.g. what the actual date of a particular find was). He shows the effect of this on the various models. He constructs “three diachronic, contrastive simulations that attempt to reconstruct in Bayesian terms the scholarly debate about what happened in the Bronze/Iron Age transition in Palestine.” These “simulations compare pairwise the Conquest hypothesis to the Immigration hypothesis, then the Immigration hypothesis to the Revolt hypothesis and again the Immigration hypothesis to the Autochthonic hypothesis.”

Moreover, “each simulation begins with the assumption that the degree of belief in the ‘incumbent’ (the current dominant) hypothesis is one hundred times stronger than that in the ‘usurper’ one, an assumption the sensitivity to which is subsequently examined.” In other words, he assumes the consensus position (the Bible Is Right) started with a prior probability of effectively 100 to 1, and shows how the evidence overwhelmed even that extreme level of confidence and reversed the consensus altogether. This is not surprising, as material evidence is very powerful in any probability matrix. Imagine if, for example, we uncovered a Christian burial conclusively dated to the 50s A.D. containing a lead tablet pronouncing the beliefs of the deceased, and it declared the archangel Jesus to have been slain in the sky by Satan. The impact of this on historicity would be catastrophic.

Alas, unlike how the Israelites came to exist, a question for which a huge array of material evidence was left for us to exmaine, the actual truth of Christianity in its first generation is almost completely lost to us (even the letters of Paul are highly compromised, and already decades late). It’s therefore unlikely to follow such a model. Nor can it claim such high priors. Because even historicity lacks any comparable material evidence in its support. The consensus on his historicity is actually demonstrably malformed: it is not based on any principled or disciplined survey of the evidence (see Chapters 1 and 5 of Proving History and Chapters 2 and 7 of OHJ). And there has not been, in fact, any peer reviewed defense of historicity published in almost a hundred years.

This is why the historicity of Jesus hovers around probabilities of high uncertainty. My analysis in OHJ ends up with a best chance Jesus existed of 1 in 3, for example. Which is still a respectably high probability for historicity. Even a single weak item of evidence (something, say, six times more likely on historicity than myth) could reverse it. Because strong evidence means a factor of 10 or 100 or 1000 times more likely, or even a million. Consider the kind of evidence we have for other historical figures, from Hannibal to Julius Caesar, even Socrates, for comparison—my article on Spartacus directs you to all of those and more. We just don’t have any kind of evidence like that for Jesus (pro or con).

Wallach maxes his scale out at 10 to 1 for “strong” evidence, and then uses 3 to 1 for “substantial evidence,” and 1.5 to 1 for weak evidence (what he calls “anecdotal” evidence). That’s an extremely conservative scale, so all his conclusions follow, IMO, a fortiori (see that term in the index of Proving History). Most of the evidence in his examined case, he concludes, doesn’t rate as “strong”; but some does. It’s the cumulative effect of it all that became convincing and reversed the consensus.

Notably, Wallach found something interesting in the literature: “I try to take cues from the way the scholars themselves reacted to the data,” Wallach says, “For example, when a discovery was followed by a flurry of auxiliary (often rather ad hoc) assumptions to reconcile it with one’s favorite hypothesis, I assume it to be strong evidence against that hypothesis,” which is reasonable; whereas he found that scholars promoting such evidence would highly over-rate it:

It is worth noting that researchers’ own appreciations of the evidential strength of their findings were often much stronger. Only rarely would a scholar say that his results “increase (diminish) the plausibility of hypothesis X by some degree.” Expressions like “there can be no doubt that this destruction was the deed of the Israelite tribes” (Yadin 1965) or “there is not the slightest doubt that we are now witnessing the beginning of the settlement of the Israelite tribes in the Negev” (Aharoni 1976) (italics mine) are much more frequent. But as we shall see, such strong claims were not always corroborated by later discoveries.

This leads to a very important lesson for historians: we must check ourselves against excessive certainty. I do this in OHJ by setting as extreme a boundary against my own conclusions as is reasonably possible, thus producing the a fortiori result of 1 in 3 for Jesus, in contrast to my own unchecked judgment of 1 in 12,000. I believe only the result of 1 in 3 is confidently defensible to critics. The lower bound of 1 in 12,000 is plausible, but not confidently knowable. In effect, I recognize the margin of error is large. The true probability is somewhere in between 1 in 12,000 and 1 in 3 and we cannot know where in between; which means even I cannot claim to know the probability is less than 1 in 3, only that it is not reasonable to say it’s higher than 1 in 3. Historicists need to adopt the same humility in the face of the extremely ambiguous and problematic evidence we are stuck with for Jesus.

Wallach also runs sensitivity checks against even his own models, seeing what happens when you move the priors or the valuations of the evidence, by even a factor of ten. He thus discovers what it would take to have gotten a different result, which is revealed to require assumptions and estimates so patently absurd that no one committed to any kind of objectivity could possibly agree with them. Thus demonstrating that even subjective estimates of probability are constrained by reality, and thus not in fact “wildly arbitrary.”

The Problems of Old Evidence & New Theories

Wallach finds:

Using fairly moderate assumptions, a Bayesian model has been constructed for a two-generation process that led to a significant change in the distribution of beliefs within a scientific community. In the model as well as in the actual case, large differences in prior assumed probabilities were “washed out” by the flow of evidence.

That Bayes’ Theorem models what happened, also means Bayes’ Theorem could have been used to monitor, understand, perfect, and check the process itself. Archaeologists are now increasingly agreeing with that conclusion (hence textbooks started espousing Bayesian reasoning in archaeology in the late 90s). Historians need to get on board. They will have to overcome their abject terror at math and formal logic first. But every profession that claims to generate accurate knowledge needs to do that. They cannot claim it is not their responsibility to understand the logic of their own arguments. And the logic of their own arguments is mathematical. Because all their arguments are over a probability. And probability is mathematics.

Wallach finds this case also provides examples of how the problem of old evidence is “resolved by considering the learning of the relevance of the ‘old’ information” as itself new information. It then gets incorporated into the model, updating the priors like any evidence should. I think Wallach gets a little more confused in trying to articulate how his case resolves the problem of novel theories. He’s right in his conclusion that it has something to do with allowing sub-theories to emerge from broader covering theories. But more simply, it’s just this: all theories are included in the competing hypothesis set. Including the ones we haven’t thought of yet. He mentions this earlier in the paper (if H is the Conquest Hypothesis, then ~H logically necessarily must contain every other theory possible, and not simply “The Immigration Hypothesis” proposed at the time). But Wallach seems to miss the significance of this point.

The Autochthonic Hypothesis was already sitting inside the “Immigration-or-Other” Hypothesis that was at the start of this process the ~H, the “what would be true” if H were false. Which must always be stated, or else H becomes semantically meaningless. In other words, H can only be intelligible as a hypothesis, if you can articulate what it would mean for it to be false. Which is why all correct empirical reasoning must be counter-factual: you are always only ever proving a theory true, by trying to prove its alternatives false. There is in fact no other way to prove a theory true.

But of course, there are lots of things that can be true if H is false, including things we never thought of before. For example, “the Jews were genetically engineered replicants planted in Palestine by extraterrestrials” is logically possible, and thus in fact included in ~H, even at the beginning of the 20th century. Despite, obviously, no one then ever even imagining such a thing, much less knowing it was possible. But that means it just had a vanishingly small prior—and that has of course remained the case. But if we were invaded by those aliens today and they showed us all their meticulous records of their Bronze Age mission in Palestine, the evidence then would overwhelm even that vanishingly small prior and it would become the consensus conclusion of what happened.

That new theory was always in ~H, and thus always a part of even Wallach’s model. It wouldn’t have been “added” from “outside,” as if in violation of Bayesian formalism as he incorrectly claims. Bayesian formalism logically requires Wallach to have included that (and every other possible theory, known or unknown), in ~H, otherwise his model is literally invalid. It’s just that it’s prior was so low, and the evidence so non-existent, that he could safely ignore it—along with every other logical possibility too improbable to trouble himself with that I’m sure he never thought of, like “the Jews were a spontaneously generated mole people who ascended into Palestine from the core of the earth using steam-powered drill machines.”

All these theories are included in ~H. Always and forever. Even when we don’t know it. But our not knowing it, is why their priors are so small they won’t even show up in the math—at the resolution used by Wallach. For example, Wallach never rounds any result to the nearest trillionth of a percent, so any theory residing in that zone of probability or below simply won’t show up in his model. And that being the case is no mystery. There really isn’t any problem to solve here.

The Power of Diverse Evidence

I’ve described the power of diverse evidence in Proving History (pp. 98-100). Wallach also says his case study argues that what matters is not so much the “diversity” of evidence as “its independence from the hypothesis being probed and from other evidential claims” and “the consilience of several independent lines of evidence.” I would argue the latter is precisely what is meant by diversity of evidence, but that Wallach is right about the methodological point. When independent lines of evidence converge on a common conclusion, that is more powerful evidence than any one line of evidence alone, and diversity of evidence is just one way this manifests in practice. The effect is simply the accumulation of evidence. Their independence simply makes each item of evidence measure more strongly. As we’d already expect. (See my discussion of how we account for dependent probabilities in A Bayesian Brief.)

For example, in the case of any Caesar’s existence, we have diverse, which really just means highly independent, evidence: at the very least, inscriptions, coins, busts, contemporary manuscripts, preserved eyewitness texts, and subsequent critical texts conveying access to eyewitnesses and documents now lost, and broader courses of historical events that Caesar’s actions are necessary to explain. If all we had were one of those, not the others—for instance, if we had tons and tons of inscriptions, but nothing else—as long as the missing evidence is not surprising (i.e. we fully expect it to be missing, for reasons unrelated to whether that Caesar existed), the conclusion would likely still be the same, or near enough. So diversity of evidence does not by itself increase the probability of a hypothesis. It’s simply our observation of what happens when a lot of independent evidence piles up. And that often happens with diversity, because being diverse, the evidence is often identifiably more independent. For example, inscriptions are unlikely to cause eyewitness texts or vice versa, at least less likely than that all the inscriptions might be causally related. For instance, if some singular Johnny Appleseed erected hundreds of inscriptions across the Empire praising a made-up person. But we already know that becomes increasingly unlikely, the more and diverse the inscriptions are.

This was reflected in the Wallach study in how more and more settlement excavations simply didn’t turn up evidence matching the Conquest hypothesis (some did, but not enough, and too incongruously), but instead turned up evidence matching the Autochthonic Hypothesis, which for a time was confused with the Immigration and Revolt Hypotheses, which were two different ways to try and have it “both ways” (to try and get the Bible to be at least a little bit right, while still explaining the conflicting evidence from excavations). But as more and more settlement excavations (and other evidence) piled up, even those solutions came to be ruled out as unlikely, and the only credible hypothesis left was the Autochthonic. Here, it was mostly similar evidence (one excavation after another), but its shear quantity became increasingly overwhelming. There wasn’t any plausible Johnny Appleseed to explain that away. Other than the obvious: the evidence is all the same, because the Israelites were always there.

The Problem of Changing Background Knowledge

Wallach makes another interesting observation of use to historians and Bayesians:

Background knowledge is often assumed to be static so that the term indicating it is sometimes omitted under the assumption that it is “built into” the probability distribution. … However, background knowledge is bound to change. Most case studies of Bayesian confirmation involve short periods of conditionalization, so a change in background knowledge may be considered negligible. A protracted conditionalization process complicates the matter, because earlier evidence becomes part of the knowledge base and can influence subjective likelihoods.

This is an important point. History differs from most of science in this respect (though it happens in science from time to time too). The debates and shifts in conclusion can take decades or even lifetimes, and over such spans, all manner of background knowledge can change. Think of how much changed when the Dead Sea Scrolls were discovered…and even then, it took decades for what was even discovered there, to (a) become accessible, (b) to be analyzed by experts and written about, and (c) for this information to become widely known in the field. And that’s just one thing that changed our background knowledge considerably, practically rewriting everything we thought about first century Judaism.

History is also vastly more complicated a field than any science. And I don’t mean in the sense of being harder, but in the sense that, the data you need to know, is far too vast for any one person even to know. When you research something in history, you often have to go and “upload” a ton of background knowledge, stuff that exists “in the literature,” but that in fact probably few even know about (and in many cases, that no one knows about, everyone having collected and written it up having died). The average historian of Christianity has not read “the complete works of Philo,” for example, much less every journal article ever published in the last hundred years analyzing its contents. Yet all that knowledge exists, as the background knowledge of the field. And it will certainly influence probability judgments.

Thus, in practical reality, historians are constantly updating their background knowledge. It can change considerably even in the span of a single day—as you “read up” on relevant material you may have known existed but never had time to absorb, or even material you didn’t know existed until you needed to know if it did, and looked.

And not recognizing this can lead to folly. Look at how Larry Hurtado made huge mistakes, all by arrogantly not actually checking and uploading the correct background knowledge, thus skewing all his probability judgments. And look at the mistakes made by James MacGrath and Bart Ehrman, all by simply assuming their background knowledge was complete, rather than having learned (as any real historian would have been taught in grad school) that they need to check before asserting things they haven’t checked before—like that only governments erected inscriptions in antiquity (which is false); or that angels were never called men (which is false); or that we don’t have any birth records or death certificates from antiquity (which is false); or that prefects were often in fact hired as procurators and Tacitus had a particular bee in his bonnet about that, which radically changes how we understand the passage on Pilate in Tacitus (though that’s super obscure background knowledge, so it’s easy not to know that).

So we do need to be more flexible in how we allow changes in background knowledge to update our probability judgments. In history, it’s just easier to rebuild the model, with the new background knowledge now in place and re-informing all our probability judgments. Though formally, you could do it by inputting the new knowledge as “new evidence” (being newly “discovered” by the individual historian), item by item, and updating all your priors with how each new item of background knowledge changes everything. But that would be extraordinarily and unnecessarily complicated, and being overly complicated (especially for a humanities major), easy to screw up. The other method is easier. Not only easier to do, but easier to check for error.

Conclusion

Wallach’s paper is an interesting demonstration of how to apply Bayesian reasoning to historical problems (and a famous one in Biblical studies at that), and of what we can learn (from modeling existing histories of change in historical opinion) about how a more overt Bayesian process of reasoning actually should operate in practice, solving a number of commonly cited problems. It’s a good read. With a handy bibliography. I recommend it to anyone interested in studying any of these questions.

§

To comment use the Add Comment field at bottom, or click the Reply box next to (or the nearest one above) any comment. See Comments & Moderation Policy for standards and expectations.

Share this:

Discover more from Richard Carrier Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading