The Principle of Indifference is important for Bayesian reasoning, and hence for Bayesian epistemology—and hence for epistemology, full stop. Yet it has many critics. The common mistake they all make, is similar to the mistake all philosophers make when they Hose Thought Experiments: they screw up what the thing they are talking about means, and thus draw invalid inferences and mistake them as valid. Generally, they articulate some consequence of the Principle of Indifference (hereafter, the PoI) that actually entails the principle wouldn’t apply, not that it is invalid; and then incorrectly conclude the principle is invalid. Or else they reach this conclusion only after incorrectly applying the principle, for instance by not logically demarcating the possibility space (see, for example, Maher’s summary of Keynes’ objections to the PoI). I’m going to correct many of these mistakes here, so you will have an article to cite or consult whenever you run into critics of the PoI, or need to hone your own grasp of it.

What Is the Principle of Indifference?

In lay terms, the PoI means this: when you have no information making any logical possibility more or less likely than any other, they are, so far as you know, all equally likely. For example, when you say you don’t have any idea whether a claim’s truth is more likely than 50% or less likely than 50%, you therefore mean it has a probability of 50%. So far as you know, that is. In other words, this is a principle of epistemic probability, not “objective” probability. Because it is a statement about what you know, not about what’s the case apart from what you know. For any claim h, if all you know is that h is as likely true as false, then what you know is, by definition, that h is as likely true as false. And that translates into mathematical terms as “the probability of h is 50%.” That’s simply a description of your state of knowledge at that time.

It’s important to recognize that this means the epistemic probability of h. As in, the probability so far as you know. New information could reveal h has (and maybe has had all along) a different probability—in other words, a different frequency of being the case; at which point you would update your epistemic probability for h. This also means h can have different epistemic probabilities at the same time for different people. For example, if I know nothing about h but you know some pertinent things about h, your epistemic probability for h will be closer to the “true” (or “objective”) probability of h (what I call in Proving History the physical probability: check the index for “epistemic probability”). Conversely, if you are unknowingly relying on a bunch of misinformation about h, you will again be assigning it an epistemic probability different from mine, but this time probably farther from its actual probability. Unbeknownst to you.

In either case, we are assigning a probability based on what we know—or more accurately, on the information we have, since some of it could be misleading or false, but that’s just another category of knowledge: “I know I have information b” remains true even when b is false. Either way, the resulting probability really only measures the probability we are right or wrong to conclude what we are about h, and not the actual probability of h. Although the more (and more accurate) information we have about h, the closer those two probabilities will be. Which is the point of any epistemology: to find by what method we can get our epistemic probabilities as close to the corresponding actual probabilities as possible, given the information available. And as it happens, epistemologies embracing PoI do appear to perform better at this; which is the chief reason to adopt it. As Churchill quipped, “Democracy is the worst form of government; except for all the others.” One can say the same of the epistemologies reliant on the PoI.

The PoI translates into frequency terminology as well. For any event or pattern, if you have no information indicating it is more frequent than its converse and no information indicating it is less frequent than its converse, then so far as you know, its frequency is the same as its converse. It happens “as often as not.” Only information indicating otherwise can warrant believing otherwise. And “information” in this sense, crucially, includes logical as well as empirical information. Once you realize or discover some logically necessary fact about a hypothesis, if that fact changes its likelihood, then the PoI no longer applies, or applies only in accord with that newly discovered logic. Likewise any empirical fact. Because then it is no longer the case that “so far as you know” the hypothesis is as likely true as not. And if that’s not the case, then the PoI no longer governs the epistemic probability, or does so only within constraints.

Wikipedia is correct to note that E.T. Jaynes gave the PoI its firmest logical foundations, by demonstrating the basis for both the principle of transformation groups, commonly known as the Jeffreys prior, and the principle of maximum entropy (whereby “the distribution with the maximum entropy is the one that makes the fewest assumptions about the true distribution of data”) and deriving the PoI from each. Maximum entropy is, statistically, always the most likely state; so you should start there, as then you are the least likely to be wrong, and adjust only as data indicate elsewise. It just so happens that when you define that most likely state (the one with “the fewest assumptions”) for any system, you get the result exactly matching the PoI (which shares an important affinity with Ockham’s Razor). Likewise, the underlying logic of the Jeffreys prior is that after you define every possible state of a system, absent any information rendering any one state more likely than any other, certain results are then entailed, which happen to correspond to…you guessed it, the PoI.

So how does anyone challenge this? Several ways, each of which simply gets wrong how to correctly apply the PoI, or when it even can apply.

The Partition Problem

At Statistics How To there is a useful summary of the PoI and why it holds as well as some of the most common criticisms of it. The most important is called the Partition Problem: in any given case “the set of all possibilities can be partitioned in any number of ways, and if probabilities are assigned via the principle of indifference, we will get different answers each time.”

For instance, if you assume the absence of all pertinent information and then demarcate h and ~h as “God exists or God does not exist” you might assume the PoI entails P(h), in this case P(God exists), is 50%. But on another day you might demarcate h and ~h as “the Christian God exists or the Christian God does not exist” and assume the PoI entails the same conclusion. But it cannot be the case that the probability that just any god exists is exactly the same as that the specifically Christian God exists. Since there is a nonzero probability that a non-Christian God exists instead, the probability “that God exists” must necessarily be higher than the probability “that the Christian God exists,” as the former includes the latter—and all other gods. The “and” here is additive: all their probabilities add together. But you can’t add any positive number to x and still have x; you will always have some number larger than x.

This problem arises not only there. Because “God exists” contains a large number of assumptions—each of which must demarcate the probability space. For example, “God exists” entails “the supernatural exists,” but that the supernatural exists cannot be as likely as “the supernatural does not exist” and at the same time “God exists” be as likely as “God does not exist.” Because it is logically possible the supernatural exists and God does not. So P(supernatural) must be higher than P(God). It can’t be the same. And here the PoI should give us a different result: if we assume no information exists to render any of these possibilities more likely than its converse, then so far as we know P(supernatural) must be 50% and so far as we know P(God|supernatural) must be 50%. And that actually gets us the conclusion that P(God|supernatural) is 25%, not 50% (as 50% x 50% = 25%).

And that’s generally the solution to the Partition Problem: to properly account for dependent probability in any hierarchy of assumptions. Since P(God|~supernatural), i.e. the probability that God exists if the supernatural doesn’t, is zero (unless we change what we mean by “God,” but then we would be talking about a different thing—more on that shortly), then necessarily the PoI first operates on the demarcation between the supernatural and the natural, and then operates on God only inside the domain of the supernatural. In other words, we are talking about P(God|supernatural) and not just P(God). Discovering this fact—which is a logical fact, about the definition of God (and thus what we are “actually” seeking the probability of)—is information that changes the probability.

So in a broad sense, the PoI doesn’t apply: because you know it is not “as likely as not” that God exists, but rather that it’s “as likely as not” that the supernatural exists, and then “as likely as not that God exists if the supernatural exists” (again, assuming we have no other information bearing on either question). But the PoI still narrowly applies: within the probability space where we have no distinguishing information. Within the set of all logically possible worlds in which “the supernatural exists,” we have no information indicating that “God exists” is any more likely than not. But that entails P(God) = 0.25, not 0.50, as just noted. So once we do the math correctly, the Partition Problem solves itself.

Of course, this all assumes “God” is so primitively defined as to create no more logical partitions of the probability space. And that’s actually, in practice, not the case. When people say “God,” in the sense of the supernatural entity, they do not usually mean just any god with just any attributes. For example, they are not usually including a feeble, dumb, morally indifferent god as a possibility. Indeed, usually, they are excluding all finite “gods” and mean only “supremely wise and powerful and good” gods, which is already three partitions (wise vs. not wise, powerful vs. not powerful, good vs. not good) and none of them are subject to the PoI in this case. Because we are not demarcating the space between, for example, “more wise than not wise” and “less wise than wise” but between “supremely wise” and any degree of wisdom less than that—even gods who are superhumanly wise, as in far wiser than any human who ever existed, but still not “as wise as it is possible to be” (hence “supremely” wise).

If we know nothing that makes any degree of wisdom any more likely than any other, then the probability that a God would be “supremely wise” is in fact supremely low—because there are vastly many more degrees of wisdom a God can have, and if they all have the same probability, the sum of the probabilities of all other possible degrees but “supreme” will be vastly larger than the singular share of probability held by “supreme” wisdom. Again, this is all assuming we have “no information” changing any of these probabilities. So I’m just talking about what the epistemic probability would be if we had no evidence to refer to, nor comprehended any logical facts, that would change it. But even if we could somehow infer that “if there is a God, they must be super-humanly wise,” that still does not get us to “they must be supremely wise.” Indeed, given how badly designed and run everything is, there is rather evidence against that proposition. So the PoI does not apply here. It simply can’t get us to a 50% chance God is supremely wise, even with “no” information apart from the logically necessary.

And it’s only the worse from there, once we start stacking other attributes, like “supremely powerful,” “supremely good,” “needs blood atonement magic to fix the universe,” “simultaneously has and is his own divine son,” “had to briefly become incarnate as a mortal member of the species Homo sapiens in order to effect his plans,” etc.. Indeed, “had to,” “briefly,” “become incarnate,” “as a mortal,” “member of a specific species” are each their own demarcations of the available probability space. We have no information that entails it’s any more likely that God had to “briefly” be incarnate than that he could have been incarnate for, say, thousands of years, or that he had to be incarnate for more than a single terrestrial microsecond of time—or not at all. And so on down the line. This presents an enormous logical problem for Christianity, a point John Loftus lays out in “Christianity Is Wildly Improbable” (Chapter 3 of The End of Christianity).

And it’s not just Christianity that suffers this problem. God in any pertinent definition is hopelessly improbable. The PoI simply can’t get you to a 50% chance God exists, even prior to all empirical information pertaining to the matter. This is why in my chapter on the “Design” argument in The End of Christianity I did not define “God” by any superlative Santa’s bag of luckily convenient attributes, but as simply any universe-creating entity, no matter how stupid or messed up they may be, and whether supernatural or not. But even there I let slide the problem that being a self-existent intelligence does not entail being able to create whole universes, much less of remarkable complexity. “There is or is not a self-existent being” may be by the PoI 50/50, and “That being may or may not be an intelligence” may be by the PoI 50/50, for a combined logical (or “uninformed”) prior probability of 25% (0.50 x 0.50 = 0.25). And that is what I proceeded to use in that chapter. But I was being exceptionally generous to theism. Because “being an intelligence” does not entail even having any ability to create things, much less complex universes. “Being able or unable to create things” would be by the PoI another 50/50, which, honestly, would drop that 25% down to a mere 12.5%. But “Being able or unable to create complex universes” would have to be far less likely than that. The logical distribution of the possibility space simply does not make that a 50/50 proposition. Most logically possible “self-existent creating intelligences” will fall far short of being that capable.

So to apply the PoI properly you have to make sure you are partitioning the probability space in a logically coherent fashion, which means in such a way as to account for dependent probabilities: that God exists is dependent on the supernatural existing; that God can pre-exist all universes is dependent on there being self-existent beings (since it is logically possible that gods can be caused by rather than causes of a universe); that a self-existent being could and would creatively design anything at all (much less universes) is dependent on that self-existent being being an intelligence and having creative powers and having that specific creative desire; and so on. Likewise, a partition must obey all the laws of probability (such as the Law of Total Probability). This can limit the PoI, in some cases even render it impossible to implement. The solution is to dial back your ambitions (e.g. stop trying to argue that your God must be “supremely” anything).

Creating Paradoxes

Consider for example that “God” (upper case) usually means a supernatural entity. But suppose we were to define “god” (lower case) as including non-supernatural entities, such as naturally occurring extraterrestrials, or super-advanced artificial intelligences they once created, maybe even in some other universe, such that our universe was entirely the creative project of either; and we’ll throw in all non-creating ontological equivalents of superheroes, too, like, say, the X-Men or Wonder Woman. Gods all.

If we then demarcated the probability space 50/50 between God and ~God and then demarcated the probability space 50/50 between god and ~god we will have constructed a contradiction: these cannot both be 50/50, as God is a sub-category of god, so P(God) must be less than P(god), not the same. But do note, that by attending to dependent suppositions we already accounted for that earlier: P(God) is actually P(God|supernatural) whereas P(god) is P(God|supernatural or ~supernatural). So P(God) is, as found before, no more than 25%, which is indeed less than the 50% we would be assigning to P(god).

But what if we then demarcated a third kind of god, let’s call it the “paragod” category, wherein we include ordinary humans merely worshiped as gods. We can’t have P(paragod) be 50%, as god is a sub-category of paragod, so P(god) must be less than P(paragod). Here we have a new demarcation problem: how do we apportion the probability space between the god and paragod categories? And note that we can keep doing this indefinitely. We can keep defining new “god” categories to include “gods who only exist fictionally” and then another definition that includes “pets” and another that extends the definition to include even colorful rocks, and so on. How can the PoI adjudicate between all these different definitions of god?

This problem arises for literally anything. We can define “Jesus” several different ways; yet we can’t demarcate the probability space by 50/50 for every single definition, as that would create those same contradictions. Pick any hypothesis, the problem arises: merely redefining the hypothesis to be more (or less) inclusive than before, and you are faced with the question why the one should be demarcated equally and not the other. For instance, consider the god and paragod and hyperparagod categories (the latter being the one that includes fictions, pets, and colorful rocks): which of these is subject to the PoI, and how? Of course we resolve this question in practice by appeal to existing information (“background knowledge”). We know fictions, pets, and colorful rocks exist (which fills out the “hyperparagod” category), and we even know ordinary yet deified humans exist (which fills out the “paragods” category), so we know P(hyperparagod) = 100% +/- s (where s is the trivially small probability we are wrong about that). The PoI therefore does not apply here. We have data.

But the question being asked here is: what if we didn’t know anything about all that? What if we had no information as to whether there were fictional gods, pets, colorful rocks, or mundanely deified humans? This would correspond to a case where we have a jar of marbles and no idea how many different colors of marble are in it. How should we assign a prior probability that any marble we draw from that jar will be, say, blue? Even if we demarcate the color space into the four primaries (red, yellow, blue, and green), such that any color we find we expect will fall within one of those four categories, we could use the PoI to get a 25% chance that a marble drawn will be blue. But then what happens when we learn about magenta, a color that actually does not fall into any of those four categories? That color is an equipoise of red and blue that has no corresponding wavelength, at which our brain generates a completely new, fifth color rather than a merely blended color (like orange, which is “red and yellow”). And there could logically be yet other colors that similarly can’t be fitted into that fourfold scheme.

Maybe, you might think, the magenta conundrum can be solved by classifying it under red, or under blue, or re-scheming the system into five rather than four, with “every other color” occupying that fifth category. But as that decision would be arbitrary, and each choice produces a different, and hence contradictory, demarcation of the probability space, the PoI would appear to be incapable of application here. And indeed that may be the case: we simply can’t resolve this situation with the PoI. But to be sure of that, we need to be sure we are not disregarding information, or sneaking information in and incoherently acting like we haven’t done that.

For example, in a sense the fivefold scheme is more logically coherent than the fourfold, as the fourfold straightforwardly violates the Law of Excluded Middle: it assumes without basis that there are only four colors, whereas it is logically necessarily the case that there could be other colors. In fact, the PoI by definition requires us to account for that: we cannot “assume we know” how many color categories there are, as that would be assuming information, but if we have such information the PoI wouldn’t apply. And as it so happens, we have only narrowed the color options in response to acquiring information about the human visual system. So we cannot abandon a catch-all category; and we’d have to assign it an equal likelihood, too, until we have information about the relative improbability (the relative infrequency) of something falling into that category. Moreover, starting out with the fourfold categorization was already assuming the existence of information, in particular our database of personal human experience with both chromatic phenomenology and the biology of our eyes and brain. Really the PoI can only apply outside the application of such information, or after our priors have already been impacted by that information.

The Color Robot

Think of the color problem like this: if we were to build a robot that knew nothing about what colors there were, or their distribution, but knew that there were colors, and assume it could only learn about colors by interviewing people who see them (e.g. showing isolated persons objects and logging what colors they identify—and how consistently, thus establishing there is something they must really be seeing). Its most logical starting point would be to adopt the least informative guess that nevertheless satisfied the law of excluded middle, namely: “there are either no colors or there are one or more colors,” and assign each hypothesis a 50% probability in accordance with the PoI, and then as it acquired information from exploring the probability space, update that probability.

Once this robot started accumulating reports about colors, the “there are no colors” hypothesis will rapidly drop to a vanishingly small probability (never zero, but close enough to act like it; if the robot could arrange to experience colors itself, then it could reach a zero probability in certain circumstances, but that distinction won’t matter for the present point). After a few reports, the robot would also already know it should sub-divide the “there are one or more colors” probability space into, at first, “there are two or more colors,” and then “there are three or more colors,” and so on, as it starts accumulating data about the number of colors.

Eventually this robot will have a large enough database to indicate that the hypothesis “there are mostly four colors, and on rare occasions some other colors” is the most probable. Somewhere in that process it may have needed the PoI to demarcate possibilities, but only when it genuinely lacked information with which to do so. Yet it might never have. Upon the first report of a second color, it would demarcate the sub-space of “there are one or more colors” into “there are one or more colors or there are two or more colors,” but not by using the PoI, as it will have constructed those relative probabilities out of the data directly. Because it is not in an uninformed state at any point. Likewise when it starts splitting the space further, e.g. into “there are three or more colors,” and as it starts building frequency estimates for each color. If it ever does need the PoI, it will only be when it lacks data to demarcate by; and then as that data arises, its frequency estimates will no longer be based on the PoI, but the data.

We are, actually, the end products of essentially that robot: our brains have similarly been building color data since birth. And so when we ask “what is the prior probability that this urn full of marbles will have a blue marble in it” we can already assign a probability based on information, not the PoI. We have a rough idea of how relatively common blue marbles are, or if we lack experience with marbles, at least of how relatively common blue objects are, as opposed to “magenta” or even “colors never before experienced.” So the conceptual problem posed against the PoI here actually never applies. At no point have we ever consciously needed or heeded the PoI in building our priors for colored objects being in urns. So even if that couldn’t be done, owing to conceptual ignorance about how to demarcate the probability space, that doesn’t matter. That the PoI won’t apply where we never need it to is not a valid criticism of it.

The same holds for the gods, paragods, and hyperparagods case: since we have information there, any condundrums that might arise from an uninformed state aren’t pertinent. And even if we want to hypothesize how we’d get from one state to the other (from knowing nothing about whether pets exist, for example, to knowing they do—a mathematical process that indeed has occupied the computational resources of our brains since infancy), and want to know how we would do that through an application of the PoI, we have the robot analogy: we probably never needed the PoI, and even when we did, it will have been washed out by abundant data by now, so is no longer pertinent. Our demarcation of any probability space evolves as we acquire more information. The PoI only applies to what we don’t know, not to what we do, and all it requires is that we keep any demarcation it would entail logically coherent, with itself and the laws of probability, and all the information we do have and are using.

For instance, we have lots of information establishing the definite existence of paragods and hyperparagods. So the PoI simply doesn’t apply to them. And if it ever did, it would have rapidly evolved into the informed prior we are now using anyway, just as with the color robot. Whereas we have no information establishing the existence of gods or Gods, so absent specific evidence brought to bear on either (e.g. imagine yourself in a hypothetical culture never imagining them, and then encountering a culture that did), the PoI there applies. And it applies according to any logically entailed hierarchy of dependent probabilities. Hence the moment we encountered a culture claiming gods or God exist, but before we started examining any evidence regarding the truth of those claims, for us P(god) = 0.50 and P(God) = 0.25, because conceptually “God” is a dependent probability in a way “god” is not (naturally existent gods logically require nothing else be the case than what is already known to be the case, i.e. the known laws of physics, etc., whereas Gods do logically require a wholly unverified regime of facts: the existence of the supernatural).

In short, probability-space must always be demarcated according to the information you have. The PoI only demarcates those probability spaces (or sub-spaces) for which you have no information. In cases where the information we want to abstract away is too immensely complex (such as regarding the number of possible colors), we simply can’t perform any hypothetical information subtraction so as to get to any application for the PoI. In those cases, we can’t use the PoI. But many cases are not so vexed. Whether the supernatural exists or not, for example, is a proper binary split of possibilities, absent any distinguishing information (and by now, in truth, we have a lot of distinguishing information). But even something like whether Jesus exists or not can be reduced to a simple binary. The fact that we “could” define “Jesus exists” in various other ways won’t matter.

For example, if we defined “Jesus exists” such as to include even imagined and not real Jesuses, we already know the PoI would cease to apply: on abundant data we know the existence of imagined Jesuses is as close to 100% certain as makes all odds. Likewise any other definition of “Jesus exists”—except that definition for which (1) there is no information telling one way or the other, but for the information we have abstracted away to then re-examine (as in my book On the Historicity of Jesus, everything surveyed in chapters six through eleven), and (2) which is the most explanatorily basic. That means the definition of “Jesus” that requires the fewest possible assumptions and at the same time leaves us in a genuine state of not knowing whether that Jesus is more likely to exist or not (until, of course, examining the evidence we have flagged as pertaining to that question). That is the only definition of Jesus that will logically split the probability space evenly (see On the Historicity of Jesus, pp. 31-35). It is therefore the only definition of Jesus to which the PoI would apply—and even then, background information quickly changes that probability anyway (see On the Historicity of Jesus, pp. 238-44).

In short, there are situations to which the PoI does not apply, either because overwhelming information already exists or because of conceptual ignorance or the conceptual complexity of a question. In those cases, we actually cannot even clearly define what question it is we are asking (“How many colors could there be?”), so it is not surprising it can’t be assigned a probability. You have to be able to define a thing clearly to get a probability distribution for it. But when you can do that, and you don’t have any dispositive information, the PoI does apply, and indeed must be applied. Because then the PoI simply correctly describes what ignorance then entailss.

Resolving Paradoxes

For the Partition Problem, the first example given at Statistics How To is the Kukla paradox: “assume for a moment that you have no idea if life is possible on other planets, and life on an Earth like planet is equally possible” and then you can get contradictory likelihoods of life existing depending on how you partition the logical possibilities.

But this only happens here when an equivocation fallacy is introduced: conflating different senses of “life exists.” In one case, “life exists on a given planet,” and in the other case, “life exists anywhere at all.” Those are not the same thing. The probability of the latter must necessarily be much larger than of the former—because it is logically necessarily the case that the probability of life existing anywhere in n number of planets equals the converse of the product of the probabilities of life not existing on any one planet. Which means P(any life) = 1 – (1 – p)^n, where p is the probability of life per planet and n the number of planets. For example, if there is a 10% chance of life on any one planet and there were only one hundred planets, then the probability that at least one of those planets has life would be the converse of the probability that none of them do, and that comes to 1 – 0.9^100 or about 1 – 0.000027 = 0.99998, or 99.998%. Of course, we know empirically that the probability of life on any one planet must actually be far lower than 10%, much less anywhere near “equally possible” (which would mean 50%). So, again, the PoI wouldn’t really apply anyway.

The second paradox listed is the Light and Urn conundrum:

Suppose you have a jar of blue, white, and black marbles, of unknown proportions. One is picked at random, and if it is blue, the light is turned on. If it is black or white, the light stays off (or is turned off). What is the probability the light is on?

Supposedly you could partition the probability space between either “light is on” and “light is off,” in which case the PoI would give you a 50% chance the light is on. Or you could assign each type of marble an equal probability of being drawn (since you have no information entailing there are more of any of the three marbles than any other, so “as far as you know” there are as many of each), in which case the PoI would give you a 33% chance the light is on. In “Explanationist Aid for the Theory of Inductive Logic”, Michael Huemer concludes the latter is the only logically correct demarcation, because the drawing of marbles is causally prior to the condition of the light, and we cannot presume there are more blue marbles than either black or white marbles. Until we have information indicating such a thing, “so far as we know” there are no more blue marbles than black or white—nor less.

This solution actually reduces again to the role of dependent probability: P(light on|marble drawn) is asking what the probability is of the light being on given which marble is drawn. So in fact P(light on) = P(blue marble drawn), and the PoI entails P(blue marble drawn) = P(black marble drawn) = P(white marble drawn). Until we learn anything about there being a different number of any of these colors of marble in the jar. But then the PoI doesn’t apply anymore. This can even be proved in a formal Laplacean fashion: suppose we know the jar contains 99 marbles; we can add up every possible combination of marbles, giving us every logically possible answer to the question, and literally count how many of those possibilities leave the light on and how many leave it off. The answer will be as Huemer finds: one out of every three times. Not one out of every two.

To see why, consider an easier case: there are only 6 marbles in the jar. The PoI entails each possible combination is as likely as any other (until we know anything indicating otherwise). This gives us these ten possible arrangements, and their consequence to the probability the light is on:

1 blue, 1 white, 4 black = 1/6
1 blue, 2 white, 3 black = 1/6
1 blue, 3 white, 2 black = 1/6
1 blue, 4 white, 1 black = 1/6
2 blue, 1 white, 3 black = 2/6
2 blue, 2 white, 2 black = 2/6
2 blue, 3 white, 1 black = 2/6
3 blue, 1 white, 2 black = 3/6
3 blue, 2 white, 1 black = 3/6
4 blue, 1 white, 1 black = 4/6

Since each possible arrangement (ten in all) is equally likely (as we can give no reason to assign any a greater likelihood than any other), it logically necessarily follows that the total probability of drawing a blue ball considering every possible arrangement is:

(1/10)(1/6) + (1/10)(1/6) + (1/10)(1/6) + (1/10)(1/6) + (1/10)(2/6) + (1/10)(2/6) + (1/10)(2/6) + (1/10)(3/6) + (1/10)(3/6) + (1/10)(4/6) = 1/60 + 1/60 + 1/60 + 1/60 + 2/60 + 2/60 + 2/60 + 3/60 + 3/60 + 4/60 = 20/60 = 1/3

Thus, the PoI only produces a logically valid prior probability when you account for every possible configuration of a system, and assign each an equal likelihood to every other. And there are plenty of short-cuts to do that by. For instance, there are infinitely many “supernatural universes” and infinitely many “nonsupernatural universes,” but you don’t have to count up infinities. You already know there is no reason to assume there are any more or fewer configurations of one than the other, so they’d all add up to 50/50 anyway. Unless of course you did know there had to be more possible configurations of one than the other (by some logical proof, say); in which case that knowledge would be incorporated into your distribution, and the PoI would no longer apply.

Huemer conceptualized this as looking for “the most causally basic” distribution (as here, that would be of the marbles, whose distribution causes the configuration of the light), specifically in what he calls “metaphysical” terms. But all that really is capturing is this role of dependent probability: the probability of the light’s configuration is dependent on the distribution of the marbles. A correct PoI takes this into account. And taking it into account eliminates all Light and Urn paradoxes.

For instance, in the demarcation “either Jesus existed or Jesus did not exist,” there are countless different configurations of facts on which either state would hold, but neither is dependent on anything else the other is not (e.g. either is compatible with the fact that information has been selected and curated; and neither requires that not be the case), so this demarcation cleanly divides the probability space into 50/50. Without specific information indicating it, there is no reason to start by assuming either “Jesus existed” or “Jesus didn’t exist” will be any more likely. Only information can get that result. And by that point you’ve left the PoI behind. Otherwise, there is no logical way to demarcate the probability space as would render either hypothesis subordinate to some other assumption that would divide that space any differently than it would for the competing hypothesis—such can only be done with information (as I show in On the Historicity of Jesus, pp. 52-55).

Likewise the Bertrand Paradox. For instance:

If we know nothing about the nationality of a person, we might argue that the probability is equal that she comes from England or France, and equal that she comes from Scotland or France, and equal that she comes from Britain or France. But from the first two assertions the probability that she belongs to Britain must be at least double the probability that she belongs to France.

But this commits a logical error, indeed, violating the Law of Total Probability via a kind of Masked Man fallacy, if we are supposing we somehow don’t know that England and Scotland are parts of (Great) Britain. The statement that these probabilities are equal entails affirming that none of those nations overlap; once we discovered they overlap, we would discover our error, and revise. The PoI only operates on correctly defined systems, so finding that it creates contradictions when applied to contradictory systems is no revelation. The contradiction there arises from the illogical definition of the system, not from the PoI itself. And of course, once we have information (e.g. the relative population sizes of different nations), then the PoI no longer applies. Because then we have information.

Similarly, Bertrand’s original formulation of this paradox was regarding choosing a “random chord” within a circle, and finding there are many contradictory ways to randomize chord selection (the same problem arises with the many ways to randomize “the size of a cube,” e.g. by height or surface area or volume; and so on). But as several mathematicians have pointed out (here and here), when asking about the epistemic probability of a chord being selected (or a “cube’s size” being selected or anything the like), you need to be able to define what you mean: how is this selection happening? Otherwise you simply haven’t defined the question clearly enough to even know what you are talking about, much less how likely it is. Often information exists that already defines this (e.g. establishing how a chord is being randomly selected or a cube is being sized), in which case the Bertrand Paradox is eliminated. Whereas in the genuine case of our not knowing that, a correct application of the PoI would give us an epistemic probability equal to the average over all possible ways of randomly selecting a chord (or sizing a cube, and so on).

The end result of that process, incidentally, matches what’s called the Jaynes condition of “not assuming any specificity” as the only correct definition of ignorance: there is actually only one way to randomize chord selection within a circle that does not produce orderly patterns across the circle; all other methods therefore are not fully random, and thus ruled out by the PoI. They require “added assumptions.” The PoI is instructing you not to do that, precisely because you have no information making those assumptions likely. So they ought not enter in—unless and until they are no longer assumptions, but evidenced conditions. In other words, ignorance must remain correctly defined as ignorance. You must not import knowledge you don’t have. Likewise with sizing a cube: absent any reason to do otherwise, you must average the probabilities across all possible ways to size a cube. Because, after all, if you don’t know, you don’t know. And that lack of knowledge must be mathematically represented. Though really, usually we do know (e.g. how we should be randomizing a cube’s size in a given case).

Thus, contrary to critics, the epistemic consequences of the PoI are not “magic.” They are the logically necessary outcome of any sound epistemic definition of ignorance combined with the laws of probability.

What We Learn

“But can’t we just say we don’t know what the probability is?” We should never affirm what we do not know, right? That’s true as a general principle, but it is not relevantly applicable here. Because when you have no dispositive information, you always in fact do know one thing: that at that point some probability is no more likely to be high than low. And that entails a probability of 50% (when demarcating the space between two equally likely possibilities), as that’s the only probability “that is neither high nor low.” Again, until you learn anything indicating otherwise, whether it be an empirical fact or a logical one.

Another way to think of this is: the absence of all causal determinism logically entails randomness. When we define “cause” as any set of circumstances, including any law of logic or physics affecting the given outcome, then you can only have one or the other; there is no third option: either something is causing the outcome; or nothing is. And “nothing determines the outcome” is logically synonymous with “the outcome is completely random.” Which means, by definition, every possible outcome is equally likely. Therefore, if you are in a state whereby there is no reason to believe anything is causally determining the outcome (not even logical necessities), then you are by definition in a state of assuming the outcome is as likely as not completely random. The PoI then follows.

So when you don’t know of anything that would cause the facts to be other than random, so far as you know those facts are random. Only knowledge can change the default state of “it’s random” to “something is determining the outcome nonrandomly.” Thus, in the absence of such knowledge, you are in a state of knowing that it may as well be random. Obviously, not to a certainty. It’s 50/50 whether it’s random or not. But even averaging over all possible conditions, random and nonrandom, that leaves us with the average probability, which will be the same as an equal distribution across every possibility. And that is precisely what is captured by the PoI: it is merely the epistemic definition of the logical consequence of ignorance.

Not only is this simply stating mathematically the consequences of not knowing anything, the same principle entails “knowing” that knowing anything more could change that conclusion—and most importantly, that only knowing anything more could change that conclusion. Since it’s logically necessarily the case that what you are evaluating can deviate from random only if something exists that ensures it, and you have no knowledge of any such thing, you then must represent the absence of that knowledge mathematically. And when you do, what you get is the PoI.

Imagine again a jar, this time of white and black marbles of unknown count and randomly mixed, and you start drawing marbles, and then start inferring the distribution of marbles in the jar from what you observe being drawn. The more draws you make, the lower the probability gets for certain distributions, and the higher the probability gets for others. But this all assumes you are assuming the prior probability started even: that there are as many white as black marbles. That is the only starting hypothesis that will give you a result (the most likely distribution of marbles after, say, ten draws) that will more frequently approach the real distribution. If you arbitrarily start with an assumption of some other distribution (e.g. “1 in every 10 marbles is white”), and you do that for no reason, you are introducing a larger possible error than if you start by assuming an even distribution. Thus the PoI simply gives you the option that will entail the least epistemic error.

And that is because assuming an even distribution averages out all possible degrees of error: e.g., so far as you know, “it’s no more likely to be 1 in 10 than 9 in 10” so either assumption is equally as likely to be wrong. This follows from the very definition of ignorance. In fact if you were to do the math for every possible starting assumption (every possible distribution of marbles, combined with observing ten random draws), and average over them all, you end up with the same result as if you simply started by assuming a distribution of 50%. All possible errors, averaged over into a combined estimate of probability, just gets you there anyway. So you may as well start there.

This is indeed why statistics works as a science, as most succinctly explained in William Faris’s review of Jaynes. For example, if you randomly select 100 people out of a given population and ask whether they identify as a man or a woman, statistical science already provides for how to average over every possible distribution and compare each possible “actual” distribution with the observed number answering a certain way, to thereby get a probability that that is the number you would observe. And when you average over all those possible outcomes (applying, again, the Law of Total Probability), what you end up with, mathematically, is the same as if you started by assuming half would have identified as men and half as women, and then corrected that assumption as data came in.

For instance if all 100 answer “man,” the probability that the actual distribution was 50/50 will be extremely low—even if you started out assuming it was 50/50. Indeed you can calculate the probability that all 100 would so answer, for every possible “actual” distribution, and weigh each possible distribution according to how many ways there are to get that result, and calculate the total average probability. And what it will be will be the actual distribution to a calculable probability. And yet that all depends on the PoI: you start out assuming that every possible distribution is equally likely (as you have no information yet suggesting otherwise); the mathematical effect of which is identical to assuming from the start that the distribution was simply 50/50. So the obvious short-cut is simply to assume that—absent any reason to assume otherwise. If you’re wrong, information will correct you.

Thus the PoI is actually an accurate representation of what you do in fact know, when you otherwise know nothing. It is simply, literally, the definition of ignorance.

5 Comments

Barry Rucker on December 30, 2020 at 6:17 pm

mistake all philosophers make when they Hose Thought Experiments: they screw up what the think they are: “The” should be “they.”
In short, probablity-space must always be demarcated according to the information you have. The PoI only: “Probablity” should be “probability.”
Eventually this robot will have a large enough database to indicate that the hypothesis “there are mostly four colors, and on rare occasions some other colors” is the most probable. Somewhere in that process it may have: I see brown, black, and white on more than rare occasions.
- Richard Carrier on December 30, 2020 at 8:13 pm
  
  Brown is yellow (it’s just a darker shade). So it is one of the standard four.
  
  White is all colors. And black is no colors. So the answer “what color is it?” for white is “all” and for black is “none.” These thus still fall under the four color scheme (as blended or vacant colors).
  
  Similarly, orange is “red and yellow,” or an equipoise between red and yellow, so also still under the four-scheme. And so on.
  
  By contrast, magenta has no correspondence to any of the four, even as a blend, as it corresponds to no wavelength of light. It is technically an equipoise of red and blue, but red and blue are not adjacent wavelengths, so they possess no “border” wavelength that can be called magenta.
- Richard Carrier on December 30, 2020 at 8:33 pm
  
  And the typo was “think” should be “thing,” rather than “the” should be “they.” Corrected.
  - Cat on March 21, 2021 at 3:49 pm
    
    Another typo, I think:
    
    “if we are supposing we somehow don’t know that Britain and Scotland are parts of England.”
    
    England and Scotland are parts of Great Britain.
  - Richard Carrier on March 22, 2021 at 2:55 pm
    
    Oh good catch! Silly American getting confused. Fixing.