Friday, 12 June 2009

A Bayesian Argument Against Induction

Have I gone mad? Taking a favoured tool of those who accept induction to argue against them? I don’t believe I have gone mad, this is quite calmly considered. I am well aware, however, that I am writing using something (maths) I do not have a full grasp of. This may not be madness but it is fertile ground for internet crankness. Internet cranks start off making simple, basic, errors in areas they do not fully understand and, on the basis of these errors, leap to wild conclusions. Now I haven’t quite gone for other aspects of internet crankery, branding anyone who disagrees with me as part of a conspiracy or cackling madly. But I am worried that what follows contains a really basic error. If it does I would be very grateful to anyone who pointed it out to me.

Induction and Abduction

Take a number of hypotheses that make a prediction, one way or another, about a piece of evidence “E”:

1. H1, H2, H3 and H4 which have been formulated and
2. A number of hypotheses H5 to HN which haven’t been formulated yet, for brevity let’s call all these “HU”

Before anything, evidence, abduction or induction we allocate a range of probabilities over these hypotheses.

1. We may firmly believe H1: “H1 is true”.
2. We may spread the probability over the four formulated hypotheses: “Either H1, H2, H3 or H4 are true” or
3. we may decide that we should include the unfortmulated propostions: “either H1, H2, H3, H4 or HU is true”.

Whatever distribution of probabilities we decide on the sum of all these is not dependent on that distribution. If we reject a hypothesis we reduce the probability, we assign it to zero. To keep the sum of probabilities stable we must increase the probability assigned to the remaining hypotheses. Without a principle, inductive evidence or prejudice to guide us we are free to distribute this increase in probability how we wish over the non-rejected propositions. That is to say that no particular proposition is singled out as being more likely to be true. All that we are forced to is a conjunction of the hypotheses left. This is abduction and is deductive:

1. Either H1, H2, H3 or H4
2. It’s not H3
3. Thus either H1, H2 or H4

is perfectly valid.

Induction, however, confirms a proposition. It does pick out a specific hypothesis and make that hypothesis more likely. If we have evidence for, say, H2 then we must increase the probability we assign to H2. If we have evidence against H3 then we could redistirubute all that probability to H1 (say we started off believing "it's either H1 or H3" and falsified H3, we'd then believe "it's definitely H1"). If we have evidence that is solely supportive of H2 then we would have to increase the probability we assign to E2 alone.

Induction, evidence for, picks out a hypothesis, or group of hypotheses, and increases the likelihood we should attach to them. How we spread the necessary reduction in other probabilities is up to us. Abduction, evidence against, picks out a hypothesis or group of hypotheses and decreases the likelihood we should attach to them. How we spread the necessary increase in other probabilities is up to us.

What I hope to show is that any movement in the probability assigned to a hypothesis is either abductive or unrelated to the evidence. If I show this I will consider that I have shown that induction does not exist.


Bayes Theorem

To show this I am using the arguments of those who believe in induction against them. There current favourite tool is Bayes’ Theorem which states:

P(H¦E) = P(E¦H) P(H) / P(E)

Where:

P(H¦E) = The probability of a hypothesis ,“H”, given an item of evidence “E”

P(E¦H) = The probability of the evidence given the hypothesis

P(H) = The probability of the hypothesis before considering the item of evidence (the “prior probability”)

P(E) = the probability of the evidence arising (without direct reference to the hypothesis)

(For an explanation of Bayes’ theorem that even I can understand (after two or three readings) go to Eleizer Yudkowsky’s )


Competing Predicitions

Return to the example list of hypotheses above and lets say that H1 and H2 predict E whilst H3 and H4 predict not-E. Not having formulated HU we can’t tell whether those hypotheses predict or forbid E.

We then see E.

This, naturally, should make us downgrade our belief in H3 and H4. It follows that we should increase the level of probability assigned to “H1, H2 and HU” but says nothing about how we should now distribute that probability across these hypotheses. So far, so abductive. This is because we have just considered the negative effect on H3 and H4 and not the positive, inductive, support that E gives any one of H1, H2 or HU.


Equal Predictions

How does E inductively effect, say, H1? It’s no good showing that the probability of H1 after the evidence, P(H1¦E), is greater than the probability of H1 before the evidence, P(H1). We know that already by abduction. To show inductive support we need to show that the P(H1¦E) is greater than the probability of another unfalsified propostion, say H2. We need to show that P(H1¦E) > than P(H2¦E) or, from Bayes’ Theorem:

P(E¦H1) P(H1) / P(E) > P(E¦H2) P(H2) / P(E)

P(E) is on both sides, so we can cancel it out:

P(E¦H1) P(H1) > P(E¦H2) P(H2)

In this part of the argument I am assuming that both equally predict the evidence, thus P(E¦H1) = P(E¦H2) and they can be cancelled out:

P(H1) > P(H2)

So our comparison of posterior probabilities depends upon the prior probabilities. How could they be different? If it is absolutely nothing to do with prior evidence then on the evidence P(H1) = P(H2) and thus P(H1¦E) = P(H2¦E). So any evidential support of E for H1 over and above H2 depends on prior evidential support for H1 over H2. This can’t be abductive, because this will increase the probability of H1 and H2 without favouring either. So the effect of the prior evidence on the prior probabilities depend on the prior-prior probabilities going into that particular calculation. The regress will go back with differentials of prior probabilities depending on previous prior probabilities until we get back to before any evidence whatsoever. If, and only if, a differential here depends on evidence will any of the later differentials in prior probabilities depend on evidence. Of course a differential that depends on evidence arising before evidence is an absurdity. Thus, where hypotheses predict evidence with equal probability any support beyond that given by abduction, is not from evidence but merely the reinforcement of non-evidential belief.


Unequal Predictions

If on the evidence P(H1) = P(H2) then P(E¦H1) > P(E¦H2) will result in P(H1¦E) > P(H2¦E).

However any hypothesis that predicts evidence with probability of less than 1 can be readily converted to a hypothesis that gives a definite prediction by adding a hypothesis about the probability predictions. If H2 predicts E will occur one in two times then the following formulations of H2* will predict that E will occur all the time:

H2* : H2 and “whatever combines with H2 to produce E” or

H2* : H2 and “it just happens to be one of those times when H2 does produce E”

For more ease let’s call the phrases in inverted commas “W” (“whatever”). H2* becomes “H2 and W1”.

The same argument can be applied to H1, to create H1* (or “H1 and W1”) which makes a definite prediction of E. If we do that then, from the analysis of hypotheses that predict the evidence with equal certainty, we know that evidence cannot favour one over the other. So there cannot be any evidential support of H1* over H2*. Can we assess the evidential support of H1 and H2 without the additional factor?

No. Because, given E, “H2 and not-W2” is falsified. Thus “H2 and W”, or H* is the only “live” hypothesis. The same argument applies to “H1 and W1”. This gets us back to the equality (on the evidence) of prior probabilities and no variation (on the evidence) of posterior possibilities. Thus evidence fails to favour one hypothesis over any other non-falsified hypothesis. Induction does not exist.

4 comments:

Anonymous said...
This comment has been removed by a blog administrator.
XXX XXX said...

Nah.

Unknown said...

Presumably the H2* is used to make definite, non-probabilistic predictions in future experiments. But H2* does not necessarily predict E for subsequent experiments. W might actually be a specific pattern of pseudo-random noise caused by outside factors, for example.

This means that there are other possible patterns of pseudo-random noise, each of which implies a different sequence of predictions when combined with H2. Some of these new hypotheses predict E for the first experiment. Because of this, "H2 and not-W" is not falsified when E is observed.

Alternatively, H2* is indefinite and probabilistic after the first experiment. But if this is the case, your criticism no longer applies: Simply perform the experiment more than once and you can adjust the posterior probabilities of H2 and H1 like normal.

Frank Butterman said...

This may sound cheap, but Bayesianism works with equations. An equation is, by definition, not inductive. It is, therefore, impossible to use Bayes' theorem to induce anything. An equation is a subset of deductively valid relations, because each side of an equation is, in the parlance of formal logic, semantically equivalent.