What are the odds, II: the Venezuelan presidential election
What's new 2024-08-02
In a previous blog post, I discussed how, from a Bayesian perspective, learning about some new information can update one’s perceived odds about how likely an “alternative hypothesis” is, compared to a “null hypothesis” . The mathematical formula here is
Thus, provided one has- (i) A precise formulation of the null hypothesis and the alternative hypothesis , and the new information ;
- (ii) A reasonable estimate of the prior odds of the alternative hypothesis being true (compared to the null hypothesis );
- (iii) A reasonable estimate of the probability that the event would occur under the null hypothesis ; and
- (iv) A reasonable estimate of the probability that the event would occur under the alternative hypothesis ,
At a qualitative level, the Bayesian identity (1) is telling us the following: if an alternative hypothesis was already somewhat plausible (so that the prior odds was not vanishingly small), and the observed event was significantly more likely to occur under hypothesis than under , then the hypothesis becomes significantly more plausible (in that the posterior odds become quite elevated). This is quite intuitive, but as discussed in the previous post, a lot hinges on how one is defining the alternative hypothesis .
In the previous blog post, this calculation was initially illustrated with the following choices of , , and (thus fulfilling ingredient (i)):
- was the event that the October 1, 2022 PSCO Grand Lotto in the Phillippines drew the numbers (that is to say, consecutive multiples if ), though not necessarily in that order;
- was the null hypothesis that the lottery was fair and the numbers were drawn uniformly at random (without replacement) from the set ; and
- was the alternative hypothesis that the lottery was rigged by some corrupt officials for their personal gain.
In this post, I would like to run the same analysis on a numerical anomaly in the recent Venezuelan presidential election of June 28, 2024. Here are the officially reported vote totals for the two main candidates, incumbent president Nicolás Maduro and opposition candidate Edmundo Gonzáles, in the election:
- Maduro: 5,150,092 votes
- Gonzáles: 4,445,978 votes
- Other: 462,704 votes
- Total: 10,058,774 votes.
Let us try to apply the above Bayesian framework to this situation, bearing in mind the caveats that this analysis is only strong as the inputs supplied and assumptions made (for instance, to simplify the discussion, we will not also discuss information from exit polling, which in this case gave significantly different predictions from the percentages above).
The first step (ingredient (i)) is to formulate the null hypothesis , the alternative hypothesis , and the event . Here is one possible choice:
- is the event that the reported vote total for Maduro, Gonzáles, and Other are all equal to the nearest integer of the total number of voters, multiplied by a round percentage with one decimal point (i.e., an integer multiple of ).
- is the null hypothesis that the vote totals were reported accurately (or with only inconsequential inaccuracies).
- is the alternative hypothesis that the vote totals were manipulated by officials from the incumbent administration.
Ingredient (ii) – the prior odds that is true over – is highly subjective, and an individual’s estimation of (ii) would likely depend on, or at least be correlated with, their opinion of the current Venezulan administration. Discussion of this ingredient is therefore more political than mathematical, and I will not attempt to quantify it further here. Now we turn to (iii), the estimation of the probability that occurs given the hypothesis . This cannot be computed exactly without a precise probabilistic model of the voting electorate, but let us make a rough order of magnitude calculation as follows. One can focus on the anomaly just for the number of votes received by Maduro and Gonzáles, since if both of these counts were the nearest integer to a round percentages then just from simple subtraction the number of votes for “other” would also be forced to also be the nearest integer from a round percentage, possibly plus or minus one due to carries, so up to a factor of two or so we can ignore the latter anomaly. As a simple model, suppose that the voting percentages for Maduro and Gonzáles were distributed more or less uniformly in some square , where are some proportions not too close to either or , and is some reasonably large margin of error (the exact values of these parameters will end up not being too important, nor will the specific shape of the distribution; indeed, the shape and size of the square here only impacts the analysis through the area of the square, and even this quantity cancels itself out in the end). Thus, the number of votes for Maduro is distributed in an interval of length about , where is the number of voters, and similarly for Gonzáles, so the total number of different outcomes here is , and by our model we have a uniform distribution amongst all these outcomes. On the other hand, the total number of attainable round percentages for Maduro is about , and similarly for Gonzáles, so our estimate for is
This looks quite unlikely! But we are not done yet, because we also need to estimate , the probability that the event would occur under the alternative hypothesis . Here one has to be careful, because while it could happen under hypothesis that the vote counts were manipulated to be exactly the nearest integer to a round percentage, this is not the only outcome under this hypothesis, and indeed one could argue that it would not be in the interest of an administration to generate such a striking numerical anomaly. But one can create a reasonable chain of events with which to estimate (from below) this probability by a kind of “Drake equation“. Consider the following variants of :- : is true, and the administration directs election officials to report vote outcomes with some explicitly preferred (round) percentages, regardless of the actual election results.
- : is true, and the election officials dutifully generate a report by multiplying these preferred percentages by the total number of voters, and rounding to the nearest integer, without any attempt to disguise their actions.
- If one assumes that the administration wishes to manipulate the vote totals, how likely is it a priori (i.e., without being aware of the anomaly ) that they would do so by explictly selecting preferred round percentages and then requesting that election officials report these percentages?
- If one assumes that election officials are being ordered to report vote totals to reflect a preferred round percentage, how likely is it a priori that they would follow the orders without question, and performing simple rounding instead of any more sophisticated numerical manipulation?
- If one assumes that election officials did indeed follow the orders as above, how likely is it a priori that the report would be published as is without any concerns raised by other officials or observers?
One can contrast this analysis with that of the Phillipine lottery in the original post. In both cases the probability of the observed event under the null hypothesis was extremely small. However, in the case of the Venezuelan election, there is a plausible causal chain that leads to an elevated probability of the observed event under the alternative hypothesis, whereas in the case of the lottery, only extremely implausible chains could be constructed that would lead to the specific outcome of a multiples-of-9 lottery draw for that specific lottery on that specific date.