# Faulty Election Data

##### 13 Jun 2009 16:3028 Comments

By MUHAMMAD SAHIMI in Los Angeles | 13 June 2009

[TEHRAN BUREAU] Iran's Interior Ministry has declared President Mahmoud Ahmadinejad the winner of yesterday's election. This has been rejected by all the three opponents of Mr. Ahmadinejad, Messrs Mir Hossein Mousavi, Mahdi Karroubi, and Mohsen Rezaaee.

The best evidence for the validity of the arguments of the three opponents of the President for rejecting the results declared by the Interior Ministry is the data the Ministry itself has issued. In the chart below, compiled based on the data released by the Ministry and announced by Iran's national television, a perfect linear relation between the votes received by the President and Mir Hossein Mousavi has been maintained, and the President's vote is always half of the President's. The vertical axis (y) shows Mr. Mousavi's votes, and the horizontal (x) the President's. R^2 shows the correlation coefficient: the closer it is to 1.0, the more perfect is the fit, and it is 0.9995, as close to 1.0 as possible for any type of data.

Statistically and mathematically, it is impossible to maintain such perfect linear relations between the votes of any two candidates in any election -- and at all stages of vote counting. This is particularly true about Iran, a large country with a variety of ethnic groups who usually vote for a candidate who is ethnically one of their own. For example, in the present elections, Mr. Mousavi is an Azeri and speaks Turkish. The Azeries make up 1/4 of all the eligible voters in Iran and in his trips to Azerbaijan province, where most of the Azeri population lives, Mr. Mousavi had been greeted by huge rallies in support of his campaign. Likewise, Mr. Karroubi, the other reformist candidate, is a Lor. But according to the data released by Iran's Interior Ministry, in both cases, Mr. Ahmadinejad has far outdone both candidates in their own provinces of birth and among their own ethnic populations.

SHARE

Nice work! Am going to Email this to some people. Math is always so excellent.

Pete / June 13, 2009 4:08 PM

Sorry, this doesn't seem unusual at all.

Actual vote-share fluctuates from 67.2% to 70.4%. This isn't unusual at all, considering that 20% of returns had already come in. It's almost on the high side of variance.

There could still be fraud, but, Benford's law is a better way to go about showing it.

David Shor / June 13, 2009 4:10 PM

This chart is interesting, but missing one bit: Just exactly /how/ unlikely is it?

I'm pretty sure that it is possible to calculate the likelihood that the graph over various steps of reporting the polls is near linear. It probably would also somewhat depend on what is know about the differences between voting behavior in different districts, but I could imagine that there's some statistical analysis to work with making only a few minimal assumptions (say, assuming that the poll data has the same variance as in the elections before). Based on that it should be possible to calculate the chance for the data being like that. Please note, I agree with your argument, it is very unlikely data, it would just be nice to know /how unlikely exactly/ it is.

norhuma / June 13, 2009 4:34 PM

As much as I'm sure the Iranian elections are bullshit rigged. The mathematical analysis in this post is flawed. For a very neat explanation of why this is the case, visit http://www.fivethirtyeight.com/ where Nate Silver links to this post and explains the problem.

As usual (speaking as a Sociologist), the problem isn't the mathematical analysis of the data, but an inappropriate understanding of causation (i.e. there is a third factor which leads to this kind of correlation holding true whether the election was rigged or not.)

Mike Thomas / June 13, 2009 4:55 PM

STUPID Engineering!

mohsen / June 13, 2009 4:55 PM

It's strange. I'm looking at this data. It's pretty clear that, while it's almost perfectly linear, that there are three problems:

1. We don't understand WHY the relationship is linear.

2. The ratio isn't two to one at each of these points in the election. It's 2.38... then 2.21... then 2.13... then 2.11... then 2.09... then 2.05. So that's not 2:1 at all. Not at any point. I find it amazing that the article claims that two-to-one ratio, because the numbers are right there.

If the correlation were truly 1, then I'd say that there were some major question marks. However, when you graph this, it's NOT linear at all. It's close. Or it LOOKS close. Close is incredibly far.

I'm looking at this right now. I'm no stats guru, but I looked at the mean and calculated standard deviation. It's around .119 (give or take.)

I'm trying to find out what that actually means. As one fellow who posted here said... we need to understand causation here.

Even if my stats approach makes no sense, what I DO know is that it's awfully strange, in that I asked a cousin who is an international election monitor whether this made sense. He said, "Absolutely not. The vote in Iran, from everything we know, reports in such a way that there would be frequent swings."

So there you have it.

Steve C / June 13, 2009 5:47 PM

Great to see all of the statistics-illiterate people defending this data.

What did the regime do to earn this loyalty?

Josh Scholar / June 13, 2009 7:05 PM

Math is powerful, and it allows others to prove things for themselves. When you wrote "Statistically and mathematically, it is impossible ..." I thought that was an amazing analysis that could end all debate. I forwarded your message on to others.

Unfortunately, your math appears to be wrong. As another reader pointed out, the famous statistician Nate Silver used math to show yours is faulty. See http://bit.ly/2UnFu for his analysis.

This doesn't say anything about the election. It just proves that in all honesty, you should issue a retraction for your math error. Or else, maybe you can crunch other numbers and come up with something solid. Pass them by Nate. I'm sure he would be supportive if they are correct.

I certainly hope Iran's election is resolved in peace and the people get what they want.

Don Gilmore / June 13, 2009 10:01 PM

It's not at all unlikely. Statistics is a black art - and it is completely impossible to say one thing or another about it unless you know the sampling method and the calculation method to create the samples. Just another example of statistics and peoples' ignorance of means and methods to manipulate. No conclusions can be drawn from this without knowing what deviations were applied etc...

Jonathan / June 13, 2009 11:13 PM

This linear correlation say nothing about the validity of the election. Nate Silver covers why here: http://www.fivethirtyeight.com/2009/06/statistical-evidence-does-not-prove.html

Basically, what is said is that most elections do follow a linear relationship, especially since this data was based on time, not by geographical location (where more discrepancies might occur). Consider that if you look at the recent US election between Obama and McCain by the same metric, you get a linear relationship as well, with an r-squared of .9959. This linear behavior is inherent in all elections when based by time.

This is not to say that there is nothing wrong with the elections - I believe that there was manipulation of the votes. I am only saying that this graph proves nothing.

Scott / June 14, 2009 12:04 AM

First off... the ratio is not 2:1 at all points. Do the division for yourself. Look at the ratios of Ahmadinejad's votes to Mousavi's votes at points a, b, c, d, e and f:

a. 2.37:1

b. 2.21:1

c. 2.13:1

d. 2.11:1

e. 2.09:1

f. 2.05:1

The ratio actually gets progressively smaller at each point, until it arrives at the final ratio. The difference initially CAN be a lot higher because there are fewer votes. The percentages, for example, in the first ten votes can be totally different than the ultimate result. 9 votes for Mousavi; 1 vote for Ahmadinejad. That's a tiny sample of the overall vote. n is extremely small relative to N. The sampling error here is ridiculous.

Basic stats: Sampling error decreases with a larger sample. When the sample finally is N, there is no sampling error at all (obviously.)

If there is a monotonic relationship between two variables, then the correlation can get pretty darn high (e.g., the 'rank-order correlation' is perfect).

We don't need to know anything else. This is how elections are. The farther along that they are, the more that the percentages start to look like the final result.

HOWEVER:

This is still a BS election. The problem is really the polls. Unless a poll isn't a random sample... or unless the sample size is so small that the sampling error is massive... or both... the poll shouldn't be off by more than a few percent. When it's off by 10%, it's a ridiculously terrible poll. Beyond that, either the pollster is a total amateur

Until mid May, Ahmadinejad consistently led in the polls. It was around that time that the numbers began to change. 14 of 20 polls had Mousavi winning... and in 7, it was a landslide. 3 had Ahmadinejad cleaning Mousavi's clock.

The bigger problem, from where I'm sitting, is the fact that the voter participation in this election went from around 45% to 85% (rough figures.) That additional 40%... let me ask you... do you think that "status quo" people get galvanized by elections?!? Think about it. These were students... women... people who wanted reform. The turnout probably increased among the status quo folks, too... as a reaction to a reformist threat.

There is no way in Hell that Ahmadinejad won by that spread. It's almost a joke. It's almost like someone is "taking the piss." It makes educated Iranians look like they've been taking. I can only imagine their grief at seeing their democracy slip away.

By the way... the Iranian people are wonderful: educated, plugged in, nice... just wonderful. They don't deserve this shit.

Steve C / June 14, 2009 1:34 AM

For reference, Nate Silver's post was deeply flawed. He randomly selected states for his graph, which is a completely bogus way to simulate Iran's results, which would have been returned regionally.

John / June 14, 2009 1:55 AM

The basis for this analysis does not provide cauastion to accompany the correlation. In fact a linear relationship IS the expected result based on the 2008 US Election using minute by minute data from Nov 4:

VoteForAmerica / June 14, 2009 6:15 AM

I think it should be more striking if the chart was non-linear. Do you know the reason why it is possible to forecast election results e.g. in the U.S.? Because of representative counts. I guess there is nothing more representative than a general election.

By the way: This linear curve is meaningless if you don't consider x- and y-errors. (How did you calculate R^2 ?!?!?! Copied it from Wikipedia??)

PersonAbleToPerfomBasicStatistics / June 14, 2009 7:24 AM

Whoops, I seem to have accidently started something of a controversy with my post, or at least, I have been the first to put my foot in it. Just to be clear, as I said in my first post, I am absolutely in agreement that the election in Iran was unfair (unless people were pretending to want the other guy, and then nevertheless voting for MA, which is not impossible at all).

However, the legitimate point which Nate picked up in his post, which stands apart from his American example, was that there are other factors which can cause this kind of relationship. Someone helpfully refered to the main issue of my post, and indeed Nate's, which is causation. Simple correlation is not the same as causation is there are conflating factors and hidden variables.

As to Josh Scholar, why not just keep the debate at the level of statistics, rather than personal attacks. I'm sure you didn't mean to come off that way. Without appealing to authority at all here, I am a top-of-my-class sociology graduate who heavily majored in statistics and modeling. I'm just saying that there is not reason to throw around a charged phrase like "statistically illiterate" when there is a legitimate discussion going on.

Mike Thomas / June 14, 2009 9:28 AM

haha, what is this? stupid excel screenshot? haha

tombstone / June 14, 2009 10:32 AM

Causation isn't the real issue.

Think about the way elections work for a second. Picture it.

First ten votes... basically a small (probably non-random) sample "n" with massive error. The more votes cast, the more the vote begins to mimic the final vote, which... in essence... is the full population "N" sample.

A little surge here or there along the way isn't going to make a dent! The more votes that come in, the less of a dent. Even if a large area that votes overwhelmingly for the trailing candidate reports later in the election... it doesn't make much of a dent. We're looking at a few data points in the election! If we looked at the entire election, we'd see some variance, especially early on. But through the course of the voting, of course the correlation is going to be high!

It's hard to explain this, but if you start thinking about more and more votes, it will begin to make some sense.

Again, the ratio DOES change... and this article is false. It is not 2:1 the whole way through. The data listed in the chart SHOWS that! It starts out around 2.37:1 for Ahmadinejad and slowly moves toward the final 2.05:1. In fact, as you look at how the ratio decreases progressively at each point, you realize that it is assidulously moving toward the "correct" final result.

That doesn't mean it wasn't rigged. That means that it's likely that those who rigged it weren't TOTAL morons.

Steve C / June 14, 2009 11:20 AM

Die Iran-Lösung...

Wider erwarten, hat Hamandinedschad die Wahl im Iran gewonnen, worüber man in Israel nicht ganz unglücklich ist (siehe Bericht von Haaretz und Der Standard). Das riecht nach massivem Wahlbetrug, könnte aber auch darauf hinweisen, dass di...

Chajms Sicht / June 14, 2009 3:36 PM

Steve, you make a good point. Looking back at my post, I think the main problem is that I've mixed up two levels of "causation". Firstly I was talking of the cause of the results; secondly about the cause of the correlation. You've put what I was trying to get at very well indeed. Nice work.

Mike Thomas / June 14, 2009 6:16 PM

take a look at this website which is specialized on CROWDSOURCING CRISIS INFORMATION online (of course open source). it has been used in conflicts like Kenya or Madagascar as well...

Welcome to Ushahidi, which means "testimony" in Swahili, where we are building a platform that crowdsources crisis information. Allowing anyone to submit crisis information through text messaging using a mobile phone, email or web form.

http://www.ushahidi.com/

Ushahidi / June 15, 2009 10:38 AM

The problem with the used statistics here is that the data used for each point are not independant, because the numbers of each following count include the numbers of previous counts. Using dependant data will always lead to higher correlations. A better way to treat the data would be to substract the number of the previous count from the count your looking at. This would yield the following numbers:

A M ratio

7027919 2955131 2,378208952

3202559 1673781 1,913367997

3781186 1946932 1,942125354

1901592 950273 2,001100736

1061126 598573 1,772759546

1328542 804542 1,651302232

Than do a plot on those. The correlation is still high (about 0.99), but not incredibly high, considering that the sample sizes for each sample are hughe. As you can see, there is also quite some variation in the vote ration between the two candidates, with a considerably larger share for Musavi in the results that came in later, compared to the first batch.

Bismarck / June 16, 2009 11:19 AM

You are obviously not a statistician, Mr. Sahimi, b/c one would never say "statistically impossible".

Jason / June 16, 2009 1:11 PM

Readers of this post should check out the fairly rigorous initial assessment of the data by University of Michigan Political scientist Walter Mebane:

It should add great value to the debate over the election results until we get even better information.

Matt Lenard / June 16, 2009 1:15 PM

The R2 by itself explains little, where are the T-stats?

Xachariah / June 18, 2009 3:11 AM

u dont know shit about math u idiots

any 60/40% election data in the world will probably (not all) look like this since numbers add up.

The ratio is a more precise indicator:

2,378208952 versus 1,651302232 means fluctuation in a 44% range.

i.e FORTY FOUR PERCENT

FST / June 19, 2009 11:09 AM

Garbage. Nearly everything in these claims is false, and anybody with half a barin can see it. As usual I see the Western backed people disrupting democracy and rioting. Polls, including one by some Rockefeller funded AntiTerrorism group, reported in AP, made it very clear Ahmadi Nejad was going to win by avery wide margin.

The pretense that Mousavi, who actually led a more repressive government in his time than the present one, another fact absent the Western reports, was the candidate for thew young, is also false. he is the candidate for the wealthy, the elite and the upwardly mobile university students. They just want more access to Western goods for their wealth, but Iran is predominantly poor and working class and a 30 million person march for the government a few months ago, warfs this punt 1 million anti-democratic demonstrations.

There are a lot of allegations but no real evidence of fraud and the dishonest western press should be ashamed, but it has long since lost all shame as we know.

Rabbit / June 21, 2009 10:35 PM

Look up and read this and see if you can regain a bit of realism and honesty in your opinions people.

Iranian Elections - The 'Stolen Elections' Hoax

By James Petras

Rabbit / June 21, 2009 10:37 PM

I think Nate's analysis doesn't so much as confute this one, but rather provides a causation for the presented correlation. Even though you cannot say that the election officials simply crunched the numbers into a first order function and reported it, but the following would definitely be a valid conclusion:

The numbers provided by official Iranian news agencies have "NOT been reported in the same order as they were collected"; instead, they were reported randomly or in any other geographically non-correlated fashion.

I think it is a very fine point, as Scott pointed out, that the results are reported by time. However, as someone who closely followed the Iranian state media during the coverage of 1997 presidential election where Khatami first came to power, I know that in Iran, the ballots are counted and reported by geographical districts not randomly. In fact each wave of vote count represents one geographical districts.

For example, in 1997 election, as I clearly remember, there was a large SPIKE in Khatami's vote during the last wave of the report simply because Tehran and Central provinces were the last to be reported (Khatami was expected to have significantly more support in these provinces than Noori his hardline opponent).

I think it would be very helpful for people who have access to similar official reports of 1997 Iranian election to plot the same graph for comparison and potential conclusion.

In my view, even though the conclusion of this report is rather hasty, the presented graph is a very clear anomaly in the way the ballot counts were reported by Iranian officials.

Afshin / January 28, 2010 1:02 AM