Are 81 percent of Elon Musk's twitter followers fake?
Numbers Rule Your World 2022-05-20
If we believe the analysis by two companies in the business of detecting fake twitter accounts, around 81 percent of Elon Musk's twitter followers are "spam" or "fake" (link). With 93 million followers, that is a shocking 75 million "spam"/fake accounts, enough to keep the anti-fake crusader Musk awake at night.
But what the analysis really tells us is that counting "spam"/fake accounts is a highly subjective exercise. Remember when I said in my previous post:
Here comes the hard part. What he wants to know is the proportion of those 9 million accounts that are spam accounts. Who is going to decide whether each of the 9 million accounts are "spam", and how?
We have two companies eager to tell us what "spam" is. Here are some of what they included in the 75 million spam/fake accounts:
- "inactive" accounts that have not tweeted in the past 90 days (67 million, 71% of Musks's followers)
- accounts using Twitter's default profile images (up to 24 million, 26%)
- accounts that have a "suspiciously small number" of followers (up to 78 million, 83%)
- accounts on an "unusually small number" of lists (up to 89 million, 95%)
- accounts using "spam-correlated keywords" in profile descriptions (up to 68 million, 73%)
- accounts specifying a location that is not a known place name (up to 66 million, 71%)
- some number of accounts that are "protected" (i.e. private): these accounts cannot be analyzed by data analysts since the data are locked up. In the binary classification of fake or not fake, the private accounts ended up on the fake side of the ledger
What's the problem? Many of us will disagree with how they define a spam/fake account.
By their own admission, a bot account that automatically shares frontpage posts from the Hacker News website counts as "fake". I doubt Musk wanted to get rid of this type of accounts. Some twitter users (lurkers) may never tweet and they count also as not authentic. I doubt users who don't tweet or retweet are responsible for spreading fake news.
The modelers claim that they are "conservative", "We intentionally biased to missing some fake/spam accounts rather than accidentally marking any real accounts incorrectly." So they claim that the algorithms almost never classify real accounts as "spam"/"fake".
That's a bit hard to believe. Consider a typical human lurker, someone who is a content consumer but not a content producer. They may have used a fake location, their accounts most likely would not be on anyone's list, they likely have very few followers, they may have retained Twitter's default profile image, etc. etc. They may therefore accumulate enough signals to be classified as spam.
***
But let's take their word and assume that the chance of a false positive is close to zero. The chance of a false negative - classifying a spam/fake account as human - is 35%, which they disclosed. This means, for every 100 spam accounts, 35 would be called human. So... the number of spam accounts is actually more than reported, since some of the spam accounts are erroneously labeled humans. I'm not sure which analysis has this false negative rate. I'm going to assume it's the one in which they applied the model to only the active Musk followers (i.e. 27 million rather than the full 93 million).
The model asserts that 23% of active Musk followers are spam accounts. Since they claim true humans are not misclassified as spam, then all 6 million are true spam accounts. But 35% of spam accounts are misclassified as humans. So the 21 million "humans" comprise of true humans plus misclassified spam accounts. 35% * # spam accounts + # humans = 21 million. Thus, # spam accounts = (6)/(1-35%) = 9 million.
In other words, 9 out of 27 million of active Musk followers are spam accounts, out of which 6 million are found by the algorithm, while 3 million are erroneously classified as humans. Thus, the total number of accounts classified as humans is 18 + 3 = 21 million, as expected.
If we adjust the model results by the claimed false negative/false positive rates, the total number of real Musk followers is 18 million out of 26.8 million active accounts (67%), or 18 million out of 93.4 million (19%) if inactives are considered spam and no humans are inactive.
The report authors concluded that 23% of the active Musk followers are spam but they failed to adjust for false negatives. After adjustment, it's 33% spam among actives (and almost 80% of total).