How does ESPN discuss white and non-white quarterbacks?

Bad Hessian 2013-05-10

It’s pretty apparent that race is a contentious topic in the sports media. I decided to explore popular perceptions of differential treatment of white and non-white quarterbacks in the NFL and algorithmically analyzed more than 36,000 articles from ESPN.com published over the past 17 months.

Background

Attention and concern have been growing over the way white and black quarterbacks in the NFL are portrayed by the media. Hall of Famer Warren Moon has recently claimed that black quarterbacks are stereotyped as having integrity or character issues. Geno Smith was the subject of a withering profile by Nolan Nawrocki (who published a similar profile of Cam Newton two years ago). Drew Magary questioned why no black quarterback has won a Super Bowl in 25 years.

In addition to the supposed “character” and “integrity” issues, black quarterbacks are often described as “athletic” or “mobile” while white quarterbacks are credited as having “vision” or “intelligence.” Black quarterbacks are expected to be dual threats (capable of both called passes and runs) while white quarterbacks are expected to be traditional drop-back pocket passers. Think Peyton Manning vs. Michael Vick. This is not a new observation. Salon commented on it in 2002.

Methods

To explore the extent of this difference, I used Scrapy to download the text of 36,156 articles from ESPN.com. I included all articles that mention the NFL between January 1, 2012 and May 8, 2013. I excluded Insider content as I am not a subscriber.

I identified all of the quarterbacks on NFL rosters for the 2011-2012 and 2012-2013 seasons (thanks, PFR!) as well as quarterbacks taken in the 2013 draft. I then used a (very) slightly modified version of Neal Caren’s Python project investigating how the NY Times talks about men and women. Most of it is simple plain Python with a little bit of help from the Natural Language Toolkit.

As is standard practice, each sentence in each article was converted to lower case and into a vector of individual words, removing punctuation. Each sentence was then analyzed to determine if it mentioned one of the above quarterbacks by name. Then it was coded as either mentioning a white quarterback, a non-white quarterback, or both. The words that were then used in that sentence were stored.

I ran the script iteratively to trim out noise words. Initial runs produced results consisting mainly of other players’ names, names of teams and cities, and so forth. I modified the script to remove most other names, cities, team names, and team nicknames. This was a decision that potentially affected my results, and I’ll discuss why in a moment.

Finally, I generated the list of words that are most associated with either white or non-white quarterbacks and the ratio at which they are used with each race.

Results

My findings indicate that differing language is indeed present, though not necessarily in the ways described above. Fourteen percent of all sentences in the articles were assigned as discussing one quarterback’s race or another. There were 2.4 sentences written about white quarterbacks (81,046 sentences) to every sentence written about a black quarterback (34,441 sentences — not surprising given that most quarterbacks on NFL rosters are white).

Below are the words that are most likely to be used to describe white and non-white quarterbacks.The numbers beside each word are the number of sentences that used that word to describe a white or non-white quarterback. As you can see, many of the words refer to throwing — unsurprising for a quarterback. Most of the words are generally neutral or positive (“MVP”, (hall of) “famer”, “completion”, “bonus”, “win”, etc.) Although we do see “sacked”, “sacks”, and “interceptions.” The high rankings of “neck” and “Tennesean” are almost certainly due to the Peyton Manning story. I have no idea what radio, music, and podcast are about.

Most Likely to Appear in Sentences about White QBs

Times Assigned to White QBs
Times Assigned to non-White QBs
Word
233 0 owner/general 439 0 famer 254 10 wildcat 231 11 vice 2553 128 room 187 10 music 342 24 neck 509 49 radio 230 28 tennessean 255 32 executive 195 25 georgia 416 57 mvp 240 40 completion 254 44 throughout 262 46 overtime 258 46 favorite 500 91 thrown 207 38 listen 516 102 sacked 240 48 podcast 291 59 president 195 40 poor 233 48 rank 249 52 april 651 136 throwing 350 74 march 203 43 relationship 830 177 throw 1051 226 threw 1238 269 interceptions 521 117 throws 560 128 qbr 387 90 shoulder 406 95 stadium 260 61 bonus 198 47 replaced 219 52 completing 474 113 saturday 11914 2861 quarterback 595 144 rating 1087 264 qb 374 92 comments 252 62 wins 1595 401 win 1645 414 backup 812 205 interception 190 48 et 676 171 home 233 59 board

Now, for the interesting part. What are the words used to describe non-white quarterbacks? First, note the prevalence of other position names throughout the list. Remember when I said above that I had to trim out a lot of other players’ names to get meaningful results? This was doubly true for non-white quarterbacks. Take what you will from it, but non-white quarterbacks seem much more likely to be discussed in concert with their teammates rather than alone.

Further, we see some words that support the differential treatment discussed above. “Talented”, “rushing”, “talent”, “threat”" (as in dual-threat), “dynamic”, “runs”, and “speed.” We also see a big focus on injuries (which is almost certainly because of many, many articles on RG III and Ray Lewis) as well as physical descriptions: “triceps”, “acl”, “knee”, “age”, “hamstring”, and “pounds.” The list is pretty different in significant ways, though I am pleased and surprised not to see “character”, “integrity”, or “off-the-field” in the list. Fewer words are offered than for white quarterbacks because the differences level off for non-white quarterbacks (presumably because there are many fewer sentences).

Most Likely to Appear in Sentences about non-White QBs

Times Assigned to White QBs
Times Assigned to non-White QBs
Word
10 154 baylor 7 92 triceps 17 107 acl 35 158 torn 73 169 restricted 93 184 linebackers 44 78 promising 76 126 tender 104 164 receptions 50 75 barry 75 103 safeties 100 137 talented 58 79 nose 245 332 rushing 197 265 receiving 61 82 opposite 314 367 man 265 308 talent 69 78 fill 203 222 tackles 213 230 catches 115 121 threat 978 1001 linebacker 471 480 knee 105 107 age

Limitations

Of course, any research project must also disclose its limitations. As this is an algorithmic analysis, I can’t say for certain that these differences are 100% valid. I also cannot attribute these differences to intent on the writers’ part. Second, the data only comprise about a year and a half of articles. I can’t say how this language has changed over time. Third, ESPN’s coverage of Manning(s), Brady, RGIII, Tebow, etc. are likely to skew these findings in particular ways. I don’t know how Sports Illustrated, NBC Sports, or Fox Sports cover the same issues.

For those with a little statistical know-how, notice there are not hypothesis tests or significance levels offered. However, given the disparity between many of the sentence counts, I’m confident the differences are not spurious. Finally, the method above misses those sentences that refer to quarterbacks without using their name. For instance, a sentence about Colin Kaepernick written as “[t]he young 49ers quarterback showed off his speed and athleticism by throwing for two touchdowns and running for another” wouldn’t be detected by this method.

All of my code can be found on my GitHub page. This will be cross-posted on spreadBlog as well as Bad Hessian.