Draft words
Lingua Franca 2014-05-14
Reuben Fischer-Baum, Aaron Gordon, and Billy Haisley, "Which Words Are Used To Describe White And Black NFL Prospects?", Deadspin 5/8/2014
Do NFL scouts talk about white players and black players differently? Are certain words reserved for white players? Are others used primarily to describe black players?
Let's try and find out. We've pulled the text from pre-draft scouting reports from NFL.com (written by the infamous Nolan Nawrocki), CBS, and ESPN, split them by player race, counted the number of times individual words appeared using the Voyant tool, and then calculated the rate at which each word appeared per 10,000 words. (In total we pulled 68,465 words on 99 white players—6,228 unique—and 223,868 words on 288 black players—10,580 unique). You can play with the data in the interactive below; simply plug a single word into the input field, hit search, and see how often the word appeared in black and white scouting reports.
Here's what the "interactive" looks like:


(For readers who are unfamiliar with the culture of American football, "center" is the name of a position on the offensive line, while "safety" in the name of a position in the defensive backfield.)
It's interesting to see such a nimble use of simple "text analytics" in this context. But the most striking part of it, to me, is this:
You can check out the code/documentation for the graphic over on Github.
It's neat to see magazine writers posting their data and code!
So I downloaded the .zip file and subjected the raw word counts to the same ranking method used earlier to rank "Obama's favored (and disfavored) SOTU words" (1/29/2014) — the "weighted log-odds-ratio, informative Dirichlet prior" algorithm described on p. 387-8 of Monroe, Colaresi & Quinn "Fightin' Words: : Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict", Political Analysis 2009.
For each word in the list, I've printed out seven numbers:
1. The count in the scouting reports on black players 2. The black-player count expressed as frequency per million words 3. The count in the scouting reports on white players 4. The white-player count expressed as frequency per million words 5. The sum of 1 and 3 6. The sum of 2 and 4 7. The weighted log-odds ratio after Bayesian shrinkage and regularization
For the two position-words in the examples above, the results are
center 48 (214.412) 59 (861.754) 107 (366.021) -5.355 safety 148 (661.104) 3 (43.818) 151 (516.534) 4.300
Other offensive-line words tend to be white-associated:
guard 79 (352.887) 84 (1226.9) 163 (557.583) -5.885
And other defensive-backfield words tend to be black-associated:
cornerback 105 (469.026) 3 (43.818) 108 (369.442) 3.509
By this criterion, many of the most white-associated words are connected with the quarterback position:
accuracy 18 (80.4045) 68 (993.208) 86 (294.185) -8.096 pocket 72 (321.618) 94 (1372.96) 166 (567.846) -6.970 arm 172 (768.31) 149 (2176.29) 321 (1098.06) -6.798 placement 37 (165.276) 57 (832.542) 94 (321.551) -5.844 throws 163 (728.108) 121 (1767.33) 284 (971.495) -5.353 pressure 47 (209.945) 57 (832.542) 104 (355.759) -5.226 throwing 21 (93.8053) 39 (569.634) 60 (205.245) -5.181 delivery 6 (26.8015) 25 (365.15) 31 (106.043) -4.981 velocity 9 (40.2023) 27 (394.362) 36 (123.147) -4.892 passing 74 (330.552) 67 (978.602) 141 (482.327) -4.713 mobility 13 (58.0699) 28 (408.968) 41 (140.251) -4.597
This is probably the main reason for the difference in (normalized) frequency of "intelligent"
intelligent 15 (67.0038) 17 (248.302) 32 (109.464) -2.748
The black-associated words seem to be connected to a wider range of positions:
burst 360 (1608.09) 41 (598.846) 401 (1371.72) 4.386 return 127 (567.299) 4 (58.424) 131 (448.119) 3.816 coverage 398 (1777.83) 64 (934.784) 462 (1580.39) 3.427 acceleration 100 (446.692) 5 (73.03) 105 (359.179) 3.142 man 178 (795.111) 20 (292.12) 198 (677.31) 3.108 cuts 89 (397.556) 4 (58.424) 93 (318.13) 3.027 leaping 82 (366.287) 4 (58.424) 86 (294.185) 2.860 explosive 169 (754.909) 22 (321.332) 191 (653.364) 2.733 receivers 187 (835.314) 26 (379.756) 213 (728.621) 2.721 runner 231 (1031.86) 36 (525.816) 267 (913.342) 2.703 returner 62 (276.949) 2 (29.212) 64 (218.928) 2.658
The whole list is here.
It's nice to see that the (currently 317) comments on the Deadspin piece are free of racist invective, as far as I can tell. Either someone is policing their comments closely, or (more likely) it's just a different crowd from the people who comment in some other places.
[Tip of the hat to JP Settles]