On the other hand, alone
Language Log 2013-04-22
My faith in the possibility of integrity and self-criticism in humankind got a real boost the other day when I read a post on Lingua Franca in which an editor (who is also a professor in an English department) stopped to think about whether she was in the right about a construction she had been proscribing for years in the journal papers she edited, and decided that she wasn't.
Is it legitimate to say "On the other hand, …" in a text where you have not first used "On the one hand, …"? Professor Anne Curzan thought the answer was no. And for years she told authors to change on the other hand to something like in contrast if they hadn't got a preceding instance of on the one hand somewhere nearby. But then one day she got to thinking: Am I right? Is it really an error to use on the other hand alone? So she did what people interested in grammar only rarely do: she started looking at the evidence, and decided that it refuted her rule.
Her question cries out for empirical investigation. After all, if it's a rule of good English that on the one hand is obligatory as a precursor to a use of on the other hand, then in competently produced prose that has been through an editing process you should always be able to find the former in any document containing the latter. If the rule is a myth, you won't find any such regularity.
When I read Curzan's piece I immediately started writing a little script that assesses text files to see how many on the other hand tokens and how many on the one hand tokens they contain (see the end of this post if you're interested in the code). [The code, and the following paragraphs of the post, were rewritten on 18 April 2013 around 3 p.m. UK time.]
It's a rather simplistic piece of programming. It treats whole files as if they were unified texts, regardless of their actual content. (A text file could of course be composed of several distinct texts having nothing to do with each other.)
A more eyebrow-raising fault is that it ignores the order in which the two phrases occur in the file and relies solely on the count (though as we shall see, this almost certainly does not matter).
It does, however (as suggested by a reader of the first version of this post), allow for both on the one hand and its variant on one hand; and it also allows for both on the other hand and on the other, with the word hand left out under ellipsis. (Such ellipsis does occur, though rather rarely.)
I ran the script on the widely available Wall Street Journal corpus (WSJ), which Mark Liberman was instrumental in causing to be released to the computational linguistics community in 1993. It consists of 313 files each containing around 180,000 words of random news stories and features from 1987-1989. One of its merits is that each sentence is a separate line of the file. My script would miss occurrences of phrases that were split across lines, but with WSJ, God bless it, that's not a problem.
Ideally one would want to treat each story separately, which would be fairly easy to do, but my script doesn't bother; it would allow on the one hand to count as a match for an on the other (hand) that occurred half a dozen stories later on a different day. This means I may be undercounting unmatched cases of on the other (hand). But for what it's worth, here are the results I got — and I stress that anyone who wants can run an independent verification:
Occurrences of on (the) one hand: 179 Occurrences of on the other (hand): 1786 Files with unmatched on the other hand tokens: 1607 Files in which on the other hand tokens were always matched: 4 Files with unmatched on the one hand tokens: 0That last number is significant. There are no cases in WSJ where the writer began with on (the) one hand but forgot to continue with an on the other (hand) later (though elsewhere on the web, as Philip Minden has pointed out to me, you can find many cases of people writing On the one hand . . . (blah blah blah . . .), but then again . . .).
This indicates that that the frequency of carelessly unmatched on (the) one hand cases in texts is probably close to zero. I say this because while orphan cases of on (the) one hand do not occur at all in WSJ, orphan cases of on the other hand are not just frequent, but run at a full order of magnitude above the total number of tokens of on the one hand. Almost 90 percent of the tokens of on the other hand do not even have a case of on the one hand anywhere in their file, whether matching or accidental. If there isn't a matching expression anywhere in the file, then a fortiori there isn't one in the preceding sentence or two within the same article.
These results hold for copy-edited, published prose by experienced journalists working for a newspaper of national and international prestige. And they make an overwhelming case against the existence of the rule that Curzan used to believe in. It would be irrational to go on believing that a rule of correct English bars uses of on the other hand that are not matched with preceding tokens of on the one hand. It's not just that exceptions occur, perhaps due to momentary carelessness. Any defender of the rule would face two compelling unexplained facts: (i) Why do the exceptions utterly swamp the cases where the putative rule is respected? And (ii) why are there never any cases of momentary carelessness that lead to unmatched instances of on the one hand?
The generalization that on the other hand always needs a matching instance of on (the) one hand preceding it is simply false, and Curzan was right to change her mind. Kudos to her for being ready to accept that evidence might bear on questions of grammar.
I wrote the script in the C Shell command language, which isn't everybody's favorite. Here's the pseudocode:
Initialize the variables one_hand, otherhand, difference, anomalous_files, and matched_files to zero. BEGIN FOR-LOOP For each file in the list supplied as input, set x to the number of occurrences of "on the one hand" or "on one hand" in it, and add the value of x that to the value of one_hand; set y to the number of occurrences of "on the other hand" or "on the other" in it, and add the value of y to the value of otherhand; if the value of y is less than the value of x, add 1 to the value of anomalous_files (and print out the name of the file); if the value of y is equal to the value of x, add 1 to the value of matched_files. END FOR-LOOP If the value of otherhand exceeds the value of one_hand, set difference to otherhand - one_hand. Print out: "Occurrences of 'on (the) one hand':" one_hand "Occurrences of 'on the other (hand)':" otherhand "Files with unmatched 'other hand' tokens:" difference "Files with 'other hand' always matched:" matched_files "Files with unmatched 'one hand' tokens:" anomalous_files
For those who would like to have the C Shell code itself, I'm happy to provide it (today especially, for I am mindful of Mark Liberman's very important comments). Here's my current version, which uses egrep to allow for the possibly missing words the and hand. The reason it prints out the names of potentially anomalous files is so that they can be hand-checked for orphan cases of on (the) one hand.
#!/bin/csh -f @ one_hand = 0 @ otherhand = 0 @ anomalous_files = 0 @ matched_files = 0 @ difference = 0 if ( $#argv > 0 ) then set files = ( $argv ) else set files = ( * ) endif foreach f ( $files ) @ x = `egrep -i ' on (the )?one hand' $f | wc -l` @ one_hand = $one_hand + $x @ y = `egrep -i ' on the other( hand)?' $f | wc -l` @ otherhand = $otherhand + $y if ( $y $one_hand ) @ difference = $otherhand - $one_hand echo "Occurrences of 'on (the) one hand': $one_hand" echo "Occurrences of 'on the other (hand)': $otherhand" echo "Files with unmatched 'other hand' tokens: $difference" echo "Files with unmatched cases of 'one hand': $anomalous_files" echo "Files with 'other hand' always matched: $matched_files"