Diagramming sentences
Language Log 2013-04-22
Today's Frazz:
In fact, there are people whose (paid) job it is to "diagram" sentences — some examples of their output are here, here, here, here, and here. These results have engineering as well as scholarly and scientific applications.
But the Reed-Kellogg method of "diagramming" sentences has been intellectually obsolete for a hundred years. It's a shame and a scandal that no other mode of syntactic analysis has any grip on the popular imagination today.
In particular, there's an alternative standard which has been around, with minor modifications and additions, for more than 20 years; which has been applied on a large scale (millions of words) in published "treebanks" of languages as diverse as English, Greek, Chinese, and Arabic; and which is increasingly used for engineering, scientific, and humanistic applications, as documented in thousands of publications. Excellent tutorial materials have been developed for training people to use this standard.
And yet, if you were to ask faculty members in English departments and Schools of Education, I'd be surprised if one person in a hundred has ever even heard of this analytic standard — and I doubt that one in ten thousand would actually know anything substantive about it. So who is doing all the research, and writing all those thousands of scholarly, scientific and technical papers? Computational linguists and computer scientists.
There's something deeply wrong here, and plenty of blame to go around. The people who know about this stuff have done a dreadful job of public relations. The people who don't know, and should, are intellectually irresponsible in this as in many other ways.
Update — I should make it clear that the advantage I'm claiming for the "treebank" style is NOT that its analyses are superior to the long list of alternatives, starting with Reed-Kellogg diagrams. Most of the substance of the analyses would have been familiar to Otto Jespersen, for example. And the now-available explanatory materials are certainly not suitable for use by high-school or even university students.
My point is that this framework is stable and descriptively well-established, and has become a widely-accepted basis for computational and historical work. There are translations back and forth with a number of alternative representational formats, including dependency grammar, tree-adjoining grammar, categorial grammar, etc.; there could perfectly well in principle be translations to and from Reed-Kellogg diagrams, if those were somewhat better formalized. There are relatively good parsers — and better ones every year.
So if we accept the premise that some fraction of educated people ought to learn "grammar", in some sense of that word, then some version of the treebank framework is the obvious candidate for the kind of grammar that they should learn.