Cartoonist walks into a language lab…
Language Log 2017-10-05
Bob Mankoff gave a talk here in Madison not long ago. You may recognize Mankoff as the cartoon editor for many years at the New Yorker magazine, now at Esquire. Mankoff’s job involved scanning about a thousand cartoons a week to find 15 or so to publish per issue. He did this for over 20 years, which is a lot of cartoons. More than 950 of his own appeared in the magazine as well. Mankoff has thought a lot about humor in general and cartoon humor in particular, and likes to talk and write about it too.
The Ted Talk On “60 Minutes” His Google talk Documentary, "Very Semi-Serious"
What’s the Language Log connection? Humor often involves language? New Yorker cartoons are usually captioned these days, with fewer in the lovely mute style of a William Steig. A general theory of language use should be able to explain how cartoon captions, a genre of text, are understood. The cartoons illustrate (sic) the dependence of language comprehension on context (the one created by the drawing) and background knowledge (about, for example, rats running mazes, guys marooned on islands, St. Peter’s gate, corporate culture, New Yorkers). The popular Caption Contest is an image-labeling task, generating humorous labels for an incongruous scene.
But it’s Mankoff's excursions into research that are particularly interesting and Language Loggy. Mankoff is the leading figure in Cartoon Science (CartSci), the application of modern research methods to questions about the generation, selection, and evaluation of New Yorker cartoons.
OK, I just invented Cartoon Science, but Mankoff’s involvement in humor research isn’t a joke. He almost completed a Ph.D. in experimental psychology back in the behaviorist era, which is pretty hard core. Before he left the field he co-authored a chapter called “Contingency in behavior theory”, as in contingencies of reinforcement in animal learning. The chapter included this cartoon:
It probably wasn’t a coincidence that the first of the weekly Caption Contests was a cartoon about a rat lab.*
Mankoff has worked with several research groups over the years, using various methods to investigate what makes something, especially a New Yorker cartoon, funny. He’s a co-author on several research articles. In the early 2000s, he worked with psychologists who collected eye-movements and evoked potentials while subjects read cartoons. A 2011 study (pdf) examined the stereotype that men are funnier than women. Men and women wrote captions for cartoons; the men’s captions were rated as funnier by independent raters—both men and women, though more so by men; in a memory experiment, both male and female subjects tended to misremember funny captions as having been written by men and unfunny ones by women. Interesting, though the number of participants was small and representative only of the population of UCSD students who participate in psychology experiments for credit.
The weekly caption contest has yielded a massive amount of data that is being analyzed using NLP and machine learning techniques. (The contest: Entrants submit captions for a cartoon; from the 5000 or so entries the editors pick three finalists; readers pick the winner by voting on-line.) Just think of the studies that can be done with this goldmine of a data set! Identify the linguistic properties that distinguish the winning captions from the two losers. Build a classifier that can estimate relative funniness from properties such as word choices, grammatical complexity, affective valence (“sentiment”), readability, structure of the joke, etc. Use the classifier to predict the winners on other weeks. Or the rated humorosity of other cartoons.
Heavy hitters from places like Microsoft, Google, Michigan, Columbia, Yale, and Yahoo have taken swings at this with Mankoff’s help. The results (from the few published studies I found) have been uninspiring. Classifiers used to pick the winning caption yielded the sorta-worked, better-than-chance-but-not-much results that one sees a lot in the classifier line of work. Several properties of winning captions have been identified but they aren’t specific enough to generate good ones. In one study, for example, negative sentiment captions did better than positive ones, a little.
Perhaps this is a job for deep learning. The methods I just described required pre-specifying a set of plausibly relevant cartoon characteristics. A deep learning model could simply discover what mattered based on exposure to examples. The magazine has published over 80,000 cartoons. Say that our multi-layer network gets drawing-caption pairs as input and learns to classify them as funny or unfunny. The training corpus includes the correct pairings (as published) but also drawings and captions that have been randomly re-paired (foils). During training the model gets feedback about whether the pair is real or fake. The weights on connections between units are adjusted using a suitable procedure (probably backpropagation). We train the model on 76,000 real cartoons and then test whether it can correctly classify the other 4000. We do this cross-validation many times, withholding different subsets of 4000. When a model generalizes accurately across validation sets, we declare victory.
Mankoff has spent time at Google world headquarters and at Google DeepMind. I certainly would not want to underestimate what these folks can achieve. (I have no inside knowledge of how far they got with cartoons.) DeepMind’s AlphaGo was retired having consistently and decisively beaten the greatest human Go player. Is labeling New Yorker cartoons harder than playing Go?
Well, yes.
Go has a conventionalized set-up and explicit rules. A captioning model has to figure out what game is being played. Captioning is a type of scene labeling but that requires recognizing what’s in the scene which in this case is, literally, ridiculous: exaggerated, crude, eccentric, stylized renderings of the world. Quite different from the naturalistic scenes that have been the focus of so much attention in AI.
OK, we give the model a break: pair descriptions of the cartoons with captions. Then what are the prospects for success? The humor turns on a vast amount of background knowledge. That feeling when you just don’t get it happens when we either don’t know the relevant stuff or can’t figure out what’s relevant to that cartoon. A deep learning network might well acquire the requisite knowledge of the world but not from 80,000 drawings: insufficient data. Same for analyzing the captions: it’s necessary to know the language. People have acquired most of the relevant knowledge by other means. A network that learned from pairs of drawings (or drawing descriptions) and captions would be acquiring several types of knowledge simultaneously: about pictures, about language, about funny. Set up this way, the model’s task is vastly more difficult than people’s.
That particular defect might be fixable, but training a network to discriminate funny from unfunny cartoons strikes me as a poorly posed problem in any case. Is “funny” even the relevant criterion for a good New Yorker cartoon (or winning caption)? Many are entertaining but hardly funny: they’re rueful, sardonic, or facetious; clever, enlightening; brutal, whimsical, or defiant commentary. “We will now observe a moment of silencing critics of gun violence” (10/4/2017) is a great cartoon but not funny at all. What about Mankoff’s most famous cartoon: “No, Thursday’s out. How about never—is never good for you?” How much do “trenchant” and “funny” overlap?
Mankoff’s conclusion from his explorations in Cartoon Science? “There is no algorithm for humor.” In Madison, as in other talks on YouTube, he said that only humans are capable of humor, in virtue of our special human hardware. Ergo there can’t be an algorithm running on a non-human machine that instantiates the experience. No program will pass the Plotzing Test, whereby an observer is unable to determine whether jokes were generated by computer or human. Mankoff says that humor is like consciousness. He’s a materialist about both. Research on the behavioral and neurophysiological correlates of humor doesn’t explain what humor is, and similarly for consciousness. I'm not endorsing these arguments, but that's the gist of the story.
Mankoff isn’t a philosopher—neither am I—and he certainly doesn’t owe anyone a rigorous analysis. He has concluded that humor is beyond human understanding. Thus there cannot be an effective procedure for being funny. That makes at least three things the brain isn’t powerful enough to understand: how the brain works, consciousness, and humor. [Discuss.]
A materialist might hold that the experiences we associated with consciousness are epiphenomenal and don't require further explanation, but that is where the analogy to humor seems to fail. Surely there is plenty to explain about the humor experience that is not “epiphenomenal”. It may be easier to explain away qualia than why I find a joke funny and you do not. Either equating humor with consciousness is a category error or the materialist account of consciousness has some ‘splaining to do. (Memo to self: read this.)
Mankoff is clearly wrong about algorithms for humor, which have existed since the dawn of AI. Why, just a few months ago, a neural network generated hysterical names for paints, like “pubic gray”. Was that not AI humor, indistinguishable from human behavior? I’ll take up that question, which is really about when it can be said that an AI program has succeeded at simulating human behavior, in my next post.
* Like other behaviorists who studied animal learning, Mankoff actually worked with pigeons, not rats.