Supporters of Marcus Endicott’s Patreon can access weekly or monthly consultations on this topic.

The Grammar Wars

The most celebrated artificial-intelligence program of its era was named, with unintentional perfection, after a fact about statistics. SHRDLU—the conversational system Terry Winograd built at the MIT Artificial Intelligence Laboratory between 1968 and 1970—took its name from ETAOIN SHRDLU, the first two columns of keys on a Linotype typesetting machine, arranged in the rough order of how often each letter appears in written English. Tired typesetters used to run a finger down those keys to mark a botched line, and the resulting gibberish sometimes slipped into print. That a program built to demonstrate that language was a matter of logic and rules, not frequencies and probabilities, should be christened with the purest possible artifact of letter-frequency counting is one of those jokes history tells on itself. It would take half a century for the punchline to land, but when it did, it would reshape the world.

To understand why SHRDLU mattered—and why its name was an omen—you have to start with the man who supplied its intellectual oxygen, though he never wrote a line of its code and would have disowned the whole enterprise of building it. Noam Chomsky was born in Philadelphia in 1928 to a family of Hebrew scholars, entered the University of Pennsylvania at sixteen, and very nearly drifted out of academic life before the linguist Zellig Harris drew him in. By 1955 he had a Pennsylvania doctorate and a post at MIT, where he was assigned, with some irony, to a machine-translation project he openly regarded as intellectually worthless. Two years later he published a slim monograph, barely more than a hundred pages, that detonated underneath the study of language and left the rubble arranged in an entirely new shape.

Syntactic Structures argued that a language is not a list of sentences a person has heard and imitated but an infinite set generated by a finite system of rules living inside the mind. To make the point that grammar is something separate from meaning, and separate too from anything a frequency table could capture, Chomsky offered a sentence that has since become the most famous nonsense in the history of linguistics: "Colorless green ideas sleep furiously." He set it beside its scrambled mirror image, "Furiously sleep ideas green colorless," and observed that neither string had ever appeared in English before, so any statistical model would have to rate both equally alien. Yet one is unmistakably an English sentence and the other is word salad. A grammar, he concluded, must be doing something a probability count cannot. The deeper claim arrived in 1968, in an essay buried in a philosophy journal, where Chomsky delivered the line that would become the banner of one whole side of a fifty-year war: the notion of the probability of a sentence, he wrote, was "an entirely useless one." It is worth being precise about what he meant, because the caricature is more famous than the argument. Chomsky was not claiming that no statistical method could ever do anything; he was attacking a particular, narrow theory of how children learn language by tallying what they hear. But the line traveled far beyond its context, and it hardened into a creed.

He had reason to feel he was winning. In 1959 he had reviewed Verbal Behavior, the behaviorist B. F. Skinner's attempt to explain language as conditioned response, and the review was so devastating that it is routinely credited with helping launch the cognitive revolution in psychology—one historian of the field called it among the founding documents of cognitive science and, even decades later, the most important refutation of behaviorism ever written. For a generation, the rationalist picture of the mind as a rule-governed symbol processor was simply what serious people believed. The idea that you might instead build intelligence by feeding a machine mountains of text and letting it find the patterns struck most researchers as not merely wrong but unscientific.

SHRDLU was that worldview made flesh. Winograd, a young Coloradan who had come to MIT in 1968 to study under Seymour Papert in Marvin Minsky's lab, built a program that conversed in plain English about a tabletop "blocks world" of colored cubes, pyramids, and boxes manipulated by a simulated robot arm. You could tell it to find a block taller than the one it was holding and put it in the box, and it would do so; you could ask why it had moved something, and it would explain; you could use a pronoun and it would work out what you meant. It remembered what it had done. For early-1970s hardware running on kilobytes of memory, the fluency was astonishing, and SHRDLU became the canonical triumph of what would later be nicknamed Good Old-Fashioned AI—proof, it seemed, that genuine machine understanding was close at hand.

It was not. SHRDLU worked precisely because its universe was tiny, closed, and hand-built; its entire world could be specified with a couple hundred words. Try to enlarge it—add real objects, real properties, the open-ended messiness of an actual room—and the hand-coded rules and knowledge required to keep it coherent exploded past anything a person could manage. The philosopher Hubert Dreyfus seized on SHRDLU as exhibit A in his argument that intelligence is embodied and situated and cannot be fully reduced to formal rules, that a system succeeding inside a toy "micro-world" tells you nothing about the real one. What is remarkable is that Winograd came to agree. The limitations, he later reflected, were a great deal closer than they had appeared at the time, and the charge that his program was too constrained to count as real intelligence struck him as perfectly fair. Most piercing of all was his realization that the conversation had only seemed to work because a human being was supplying the intelligence at the other end. The understanding, he came to think, lived in the listener.

So Winograd did something his peers found almost unthinkable: he walked away. He moved to Stanford in 1973, encountered Dreyfus's critique and then the Chilean philosopher Fernando Flores, and steered his thinking toward phenomenology and a sustained attack on the very rationalist tradition he had embodied. In 1986 he and Flores published Understanding Computers and Cognition, a book steeped in Heidegger that argued, against the prevailing faith of his own field, that one could not build machines that genuinely modeled intelligent behavior. He poured the rest of his career into human-computer interaction and design, helping found Stanford's celebrated "d.school," insisting that the real question was never how you wrote the algorithm but how you understood the people. The journalist John Markoff would later call him the first high-profile deserter from artificial intelligence—a man who had built one of the field's defining programs and then crossed over.

Here the story turns on its hinge. Among the graduate students Winograd advised at Stanford was a young man named Larry Page, who in 1995 was casting about for a dissertation topic—telepresence, self-driving cars, and finally the mathematical structure of the links connecting pages on the young World Wide Web. Page wanted to download the entire web and keep just the links; Winograd, by Page's own recollection, told him to give it a try. Page would later call it the best advice he ever received. The project, nicknamed BackRub, became the PageRank algorithm, and Winograd was the fourth author on the 1998 paper that described it. The algorithm became Google.

And so the symbolic pioneer who had watched his own paradigm hit a wall personally midwifed the commercial triumph of the opposite approach—the brute-force, data-driven, statistical method of finding what you want by counting patterns across an unanalyzed ocean of text. Winograd seemed to grasp the irony himself, admitting that what had surprised him was exactly this: that superficial techniques over enormous bodies of data could deliver what he, raised to demand a complete conceptual model, had assumed they never could.

Chomsky never conceded. At an MIT symposium in 2011 he dismissed the statistical turn as something novel and faintly disreputable in the history of science, a discipline that mistook the mere approximation of unanalyzed data for success. Google's research director, Peter Norvig, answered him in a widely read essay, pointing out that Chomsky was condemning an entire family of methods on the strength of a fifty-year-old example, and that engineering that actually works is at least evidence of a model that captures something real. As if to prove the war would outlive every battlefield, Chomsky returned in 2023, when the rest of the world was reeling from ChatGPT, to publish a withering verdict in The New York Times: such systems, he and his co-authors wrote, were nothing but a lumbering statistical engine for pattern matching, stuck in some prehuman phase of cognition, forever unable to grasp what could not be the case.

He was holding the same line he had drawn in 1957, and he held it with characteristic brilliance and conviction. But the machines he was describing had, by then, been trained on a fact he had spent his life refusing: that probability, the thing he once called entirely useless, could carry an extraordinary amount of the weight of language. A linguist named Fernando Pereira had quietly demonstrated as much back in 2000, training a simple statistical model that judged "Colorless green ideas sleep furiously" to be roughly two hundred thousand times more probable than its scrambled twin—doing precisely what Chomsky had insisted no such model could ever do, with the very sentence he had built to prove it impossible. The grammar wars had begun with a piece of nonsense about sleeping ideas. They would end, more or less, with a machine that had read the whole internet and learned to finish the sentence.

=> The Heretic's Execution

Page updated

Google Sites

Report abuse