Supporters of Marcus Endicott’s Patreon can access weekly or monthly consultations on this topic.
In the early months of 1943, two men met for tea most days in the cafeteria of Bell Labs' building on West Street in Manhattan. One was a young American mathematician working on fire-control systems and the secret mathematics of cryptography. The other was an English visitor on a cryptographic mission, sent to compare notes with the US Navy and to inspect a secure speech scrambler that would soon carry calls between Roosevelt and Churchill. They were forbidden to discuss the actual work that had brought them together. So instead, over weak wartime tea, Claude Shannon and Alan Turing talked about something more dangerous: whether a machine could be made to think.
Neither man could have known that he was holding one half of a question that would take the better part of a century to answer. Shannon recalled the conversations decades later, in a 1982 oral history, and was emphatic about what they had not discussed. "We talked not at all about cryptography," he said. "I don't think we exchanged word one about cryptography. We talked much more about things like the human brain and computing machines and that sort of thing." When Shannon floated his nascent ideas about information — the notion that messages of any kind could be measured, quantified, reduced to a common currency — Turing was unconvinced. Shannon remembered the pushback plainly: the Englishman was interested, but he didn't believe the ideas were pointed in the right direction. As one telling captures the mood, Shannon wanted to feed not just numbers but cultural things, even music, to an electronic brain, and Turing waved it off. He wasn't after a powerful brain, he said, just a mundane one, something on the order of the president of the telephone company. It was a joke, but it was also a thesis: intelligence was not magic, just machinery operating at scale.
Five years later, Shannon made good on the half of the conversation that had been his. In two installments published in the Bell System Technical Journal in 1948, under the unassuming title "A Mathematical Theory of Communication," he laid down the foundations of the information age. He gave the world the word "bit," the concept of channel capacity, and a precise mathematical definition of information itself in terms of entropy. Scientific American would later call it the Magna Carta of the information age, and it is not hyperbole; nearly every digital signal that moves through the modern world moves according to rules Shannon wrote down that year. But buried inside the paper, almost as an aside, was something stranger and, in retrospect, more prophetic.
To show that a source of language could be treated as a statistical process — a Markov chain, in which each symbol depends on the ones before it — Shannon built a ladder of approximations to English and climbed it rung by rung. At the bottom, choosing letters at random with equal probability, he got pure noise: strings like XFOML RXKHRJFFJUJ ZLPWCFWKCYJ. Weighting the letters by how often they actually appear in English helped a little. Accounting for which letters tend to follow which others helped more. By the time he was modeling the probability of each word given the word before it, the output had climbed eerily close to sense: THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD. No word-pair frequency tables existed, so Shannon generated that last passage by hand — opening a book, picking a word, skipping forward to find its next appearance, recording what came after it, and repeating, a patient manual walk through real text. He noted, with evident satisfaction, that the run of ten words beginning "attack on an English writer" was "not at all unreasonable," and that pushing the method any further would become impossibly laborious.
It is worth being careful here, because the temptation to overclaim is strong. Shannon never used the phrase "language model," and the framing did not exist in 1948; his actual purpose was engineering, quantifying the redundancy of English so that messages could be compressed and sent more efficiently. And yet what he had built was, mechanically, exactly what a modern language model does. It estimated the probability of the next symbol given the ones before it, and then it sampled. The line from that hand-run experiment to the system that now drafts emails and writes code is not a metaphor; it is the same idea, scaled by seventy years of hardware. The intellectual debt was real enough that the field's standard textbooks and surveys still trace language modeling directly back to this work. Shannon had written the first language model in all but name and then walked past it on his way to something he considered more important.
While Shannon was reducing language to a signal, Turing was sharpening the question. In 1950, in the philosophy journal Mind, he published "Computing Machinery and Intelligence" and opened with deceptive simplicity: "I propose to consider the question, 'Can machines think?'" — only to declare the question itself meaningless and replace it with a game. The version most people remember, a human and a machine trading typed messages while a judge guesses which is which, is a later simplification. Turing's original was odder and more playful: an imitation game with a man, a woman, and an interrogator, in which the man tries to fool the judge into mistaking him for the woman, and the real question becomes what happens when a machine takes the man's place. His famous prediction is also routinely mangled. He did not say machines would "pass the Turing Test" by the year 2000. He estimated that a computer with about a gigabit of storage could play the game well enough that an average interrogator would have no more than a seventy percent chance of identifying it correctly after five minutes — which is to say the machine would fool the judge only thirty percent of the time, in a brief conversation. It was a modest, careful forecast, and his storage estimate turned out to be roughly right.
The deeper achievement of the paper was the second half, in which Turing patiently anticipated and dismantled nine objections to thinking machines, from the theological to the mathematical to the claim, attributed to Lady Lovelace, that a machine can only do what it is told and can originate nothing. His answer was that machines could still take us by surprise, and that the way to build a thinking machine might be not to program an adult mind but to raise a child one — a learning machine that improved through experience. He was describing, decades early, the thing we now call machine learning.
Neither man lived to see the question answered. In 1952 Turing was prosecuted for "gross indecency," accepted chemical castration to avoid prison, and lost his security clearance. On June 7, 1954, he was found dead at home, of cyanide poisoning, a half-eaten apple beside his bed. The inquest ruled suicide, and the romantic story that he had staged the scene after the poisoned apple in Snow White has hardened into legend. But the apple was never tested, he left no note, he had been making plans and to-do lists, and he routinely handled cyanide in a home electroplating experiment; serious scholars have argued the verdict would not survive a modern inquest. The truth is that we do not know. He was forty-one. Two years later, a proposal co-signed by Shannon for a summer workshop at Dartmouth put the phrase "artificial intelligence" into print for the first time and founded the field as a named discipline — Turing's wartime tea companion helping to christen the science of machine thought two years after the man who posed its defining question was dead.
Shannon outlived him by nearly half a century, riding unicycles through the Bell Labs corridors and building flame-throwing trumpets and maze-solving mechanical mice, a man his colleagues said had earned the right to be unproductive. But the ending was its own quiet cruelty. He spent his final years with Alzheimer's disease and died in 2001, never fully grasping how foundational his work had become, just as the machines built on it were beginning to stir. The signal and the question had been set down in 1948 and 1950 by two men who knew each other, doubted each other, and died too soon to watch their ideas converge. Everything that came afterward — every model that predicts the next word and every argument about whether doing so amounts to thinking — is woven from those two threads.