Supporters of Marcus Endicott’s Patreon can access weekly or monthly consultations on this topic.

Long Road to ChatGPT

Prologue

The Demo That Ate the World - On November 30, 2022, OpenAI's quietly launched ChatGPT exploded into the fastest-growing consumer app on record, but its conversational fluency was less a sudden breakthrough than the surfacing of a century-old argument—running from Shannon and Turing onward—over whether machines could master language through statistical prediction rather than rules.

Act I — The Dream (1948–1990)

The Signal and the Question - In 1948 and 1950, two acquaintances who'd debated machine thinking over wartime tea—Claude Shannon and Alan Turing—independently laid the foundations for modern AI, with Shannon's information theory containing a hand-built statistical model of English that prefigured today's language models and Turing's "imitation game" reframing whether machines can think, before both men died too soon to see their converging ideas bear fruit.
The Translation Memo - In 1949, mathematician and Rockefeller Foundation administrator Warren Weaver wrote a brief memo proposing that machines could translate languages by treating translation as a statistical, code-breaking problem—an idea his field initially abandoned in favor of rule-based methods but that IBM researchers vindicated four decades later, making him an unheralded forefather of how modern language models work.
The Mirror - Joseph Weizenbaum's ELIZA, a simple 1960s pattern-matching program that mimicked a psychotherapist without any real understanding, so unnervingly convinced people they were being understood that its creator spent the rest of his life arguing there are tasks computers ought never be given, regardless of whether they technically can perform them.
The Grammar Wars - Terry Winograd's SHRDLU embodied Chomsky's rule-based, anti-statistical theory of language, yet Winograd himself later abandoned symbolic AI and went on to advise Larry Page on the data-driven PageRank algorithm—a reversal that prefigured how statistical methods would ultimately triumph over the rationalist paradigm Chomsky defended to the end.
The Heretic's Execution - Marvin Minsky, who built one of the first neural network machines before co-authoring the 1969 book Perceptrons, is wrongly remembered as having single-handedly killed neural-network research and broken his rival Frank Rosenblatt, when in fact the field's decline had many causes that predated the book, Rosenblatt had already moved on to neurobiology before drowning in 1971, and the "sterile" multi-layered approach the book doubted would later prove foundational to modern AI.
The Reports - Two skeptical government reports — the American ALPAC report of 1966 on machine translation and Britain's 1973 Lighthill report on artificial intelligence broadly — applied hard-nosed cost-benefit reasoning to overhyped fields and contracted their funding without truly killing them, a chastening that only later, and somewhat inaccurately, got folded into the legend of the "first AI winter."

Act II — The Heretics (1990–2017)

The Statisticians - Frederick Jelinek, a Holocaust survivor and information theorist who joined IBM almost by accident in 1972, pioneered a statistical, data-driven approach to speech recognition and machine translation that defied the rule-based linguistic orthodoxy of his era and ultimately laid the conceptual groundwork for modern language models.
The Winter Gardeners - Three researchers—Bengio, Hinton, and LeCun—earned their reputation as the "godfathers of deep learning" not by inventing backpropagation (which had been discovered repeatedly by forgotten predecessors) but by stubbornly believing in neural networks through decades of unfashionability, until breakthroughs like AlexNet vindicated them with a 2018 Turing Award—after which several came to fear the very technology they had nurtured.
The Paper Nobody Read - Yoshua Bengio's 2003 neural probabilistic language model introduced the idea of representing words as learnable vectors in continuous space—later called word embeddings—solving the generalization problem that doomed n-gram models, but the idea sat waiting roughly a decade for the hardware and data that would finally make it the foundation of virtually every modern language system, from word2vec to the transformers behind ChatGPT.
The Memory Problem - In 1991, Sepp Hochreiter diagnosed why neural networks couldn't retain information across long sequences (the vanishing gradient problem), and in 1997 he and his advisor Jürgen Schmidhuber published the LSTM architecture that solved it—a breakthrough ignored for nearly a decade before powering voice recognition, translation, and other systems until the Transformer's arrival in 2017, all against the backdrop of Schmidhuber's lifelong and frequently vindicated campaign to correct the historical record on AI credit.
The Ignition - In October 2012, graduate student Alex Krizhevsky's neural network (later called AlexNet) won the ImageNet competition by a stunning eleven-point margin—built on Fei-Fei Li's massive labeled-image database and a pair of off-the-shelf GPUs—a result that converted the entire field to deep learning and, through co-creator Ilya Sutskever's later path to OpenAI, supplied the proof, playbook, and people that led to ChatGPT.
The Meaning of Vectors - Tomáš Mikolov's famous Word2Vec "king − man + woman ≈ queen" result is widely misremembered—the analogy actually came from his earlier recurrent neural network language model rather than Word2Vec, the clean result depends partly on a scoring rule that excludes the obvious answer, and Mikolov himself considers his recurrent network, not the more famous embedding toolkit, to be his real contribution.
The Eight - In June 2017, eight Google researchers who insisted on equal, unranked credit published "Attention Is All You Need," introducing the Transformer architecture that discarded recurrence in favor of pure attention, became the foundation of modern AI like GPT and BERT, and scattered its authors across the industry their paper created.

Act III — The Flood (2017–present)

The Fork - In June 2018 OpenAI quietly released GPT, a decoder-based model trained to predict the next word, while four months later Google released the more celebrated BERT, an encoder-based model that read in both directions and dominated benchmarks—a fork in which the less-acclaimed path would, contingent on a scale not yet built, eventually lead to ChatGPT.
The Scaling Bet - In January 2020, physicist-turned-AI-researcher Jared Kaplan and colleagues at OpenAI discovered that language model performance improves as a smooth, predictable power law in model size, data, and compute—a finding that guided the design of the massive GPT-3 and reshaped the field, even though its recommendation to favor size over data was later corrected by DeepMind's Chinchilla, and the deeper reason scaling works so cleanly remains unexplained.
The Alignment Problem - This essay traces how reinforcement learning from human feedback — pioneered by Paul Christiano (who developed learning rewards from human preference comparisons) and powered by John Schulman's Proximal Policy Optimization algorithm — converged in OpenAI's InstructGPT to prove that alignment rather than raw scale made language models useful, directly enabling ChatGPT's record-breaking success.
The Founders' War - In December 2015, OpenAI launched as a nonprofit pledging to develop AI for humanity's benefit rather than profit, but over the following decade its billion-dollar founding promise proved largely unfunded, its founding alliance fractured amid Musk's failed bid for control and 2018 departure, its 2019 shift to "capped-profit" status and Microsoft partnership bound it to corporate interests, Altman survived a chaotic five-day firing in November 2023, and Musk's lawsuit accusing the company of betraying its mission was ultimately dismissed in May 2026 on statute-of-limitations grounds—leaving the central question of whether OpenAI had become what it set out to prevent legally unanswered.
The Schism - Anthropic was founded in early 2021 by Dario Amodei and a core group of researchers who left OpenAI not out of a simple anti-commercial revolt but to make a focused bet that AI scaling and safety must be pursued together, a wager whose proof arrived when they published their "Constitutional AI" paper just weeks after ChatGPT's late-2022 release.
The Doubters - "The Doubters" recounts how the 2021 "Stochastic Parrots" paper—which argued that large language models merely stitch together linguistic forms without understanding meaning, while raising practical concerns about carbon cost, uninspectable training data, and fluent disinformation—became inseparable from the messy, often-conflated firings of Timnit Gebru and Margaret Mitchell from Google, and how its "present-harm" critique stands distinct from the "future-risk" worries that drove figures like Dario Amodei to found Anthropic that same season.
The Reckoning - Geoffrey Hinton, awarded the 2024 Nobel Prize in Physics for the neural network research he'd long championed, accepted the honor with deep ambivalence—having left Google in 2023 to warn freely about AI's existential dangers (which he now puts at a 10–20% chance of causing human extinction within thirty years)—even as his fellow Turing laureates Bengio and especially LeCun diverged sharply on how seriously to take those fears.

Epilogue

The Question That Wouldn't Die - This epilogue argues that although an OpenAI model finally passed Turing's imitation game in 2025, the achievement settled nothing about whether machines can truly think—a tension embodied by Judea Pearl's enduring objection that neural networks remain stuck on the bottom rung of his "Ladder of Causation," performing sophisticated curve-fitting rather than genuine causal understanding.

Page updated

Google Sites

Report abuse