Supporters of Marcus Endicott’s Patreon can access weekly or monthly consultations on this topic.

Chapter 4: Case Studies

When the dissertation was written, the future of conversational AI seemed to belong to the human face. The reasoning was intuitive and, at the time, almost unarguable. People crave faces. We read them instinctively, trust them in proportion to their warmth, and forgive them their flaws in ways we never forgive a blinking cursor. So it followed that the faceless interfaces of the late 2010s — the text chatbot waiting in a corner of a website, the smart speaker glowing on a kitchen counter — would always feel like placeholders, stopgaps biding their time until a photorealistic, emotionally responsive humanoid arrived to do the job properly. The compelling future, on this view, was embodied, and the natural place to embody it was the immersive, three-dimensional space of virtual reality. That was the bet. Within five years, reality had inverted nearly every term of it.

The inversion began on the last day of November 2022, when a text box appeared on the open web and asked the world what it wanted to talk about. ChatGPT had no face, no body, no voice at first, and no presence beyond a scrolling column of words. It became the fastest-adopted consumer technology in recorded history, reaching a hundred million monthly users two months after launch, a ramp that veteran analysts admitted they had no precedent for in two decades of watching the internet. By 2026 it counted its weekly users in the hundreds of millions and its monthly app users in the billions. The single largest moment of consumer-AI adoption in the era arrived through plain text and, later, plain voice — through exactly the faceless channel the dissertation had ranked lowest. The thing people found compelling was not a rendered human. It was a fluent one.

The smart speaker deserves a closer look, because the dissertation was half right about it in a way that turned out to be instructive. The prediction that the first generation of voice assistants would underwhelm was vindicated; in their original form they genuinely stalled, and the company that had bet most heavily on them quietly absorbed billions in losses across its devices business while its assistant began to look antique beside a chatbot that could actually reason. But the cure, when it came, had nothing to do with faces. Nobody fixed the smart speaker by giving it a photorealistic head. They fixed it by transplanting a large language model into the same faceless cylinder, and the relaunched, generative-AI-powered assistant that rolled out to households in early 2026 was the old device with a new brain. The lesson was not that embodiment had been undervalued. It was that fluency had been the missing ingredient all along.

The geography of the bet fared no better than its physiognomy. The dissertation assumed that virtual reality would be the decisive container for conversational AI, the venue where embodied agents would finally come into their own. Instead the immersive hardware faltered just as the software flourished. The most lavishly engineered headset of the period launched at thirty-five hundred dollars, shipped modestly, and was cut back within a year, while the company that had staked its very name on the metaverse watched its immersive division accumulate losses on the order of seventy billion dollars before publicly pivoting toward lightweight AI glasses and ambient wearables. Consumer VR as a whole contracted rather than expanded. The conversational revolution, meanwhile, was happening everywhere else: on phones, on laptops, in browser tabs, in the small voice coming through a pair of glasses. The action moved decisively to flat screens and the ambient air, and the headset, far from being the frontier, became one of the weaker delivery channels of the decade.

The fate of the dissertation's own exemplars sharpened the point into something almost elegiac. The Australian government's signature virtual human, the disability-services avatar voiced by a celebrated actress and built on a pioneering New Zealand company's technology, never reached full production; entangled in privacy concerns and political fallout, it was quietly shelved and became, in the years that followed, the canonical example of a high-profile virtual human that failed to launch. The company behind its face fared worse still. Having raised well over a hundred million dollars and shed customers and staff in its final years, it collapsed into receivership in early 2026, owing millions, a pioneer wound down rather than scaled up. Its old Auckland counterpart survived only by abandoning the original premise: it repositioned itself as a vendor of enterprise digital humans whose real value lay not in the rendered face at all but in the language model orchestrated behind it. Of the two companies the dissertation had named as the future, one was dead and the other had lived by becoming a thin layer on top of the very text-based intelligence the chapter had bet against.

And yet the deeper intuition was not wrong, only early and misaddressed. Embodiment did come back. After 2023, faces and personas returned in force — talking-head avatar studios, real-time animation pipelines, and a booming class of AI companion apps that drew tens of millions of users and hundreds of millions of downloads by giving a synthetic voice a name and a character to inhabit. The instinct that people want presence, that they will lean toward an agent with a face and a personality, was genuinely vindicated. It simply arrived as a presentation layer draped over a language model rather than as the foundation of the system, and it arrived almost entirely outside virtual reality, on the same phones and screens where the text revolution had already won. Even the once-proprietary craft of stitching a conversational mind into a real-time three-dimensional character — a poorly documented black art when the dissertation was written, practiced by a precious few — became a productized, well-signposted capability, complete with off-the-shelf engines, sample scenes, and step-by-step tutorials for the very headsets that had otherwise stalled. The blank spot on the map was surveyed and paved.

What the dissertation got right, in the end, was that the face would matter. What it could not foresee was that the face would come last, and that it would matter most as a costume worn by something that had learned, faceless and bodiless, to simply talk.

Page updated

Google Sites

Report abuse