Supporters of Marcus Endicott’s Patreon can access weekly or monthly consultations on this topic.
This dissertation presents a generalised model for virtual humans, locates dialogue systems within that structure, and provides a roadmap for mass market distribution of virtual humans. It explains how conversational artificial intelligence can be integrated within virtual reality. The methodology included both systematic study of the scholarly literature and the popular press, from which I identify three major historical virtual human systems, and virtual human development in three contemporary virtual human systems. From these six virtual human systems, a generalised model of virtual humans was abstracted, the dialogue system was located in that structure, and a plausible pathway to broad consumer adoption of virtual humans is presented. This dissertation aggregates past work on virtual humans, contributes my reflections on developing virtual humans, and proposes a vision for the future relying on virtual humans throughout society.
Endicott, M. L. (2021). Virtual human systems: A generalised model (Master’s dissertation, SAE Creative Media Institute). ResearchGate.
Revised Version: 2026
Chapter 1 introduces Endicott's research question—how conversational AI gets "into" and "out of" virtual reality—and frames the dissertation's aim of locating dialogue systems within a generalised model of virtual humans, while recounting the author's path from chatbot work in Bangalore to the emerging virtual human field (e.g., Soul Machines, Westworld) and establishing key definitions, notably the choice of the academic term "virtual human" over the commercial "digital human."
Chapter 2 lays out Endicott's mixed-methodology approach to the research question, combining quantitative methods—a decade of automated Google Scholar and News alerts processed through word frequency, n-gram, and cluster analysis using his proprietary faceted classification—with qualitative reflective practice across three hands-on VR projects (a photogrammetry-scanned talking Buddha head in Unity, virtual human models in Linden Lab Sansar, and voice-interactive "hosts" in Amazon Sumerian), an investigation whose conclusions ultimately extended beyond the original question toward how virtual humans might reach the mass market.
Chapter 3 surveys the literature underpinning virtual humans, tracing the psychology of anthropomorphism alongside the two believability benchmarks—the Turing test (and its Loebner Prize history) and Mori's Uncanny Valley—then walking through the lineage of dialogue systems (Weizenbaum's ELIZA, Colby's PARRY, Wallace's AIML/ALICE, Mauldin's Verbots), historical virtual human systems and toolkits, and applications like telepresence and human-in-the-loop social simulation (The Sims and Second Life), before closing on the ethical stakes of the field—privacy, deception, virtual influencers, deepfakes, and job displacement.
Chapter 4 presents three first-hand virtual human development case studies—a photogrammetry-scanned talking Buddha head in a VR museum scene built in the desktop Unity engine, a "Digital Human Gathering" social-VR world with imported humanoid models and animated dancers in Linden Lab Sansar, and voice-interactive "hosts" wired to an Amazon Lex chatbot via a state machine in cloud-based Amazon Sumerian—from which Endicott concludes that the state machine mediates the link between dialogue system and body language, that all three editors are functionally similar, and that the key differences lie in their deployment architectures (desktop, hybrid edge-cloud, and full cloud-cloud), explaining why fully real-time, mass-market virtual human experiences remain difficult to deliver.
Chapter 5: A Generalised Model
Chapter 5 synthesizes the literature and case studies into Endicott's generalised model of virtual humans, first laying out the technical issues—behaviour realisation via the SAIBA framework and Behavior Markup Language, established BML realisers (EMBR, SmartBody, Greta, Elckerlyc), and the shift from rule-based realisation toward machine-learning-based "behaviour generation," including neural facial behaviour and neural state machines—before proposing a model that coordinates four elements (facial behaviour, body language, cognitive architecture, and the avatar layer), arguing that jumping the Uncanny Valley requires fusing spoken and body language at the cognitive-architecture level, and that the future "virtual robot" emerges by merging the convincing backstory/knowledgebase of virtual influencers with the real-time authentic behaviour of virtual streamers.
Chapter 6 takes up the second half of the research question—how conversational AI gets out of virtual reality, meaning the delivery of virtual humans to users—by comparing the deployment architectures of multiplayer games, social VR, and WebXR, and argues that as virtual humans become more photorealistic, conversational, and real-time, the rising demands on processing (split across CPU, GPU, and data, at the edge or in the cloud) and bandwidth will be met by a new generation of cloud game engines (Sansar, Sumerian, Stadia, Lumberyard, etc.) combined with edge processing (neural chips) and high-bandwidth "last mile" connectivity via 5G, ultimately enabling photorealistic, fully interactive virtual humans within a ubiquitous mixed reality.
Chapter 7 concludes the dissertation by restating its core findings—that in virtual humans the body-language behaviour realiser sits beside the natural-language dialogue system (with rule-based methods giving way to machine-learning approaches), that conversational AI enters virtual reality through dialogue-system-driven virtual humans used for telepresence and simulation, and that these virtual humans cannot exist without an environment (a game engine)—and projects that as game engines migrate to the cloud and 5G delivers high bandwidth over the "last mile," virtual humans will converge with cloud gaming to reach mass-market consumers across a soon-to-be-ubiquitous mixed reality, with all the building blocks already on the horizon.