Supporters of Marcus Endicott’s Patreon can access weekly or monthly video consultations on this topic.
China's AI digital human industry has undergone a transformation so rapid that market projections published just three years ago now look quaint. In June 2022, IDC released its China AI Digital Human Market Status and Opportunity Analysis, forecasting that the sector would reach 10.24 billion yuan by 2026. That estimate, authored by researchers including IDC China AI Research Manager Cheng Yin (程荫), was widely cited in Chinese financial media and became a standard reference point for the industry. By the time IDC published its next major assessment in June 2025, however, actual 2024 market revenue had already reached 41.2 billion yuan — four times the original 2026 target — with year-over-year growth of 85.3 percent. IDC now projects the market will reach 250.5 billion yuan by 2029, with a compound annual growth rate of 43.5 percent over the 2024–2029 period. The broader virtual digital human industry, as measured by iiMedia Research using a wider definitional scope that encompasses virtual influencers, virtual idols, and downstream applications, reached a core market size of 339.2 billion yuan in 2024 and is projected to exceed 480 billion yuan in 2025. The China Academy of Information and Communications Technology (CAICT) reported the industry scale at over 300 billion yuan in 2024 with growth above 35 percent, while the China Internet Association's 2024 Digital Human Development Report projected the core market would surpass 400 billion yuan by 2025. The discrepancies between these figures are substantial but methodologically explicable: IDC measures AI digital human platform revenue in a narrow PaaS/SaaS sense, while iiMedia and the China Internet Association capture the full ecosystem including content production, virtual commerce, and downstream services.
The single most important driver of this acceleration has been generative AI. The integration of large language models into digital human systems beginning in 2023 collapsed production costs, shortened creation timelines from weeks to minutes, and enabled real-time interactive capabilities that were previously impossible. JD.com's Yanxi platform compressed single digital human production costs from tens of thousands of yuan to double digits, a reduction exceeding 90 percent. "Zero-sample" technology introduced in 2024 enabled instant digital human cloning from as little as 30 seconds of image data and 10 seconds of voice data. LLM inference costs dropped approximately 60-fold in a single year across major Chinese cloud providers. One industry estimate projects that generative AI will contribute over 60 percent of value in China's digital human market by 2027. The IDC MarketScape: China AI Digital Human Products 2024 Vendor Assessment, published in August 2024 under document number CHC52437724, evaluated 16 vendors and identified AIGC digital humans, AI video generation, and multimodal large models as the top investment priorities over the next one to three years. That assessment named Baidu, Huawei, SenseTime, iFlytek, Tencent, Xiaoice, Silicon Intelligence (硅基智能), MoFa (Shanghai) Information Technology Co., Ltd. (魔珐(上海)信息科技有限公司), JD.com, and others among the leading market participants. IDC's subsequent June 2025 market share report placed Baidu first at 9.8 percent, followed by Huawei Cloud at 9.7 percent, Xiaoice at 5.1 percent, and SenseTime at 4.3 percent.
Text-to-speech technology has been a particularly consequential area of advancement for the digital human sector. OpenAI's GPT-4o, announced on May 13, 2024, introduced a single end-to-end multimodal neural network capable of processing text, vision, and audio simultaneously, replacing the previous pipeline architecture of Whisper transcription followed by GPT-4 processing followed by separate speech synthesis. This reduced voice response latency from 5.4 seconds to as low as 232 milliseconds and enabled the model to detect tone, emotion, and background noise from audio input while responding with laughter, singing, and emotional expression. Advanced Voice Mode reached ChatGPT Plus subscribers in September 2024, the Realtime API launched on October 1, 2024, and a dedicated GPT-4o-mini-TTS model with prompt-based tone control debuted in March 2025. Chinese companies responded within weeks of the original announcement. ByteDance published its Seed-TTS paper on June 4, 2024, claiming speech virtually indistinguishable from human speech, then upgraded to Doubao-Seed-TTS 2.0 in October 2025 with enhanced emotional expressiveness. iFlytek launched its "Her" system, described as China's first ultra-fast super-human-like interaction system using end-to-end voice-to-voice modeling. MiniMax's Speech-02 achieved the top global ranking on the Artificial Analysis TTS benchmark, supporting 32 languages with 99 percent speaker similarity in zero-shot voice cloning. Alibaba's CosyVoice 2.0, released in December 2024, achieved 150-millisecond streaming latency with MOS scores of 5.53, comparable to professional recordings, and was fully open-sourced under an Apache 2.0 license. Fish Audio's S2 Pro, a 4-billion-parameter model released in March 2026, achieved the highest score on EmergentTTS-Eval, surpassing ElevenLabs, OpenAI, and Google. The open-source ecosystem has been particularly robust, with ChatTTS, F5-TTS, GLM-TTS from Zhipu AI, IndexTTS, and GPT-SoVITS all emerging in 2024 and 2025 and making high-quality Chinese TTS widely accessible.
The financial sector remains one of the most mature areas of digital human deployment in China. Shanghai Pudong Development Bank (浦发银行) and Baidu Intelligent Cloud (百度智能云) established the SPDB-Baidu Deep Learning Lab in February 2018 and held a digital human concept launch event on April 23, 2019. Their 3D digital human, named Xiao Pu (小浦), debuted as an "intern" at the 2019 Baidu AI Developer Conference on July 3, 2019, interacting live with Baidu CEO Robin Li. On December 13, 2019, Xiao Pu officially graduated and became the banking industry's first digital employee, with SPDB President Pan Weidong personally awarding the employee badge. During the COVID-19 pandemic in 2020, Xiao Pu provided zero-contact financial services. SPDB's own official communications describe more than 10 types of digital employee positions across 12 digital channels, generating over 3,000 person-years of digital labor by 2022. The named roles include wealth planner, AI trainer, lobby manager, smart teller, digital anchor, digital loan officer, intelligent outbound calling agent, and digital auditor, with additional functions in intelligent customer service, AI marketing, and digital quality inspection.
Tourism and cultural heritage have emerged as a major growth vertical. The Dunhuang Mogao Caves deployed Jiayao (伽瑶), developed with Tencent, as its first official digital cultural ambassador on the Cloud Tour Dunhuang WeChat mini-program. The National Museum of China launched Ai Wenwen (艾雯雯) on July 22, 2022, its 110th anniversary, developed with Tencent SSV Digital Culture Lab; after three upgrades, the system covers over 1.4 million museum artifacts and provides guided tours for the Chinese Civilization Cloud Exhibition. The Capital Museum in Beijing introduced Jing Hui (京慧) as an AI digital human guide for ancient Beijing history in September 2024. The Chengdu Du Fu Thatched Cottage Museum deployed AI Du Fu (AI杜甫), a digital recreation of the Tang Dynasty poet enabling visitor conversations about poetry, in September 2024. The Changchun Puppet Emperor's Palace Museum deployed Guide No. 5 (五号讲解员) with Beijing Bowen Technology for personalized AI-guided tours in September 2024. The Datong Museum deployed an AI digital human guide for the Northern Wei Empress Dowager Feng exhibition in January 2025, developed by Acoustiguide Digital Culture (爱可声). Acoustiguide alone had deployed AI guide systems across more than 50 museums in Beijing, Shanghai, Jiangsu, Anhui, Fujian, and Shanxi by January 2025, with nearly 100 additional partnerships confirmed. The National Museum of Natural History's AI digital human and AR guide was selected among the Top 10 Cases of Cultural and Tourism Digital Innovation in 2024 by China's Ministry of Culture and Tourism.
E-commerce livestreaming has become the most commercially visible application of digital humans in China, though it is also the most contested. The most prominent example is JD.com's Caixiao Dongge (采销东哥), a digital human modeled on founder Liu Qiangdong, which launched on April 16, 2024, attracted over 20 million viewers in under an hour, and generated over 50 million yuan in transactions with user dwell time reaching 5.6 times the daily average. During JD.com's 618 Shopping Festival in 2024, digital humans appeared in over 5,000 brand livestream rooms with cumulative duration exceeding 400,000 hours and more than 100 million viewers. By 2025, JD.com's platform, rebranded as JoyAI, reported cumulative viewing of 17 million and GMV exceeding 700 million yuan, with digital human livestream costs at one-tenth of real-person costs. Baidu partnered with tech entrepreneur Luo Yonghao for a digital human livestream in June 2025 in which Luo's digital twin outperformed his real-person debut, generating 55 million yuan in GMV from over 13 million viewers in six hours with an AI-generated script of 97,000 words and 8,300 AI-driven actions. Platform regulation has varied sharply: Douyin banned fully AI-driven digital human livestreaming in May 2023, requiring real-time human oversight; WeChat Video Accounts effectively classified virtual livestreaming as low-quality content in June 2024; and Kuaishou withdrew traffic support for digital human streams in June 2024, despite its Nüwa platform having previously supported over 2,200 concurrent digital human streams. Taobao has maintained a more permissive approach. Zhejiang Province issued the first provincial-level guidelines for AI digital human livestreaming in September 2024, while Shanghai's plan targets 600 billion yuan in livestream e-commerce by 2026. The industry has experienced severe consolidation: 60 to 70 percent of digital human service providers active in 2023 had disappeared by 2024.
The characterization of the market as "returning to rationality" after a period of excessive hype captures a real dynamic but oversimplifies a bifurcation. A 36Kr article from October 2025 documented shrinking company counts in the sector, and MoFa (Shanghai) Information Technology Co., Ltd. CEO Chai Jinxiang told Jiemian News that many companies were eliminated because their digital human capabilities could not keep pace with LLM advances. Industry insiders rated digital human interaction quality at only 20 to 30 points out of 100 despite visual appearance scores reaching 70 to 80 points. People's Daily noted in September 2025 that the industry was stepping down from its idol pedestal and deepening its focus on industrial needs. The famous virtual influencer Liu Yexi (柳夜熙), who went viral in 2021, required millions of yuan in production investment and hundreds of thousands per short video. Yet the cooling narrative is only half the story. Enterprise registrations hit a historical high of 413,000 new companies in 2024, up 36.9 percent year-over-year. Investment remained active, with 33 deals totaling 110.56 billion yuan in 2024. Patent applications reached 641 in 2024 with 340 granted, an increase of 129.73 percent over 2023. Silicon Intelligence (硅基智能) held 32.2 percent market share in the digital human intelligent agent track and was pursuing an IPO at a 3.15 billion yuan valuation. By end-2024, 21 provincial and 40 municipal governments had issued approximately 300 targeted policies supporting the sector. The most accurate description of the market is a structural transformation in which entertainment-driven virtual idols face severe consolidation while LLM-powered enterprise digital humans in banking, e-commerce, government services, and education are surging, and purchasing decisions have shifted from hype-driven to ROI-driven.
China's regulatory framework for digital humans has developed incrementally over four years and reached a milestone in April 2026. The foundational layer is the Provisions on the Administration of Deep Synthesis Internet Information Services, which took effect on January 10, 2023, and explicitly covers 3D reconstruction and digital simulation for generating digital characters. The Interim Measures for the Management of Generative AI Services followed on August 15, 2023, applying to all generative AI services in China. The Measures for the Labeling of AI-Generated Content took effect on September 1, 2025, mandating explicit and implicit labeling of all AI-generated content. On December 27, 2025, the Cyberspace Administration of China published the Interim Measures for the Management of Human-like Interactive AI Services for public comment, regulating AI companions and emotional chatbots with provisions including a two-hour continuous use limit and anti-addiction features. Then on April 3, 2026, the CAC published the Administrative Measures for Digital Virtual Human Information Services (《数字虚拟人信息服务管理办法(征求意见稿)》) for public comment, with the comment period running through May 6, 2026. This 27-article regulation across five chapters represents the world's first attempt to create a dedicated legal framework specifically for digital virtual humans. Key provisions include a mandatory continuous "数字人" label throughout any digital human service display, a requirement for separate informed consent when using individuals' sensitive personal information for modeling or image generation, prohibitions on creating identifiable digital humans of specific persons without consent, bans on providing virtual family members or virtual intimate partners to minors, and graduated penalties from warnings to fines of 10,000 to 200,000 yuan. At the local level, Beijing was first to issue a dedicated digital human industry policy in August 2022 with the Action Plan for Promoting Digital Human Industry Innovation and Development (2022–2025), targeting 50 billion yuan in industry scale by 2025. The first national standard for virtual digital humans, GB/T 46483-2025, led by SenseTime, was published in November 2025, specifying requirements including minimum 200,000 polygons for 3D ultra-realistic models and 90 percent or greater lip-sync accuracy. The National Radio and Television Administration published its own industry standard, GY/T 411-2024, for digital humans in broadcasting in November 2024.
The trajectory of China's digital human industry as of April 2026 is one of rapid maturation rather than simple growth or decline. The technology has moved from expensive custom production to AI-automated generation, from scripted avatar performance to real-time LLM-driven interaction, and from novelty marketing to systematic enterprise deployment across finance, commerce, tourism, government services, and education. The regulatory environment has evolved in parallel, with the April 2026 draft regulation signaling that digital humans have moved from a niche technological curiosity to a domain requiring dedicated governance. Market concentration is intensifying around a handful of platform leaders including Baidu, Huawei Cloud, Xiaoice, SenseTime, and Silicon Intelligence, while the long tail of small providers continues to contract. The fundamental challenge remains interaction quality: visual realism has advanced far ahead of conversational naturalness, emotional intelligence, and domain expertise. Those gaps are closing rapidly as large language models improve and Chinese TTS technology achieves parity with or exceeds global competitors, but the industry's near-term trajectory will be determined less by technological capability than by regulatory clarity, platform governance decisions, and the willingness of enterprises to integrate digital humans into core business processes rather than treating them as promotional novelties.
[Apr 2026]