Supporters of Marcus Endicott’s Patreon can access weekly or monthly video consultations on this topic.

The Infrastructure Behind China's Digital Human Industry

China has assembled one of the world's most extensive national ecosystems for the creation, deployment, and governance of digital humans. This ecosystem rests on three reinforcing pillars: a layered policy architecture stretching from national five-year plans to municipal action plans, a physical infrastructure of nearly five million 5G base stations and hyperscale GPU-powered data centers, and a rapidly maturing domestic chip and software stack shaped by both US export restrictions and homegrown algorithmic innovation. iiMedia Research estimated China's core virtual digital human market at roughly 20.5 billion yuan in 2023 and projected it to reach 48 billion yuan by 2025, while the broader associated industry spanning live commerce, virtual influencers, and AI-powered customer service was forecast to approach 640 billion yuan in the same period. What distinguishes China's approach from those of other nations is not any single technology but the deliberate convergence of state industrial policy, municipal experimentation, telecom infrastructure, and platform-economy incentives into a unified deployment environment.

The national-level policy architecture provides strategic context rather than explicit digital human directives. The Fourteenth Five-Year Plan, adopted by the National People's Congress in March 2021, mentions the word "digital" eighty times and devotes an entire section to accelerating digital development, identifying artificial intelligence as one of seven key digital economy industries and separately calling for technological innovation in real-time three-dimensional graphics, dynamic environment modeling, real-time motion capture, and fast rendering under its virtual and augmented reality priorities. The plan does not, however, use the term "digital human." The companion document issued by the State Council in January 2022, the Fourteenth Five-Year Plan for Digital Economy Development, sets a target for core digital economy industries to account for ten percent of GDP by 2025 and emphasizes construction of integrated big-data center systems and AI research, but it likewise stops short of naming digital humans as a discrete priority. Made in China 2025, signed by Premier Li Keqiang in May 2015, similarly addresses AI only obliquely, situating it within next-generation information technology alongside integrated circuits and communications equipment, with its specific targets centered on intelligent manufacturing rather than AI as a standalone industry. China's dedicated national AI policy arrived separately in the July 2017 New Generation Artificial Intelligence Development Plan, which set explicit scale targets of 150 billion yuan by 2020, 400 billion yuan by 2025, and one trillion yuan by 2030 for the AI industry's core output.

The operational specificity lies one level down, at the municipal and provincial tier. Beijing produced China's first policy document explicitly targeting digital humans: the Action Plan for Promoting the Innovation and Development of the Digital Human Industry (2022 to 2025), issued by the Municipal Bureau of Economy and Information Technology in August 2022. That plan set a target for Beijing's digital human industry to exceed 50 billion yuan by 2025, called for cultivating one or two leading enterprises with revenue above five billion yuan each, and mandated the construction of common technology platforms for cloud rendering, interactive driving, intelligent computing, data openness, and digital asset circulation. By 2023, Beijing's digital human industry had reportedly generated cumulative revenue of roughly 51 billion yuan, suggesting the initial target was approximately met. Shanghai followed in July 2022 with a metaverse-focused action plan encompassing virtual digital humans, Guangdong released a next-generation AI innovation plan in December 2022 with explicit digital human research and development provisions, and Zhejiang issued a metaverse industry development action plan for 2023 to 2025 promoting consumer-facing virtual humans and direct-to-avatar business models. The pattern is consistent: national plans establish AI and digital economy ambitions in broad strokes, while provincial and municipal governments translate these into actionable digital human industry targets.

The concept of the Digital Silk Road, announced by President Xi Jinping in 2015 as a component of the Belt and Road Initiative and elaborated at the Second BRI Forum in 2019, covers 5G network construction, subsea cables, smart city deployments, cloud computing, and AI infrastructure abroad. Seventeen countries had signed Digital Silk Road memoranda of understanding with China by 2021, and associated investments were estimated at 79 billion dollars as of 2018. No official Digital Silk Road document, however, specifically identifies digital human technology export as a DSR objective, and any connection between the two remains inferential rather than documented.

China's 5G network provides the low-latency transport layer on which many real-time digital human applications depend. As of December 31, 2025, China had 4.838 million 5G base stations, a figure that yields 34.4 base stations per ten thousand people and exceeds the national planning target by 8.4 units. This network supports 1.204 billion 5G subscriptions out of 1.827 billion total mobile connections. Coverage extends to all towns and more than 95 percent of administrative villages, 5G-Advanced service has reached over 330 cities, and China holds 42 percent of global 5G standard-essential patents.

The three dominant cloud providers each offer digital human production and deployment services, though their architectures and commercial models differ. Alibaba Cloud's digital human ecosystem centers on its Lingjing platform for real-time conversational digital humans, integrated with its Intelligent Media Services workflow. In August 2025, Alibaba open-sourced Wan2.2-S2V, a speech-to-video model that converts portrait photographs into animated avatars capable of speaking and singing, building on a broader Wan series that has accumulated over 6.9 million downloads. The underlying large language model is Tongyi Qianwen (Qwen), now in its 3.5 iteration, while Tongyi Wanxiang provides text-to-image generation for avatar creation. Alibaba's ModelScope community, with 2.8 million active developers and over 100 million model downloads, serves as the model-as-a-service substrate. The company's digital human solutions have been deployed prominently in Taobao Live's e-commerce livestreaming, which generated over 400 billion yuan in gross merchandise value in 2020 alone, and in the virtual influencer Dong Dong created for Beijing 2022 Olympic merchandise promotion. Alibaba also partnered with BYOND Asia in October 2025 to deploy Arabic-first digital human avatars for the Middle East market using the Qwen model stack.

Tencent Cloud markets its AI Digital Human product under the product code IVH, offering three tiers: image procurement for the avatar itself, broadcasting service for text-to-video generation with synchronized lip movements, and interactive service for real-time voice dialogue. Available in five styles ranging from 3D photorealistic to 2D cartoon, the platform produces a custom digital human from three minutes of video and one hundred spoken sentences within twenty-four hours, starting at roughly one thousand yuan. It is powered by Tencent's Hunyuan large model family and the TI machine learning platform. Clients include FAW-Volkswagen, CITIC Jiantou Securities, Southern Metropolis Daily, and the National Museum. Tencent has expanded internationally through partnerships with BeLive Technology in Southeast Asia for multilingual digital humans in livestream commerce and customer service. Tencent's total AI spending reached 18 billion yuan in 2025, with plans to double that figure in 2026, though GPU supply constraints limited capital expenditure below internal targets.

Huawei Cloud's entry is MetaStudio, a virtual human production platform leveraging Huawei's Pangu Virtual Human Model. MetaStudio generates a basic virtual human in three hours and a photorealistic 3D avatar in roughly one week, claims lip synchronization accuracy above 95 percent across more than twenty languages, and supports round-the-clock livestreaming with real-time comment replies and object interaction. The platform runs on Huawei's Ascend NPU infrastructure alongside CPUs and GPUs, and is available in Chinese mainland, Singapore, and Ireland regions. Use cases span agricultural livestreaming, education, healthcare, news broadcasting, and accessibility services for hearing-impaired users.

ByteDance's Volcano Engine constitutes a fourth major infrastructure player. Its virtual digital human platform, which received one of the first batches of MIIT certification in June 2022, offers three solution types: broadcasting digital humans for scripted video production with lip accuracy of 98.5 percent, interactive digital humans for real-time customer service with end-to-end latency of 500 milliseconds, and livestreaming digital humans for automated e-commerce broadcasting on Douyin, Taobao, and JD.com. In July 2025, Volcano Engine began closed beta testing of a next-generation digital human platform called Chimera, incorporating enhanced capabilities from the Jimeng AI image and video generation system. The underlying large language model is Doubao, which by December 2024 had surpassed 100 million daily active users and processed over 50 trillion tokens daily. According to IDC's first-quarter 2025 report, Volcano Engine held 46.4 percent of China's public cloud large model invocation volume, surpassing Alibaba Cloud and Baidu Cloud's combined 38.6 percent. Volcano Engine's revenue exceeded 12 billion yuan in 2024, targeting 25 billion yuan in 2025 and 100 billion yuan by 2030. ByteDance's broader capital expenditure on AI infrastructure exceeded 150 billion yuan in 2025, with plans for 160 billion yuan in 2026, much of it flowing into facilities like the Volcano Cloud Taihang Computing Centre in Datong, Shanxi Province, whose Phase II alone involves an investment of 4.5 billion yuan for 15,604 server cabinets across 205,000 square meters.

The semiconductor layer underpinning digital human workloads is shaped decisively by US export controls. Beginning with the Bureau of Industry and Security's October 2022 restrictions on advanced AI chips and semiconductor manufacturing equipment, successive rounds of controls have progressively tightened. The October 2023 update closed loopholes that had allowed modified chips like Nvidia's A800 and H800 to reach Chinese buyers, and in April 2025 the Trump administration required export licenses for the Nvidia H20. In December 2025, the administration approved sales of the more powerful Nvidia H200 to selected Chinese customers, subject to a 25 percent revenue share and extensive certification requirements. The cumulative effect has been significant: China's share of worldwide AI computing power dropped from 37.3 percent in March 2022 to 14.1 percent by March 2025, while the United States maintained roughly 75 percent. The export control regime's most consequential effect on digital human deployment lies in constraining inference compute capacity at scale, since digital human workloads are predominantly inference-intensive, requiring real-time speech synthesis, lip synchronization, gesture generation, and natural language understanding for every user interaction. Chinese firms have responded through algorithmic efficiency, exemplified by DeepSeek's cost-effective training methods, domestic chip development, and system-level clustering such as Huawei's CloudMatrix 384 super node with 384 Ascend 910C chips.

ByteDance's Douyin launched the V Project (V计划) in November 2024, enabling content creators to generate AI-powered virtual avatars that replicate their personality and communication style for round-the-clock fan interaction. The system, originally tested as AI Space beginning in early 2024, offers five sub-features: AI Interactive Space, AI Group Chat, AI Private Messaging, AI Comments, and AI Live Streaming, all powered by the Doubao large language model. ByteDance's research capabilities in this domain are substantial: OmniHuman-1, released in early 2025, generates realistic human videos from a single photograph and audio input using a diffusion-transformer model trained on 18,700 hours of human video, while OmniHuman 1.5 extended this to multi-character scenes and videos exceeding one minute. The Seedance video generation model, launched in October 2025 via Volcano Engine, produces five-second 1080p clips for approximately one yuan.

Silicon Intelligence (硅基智能), headquartered in Nanjing and founded in 2017, is China's leading dedicated digital human company by market share, holding 32.2 percent of the domestic market by revenue in 2024. The company has served approximately 40,000 enterprises across 40 industries through its DUIX intelligent interaction platform, which provides digital human capabilities via web SDKs supporting text-to-speech, audio input, and real-time interaction. Its product lines span virtual live streaming, AI video production, and algorithmic technology, and its clients include major banks such as ICBC, Bank of China, and China Merchants Bank. In November 2024, Silicon Intelligence released DUIX ONE, a multimodal large model integrating speech recognition, synthesis, natural language processing, and computer vision, open-sourced as what it described as the first AIGC digital human model. The company filed for a Hong Kong IPO in 2025, reporting revenues of 655 million yuan in 2024 on cumulative losses exceeding 300 million yuan over three years. Its 135 international patents, including 19 US patents, and its deployment of digital humans at the 2025 Osaka World Expo using DeepSeek-powered models reflect ambitions well beyond the Chinese domestic market. Silicon Intelligence should not be confused with Silicon Flow (硅基流动), a Beijing-based AI inference cloud platform founded in 2023 that provides API services for large language models but has no digital human products.

Tencent has built real-time digital twin platforms in Shenzhen, most notably for Bao'an District's smart transportation system in December 2022, fusing IoT data, traffic big data, and high-precision maps. Shenzhen won the City Award at the 2024 Global Smart City Conference partly due to this technology. However, Tencent's own documentation acknowledges that current digital twin city construction mostly serves urban governance operators, with citizens benefiting indirectly. Tencent's digital twin and digital human product lines remain organizationally and technically separate.

The deployment that first brought China's digital human ambitions to international attention was Xinhua News Agency's AI news anchor, unveiled on November 7, 2018, at the Fifth World Internet Conference in Wuzhen, Zhejiang Province, in collaboration with Sogou. Two initial versions debuted: an English-speaking anchor modeled on journalist Zhang Zhao and a Chinese-speaking anchor modeled on Qiu Hao. The technology used Sogou's Avatar system with cloud-based facial landmark localization, face reconstruction, multimodal recognition and synthesis, and transfer learning. By February 2019, the AI anchors had delivered approximately 3,400 news reports totaling over 10,000 minutes of broadcast time. The system was iteratively upgraded: a female AI anchor, Xin Xiaomeng, debuted in February 2019, a Russian-speaking version appeared at the St. Petersburg International Economic Forum in June 2019, and Sogou launched the world's first 3D AI news anchor, Xin Xiaowei, in May 2020, followed by an AI sign language anchor in May 2021. Initial distribution was via WeChat, Weibo, and Xinhua's digital platforms rather than 5G, since China's commercial 5G launch did not occur until October 31, 2019, more than a year after the Wuzhen debut. After Tencent completed its acquisition of Sogou in September 2021, the specific Xinhua-Sogou AI anchor partnership's continuation became unclear, and the original technology has been superseded by more capable generative AI systems.

China Mobile's Migu platform represents the most prominent deployment of digital humans in live entertainment and sports broadcasting. For the Beijing 2022 Winter Olympics, Migu created Meet GU, an ultra-realistic digital human modeled on freestyle skier Eileen Gu with sub-millimeter 3D face fitting precision, nearly 700 skeletal control points, and voice cloned from the athlete's actual speech using personalized text-to-speech technology. This marked the first time in Olympic broadcasting history that a digital human appeared as a studio guest. Migu deployed nine pioneering technologies during the Games, including 8K UHD live broadcasting, HDR Vivid, AVS3 encoding, AR studios, AI tactical analysis, and 5G Cloud XR. At the 2022 Qatar World Cup, Migu fielded multiple digital avatars including a sign-language host, an industry first for FIFA coverage, and created a Metaverse World Cup interactive space using VIZRT virtual studio technology. Migu leverages China Mobile's 5G network and has led development of over twenty international and domestic technical standards in ultra-high definition and VR/AR broadcasting. Its distribution infrastructure relies on AVS3 encoding and its self-developed low-latency FLV distribution platform.

In Hangzhou, the municipal government's City Brain platform, a government-led initiative operated by a joint venture with Alibaba, has integrated digital human assistants for citizen services. The tourism-focused digital human is Hang Xiaoyi (杭小忆), launched on May 16, 2022, by the Hangzhou Bureau of Culture, Radio, TV and Tourism in collaboration with China Mobile Hangzhou, providing trip planning, hotel booking, attraction recommendations, and real-time crowd monitoring. Additional City Brain digital assistants include Jingxiao'ai for police affairs, Yibao'er for medical insurance, and Hanghaomeng, described as China's first AI mental health assistant, which has served 1.8 million users. City Brain 3.0 launched on March 31, 2025, and integrates DeepSeek-R1 series models, including the 671-billion-parameter version and distilled Qwen variants, in a government information-security environment, making Hangzhou one of the first Chinese cities to incorporate large language models with self-evolving capabilities into municipal governance infrastructure.

China's Smart Court initiative, driven by the Supreme People's Court since approximately 2014, has been extensive, but the primary AI technology provider is iFlytek rather than Huawei. iFlytek's automatic speech recognition system has been implemented in courts across all 31 provincial-level administrative regions, covering over one thousand courts and nearly five thousand courtrooms as of 2019. iFlytek developed the pioneering 206 System for AI-assisted criminal case adjudication with the Shanghai High Court, deploying over 200 technical personnel. The system achieves above 90 percent accuracy for legal speech recognition and supports minority language models for Uyghur, Tibetan, and Sichuan dialect. These are AI-assisted transcription, document review, and case classification tools running on iFlytek's cloud platform, not digital human clerks in the visual, interactive sense. iFlytek's role in education is similarly substantial: its AI-led innovative education services were utilized in 3,600 schools across 28 provincial-level administrative regions, including 55 national AI experimental schools, as of December 2022, while its broader smart education platform now reaches over 50,000 primary and secondary schools serving more than 130 million teachers and students. Products like Curiosity Window, which deploys digital human personas of historical figures for interactive learning, and the Chinese Smart Classroom dual-teacher model deployed in over 270 institutions domestically and 300 abroad come closest to the AI teacher characterization, though these are interactive supplements rather than autonomous replacements for human educators. iFlytek chairman Liu Qingfeng confirmed that the company uses Huawei Ascend 910B chips for large language model training, stating that the company raised the chip's efficiency from 20 percent to nearly 80 percent of Nvidia's capabilities.

China's regulatory approach to digital humans has evolved from general AI governance provisions to increasingly targeted rules. The foundation includes the Algorithm Recommendation Provisions effective March 2022, the Deep Synthesis Provisions effective January 10, 2023, requiring labels on synthesized content and consent for biometric editing, and the Interim Measures for Generative AI Services effective August 15, 2023, the world's first binding regulation for generative AI, mandating security assessments and CAC filing for large language models. In March 2025, four agencies jointly released Measures for Labeling AI-Generated Synthetic Content, effective September 1, 2025, mandating both visible and metadata-embedded labels. A nationwide Clear and Bright enforcement campaign from April to June 2025 resulted in 3,500 AI products being taken down, over 960,000 pieces of illegal content removed, and 3,700 accounts penalized.

Two CAC regulations in the 2025 to 2026 period directly address digital human governance. The Provisional Measures on the Administration of Human-like Interactive AI Services, published December 27, 2025, target AI products that simulate human personality and engage in emotional interaction, with notable provisions including mandatory human operator takeover when users express self-harm intentions, a two-hour consecutive use limit enforced by pop-up reminders, and security assessment triggers when registered users reach one million or monthly active users reach 100,000. On April 3, 2026, the CAC published draft Measures on Digital Human Services for public comment through May 6, 2026. This regulation specifically defines digital humans as virtual figures that exist in nonphysical environments and simulate human appearance and behavior using technologies such as computer graphics, digital image processing, and artificial intelligence. It mandates continuous, prominent labeling visible throughout display, requires explicit consent before using any person's likeness or voice, bans the provision of virtual intimate relationships to minors under eighteen, prohibits sexually suggestive, violent, or discriminatory content, and introduces a life-cycle framework covering creation, operation, and dissemination. Separately, the State Council's August 2025 directive on deepening the AI Plus initiative set targets for AI agent integration exceeding 70 percent across six key domains by 2027 and 90 percent by 2030, while an October 2025 revision to the Cybersecurity Law, effective January 1, 2026, incorporated AI provisions into national law for the first time.

China's digital human infrastructure has reached a point of deployment maturity where the binding constraints are shifting from technology availability to regulatory design, compute access, and commercial model viability. The physical infrastructure of nearly 4.84 million 5G base stations, hyperscale data centers from ByteDance, Alibaba, Tencent, and Huawei, and maturing platform services that can generate custom digital humans in hours rather than months has lowered production costs to the point where AI-powered livestreaming hosts are reducing e-commerce operating expenses by roughly 90 percent. The domestic AI chip ecosystem, anchored by Huawei's Ascend series and supplemented by Alibaba's and Baidu's in-house efforts, has made genuine if constrained progress under export restrictions that have reduced China's global AI compute share from 37 to 14 percent in three years. The most significant near-term development may be the interplay between the April 2026 CAC draft regulation and the industry's growth trajectory: if finalized as drafted, the mandatory labeling, consent, and minor protection requirements will impose compliance costs that favor large platform providers with established governance infrastructure over the hundreds of startups that currently populate the market. The Fifteenth Five-Year Plan, released in March 2026, singles out AI as a top national priority and specifically calls for building open-source AI communities, signaling that the state intends to sustain and accelerate this ecosystem rather than merely regulate it.

[Apr 2026]

Page updated

Google Sites

Report abuse