Supporters of Marcus Endicott’s Patreon can access weekly or monthly video consultations on this topic.

China's First Technical Standard for Digital Virtual Humans in Broadcasting

The National Radio and Television Administration (NRTA) published GY/T 411-2024, titled Technical Requirements for Digital Human, on November 26, 2024, establishing China's first sector-specific technical specification governing the creation and deployment of digital virtual humans in broadcasting, television, and online audiovisual services. The standard moved from committee review to formal issuance with unusual speed. The NRTA Science and Technology Department posted the approval draft for public notice on November 15, 2024, opening a ten-day comment window that closed on November 24. The draft had already passed review by the National Radio, Film and Television Standardization Technical Committee, and just two days after the comment period ended, the NRTA formally issued the standard, with the publication notice appearing on its website on November 28. The designation GY/T 411-2024 classifies it as a recommended industry standard for the broadcasting and network audiovisual sector, operating one tier below mandatory national standards. The "GY" prefix denotes the broadcasting industry, while "/T" marks it as recommended rather than mandatory. The official English title rendered in the standard document itself is "Technical requirements for digital human," though Chinese-language coverage and the Chinese title use the fuller term "digital virtual human" (数字虚拟人).

The lead drafting organization was the China Broadcasting Design and Research Institute, supported by thirteen co-drafters including the NRTA Broadcasting Science Research Institute, the NRTA Broadcasting and Television Planning Institute, Tencent Cloud Computing (Beijing), the Communication University of China (中国传媒大学), TRS Information Technology (拓尔思信息技术股份有限公司), Sichuan Broadcasting Television Station, Shandong Broadcasting Television Station, and several AI technology firms including Beijing Zhongke Shenzhi Technology and Beijing Qiwei Vision Technology. Tencent Cloud's participation proved commercially significant, as the company subsequently announced that its digital human products fully comply with the new standard.

The standard's architecture rests on a triaxial classification system that categorizes every digital virtual human along three independent dimensions. By appearance, digital humans divide into 2D and 3D types. By interaction mode, they split into non-interactive and interactive variants. By driving mode, they separate into algorithm-driven and real-person-driven categories. Any given digital human occupies one position on each axis, creating a matrix of possible configurations. Application scenarios fall into four major categories: content broadcasting (including news presentation, sign language interpretation, and livestream e-commerce), interactive customer service (virtual assistants and intelligent question-and-answer systems), virtual performance (variety show hosting, virtual concerts, and user proxy avatars), and content creation (film and television production, video creation, advertising, and game development). The standard's technical architecture then specifies five capability layers, each governed by dedicated chapters: appearance, algorithm-driven capabilities, real-person driving capabilities, platform capabilities, and security capabilities.

Appearance requirements vary by type and impose considerable specificity. For 3D realistic digital humans, the standard mandates that head models cover the face, oral cavity, upper and lower teeth, tongue, independent left and right eyeballs, eyelids, and tear glands. Hair systems must render hair, eyelashes, and facial fuzz with clear texture. The standard requires support for one-to-one ratio replication of real persons, lighting effects processing including shadows, refraction, and reflection, humanoid skeleton and skinning modeling, and flexible conversion between different angles and framings. For 2D real-person types, accurate reproduction of facial features, skin tone, teeth, and lighting from photos or video is required. All types must avoid distortion, mosaic artifacts, frame skipping, audio-video delay, or lip-sync inconsistency, and must not infringe third-party rights.

The algorithm-driven capability requirements span text-driven, speech-driven, and video-driven modes, plus speech synthesis, video synthesis, and multimodal integration. Speech synthesis must support end-to-end models including HiFi-GAN, VAE, diffusion models, Glow, and DurIAN, producing output close to real human speech with word-level fine-grained control of volume and duration. The standard requires multi-emotion controllable synthesis that automatically switches emotional tone based on text content, rapid personalized customization from minute-level to hour-level speech corpora, and multiple style profiles spanning broadcasting, narration, poetry, and customer service. Video synthesis requirements set a hard performance floor: under 1080P resolution, the video synthesis real-time rate must not exceed one, meaning synthesis must be at least as fast as real-time playback, with a minimum frame rate of 25 frames per second. Rendering must support both Unreal Engine and Unity. Multimodal requirements demand accurate pronunciation with no missed sounds, phoneme errors, or tone errors, natural lip synchronization, precise and contextually appropriate body movements, and real-time rendering based on physical lighting conditions.

For real-person driving, the standard addresses body motion capture, expression capture, and capture data handling, requiring that the motion capture actor (动捕演员, also called 中之人, meaning "the person inside") can map actions, expressions, and voice in real time to the digital human. Capture data sampling rate and precision must meet practical usage requirements. Platform requirements mandate support for public cloud, private cloud, or local deployment, and multi-device front-end access spanning PC, mobile, large-screen, web, apps, mini-programs, and H5. A particularly significant provision requires that real-person driving be capable of mixing with algorithm driving, with mutual takeover capability, allowing seamless handoff between human operators and AI.

The standard's security provisions establish requirements across two domains. For data and algorithm security, application entities must collect and use data within legally prescribed purposes and scope, configure access control mechanisms, encrypt business data requiring secured transmission, and must not produce, copy, publish, or disseminate false content. For personal information protection, the standard adopts principles of legality, legitimacy, necessity, and good faith that echo the Personal Information Protection Law (PIPL). Its most consequential provision requires that when editing real human faces, voices, or other biometric identification information, the application entity must notify the individual being edited and obtain their separate consent, applying the elevated consent standard from PIPL Article 29 for sensitive personal information.

GY/T 411-2024 does not exist in isolation. It operates within a multi-layered governance architecture that has expanded dramatically between 2022 and 2026, involving at least five major regulatory bodies. The CAC deep synthesis regulations, jointly issued by the CAC, MIIT, and Ministry of Public Security and effective January 10, 2023, established the foundational framework. These 25-article provisions defined "deep synthesis" to encompass technologies using deep learning, VR, and generative algorithms to create text, images, audio, video, and virtual scenes, capturing virtually all digital human technology. Article 14 requires biometric editing consent, Article 16 mandates invisible watermarking, Article 17 requires explicit visible labeling for content that may cause public confusion, and Article 24 specifically notes that online audiovisual services must additionally comply with NRTA regulations, creating the jurisdictional bridge to GY/T 411. The Interim Measures for the Management of Generative AI Services, effective August 15, 2023 and jointly issued by the CAC plus six departments including the NRTA, extended regulation to all generative AI services. By late 2025, 748 generative AI services had completed mandatory registration under these measures. In March 2025, four agencies including the NRTA jointly issued the Measures for Labeling AI-Generated and Synthesized Content, paired with GB 45438-2025, China's first mandatory national standard for AI content labeling, effective September 1, 2025. This standard requires dual explicit and implicit labeling of all AI-generated content including virtual scenes, directly affecting every digital human deployment.

On the national standards track, GB/T 46483-2025, published November 18, 2025, became China's first national standard for virtual digital humans. Managed by TC28/SC24 and led by SenseTime and CESI, it sets quantitative benchmarks: 3D ultra-realistic models require at least 200,000 polygon faces, lip-sync accuracy of 90 percent or above, and emotional interaction success rates of 80 percent or above. Digital human standardization work is distributed across multiple bodies, including TC28/SC24 for information technology, TC260 for cybersecurity, CCSA TC602 for telecommunications (which published the YD/T 4393 series on virtual digital human metrics), and the NRTA's own standardization infrastructure. In January 2026, MIIT opened public comment on a planned mandatory national standard for digital human identity identification, proposing a "one person, one code" system for all commercially deployed digital humans in China, motivated by over 1.14 million digital human-related enterprises and rising fraud concerns.

The regulatory trajectory steepened sharply in late 2025 and early 2026 with two CAC draft regulations that represent the most directly targeted governance instruments for digital humans in China's regulatory history. On December 27, 2025, the CAC released the Provisional Measures on the Administration of Human-like Interactive AI Services, with comments due by January 25, 2026. This 32-article draft explicitly covers digital humans, voice-based companions, and anthropomorphic AI characters, imposing requirements including a two-hour continuous use limit triggering mandatory pause reminders, a prohibition on emotional manipulation to retain users, mandatory parental consent for emotional companionship services provided to minors, and mandatory safety assessments with annual reviews by provincial CAC offices. Then on April 3, 2026, the CAC released the Administrative Measures for the Management of Digital Virtual Human Information Services, a 27-article draft open for comment until May 6, 2026. This represents the single most comprehensive instrument targeting digital humans specifically. Key provisions include mandatory continuous display of a digital human label throughout service provision, prohibition on using digital humans to circumvent facial or voice recognition systems, protection of motion capture actors' rights, a ban on virtual intimate relationships and addiction-inducing services for minors, required consent for using deceased persons' biometric information, and graduated fines of 10,000 to 200,000 yuan for violations. The draft's explicit call to "establish and improve the digital virtual human technology standards system" implicitly legitimizes GY/T 411-2024 and the broader technical standards ecosystem.

The NRTA has also built a broader regulatory apparatus around the standard. The Code of Conduct for Online Anchors, jointly issued with the Ministry of Culture and Tourism on June 22, 2022, was the first NRTA document to formally bring virtual anchors under regulatory oversight, stating that virtual anchors and content synthesized using AI technology shall comply with the code by reference. In September 2023, the NRTA issued a notice establishing VR production technology demonstration categories that included a dedicated Virtual Digital Human Application Demonstration track and an AIGC Application Demonstration track, with ten projects approved across six provinces by November 2024. A separate track addresses deepfake prevention through the Technical Requirements for Deepfake Prevention in Broadcasting and Online Audiovisual, developed under the NRTA's AI Application Key Laboratory. In December 2024, the NRTA Network Audiovisual Program Management Division issued management guidance specifically targeting AI modification of classic films and television dramas, triggering a nationwide cleanup campaign beginning January 2026. The NRTA also launched development of a Digital Human Identity Identification Standard for Broadcasting and Online Audiovisual in March 2024, with participation from over 20 major platforms including Douyin, Kuaishou, iQIYI, Bilibili, and Tencent. Looking ahead, NRTA policy documents reference two instruments under development: Management Provisions on Generative AI Applications in Broadcasting and a Broadcasting Audiovisual AI Ethics Code. Meanwhile, China Media Group has issued its own trial-version AI usage standard banning deep synthesis technology in political, current affairs, and sensitive historical reporting.

GY/T 411-2024 achieved what it set out to do: establish the first technical baseline for digital virtual humans within China's broadcasting ecosystem. Its three-axis classification system, concrete performance requirements covering 25-frame-per-second minimums, real-time synthesis benchmarks, and biometric consent mandates, and its security provisions created a technical vocabulary that subsequent regulations have adopted. Yet its most revealing feature may be what it omits. The standard deliberately confines itself to technical specifications, leaving content governance, identity management, and behavioral regulation to the cascade of CAC, MIIT, and joint-agency instruments that followed through 2026. China's digital human governance is not converging on a single regime but rather layering technical standards, mandatory labeling requirements, identity systems, and behavioral regulations into an increasingly dense mesh. The April 2026 CAC draft, with its explicit call to build out the digital human technical standards system, signals that GY/T 411-2024 was an opening move rather than a final word, the beginning of a standardization effort whose full architecture is still being assembled.

[Apr 2026]

Page updated

Google Sites

Report abuse