In this article
Generative AI is beginning to enter content creation processes where decision makers can eliminate or minimize copyright legality risks.
This year’s Mipcom Cannes gathered experts to discuss the role of generative AI in the TV industry at its “Applied AI Summit” at the MIP Innovation Lab on Oct. 21, with a full day of presentations, panels, technology showcases and roundtable discussions. VIP+ presented insights from its research on the emerging state of generative AI in TV content, as detailed in the June special report “Generative AI in Film & TV.”
Related: Generative AI in TV: VIP+ at Mipcom Cannes Recap
Until more legal clarity arrives, early adoption and industry tests will occur primarily in areas of pre-production, post-production and distribution. U.S. media and entertainment decision makers expect generative AI to be used for concept work, enhanced VFX and content localization (e.g., AI dubbing), according to a VIP+ survey conducted by HarrisX in May 2024.
VIP+’s Mipcom presentation focused on three areas of generative AI with near-term potential for use in TV content creation and distribution processes:
1. AI Voice: AI voices have some early usefulness as they gain naturalism. Dubbing with synthetic voices is gaining some traction for “lower-stakes” content or platforms, such as localizing news or sports clips for YouTube or programming for FAST channels. In these cases, faster turnaround is the priority to extend audience reach for content that wouldn’t otherwise have been dubbed.
However, dubbing more premium TV content and distribution is less capable right now. AI voices can still have some flaws in delivery versus a human voice actor. Some imperfections can be corrected with features in the tools to modify tone and inflection. But whether that’s a more efficient process than recording a voice actor will depend on how much effort needs to be put into those changes to produce a high enough quality speech track for more premium content and distribution settings.
For now, AI dubbing for TV content could similarly focus on improving reach and monetization, such as dubbing content for lower-resource languages that wouldn’t otherwise get dubs and testing audience response.
Beyond dubbing, uses of voice clones for narration have emerged for TV and ancillary content, with the consent and compensation of the personality or their estate (e.g., the Al Michaels voice clone that was used to give fans personalized highlights from NBC’s Olympics coverage on Peacock).
2. Face-swapping: Deep learning models are very powerful for rendering complex or subtle face modifications. The early potential here is proving out in lip-sync dubbing and face-swapping to achieve de-aging and other effects.
AI lip-sync tools, such as those offered by Flawless and MARZ’s LipDub AI, can synchronize an actor’s mouth and facial movements with a dubbed speech track. This year, major Hollywood studios are testing lip-sync dubbed content, as VIP+ reported in May 2024. The hope is that it pays off by giving audiences a more immersive experience of foreign-language content, making it feel like the viewer is watching content in their native language.
Face-swapping can also be used for cosmetic touch-ups or to change entire face structures to age or de-age actors and more. The same tools have also been discussed to open up editing to eliminate reshoots by allowing actors to redo lines remotely.
3. Video generation: Video generation is rapidly advancing, and studios and some filmmakers are clearly interested in retaining and using these models as production tools. Yet for studio productions, it’s still unclear what it would mean professionally to use video generation in a workflow, including who is best situated to directly use it. Figuring out how to maximize these tools will be an internal focus in the coming months. Any studio productions that make use of video generation would likely work directly with AI company teams to get desired outputs from the tools.
Still, it’s important to recognize these systems have important differences versus traditional camerawork, VFX or animation. The differences appear in their photo and physics realism versus the real world, consistency or continuity and controllability. A critique of text-to-video is it’s effectively a “slot machine,” in that the output can look excellent but may not fit the exact need for a specific scene or production. Different techniques will have different capabilities, such as video-to-video, recently launched by Runway for Gen-3 Alpha.
Notably, major studios are exploring fine-tuning video models — further training pretrained video generation models on owned IP, such as catalog TV and film content — to have a model solely intended for internal use, as VIP+ first covered in July. The Lionsgate-Runway partnership is the first publicly announced instance of fine-tuning at a studio, but as many as six major Hollywood studios are pursuing video model fine-tuning, a source told VIP+ in September.
While generative AI’s performance issues are becoming solved for premium TV, pressing legal questions still present still one of the biggest near-term barriers to its use in content production.
Now dig into a data-fueled VIP+ subscriber report ...