Audio inference

Build flexible pipelines to transcribe audio, generate voice or synthesize high-fidelity music.

“Suno has developed proprietary state-of-the-art models that generate music and speech using AI. Modal's superb developer experience enables our team to ship new models to production quickly, and with and confidence we'll scale to thousands of simultaneous users.”

Georg Kucsko, Co-Founder

“At Phonic, we train our own proprietary models for audio generation. We moved all our large-scale audio processing batch jobs to Modal. Our engineers are ecstatic with the result – we can run at a much larger scale than before, no longer have to babysit our batch jobs, and we can ship much faster.”

Moin Nadeem, Co-Founder

“When Substack launched a feature for AI-powered audio transcriptions. The data team picked Modal because it makes it easy to write code that runs on 100s of GPUs in parallel, transcribing podcasts in a fraction of the time.”

Mike Cohen, Head of Data

Cheap, efficient transcription

View Examples

Outperform managed APIs

Get faster speeds at lower costs compared to popular transcription APIs like AssemblyAI and Deepgram by leveraging open-source models on Modal.

Scale on demand

Distribute transcription tasks across hundreds of containers simultaneously.

View Examples

Build your own AI voice

View Examples

Transform text into natural-sounding speech using the latest open-source models.

Deploy text-to-speech models like XTTS directly on Modal's platform.

Cutting-edge hardware access

Tap into Modal's fleet of A100 and H100 GPUs for memory-intensive voice models.

Lightning-fast cold starts

Generate speech on-demand without lengthy startup times with our optimized container file system and engine.

View Examples

Try it out

View all

Voice chat with LLMs

Build an interactive voice chat app.

Fast podcast transcriptions

Build an end-to-end podcast transcription app that leverages dozens of containers for super-fast processing.

Multilingual voice and text chat

Use the SeamlessM4T model and WebSockets to connect chat users speaking different languages.

Run a music-generating Discord bot

Create your own music samples on Discord

Ship your first app in minutes.

Get Started

$30 / month free compute