Same engine. One HTTP API plus an MCP server.
Drop the transcribe.so API into AI agents, video editors, meeting bots, voice memo apps, and audio pipelines. Same engine that powers the consumer product, accessed with one Bearer token or via the Model Context Protocol server for Claude and ChatGPT. Free trial credits on signup, no card required.
No credit card required.·Pay only for what you use.
See it in action
Real output from a real transcription
Browse chapters, ask questions, and explore search results from an actual transcript.
Command Palette
Search for a command to run...
Building audio infrastructure is not the project you want to be on
- Managed transcription APIs cap at 25 MB and rate-limit the moment you scale past hobby use
- Per-minute pricing without per-language model routing taxes multilingual workloads
- Self-hosted Whisper-large eats GPU budget and breaks on long files or non-English audio
- Polling every few seconds blocks workers and burns rate-limit budget you'd rather spend on real traffic
What you get from the transcribe.so API and MCP server
One Bearer token, three input shapes
POST a YouTube URL, an external audio link, or a file uploaded via presigned S3 PUT. Same response shape regardless. No SDK required; pure HTTP.
MCP server for Claude and ChatGPT
Drop-in for Claude Desktop, ChatGPT Custom GPTs, and any MCP-compatible client. Exposes transcribe, search_library, list_transcriptions, and get_me tools out of the box.
Webhooks, not polling
Register a URL and we POST you when transcription completes. HMAC-signed (Stripe-style). Auto-retry with exponential backoff. Works with Cloudflare Workers, Lambda, Vercel, n8n, any HTTP-capable runtime.
Multilingual without compromise
67 languages with measured accuracy per language. The right ASR engine is picked per request, or you specify. Same engine routes the consumer dashboard.
Structured outputs, not raw text
/result returns segments, word-level timestamps, speaker labels, chapters, topics, summaries, and cited Q&A, not a wall of text. Skip the post-processing pipeline you'd otherwise build on top of raw ASR.
Predictable retries, debuggable failures
Idempotency-Key header support. Stripe-style error envelope with code, message, request_id, and doc_url. Per-key spend visibility in the dashboard.
What people use this for
- AI agents that read audio, drop transcripts into your LLM context and let agents reason over hours of recordings
- Meeting bots, process Zoom and Twilio recordings into searchable notes the moment a call ends
- Voice memo apps on iPhone and Android, auto-generated chapters and topics from raw audio
- Podcast pipelines, process new episodes from RSS feeds into show notes and chapter posts
- Video editors, generate burn-in captions with word-level timestamps, export SRT and VTT directly
- Language learning apps, accurate multilingual transcripts for shadowing and dictation drills
- Customer support, surface call topics and follow-ups from recorded calls
- Journalist workflows, drop interview audio in, get back chapters, quotes, and a searchable archive
FAQ
Frequently asked questions
Want a deeper comparison? Read the launch announcement
Ship it today.
Create a key, paste it into your script, transcribe in a minute. Or add the MCP server to Claude Desktop or your ChatGPT GPT. Per-key spend visibility and webhook configuration in the dashboard.