Same engine. One HTTP API plus an MCP server.

Drop the transcribe.so API into AI agents, video editors, meeting bots, voice memo apps, and audio pipelines. Same engine that powers the consumer product, accessed with one Bearer token or via the Model Context Protocol server for Claude and ChatGPT. Free trial credits on signup, no card required.

No credit card required.·Pay only for what you use.

See it in action

Real output from a real transcription

Browse chapters, ask questions, and explore search results from an actual transcript.

44 Harsh Truths About The Game Of Life - Naval Ravikant (4K)
Chris Williamson
Contents
8 chapters · 513 sections
1Happiness Versus Success: Philosophical Reflections on Contentment, Desire, and Motivation
2Optimizing Sleep: Smart Temperature Regulation and the Foundations of Self-Esteem
3Decisive Action and Iterative Practice: Keys to Optimal Choices and Mastery
4Wealth Management: From Materialism to Value Creation and Fair Compensation
5Evaluating LLMs: Capabilities, Limitations, and Their Role in AI's Evolving Landscape
6Pathogens, Evolution, and Knowledge: How Humans Adapt and Defend
7Agency, Power, and the Individual: From Child Development to Cultural Conflict
8Unseen Trends: Media Oversights, Medical Limitations, and the Primitive State of Modern Biology
Q&A preview
Answer
Naval explains two distinct paths to happiness using the story of Alexander and Diogenes. The first path is through success—conquering the world, satisfying material needs, and getting what you want. The second path, exemplified by Diogenes living in a barrel, is simply not wanting in the first place. As Socrates said when shown luxuries: 'How many things there are in this world that I do not want.' Naval suggests not wanting something is as good as having it—both paths lead to the same destination of contentment [00:38–01:10]. He's not sure which path is more valid, noting it depends on how you define success [01:10–01:25].

Command Palette

Search for a command to run...

Building audio infrastructure is not the project you want to be on

  • Managed transcription APIs cap at 25 MB and rate-limit the moment you scale past hobby use
  • Per-minute pricing without per-language model routing taxes multilingual workloads
  • Self-hosted Whisper-large eats GPU budget and breaks on long files or non-English audio
  • Polling every few seconds blocks workers and burns rate-limit budget you'd rather spend on real traffic

What you get from the transcribe.so API and MCP server

One Bearer token, three input shapes

POST a YouTube URL, an external audio link, or a file uploaded via presigned S3 PUT. Same response shape regardless. No SDK required; pure HTTP.

MCP server for Claude and ChatGPT

Drop-in for Claude Desktop, ChatGPT Custom GPTs, and any MCP-compatible client. Exposes transcribe, search_library, list_transcriptions, and get_me tools out of the box.

Webhooks, not polling

Register a URL and we POST you when transcription completes. HMAC-signed (Stripe-style). Auto-retry with exponential backoff. Works with Cloudflare Workers, Lambda, Vercel, n8n, any HTTP-capable runtime.

Multilingual without compromise

67 languages with measured accuracy per language. The right ASR engine is picked per request, or you specify. Same engine routes the consumer dashboard.

Structured outputs, not raw text

/result returns segments, word-level timestamps, speaker labels, chapters, topics, summaries, and cited Q&A, not a wall of text. Skip the post-processing pipeline you'd otherwise build on top of raw ASR.

Predictable retries, debuggable failures

Idempotency-Key header support. Stripe-style error envelope with code, message, request_id, and doc_url. Per-key spend visibility in the dashboard.

What people use this for

  • AI agents that read audio, drop transcripts into your LLM context and let agents reason over hours of recordings
  • Meeting bots, process Zoom and Twilio recordings into searchable notes the moment a call ends
  • Voice memo apps on iPhone and Android, auto-generated chapters and topics from raw audio
  • Podcast pipelines, process new episodes from RSS feeds into show notes and chapter posts
  • Video editors, generate burn-in captions with word-level timestamps, export SRT and VTT directly
  • Language learning apps, accurate multilingual transcripts for shadowing and dictation drills
  • Customer support, surface call topics and follow-ups from recorded calls
  • Journalist workflows, drop interview audio in, get back chapters, quotes, and a searchable archive

FAQ

Frequently asked questions

The HTTP API is for any backend or script that can make Bearer-authenticated HTTP calls. The MCP server is for LLM agents (Claude Desktop, ChatGPT Custom GPTs, Cursor, and other MCP-capable clients) that want transcription and library tools surfaced as native MCP actions. Same engine, different surface.

Same per-minute rate as the consumer dashboard. Wallet-funded, pay only for what you transcribe, no monthly commit. No file-size caps; presigned uploads support files up to 500 MB. Multilingual workloads are not separately taxed.

Free trial credits on signup. Enough to transcribe roughly fifteen minutes on Qwen3-ASR-Flash. No card required. Top up the wallet from the billing page when you're ready to scale.

Yes. Bearer auth means no cookies and no CSRF. CORS is open on every /api/v1/ endpoint. Webhooks remove the need for long-running polling. Works from Cloudflare Workers, Lambda, Vercel, edge functions, n8n, and anything that can make an HTTP call.

67 languages with FLEURS-measured accuracy per language and per pipeline. Set language: 'auto' to let the engine detect and route, or specify a pipeline to lock in a model.

Three options. (1) Add the MCP server to Claude Desktop or a ChatGPT Custom GPT and the transcription tools appear natively. (2) Use Bearer-authenticated HTTP from any agent framework. (3) Use the public ChatGPT app or Claude connector and the engine works out of the box.

POST /api/v1/transcriptions returns 402 insufficient_funds with a doc_url pointing to the billing page. In-flight jobs complete normally. Top up and retry. Idempotency keys prevent duplicate submissions.

X-Transcribe-Signature header carries t=<unix-seconds>,v1=<hmac_sha256(secret, '${t}.${rawBody}')>. Verify against the raw request body. TypeScript and Python examples at /developers/docs#webhooks.

500 MB per file via presigned upload. No length cap; long audio is chunked internally. A 4-hour podcast transcribes in 4–8 minutes wall time on Qwen3-ASR-Flash.

Want a deeper comparison? Read the launch announcement

Ship it today.

Create a key, paste it into your script, transcribe in a minute. Or add the MCP server to Claude Desktop or your ChatGPT GPT. Per-key spend visibility and webhook configuration in the dashboard.