Turn lectures, podcasts, and videos into a searchable library. In any language.

Paste a YouTube link or upload audio. Ask anything across your whole library and jump to the exact second they said it. Cited answers, chapters, and export-ready subtitles included.

A simple HTTP API lets you plug the same engine into AI agents, automations, meeting bots, and your own apps. Read the API docs Try it on ChatGPT

No credit card required.·Pay only for what you use.·50+ languages.

Try this real transcript
44 Harsh Truths About The Game Of Life - Naval Ravikant (4K)
Chris Williamson
Contents
8 chapters · 513 sections
1Happiness Versus Success: Philosophical Reflections on Contentment, Desire, and Motivation
2Optimizing Sleep: Smart Temperature Regulation and the Foundations of Self-Esteem
3Decisive Action and Iterative Practice: Keys to Optimal Choices and Mastery
4Wealth Management: From Materialism to Value Creation and Fair Compensation
5Evaluating LLMs: Capabilities, Limitations, and Their Role in AI's Evolving Landscape
6Pathogens, Evolution, and Knowledge: How Humans Adapt and Defend
7Agency, Power, and the Individual: From Child Development to Cultural Conflict
8Unseen Trends: Media Oversights, Medical Limitations, and the Primitive State of Modern Biology
Q&A preview
Answer
Naval explains two distinct paths to happiness using the story of Alexander and Diogenes. The first path is through success—conquering the world, satisfying material needs, and getting what you want. The second path, exemplified by Diogenes living in a barrel, is simply not wanting in the first place. As Socrates said when shown luxuries: 'How many things there are in this world that I do not want.' Naval suggests not wanting something is as good as having it—both paths lead to the same destination of contentment [00:38–01:10]. He's not sure which path is more valid, noting it depends on how you define success [01:10–01:25].

Command Palette

Search for a command to run...

Powered by

OpenAIQwenMistral
Works with
YouTube
Google Meet
Zoom
Microsoft Teams
Loom
Voice Memos
Video files
Audio files
Export to
CapCut
Final Cut Pro
Premiere Pro
DaVinci Resolve
Copy to
Notion
Apple Notes
Google Keep
OneNote
Evernote
Obsidian
WhatsApp
Slack
Telegram

Supports MP4, MOV, WebM, MP3, WAV, M4A, AAC, FLAC, OGG, and more.

YouTube to textSpeech to textAudio to textVideo to textVoice note to textGoogle Meet to textLoom to textLecture video to notesSubtitle generatorSearchable transcripts

How it works

Three steps from a recording to a searchable library

Real screenshots from a real transcript. No mockups, no marketing fluff. This is what you get.

  1. 1

    Drop in your audio

    Paste a link or upload a file. Anything you record, download, or screen-capture works.

    Paste a YouTube URL and pick the best speech-to-text model for your language
  2. 2

    Get a structured transcript

    Every transcript comes back with chapters, topic summaries, and timestamps so you can jump straight to what matters.

    Auto-generated chapters and topic summaries with timestamps
  3. 3

    Ask anything across your library

    Ask a question and get a cited answer with a timestamp. Search across every transcript in your library. Export subtitles if you need them.

    AI Q&A answer with inline timestamps and cited topic cards

No credit card required. Pay only for what you use.

The real cost of long videos

You remember the answer is in there. You just can't find it.

Long lectures and podcasts hold the exact moment you need. The summary won't show it. The timeline won't either. So you scrub, overshoot, and start the video over.

And when the video is in a language your tool barely understands, the transcript is wrong before you even start looking.

Per long video

3-hour lecture you need to study

+ 20–40 min scrubbing to find one explanation

+ Replaying the same 90-second clip three times

+ A summary that skipped the part you wanted

= You learned less than the hours you put in

Per week

5 long videos to get through

+ 20–30 min hunting for moments in each

= 1.5–2.5 hours of study time gone every week

Library, not a one-off transcript

Build a knowledge library you can actually search

Most tools give you one transcript at a time. transcribe.so turns every lecture, podcast, and video into a searchable, askable library that grows with you.

One searchable library across everything

Every YouTube link, lecture, and podcast joins one library. Find a quote across hours of content in seconds.

Ask anything. Jump to the second they said it.

Ask a question and get a cited answer with a timestamp. Click the citation to jump straight to the moment in playback.

Works in any language, automatically

67 languages with measured accuracy per language. We pick the right speech-to-text engine for you, so you focus on studying.

That's roughly 1.5–2.5 hours of study time back every week.

No credit card required. Pay only for what you use.

For power users

Under the hood: the speech-to-text engines we route to

You don't need to pick a model. We route each file to the right engine for your language automatically. If you want to override the default, here's what powers your library: GPT-4o, Qwen3-ASR-Flash, and Voxtral. Chapters, library search, cited Q&A, subtitles, and exports work the same way across all of them.

Premium
GPT-4o Transcribe Diarize
Best-in-class diarization with built-in speaker labels
Built-in speaker identification (who said what)
58 languages, sentence timestamps
Hosted by OpenAI for enterprise reliability
OpenAIOpenAI
Top-Tier
Qwen3-ASR-Flash
Leaderboard-leading accuracy with word-level timestamps
#1 on HuggingFace Open ASR Leaderboard (4.25% avg WER)
33 languages, word timestamps (10 langs)
Emotion detection, long-form audio
QwenAlibaba Qwen3
New
Voxtral Mini Transcribe
Word-level timestamps with speaker labels
Word-level timestamps in 13 languages
Speaker labels & context biasing
13 languages, lowest cost per minute
MMistral AI
Search backbone
Semantic Search & AI Q&A
Powers search by meaning and AI Q&A across every transcript, no matter which ASR model produced it.
Hybrid retrieval with second-stage reranking
Citation-grounded answers with timestamps
Find moments by meaning, not just keywords
Frontier embedding + LLM stack

A note from the maker

Hey, I'm Seunghun 👋

In 2023 I left Spotify to work on the problem of finding the useful 90 seconds inside a three-hour podcast. We built goodlisten.co, ran out of runway, and I went back to a desk job.

But I kept needing it myself. English was the easy part. The audio I actually cared about was harder: Korean podcasts where the host slips into English, Japanese conversations with three speakers, Spanish lectures recorded in noisy rooms. I was tired of spending two hours just to find the two minutes that mattered.

So in 2025 I stopped trying to build for “the market” and built the tool I wished existed, for one very specific user: me. If it saves you time, tell me. If it doesn't, tell me directly. That's how it gets better.

Seunghun

Who it's for

Built for learners. With an API for builders.

Whether you're studying from a 3-hour lecture, a foreign-language podcast, or building an AI agent that needs to listen, the same engine handles it.

Students and lifelong learners

Turn every lecture, podcast, and video you study into a searchable library. Get cited answers tied to the exact second they were said.

Learners studying in any language

Korean MOOCs, Japanese podcasts, Spanish talks, or English lectures. We pick the right speech-to-text engine for each of 67 languages, with measured accuracy.

Developers building AI apps

One HTTP API plus a Claude and ChatGPT MCP surface. Plug the same engine into agents, automations, voice memo apps, and meeting bots.

No credit card required. Pay only for what you use.

Use it in your app

One HTTP API. Plus an MCP server for Claude and ChatGPT.

Same engine that powers your library. Chapters, library search, cited Q&A, subtitles, exports. Hit it from any agent, video tool, meeting bot, or voice app.

One curl, full pipeline

curl https://transcribe.so/api/v1/transcriptions \
  -H "Authorization: Bearer tsk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "source": "youtube",
    "url": "https://youtu.be/dQw4w9WgXcQ",
    "pipeline_code": "qwen3-asr-flash-filetrans"
  }'

YouTube, file upload via presigned S3, or any direct audio URL. Same response shape.

  • AI agents

    Drop a transcript into your agent's context. Claude, ChatGPT, Cursor, anything that calls HTTP.

  • Video editors and tools

    Word-level timestamps, burn-in captions, SRT/VTT export. Same engine as the dashboard.

  • Meeting bots and call platforms

    Transcribe Zoom, Twilio, or any recording the moment a call ends. Webhook fires when ready.

  • Voice memos, podcasts, language apps

    67 languages with measured accuracy per language. Auto-detect or pin a specific code per request.

Use cases

From transcription to something actually useful

Whether you are publishing, editing, researching, or learning, transcribe.so helps you get usable output from long-form content faster.

Podcast and interview transcription

Search long conversations, find strong quotes, and jump straight to important moments with chapters, citations, and playback.

Subtitle creation for videos

Generate subtitles that are easier to use in your editing workflow, with more control than rough auto-captions.

Learning from YouTube and lectures

Turn long videos into structured content with chapters, cited answers, and searchable playback so you can study faster.

Meeting and recording review

Upload calls, notes, or voice recordings and quickly find decisions, highlights, and follow-up moments without re-listening to everything.

No credit card required. Pay only for what you use.

FAQ

Before you try transcribe.so

Start your library with one link.

Paste any lecture, podcast, or video. Free credits to start. See the exact cost before you confirm.

No credit card required. Pay only for what you use.

Keep scrolling for details

Product features in depth

What you get with every transcript

Every upload joins your searchable library. You get chapters, cited Q&A with timestamps, library-wide search, subtitles for any editor, and full exports. The right engine for your language is picked automatically.

In the box

Library-wide search across every transcript you own
Cited Q&A with timestamps. Click to jump to the second they said it
AI-generated chapters and section detection
AI summary and key takeaways
Speaker labels on multi-speaker audio
Entity extraction (people, places, brands)
67 languages with measured accuracy per language
Subtitle exports (SRT, WebVTT, karaoke VTT, JSON)
Encrypted Cloudflare R2 storage. Your audio is never used for training
Power-user toggle: override the default engine (GPT-4o, Qwen3-ASR-Flash, or Voxtral)

No credit card required. Pay only for what you use.

Subtitles

Subtitles ready for any editor

Every transcript ships with word-level timestamps, formatted as SRT or WebVTT for CapCut, Premiere Pro, DaVinci Resolve, or Final Cut Pro. Pick a platform preset or tune every parameter.

Platform Presets

One-click presets tuned for each platform's readability standards. Each preset controls characters per line, max lines, reading speed (CPS), timing gaps, and more.

YouTube
Long-form captions optimized for readability
20 CPS · 2 lines
TikTok / Shorts
Short, punchy single-line captions
20 CPS · 1 line
Netflix-style
Professional broadcast with strict reading speed
17 CPS · 2 lines
Podcast
Longer segments with speaker labels
15 CPS · 2 lines
Broadcast / TV
Traditional broadcast standards
15 CPS · 2 lines
Custom
Full control over every parameter

Export Formats

Export in the format your video editor needs. SRT and WebVTT import directly into CapCut, Premiere Pro, DaVinci Resolve, and Final Cut Pro.

SRT
CapCut, Premiere Pro, DaVinci Resolve, Final Cut Pro & more
WebVTT
Web players, CapCut, and editors with styling support
Karaoke VTT
Word-by-word highlight timing
JSON
Full data with word timestamps

Powered by Word-Level Timestamps

Unlike simple text-splitting tools, our subtitle engine uses precise word-level timestamps from your transcription to build optimally timed cues.

Line breaks chosen for readability, not character count
Smart line breaking at natural pauses
CPS-aware reading speed optimization
Automatic gap and duration enforcement
Speaker label support for multi-speaker content
Live preview before export
Privacy First

Your private files stay private

Worried about uploading sensitive audio? Privacy is built into the platform from the bottom up.

Encrypted Storage

Your files are stored in private Cloudflare R2 buckets with time-limited access links. Only you can view your transcriptions.

Instant Deletion

Delete anytime. All data is instantly removed from our servers. No backups, no retention, completely gone.

Trusted Infrastructure

Inference and embeddings via trusted enterprise providers (OpenAI, Mistral, and partners). Storage on Cloudflare R2. No other third parties involved.

Your Data, Your Control

We don't use your content for AI training. Your transcriptions are private and never shared or made public.

Questions about privacy? Contact us

Export & Share

Copy or export anything you read

Export anything in markdown. Chapters, search results, Q&A history all carry timestamps that link back to the source.

Table of Contents
Chapters
Search Results
Q&A History
One-click copy Markdown download Playable YouTube links Direct timestamps Time ranges