transcribe.so API · v1

API Reference

Everything you need to call the transcribe.so API. Bearer auth, three input sources (same as the dashboard's transcribe form), webhook delivery, errors, idempotency. If you've used Stripe's API the patterns will feel familiar.

Quick start

One curl call from a YouTube URL to a queued transcription:

bash
curl -X POST https://transcribe.so/api/v1/transcriptions \
  -H "Authorization: Bearer tsk_live_REPLACE_ME" \
  -H "Content-Type: application/json" \
  -d '{
    "source": "youtube",
    "url": "https://youtu.be/dQw4w9WgXcQ",
    "pipeline_code": "qwen3-asr-flash-filetrans"
  }'

You'll get an integer id back. Poll GET /api/v1/transcriptions/<id> until status is completed, then fetch /result. Or register a webhook (see below) and we'll POST you when it's done.

Conventions

  • Base URL: https://transcribe.so. Versioned prefix: /api/v1.
  • Auth: every request carries Authorization: Bearer tsk_live_….
  • Content type: JSON in/out, UTF-8.
  • Identifiers: 4821 (plain integer) for transcriptions, tsk_live_… for API keys.
  • Every response includes X-Request-Id. Quote it in support tickets.
  • Rate limit: 60 requests / minute per key. Exceeded → 429 rate_limited.
  • CORS: every /api/v1/* endpoint is open to any origin. Bearer auth, no cookies.
  • Pricing: identical to /pricing. Wallet drains monthly credit first, then top-up balance.

Authentication

The API uses Bearer tokens — no OAuth, no JWT, no cookies. Treat a key like a password.

Get a key

  1. Sign in and visit /settings/api-keys.
  2. Click Create key, give it a name (e.g. n8n-prod).
  3. Copy the plaintext immediately — we show it once and never again. The server only stores sha256(key).

Smoke test

bash
curl -sS https://transcribe.so/api/v1/me \
  -H "Authorization: Bearer $TRANSCRIBE_API_KEY"

Returns the authenticated user, current wallet balance, and remaining monthly credit.

Limits

  • 20 active keys per user
  • 60 requests / minute per key
  • Wallet itself is the spend cap; no separate per-key cap in v1

Three input sources

POST /api/v1/transcriptions accepts the same three sources as the dashboard's /transcriptions page. All three go through the same quote → wallet hold → enqueue path.

sourcewhen to useduration_seconds
youtubePublic YouTube URL.Not needed — we probe the video.
external_urlDirect audio/video URL on a public host.Optional. Pass when known to skip a probe round-trip.
uploadFile on your machine; no public URL.Required. S3 isn't probed from the API.

Endpoints

GET/api/v1/me

The authenticated user, wallet, tier, and a self-discovering links map.

200 response

json
{
  "user_id": "49bf19f6-…",
  "email": "you@example.com",
  "wallet_balance_usd": 97.55,
  "monthly_credit_remaining_usd": 0.10,
  "subscription_tier": "free",
  "links": {
    "dashboard":  "https://transcribe.so/transcriptions",
    "api_keys":   "https://transcribe.so/settings/api-keys",
    "billing":    "https://transcribe.so/billing",
    "docs":       "https://transcribe.so/developers/docs",
    "support":    "https://transcribe.so/contact"
  }
}

The links map gives your client a stable spot to surface "manage your key" / "top up" / "see docs" actions without hardcoding URLs.

GET/api/v1/pipelines

Available models with current per-minute pricing — same rates as the dashboard's /pricing page.

200 response

json
{
  "pipelines": [
    {
      "code": "qwen3-asr-flash-filetrans",
      "name": "Qwen3 ASR Flash",
      "retail_usd_per_min": 0.0362,
      "retail_usd_per_hour": 2.17,
      "supported_languages": ["en", "zh", "es", …],
      "word_timestamp_languages": ["en", "zh", …],
      "timestamp_options": ["sentence", "word"]
    }
  ]
}
POST/api/v1/uploads

Step 1 of the upload flow. Returns a short-lived presigned S3 PUT URL.

Body

json
{
  "filename": "podcast.mp3",
  "content_type": "audio/mpeg",
  "file_size": 8421120
}
  • Allowed content_type: any audio/* or video/* mime — audio/mpeg, audio/wav, audio/mp4, audio/x-m4a, audio/aac, audio/ogg, audio/webm, audio/flac, video/mp4, video/webm, video/quicktime, video/x-msvideo.
  • Max file_size: 500 MB.

200 response

json
{
  "upload_id": "user/<uuid>/uploads/1777458021_abe2ea44.mp3",
  "upload_url": "https://s3.transcribe.so/...",
  "expires_in": 900
}

Then PUT the raw file body to upload_url with the same Content-Type header. URL expires in 900s.

For files over ~50 MB or unstable networks, prefer the resumable variant below.

POST/api/v1/uploads/tus

Resumable upload via tusd. Returns a tusd endpoint URL plus a short-lived HMAC ticket. The client uploads with any tus 1.0 client; tusd writes to S3 chunk by chunk and resumes on network drops.

Body

json
{
  "filename": "podcast.mp3",
  "file_size": 187654321
}
  • Max file_size: 500 MB. The token is bound to this size; tusd rejects uploads that exceed it.
  • No content_type needed at this step. The worker sniffs the file when it processes.

200 response

json
{
  "upload_endpoint": "https://upload.transcribe.so/files/",
  "upload_token": "eyJ1IjoiYWY5...",
  "upload_metadata_key": "upload-token",
  "expires_in": 3600,
  "max_file_size": 524288000
}

Use any tus 1.0 client. Recommended: tus-js-client (browser + Node) and tus-py-client (Python). Put upload_token in Upload-Metadata under upload_metadata_key.

After the upload finishes

Tusd's Location header has the form <endpoint>/<id>+<resume-token>. Pass upload_id = "tus/<id>+<resume-token>" (or tus/<id> alone — the server normalizes) to POST /api/v1/transcriptions with source: "upload" and duration_seconds. The same quote → wallet-hold → enqueue path the presigned-PUT flow uses.

See the resumable upload recipe for a working end-to-end example.

POST/api/v1/transcriptions202

Submit a transcription. Three source modes; same dance the dashboard does.

Body — youtube

json
{
  "source": "youtube",
  "url": "https://youtu.be/dQw4w9WgXcQ",
  "pipeline_code": "qwen3-asr-flash-filetrans",
  "language": "auto"
}

Body — external_url

json
{
  "source": "external_url",
  "url": "https://example.com/podcast.mp3",
  "pipeline_code": "qwen3-asr-flash-filetrans",
  "language": "auto",
  "duration_seconds": 1234
}

Body — upload

json
{
  "source": "upload",
  "upload_id": "user/<uuid>/uploads/...mp3",
  "original_filename": "podcast.mp3",
  "duration_seconds": 1234,
  "pipeline_code": "qwen3-asr-flash-filetrans",
  "language": "auto"
}

202 response

json
{
  "id": 4821,
  "status": "processing",
  "stage": "queued",
  "pipeline_code": "qwen3-asr-flash-filetrans",
  "language": "auto",
  "source": "upload",
  "upload_id": "user/...",
  "duration_seconds": 1234,
  "billed_minutes": 20.6,
  "retail_usd": 0.7457
}

For youtube and external_url, the response carries url instead of upload_id.

Send Idempotency-Key on retries (see below).

GET/api/v1/transcriptions

List your transcriptions, newest first. Cursor-paginated.

Query

  • limit — 1–200, default 50
  • cursor — ISO timestamp of the last item from the previous page
  • api_only=true — filter to API-originated jobs
GET/api/v1/transcriptions/:id

Single transcription metadata + status.

GET/api/v1/transcriptions/:id/result

Full result body. Only meaningful once status === completed.

json
{
  "id": 4821,
  "status": "completed",
  "segments":  [{ "start_seconds": 0.0, "end_seconds": 4.21, "text": "..." }],
  "chapters":  [{ "start_seconds": 0.0, "end_seconds": 145.6, "title": "...", "summary": "..." }],
  "sections": [{ "section_index": 0, "title": "...", "summary": "...", "start_seconds": 0.0 }],
  "qna":       [{ "question": "...", "answer": "...", "citations": [...] }]
}
GET/api/v1/transcriptions/:id/timestamps

Platform-formatted chapter timestamps. Reads from the LLM-curated posting_chapters cache. Returns 409 not_ready if the requested style hasn't been generated — POST /timestamps/regenerate to populate it.

Required: ?format=<destination>. Each format encodes its rules (character budget, min spacing, first-must-be-0:00, HH:MM:SS) so you paste straight into the destination field.

json
{
  "format": "spotify",
  "style": "balanced",
  "text": "0:00 How Floga's pre-launch hit $100K\n1:20 Why your first hire defines culture\n4:15 ...",
  "char_count": 412,
  "items_used": 12,
  "items_total": 12,
  "truncated": false,
  "ok_to_paste": true,
  "warnings": [],
  "constraints": {
    "maxChars": 4000,
    "minChapters": 3,
    "firstMustBeZero": true,
    "minSpacingSeconds": 30,
    "titleCap": 40,
    "label": "Spotify"
  },
  "source": {
    "kind": "posting_chapters",
    "generated_at": "2026-05-09T12:34:56Z",
    "model": "qwen3.6-plus",
    "regen_count": 0,
    "available_styles": ["balanced", "punchy"]
  }
}

Query

  • format — required. v1 supports youtube, spotify, apple_podcasts, markdown, clip_ideas, show_notes, original.
  • style — optional. balanced (default), punchy, educational. Drives the LLM voice for the curated titles. Ignored for show_notes (sources from chapter summaries) and original (raw sections).

Format reference

  • youtube / spotify / apple_podcasts / markdown — standard 10–30-item chapter list, formatted per platform's first-party rules.
  • clip_ideas — the 3 most shareable moments, rendered as a YouTube-style chapter list (paste into video descriptions for clip lists).
  • show_notes — per-chapter Markdown summaries from transcript_chapters.summary. Always Markdown output regardless of any other request.
  • original — every raw transcript_section with its title and timestamp; bypasses LLM curation for max granularity.

409 not_ready envelope

When the requested style isn't cached, the error block includes available_styles so you can fall back to a free read instead of paying for a regenerate:

json
{
  "error": {
    "code": "not_ready",
    "message": "Posting chapters not generated for style=\"punchy\". Cached styles: balanced. Either retry with a cached style or POST /api/v1/transcriptions/4821/timestamps/regenerate to add this one.",
    "request_id": "req_…",
    "doc_url": "https://transcribe.so/developers/docs#endpoints",
    "available_styles": ["balanced"],
    "requested_style": "punchy"
  }
}
POST/api/v1/transcriptions/:id/timestamps/regenerate200

Re-runs the LLM curate+polish step. Use to change style, add a refine prompt, or generate posting_chapters for a transcription that predates this feature.

json
{
  "style": "punchy",
  "refine_prompt": "focus on the case studies"
}

Body

  • style — optional. balanced (default), punchy, educational.
  • refine_prompt — optional, ≤200 chars. Free-text steer for the LLM (e.g. "focus on case studies"). Validated for prompt-injection markers.

Latency: 30–90 seconds. Synchronous — the response body contains the freshly-generated chapters. Capped at 10 regenerations per transcription per user.

For long-running jobs in general (transcription itself, not regenerate), prefer webhook subscriptions over loop-polling.

DELETE/api/v1/transcriptions/:id

Permanently deletes the transcription, derived rows, and S3 objects.

POST/api/v1/transcriptions/:id/retry202

Restart a failed job. Charges run again from scratch.

POST/api/v1/quotes

Preview cost (and reserve a quoted row) without queueing. Same body shape as POST /transcriptions.

Errors

Every error response uses the same envelope:

json
{
  "error": {
    "code": "insufficient_funds",
    "message": "Wallet balance too low. Top up your wallet at https://transcribe.so/billing.",
    "request_id": "req_a1b2c3d4e5f6",
    "doc_url": "https://transcribe.so/billing"
  }
}
  • message inlines an actionable URL where one applies. Terminal users see the link without parsing JSON.
  • doc_url always points at a stable docs section or dashboard surface for that error.
  • request_id is also returned as X-Request-Id on every response — quote it in support tickets.
codeHTTPwhen
unauthenticated401Missing Authorization header.
invalid_api_key401Key malformed, unknown, revoked, or expired.
not_found404Resource doesn't exist or isn't yours.
invalid_request400Body / query / path parameter is missing or malformed.
unsupported_pipeline400pipeline_code isn't recognized.
insufficient_funds402Wallet + monthly credit can't cover the estimated charge.
rate_limited429Per-key request rate exceeded (60/min).
internal_error500Server bug; safe to retry with backoff. Quote request_id.

Retry guidance

  • 429: back off 60s.
  • 500: exponential backoff (1, 2, 4, 8s, max 60s), cap at 5 attempts. Use the same Idempotency-Key so duplicates don't bill twice.
  • 402: do not retry until the user tops up.
  • 400 / 401 / 404: don't retry; fix the request.

Idempotency

POST endpoints accept an Idempotency-Key header. Use it on any request that creates or starts something, so retries don't double-bill or double-queue.

bash
POST /api/v1/transcriptions
Idempotency-Key: 2026-04-30-podcast-ep-149
  • First request runs normally. Subsequent requests with the same (api_key, idempotency_key) within 24h return the original response unchanged.
  • Reusing the same key with a different body returns 400 invalid_request.
  • 2xx and 4xx responses are cached; 5xx are not (so you can retry past transient bugs).
  • Max key length: 128 chars. Use a UUID, content hash, or stable composite — anything that doesn't change across retries of the same logical request.

Async patterns (don't poll)

Transcriptions are async (~60s for 1-min audio, ~5min for an hour-long podcast). For long-running ops, ranked best-to-worst:

  1. Webhook — best for fire-and-forget pipelines. Register one URL per API key; we deliver an HMAC-signed transcription.completed POST when the job hits a terminal state. No connections held open, no rate-limit pressure, scales to any volume. Setup →
  2. Long-poll /wait — best for synchronous "create-and-wait" flows where you can hold one HTTP connection. Server holds the response open up to timeout seconds (max 45) and returns as soon as the job finishes. One request per job.
    bash
    curl -H "Authorization: Bearer $TRANSCRIBE_API_KEY" \
      "https://transcribe.so/api/v1/transcriptions/4821/wait?timeout=30&include=chapters,sections,qna"
  3. Email notification — falls out automatically. Every user gets an email when their job completes. No code required.
  4. Loop-polling GET /transcriptions/:iddon't. Naive polling every few seconds wastes tokens, eats rate limit, and gives you no faster signal than the long-poll. If your runtime can't hold a connection, use the webhook.

The synchronous endpoints (POST /timestamps/regenerate, POST /transcriptions) intentionally block until they have something to return; you don't poll those — you await the single response.

Webhooks

Get a signed POST when a transcription finishes — no polling. One webhook per API key.

Events

  • transcription.completed — your transcription reached status: completed.
  • transcription.failed — your transcription reached status: failed.
  • webhook.test — you called POST /api/v1/webhooks/test.

Register

bash
curl -X POST https://transcribe.so/api/v1/webhooks \
  -H "Authorization: Bearer $TRANSCRIBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com/transcribe-so/webhook" }'

Returns the webhook id and a one-time signing_secret. Store it — we never show it again. You can also register a webhook from the dashboard at /settings/api-keys.

Payload

json
{
  "id": "evt_1234",
  "event": "transcription.completed",
  "created": 1777472458,
  "data": {
    "transcription": {
      "id": 4821,
      "status": "completed",
      "stage": "completed",
      "pipeline_code": "qwen3-asr-flash-filetrans",
      "language": "auto",
      "source": "upload",
      "title": "podcast.mp3",
      "duration_seconds": 60,
      "billed_seconds": 60,
      "charge_usd": 0.03,
      "completed_at": "2026-04-29T14:25:27.968Z"
    }
  }
}

Fetch the full result (segments, chapters, sections, qna) via GET /api/v1/transcriptions/:id/result — we don't push the full body inline because it can be large.

Verify the signature

Every delivery carries X-Transcribe-Signature: t=<unix-seconds>,v1=<hex>. The v1 value is hex(hmac_sha256(signing_secret, `$${t}.$${rawBody}`)). Verify on the raw body (re-serializing JSON breaks the HMAC).

typescript
import { createHmac, timingSafeEqual } from "crypto";

function verify(rawBody: string, header: string, secret: string): boolean {
  const m = header.match(/t=(\d+),v1=([0-9a-f]+)/);
  if (!m) return false;
  const [, t, v1] = m;
  // Reject if more than 5 minutes off (replay protection).
  if (Math.abs(Math.floor(Date.now() / 1000) - Number(t)) > 300) return false;
  const expected = createHmac("sha256", secret).update(`${t}.${rawBody}`).digest("hex");
  return expected.length === v1.length &&
    timingSafeEqual(Buffer.from(expected, "utf8"), Buffer.from(v1, "utf8"));
}
python
import hmac, hashlib, re, time

def verify(raw_body: bytes, header: str, secret: str) -> bool:
    m = re.match(r"t=(\d+),v1=([0-9a-f]+)", header)
    if not m: return False
    t, v1 = m.group(1), m.group(2)
    if abs(int(time.time()) - int(t)) > 300: return False
    expected = hmac.new(
        secret.encode(),
        f"{t}.{raw_body.decode()}".encode(),
        hashlib.sha256,
    ).hexdigest()
    return hmac.compare_digest(expected, v1)

Retry

We retry any non-2xx (or network failure) at 1m, 5m, 30m, 3h, 12h. Five attempts max, 10s HTTP timeout each. After 5 consecutive failures across deliveries, the webhook itself is auto-disabled — re-enable it from the dashboard once your endpoint is healthy.

Send a test event

bash
curl -X POST https://transcribe.so/api/v1/webhooks/test \
  -H "Authorization: Bearer $TRANSCRIBE_API_KEY"

Enqueues a synthetic webhook.test delivery — useful to confirm your URL is reachable and signature verification works before any real transcriptions run.

Pricing

Same per-minute rates as the dashboard. Billed against your wallet — monthly credit first, then top-up balance. No separate API quota, no minimums.

ModelPipeline codePer minutePer hour
GPT-4o Transcribe (timestamps + diarization) + AI analysisgpt-4o-transcribe-diarize$0.0538$3.23
Qwen3-ASR-Flash-Filetrans (timestamps) + AI analysisqwen3-asr-flash-filetrans$0.0176$1.06
Voxtral Mini Transcribe with Diarization + AI Analysisvoxtral-mini-transcribe$0.0187$1.12

End-to-end walkthrough

Full upload flow with curl. The hardest path — YouTube and external URL skip steps 2-3.

For files over ~50 MB or unstable networks, see the resumable upload recipe instead. Same auth, same continuation step.

bash
# 0. Smoke test
curl -sS https://transcribe.so/api/v1/me \
  -H "Authorization: Bearer $TRANSCRIBE_API_KEY"

# 1. Get a presigned upload URL
SIZE=$(stat -f%z podcast.mp3 2>/dev/null || stat -c%s podcast.mp3)
PRESIGN=$(curl -sS -X POST https://transcribe.so/api/v1/uploads \
  -H "Authorization: Bearer $TRANSCRIBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{ \"filename\": \"podcast.mp3\", \"content_type\": \"audio/mpeg\", \"file_size\": $SIZE }")
UPLOAD_URL=$(echo "$PRESIGN" | jq -r .upload_url)
UPLOAD_ID=$(echo "$PRESIGN" | jq -r .upload_id)

# 2. PUT the file straight to S3
curl -sS -X PUT "$UPLOAD_URL" \
  -H "Content-Type: audio/mpeg" \
  --data-binary @podcast.mp3

# 3. Submit the transcription
DURATION=$(ffprobe -i podcast.mp3 -show_entries format=duration -v quiet -of csv="p=0" | cut -d'.' -f1)
JOB=$(curl -sS -X POST https://transcribe.so/api/v1/transcriptions \
  -H "Authorization: Bearer $TRANSCRIBE_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d "{
    \"source\": \"upload\",
    \"upload_id\": \"$UPLOAD_ID\",
    \"original_filename\": \"podcast.mp3\",
    \"duration_seconds\": $DURATION,
    \"pipeline_code\": \"qwen3-asr-flash-filetrans\"
  }")
TR_ID=$(echo "$JOB" | jq -r .id)

# 4. Poll until done
while true; do
  STATE=$(curl -sS "https://transcribe.so/api/v1/transcriptions/$TR_ID" \
    -H "Authorization: Bearer $TRANSCRIBE_API_KEY")
  echo "$STATE" | jq -r '"\(.status) · \(.stage)"'
  S=$(echo "$STATE" | jq -r .status)
  [[ "$S" == "completed" || "$S" == "failed" ]] && break
  sleep 5
done

# 5. Pull the result
curl -sS "https://transcribe.so/api/v1/transcriptions/$TR_ID/result" \
  -H "Authorization: Bearer $TRANSCRIBE_API_KEY" | jq

Same flow in Python:

python
import os, time, requests

API = "https://transcribe.so/api/v1"
H = {"Authorization": f"Bearer {os.environ['TRANSCRIBE_API_KEY']}"}

with open("podcast.mp3", "rb") as f:
    body = f.read()

p = requests.post(f"{API}/uploads", headers=H, json={
    "filename": "podcast.mp3",
    "content_type": "audio/mpeg",
    "file_size": len(body),
}).json()

requests.put(p["upload_url"], data=body, headers={"Content-Type": "audio/mpeg"}).raise_for_status()

job = requests.post(f"{API}/transcriptions",
    headers={**H, "Idempotency-Key": "podcast-149"},
    json={
        "source": "upload",
        "upload_id": p["upload_id"],
        "original_filename": "podcast.mp3",
        "duration_seconds": 60,
        "pipeline_code": "qwen3-asr-flash-filetrans",
    },
).json()

while True:
    state = requests.get(f"{API}/transcriptions/{job['id']}", headers=H).json()
    if state["status"] in ("completed", "failed"):
        break
    time.sleep(3)

result = requests.get(f"{API}/transcriptions/{job['id']}/result", headers=H).json()
print(f"segments={len(result['segments'])} chapters={len(result['chapters'])} sections={len(result['sections'])}")

Common failure modes

symptomcausefix
401 unauthenticated on every callMissing Authorization header.Add -H 'Authorization: Bearer $KEY'.
401 invalid_api_keyKey revoked, expired, or typo.Recreate at /settings/api-keys.
400 invalid_request: duration_seconds (>0)…Forgot duration_seconds on source=upload.Probe with ffprobe; pass it.
S3 PUT 403Presigned URL expired (900s).Re-call POST /uploads, PUT promptly.
402 insufficient_fundsWallet < estimated charge.Top up via the dashboard, or pick a cheaper pipeline.
429 rate_limitedExceeded 60 req/min on this key.Back off 60s; consider a second key for parallel pipelines.

Ready to ship?

Create a key, paste it into your script, and you're transcribing inside a minute.