Paatch Transcriber
|API Documentation
AppHackathon Edition
Hackathon API

Transcription API with Speaker Diarization

Send an audio file, get back a full transcript with speaker identification. Powered by Pyannote (diarization) + Groq Whisper (transcription).

Bearer token auth
JSON, Text, Markdown, SRT
Sync & async modes

Base URL

url
https://transcript.paatch.ai/api/v1

Authentication

All requests require a Bearer token in the Authorization header. Get your API key from the hackathon organizers.

bash
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://transcript.paatch.ai/api/v1/health

GET/health

Check API status and available services.

bash
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://transcript.paatch.ai/api/v1/health

Response:

json
{
  "status": "ok",
  "version": "1.1.0",
  "services": { "pyannote": true, "groq": true, "ffmpeg": true },
  "features": {
    "autoCompression": "Files > 25MB auto-compressed via ffmpeg",
    "webhookCallbacks": "POST results to your URL when async job completes",
    "speakerRenaming": "Map SPEAKER_00/01 to real names"
  },
  "rateLimit": "10 requests per hour",
  "formats": ["json", "text", "markdown", "srt"],
  "languages": ["fr", "en", "es", "de", "it", "pt", "nl", "ja", "ko", "zh"]
}

POST/transcribe

Upload an audio file and get a full transcript with speaker diarization. Supports synchronous (wait for result) and asynchronous (poll for result) modes.

Parameters

NameTypeDefaultDescription
filerequiredfileAudio file (WAV, MP3, M4A, OGG, FLAC, MP4). Max 100 MB.
languagestringfrISO 639-1 language code (fr, en, es, de, it, pt, nl, ja, ko, zh).
formatstringjsonOutput format: json, text, markdown, or srt.
asyncbooleanfalseSet to "true" for async mode. Returns a job ID to poll.
webhookstringURL to receive a POST callback when the job completes (async mode). Retries up to 3 times with exponential backoff (2s, 4s, 8s).
compressstringautoAudio compression mode. "auto" compresses files > 25MB via ffmpeg (mono 16kHz MP3). "true" always compresses. "false" never compresses.
speakersstringJSON object mapping speaker IDs to names. Example: {"SPEAKER_00": "Alice", "SPEAKER_01": "Bob"}. Applied to all output formats.

Synchronous mode (default)

Waits for the full pipeline to complete (typically 2-5 minutes) and returns the result directly.

cURL

bash
curl -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F "language=fr" \
  -F "format=json" \
  https://transcript.paatch.ai/api/v1/transcribe

Python

python
import requests

response = requests.post(
    "https://transcript.paatch.ai/api/v1/transcribe",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files={"file": open("recording.m4a", "rb")},
    data={"language": "fr", "format": "json"},
    timeout=600  # 10 min timeout for long audio
)

result = response.json()
print(f"Speakers: {result['speakers']}")
print(f"Turns: {result['turnCount']}")

for turn in result["turns"]:
    print(f"[{turn['speaker']}] {turn['text']}")

JavaScript / Node.js

javascript
const form = new FormData();
form.append("file", fs.createReadStream("recording.m4a"));
form.append("language", "fr");
form.append("format", "json");

const response = await fetch("https://transcript.paatch.ai/api/v1/transcribe", {
  method: "POST",
  headers: { "Authorization": "Bearer YOUR_API_KEY" },
  body: form,
});

const result = await response.json();
console.log("Speakers:", result.speakers);
result.turns.forEach(t => console.log(`[${t.speaker}] ${t.text}`));

JSON response:

json
{
  "status": "succeeded",
  "speakers": ["SPEAKER_00", "SPEAKER_01"],
  "duration": 1320.5,
  "wordCount": 3147,
  "turnCount": 360,
  "turns": [
    {
      "speaker": "SPEAKER_00",
      "start": 0.52,
      "end": 4.18,
      "text": "Bonjour, comment allez-vous aujourd'hui ?"
    },
    {
      "speaker": "SPEAKER_01",
      "start": 4.35,
      "end": 8.92,
      "text": "Très bien merci, on peut commencer quand vous voulez."
    }
  ],
  "fullText": "Bonjour, comment allez-vous aujourd'hui ? Très bien merci...",
  "compressed": false
}

Async mode

Returns immediately with a job ID. Poll GET /jobs/:jobId to check status.

bash
# Start async job
curl -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F "language=fr" \
  -F "async=true" \
  https://transcript.paatch.ai/api/v1/transcribe

# Response: { "jobId": "job_abc123", "status": "processing", "pollUrl": "/api/v1/jobs/job_abc123" }

# Poll for result
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://transcript.paatch.ai/api/v1/jobs/job_abc123

GET/jobs/:jobId

Check the status of an async transcription job. Jobs expire after 2 hours.

bash
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://transcript.paatch.ai/api/v1/jobs/job_abc123

Possible responses:

json
// Processing
{ "jobId": "job_abc123", "status": "processing", "progress": "Running speaker diarization..." }

// Succeeded
{ "jobId": "job_abc123", "status": "succeeded", "result": { /* same as sync response */ } }

// Failed
{ "jobId": "job_abc123", "status": "failed", "error": "Diarization failed: ..." }

Webhook Callbacks

In async mode, provide a webhook URL to receive results automatically when the job completes. The API will retry up to 3 times with exponential backoff (2s, 4s, 8s) if your endpoint is unavailable.

bash
curl -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F "language=fr" \
  -F "async=true" \
  -F "webhook=https://your-server.com/callback" \
  https://transcript.paatch.ai/api/v1/transcribe

Webhook payload (POST to your URL):

json
// On success
{
  "event": "transcription.completed",
  "jobId": "job_abc123",
  "status": "succeeded",
  "result": { /* same as sync JSON response */ },
  "timestamp": "2025-06-15T14:30:00.000Z"
}

// On failure
{
  "event": "transcription.failed",
  "jobId": "job_abc123",
  "status": "failed",
  "error": "Diarization failed: ...",
  "timestamp": "2025-06-15T14:30:00.000Z"
}

Retry policy: If your webhook endpoint returns a non-2xx status or is unreachable, the API retries up to 3 times with exponential backoff (2s → 4s → 8s). Each request has a 10-second timeout.

Auto-Compression (FFmpeg)

The Groq Whisper API has a 25 MB file size limit. To handle larger files seamlessly, the API includes built-in audio compression via FFmpeg.

"auto"Recommended

Default. Compresses only if the file exceeds 25 MB. No action needed for smaller files.

"true"Force

Always compress, regardless of file size. Useful for faster upload/processing.

"false"Skip

Never compress. Files over 25 MB will fail at the Groq transcription step.

Compression settings: mono, 16 kHz, 64 kbps MP3. The JSON response includes a compressed: true/false field.

Speaker Renaming

By default, speakers are labeled SPEAKER_00, SPEAKER_01, etc. Use the speakers parameter to map these IDs to real names.

bash
curl -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F "language=fr" \
  -F 'speakers={"SPEAKER_00": "Alice", "SPEAKER_01": "Bob"}' \
  https://transcript.paatch.ai/api/v1/transcribe

The mapping is applied to all output formats (JSON turns, plain text, markdown, SRT). Unmapped speakers keep their original ID.

Output Formats

jsonapplication/json

Structured data with speakers, turns, timestamps, and full text. Best for building apps.

texttext/plain

Plain text with timestamps and speaker labels. Easy to read and share.

markdowntext/markdown

Formatted Markdown with metadata table and speaker-labeled paragraphs.

srttext/srt

SubRip subtitle format with speaker labels. Import into video editors.

Rate Limits

10 requests per hour per IP address. This is a hackathon API — be mindful of shared resources. If you hit the limit, you'll receive a 429 response with a retryAfter field (in seconds).

Error Codes

CodeMeaningWhat to do
400Bad requestCheck file field and format parameter
401Missing API keyAdd Authorization: Bearer <key> header
403Invalid API keyCheck your API key with the organizers
429Rate limit exceededWait for retryAfter seconds
500Server errorRetry or contact organizers

Quick Start (Python)

python
import requests

API_KEY = "YOUR_API_KEY"
API_URL = "https://transcript.paatch.ai/api/v1"

# 1. Check the API is up
health = requests.get(f"{API_URL}/health", headers={"Authorization": f"Bearer {API_KEY}"})
print(health.json())

# 2. Transcribe an audio file (synchronous)
with open("my_recording.m4a", "rb") as f:
    response = requests.post(
        f"{API_URL}/transcribe",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"file": f},
        data={"language": "fr", "format": "json"},
        timeout=600
    )

result = response.json()

# 3. Print the transcript
for turn in result["turns"]:
    mins = int(turn["start"] // 60)
    secs = int(turn["start"] % 60)
    print(f"[{mins:02d}:{secs:02d}] {turn['speaker']}: {turn['text']}")

# 4. Get plain text version
response_txt = requests.post(
    f"{API_URL}/transcribe",
    headers={"Authorization": f"Bearer {API_KEY}"},
    files={"file": open("my_recording.m4a", "rb")},
    data={"language": "fr", "format": "text"},
    timeout=600
)
print(response_txt.text)

# 5. Get Markdown version
response_md = requests.post(
    f"{API_URL}/transcribe",
    headers={"Authorization": f"Bearer {API_KEY}"},
    files={"file": open("my_recording.m4a", "rb")},
    data={"language": "fr", "format": "markdown"},
    timeout=600
)
with open("transcript.md", "w") as f:
    f.write(response_md.text)

How It Works

📤
STEP 1

Upload

Your audio file is uploaded to Pyannote's temporary storage.

👥
STEP 2

Diarize

Pyannote identifies who speaks when (speaker segments).

📝
STEP 3

Transcribe

Groq Whisper converts speech to text with word-level timestamps.

🔗
STEP 4

Merge

Words are aligned to speaker segments to produce the final transcript.