Send an audio file, get back a full transcript with speaker identification. Powered by Pyannote (diarization) + Groq Whisper (transcription).
https://transcript.paatch.ai/api/v1All requests require a Bearer token in the Authorization header. Get your API key from the hackathon organizers.
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://transcript.paatch.ai/api/v1/healthCheck API status and available services.
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://transcript.paatch.ai/api/v1/healthResponse:
{
"status": "ok",
"version": "1.1.0",
"services": { "pyannote": true, "groq": true, "ffmpeg": true },
"features": {
"autoCompression": "Files > 25MB auto-compressed via ffmpeg",
"webhookCallbacks": "POST results to your URL when async job completes",
"speakerRenaming": "Map SPEAKER_00/01 to real names"
},
"rateLimit": "10 requests per hour",
"formats": ["json", "text", "markdown", "srt"],
"languages": ["fr", "en", "es", "de", "it", "pt", "nl", "ja", "ko", "zh"]
}Upload an audio file and get a full transcript with speaker diarization. Supports synchronous (wait for result) and asynchronous (poll for result) modes.
| Name | Type | Default | Description |
|---|---|---|---|
filerequired | file | — | Audio file (WAV, MP3, M4A, OGG, FLAC, MP4). Max 100 MB. |
language | string | fr | ISO 639-1 language code (fr, en, es, de, it, pt, nl, ja, ko, zh). |
format | string | json | Output format: json, text, markdown, or srt. |
async | boolean | false | Set to "true" for async mode. Returns a job ID to poll. |
webhook | string | — | URL to receive a POST callback when the job completes (async mode). Retries up to 3 times with exponential backoff (2s, 4s, 8s). |
compress | string | auto | Audio compression mode. "auto" compresses files > 25MB via ffmpeg (mono 16kHz MP3). "true" always compresses. "false" never compresses. |
speakers | string | — | JSON object mapping speaker IDs to names. Example: {"SPEAKER_00": "Alice", "SPEAKER_01": "Bob"}. Applied to all output formats. |
Waits for the full pipeline to complete (typically 2-5 minutes) and returns the result directly.
curl -X POST \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "[email protected]" \
-F "language=fr" \
-F "format=json" \
https://transcript.paatch.ai/api/v1/transcribeimport requests
response = requests.post(
"https://transcript.paatch.ai/api/v1/transcribe",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files={"file": open("recording.m4a", "rb")},
data={"language": "fr", "format": "json"},
timeout=600 # 10 min timeout for long audio
)
result = response.json()
print(f"Speakers: {result['speakers']}")
print(f"Turns: {result['turnCount']}")
for turn in result["turns"]:
print(f"[{turn['speaker']}] {turn['text']}")
const form = new FormData();
form.append("file", fs.createReadStream("recording.m4a"));
form.append("language", "fr");
form.append("format", "json");
const response = await fetch("https://transcript.paatch.ai/api/v1/transcribe", {
method: "POST",
headers: { "Authorization": "Bearer YOUR_API_KEY" },
body: form,
});
const result = await response.json();
console.log("Speakers:", result.speakers);
result.turns.forEach(t => console.log(`[${t.speaker}] ${t.text}`));
JSON response:
{
"status": "succeeded",
"speakers": ["SPEAKER_00", "SPEAKER_01"],
"duration": 1320.5,
"wordCount": 3147,
"turnCount": 360,
"turns": [
{
"speaker": "SPEAKER_00",
"start": 0.52,
"end": 4.18,
"text": "Bonjour, comment allez-vous aujourd'hui ?"
},
{
"speaker": "SPEAKER_01",
"start": 4.35,
"end": 8.92,
"text": "Très bien merci, on peut commencer quand vous voulez."
}
],
"fullText": "Bonjour, comment allez-vous aujourd'hui ? Très bien merci...",
"compressed": false
}Returns immediately with a job ID. Poll GET /jobs/:jobId to check status.
# Start async job
curl -X POST \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "[email protected]" \
-F "language=fr" \
-F "async=true" \
https://transcript.paatch.ai/api/v1/transcribe
# Response: { "jobId": "job_abc123", "status": "processing", "pollUrl": "/api/v1/jobs/job_abc123" }
# Poll for result
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://transcript.paatch.ai/api/v1/jobs/job_abc123Check the status of an async transcription job. Jobs expire after 2 hours.
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://transcript.paatch.ai/api/v1/jobs/job_abc123Possible responses:
// Processing
{ "jobId": "job_abc123", "status": "processing", "progress": "Running speaker diarization..." }
// Succeeded
{ "jobId": "job_abc123", "status": "succeeded", "result": { /* same as sync response */ } }
// Failed
{ "jobId": "job_abc123", "status": "failed", "error": "Diarization failed: ..." }In async mode, provide a webhook URL to receive results automatically when the job completes. The API will retry up to 3 times with exponential backoff (2s, 4s, 8s) if your endpoint is unavailable.
curl -X POST \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "[email protected]" \
-F "language=fr" \
-F "async=true" \
-F "webhook=https://your-server.com/callback" \
https://transcript.paatch.ai/api/v1/transcribeWebhook payload (POST to your URL):
// On success
{
"event": "transcription.completed",
"jobId": "job_abc123",
"status": "succeeded",
"result": { /* same as sync JSON response */ },
"timestamp": "2025-06-15T14:30:00.000Z"
}
// On failure
{
"event": "transcription.failed",
"jobId": "job_abc123",
"status": "failed",
"error": "Diarization failed: ...",
"timestamp": "2025-06-15T14:30:00.000Z"
}Retry policy: If your webhook endpoint returns a non-2xx status or is unreachable, the API retries up to 3 times with exponential backoff (2s → 4s → 8s). Each request has a 10-second timeout.
The Groq Whisper API has a 25 MB file size limit. To handle larger files seamlessly, the API includes built-in audio compression via FFmpeg.
"auto"RecommendedDefault. Compresses only if the file exceeds 25 MB. No action needed for smaller files.
"true"ForceAlways compress, regardless of file size. Useful for faster upload/processing.
"false"SkipNever compress. Files over 25 MB will fail at the Groq transcription step.
Compression settings: mono, 16 kHz, 64 kbps MP3. The JSON response includes a compressed: true/false field.
By default, speakers are labeled SPEAKER_00, SPEAKER_01, etc. Use the speakers parameter to map these IDs to real names.
curl -X POST \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "[email protected]" \
-F "language=fr" \
-F 'speakers={"SPEAKER_00": "Alice", "SPEAKER_01": "Bob"}' \
https://transcript.paatch.ai/api/v1/transcribeThe mapping is applied to all output formats (JSON turns, plain text, markdown, SRT). Unmapped speakers keep their original ID.
jsonapplication/jsonStructured data with speakers, turns, timestamps, and full text. Best for building apps.
texttext/plainPlain text with timestamps and speaker labels. Easy to read and share.
markdowntext/markdownFormatted Markdown with metadata table and speaker-labeled paragraphs.
srttext/srtSubRip subtitle format with speaker labels. Import into video editors.
10 requests per hour per IP address. This is a hackathon API — be mindful of shared resources. If you hit the limit, you'll receive a 429 response with a retryAfter field (in seconds).
| Code | Meaning | What to do |
|---|---|---|
400 | Bad request | Check file field and format parameter |
401 | Missing API key | Add Authorization: Bearer <key> header |
403 | Invalid API key | Check your API key with the organizers |
429 | Rate limit exceeded | Wait for retryAfter seconds |
500 | Server error | Retry or contact organizers |
import requests
API_KEY = "YOUR_API_KEY"
API_URL = "https://transcript.paatch.ai/api/v1"
# 1. Check the API is up
health = requests.get(f"{API_URL}/health", headers={"Authorization": f"Bearer {API_KEY}"})
print(health.json())
# 2. Transcribe an audio file (synchronous)
with open("my_recording.m4a", "rb") as f:
response = requests.post(
f"{API_URL}/transcribe",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": f},
data={"language": "fr", "format": "json"},
timeout=600
)
result = response.json()
# 3. Print the transcript
for turn in result["turns"]:
mins = int(turn["start"] // 60)
secs = int(turn["start"] % 60)
print(f"[{mins:02d}:{secs:02d}] {turn['speaker']}: {turn['text']}")
# 4. Get plain text version
response_txt = requests.post(
f"{API_URL}/transcribe",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": open("my_recording.m4a", "rb")},
data={"language": "fr", "format": "text"},
timeout=600
)
print(response_txt.text)
# 5. Get Markdown version
response_md = requests.post(
f"{API_URL}/transcribe",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": open("my_recording.m4a", "rb")},
data={"language": "fr", "format": "markdown"},
timeout=600
)
with open("transcript.md", "w") as f:
f.write(response_md.text)
Your audio file is uploaded to Pyannote's temporary storage.
Pyannote identifies who speaks when (speaker segments).
Groq Whisper converts speech to text with word-level timestamps.
Words are aligned to speaker segments to produce the final transcript.