App Hackathon Edition

Hackathon API

Transcription API with Speaker Diarization

Send an audio file, get back a full transcript with speaker identification. Powered by Pyannote (diarization) + Groq Whisper (transcription).

Bearer token auth

JSON, Text, Markdown, SRT

Sync & async modes

Base URL

url

https://transcript.paatch.ai/api/v1

Authentication

All requests require a Bearer token in the Authorization header. Get your API key from the hackathon organizers.

bash

curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://transcript.paatch.ai/api/v1/health

GET/health

Check API status and available services.

bash

curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://transcript.paatch.ai/api/v1/health

Response:

json

{
  "status": "ok",
  "version": "1.1.0",
  "services": { "pyannote": true, "groq": true, "ffmpeg": true },
  "features": {
    "autoCompression": "Files > 25MB auto-compressed via ffmpeg",
    "webhookCallbacks": "POST results to your URL when async job completes",
    "speakerRenaming": "Map SPEAKER_00/01 to real names"
  },
  "rateLimit": "10 requests per hour",
  "formats": ["json", "text", "markdown", "srt"],
  "languages": ["fr", "en", "es", "de", "it", "pt", "nl", "ja", "ko", "zh"]
}

POST/transcribe

Upload an audio file and get a full transcript with speaker diarization. Supports synchronous (wait for result) and asynchronous (poll for result) modes.

Parameters

Name	Type	Default	Description
`file`required	file	—	Audio file (WAV, MP3, M4A, OGG, FLAC, MP4). Max 100 MB.
`language`	string	fr	ISO 639-1 language code (fr, en, es, de, it, pt, nl, ja, ko, zh).
`format`	string	json	Output format: json, text, markdown, or srt.
`async`	boolean	false	Set to "true" for async mode. Returns a job ID to poll.
`webhook`	string	—	URL to receive a POST callback when the job completes (async mode). Retries up to 3 times with exponential backoff (2s, 4s, 8s).
`compress`	string	auto	Audio compression mode. "auto" compresses files > 25MB via ffmpeg (mono 16kHz MP3). "true" always compresses. "false" never compresses.
`speakers`	string	—	JSON object mapping speaker IDs to names. Example: {"SPEAKER_00": "Alice", "SPEAKER_01": "Bob"}. Applied to all output formats.

Synchronous mode (default)

Waits for the full pipeline to complete (typically 2-5 minutes) and returns the result directly.

cURL

bash

curl -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F "language=fr" \
  -F "format=json" \
  https://transcript.paatch.ai/api/v1/transcribe

Python

python

import requests

response = requests.post(
    "https://transcript.paatch.ai/api/v1/transcribe",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files={"file": open("recording.m4a", "rb")},
    data={"language": "fr", "format": "json"},
    timeout=600  # 10 min timeout for long audio
)

result = response.json()
print(f"Speakers: {result['speakers']}")
print(f"Turns: {result['turnCount']}")

for turn in result["turns"]:
    print(f"[{turn['speaker']}] {turn['text']}")

JavaScript / Node.js

javascript

const form = new FormData();
form.append("file", fs.createReadStream("recording.m4a"));
form.append("language", "fr");
form.append("format", "json");

const response = await fetch("https://transcript.paatch.ai/api/v1/transcribe", {
  method: "POST",
  headers: { "Authorization": "Bearer YOUR_API_KEY" },
  body: form,
});

const result = await response.json();
console.log("Speakers:", result.speakers);
result.turns.forEach(t => console.log(`[${t.speaker}] ${t.text}`));

JSON response:

json

{
  "status": "succeeded",
  "speakers": ["SPEAKER_00", "SPEAKER_01"],
  "duration": 1320.5,
  "wordCount": 3147,
  "turnCount": 360,
  "turns": [
    {
      "speaker": "SPEAKER_00",
      "start": 0.52,
      "end": 4.18,
      "text": "Bonjour, comment allez-vous aujourd'hui ?"
    },
    {
      "speaker": "SPEAKER_01",
      "start": 4.35,
      "end": 8.92,
      "text": "Très bien merci, on peut commencer quand vous voulez."
    }
  ],
  "fullText": "Bonjour, comment allez-vous aujourd'hui ? Très bien merci...",
  "compressed": false
}

Async mode

Returns immediately with a job ID. Poll GET /jobs/:jobId to check status.

bash

# Start async job
curl -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F "language=fr" \
  -F "async=true" \
  https://transcript.paatch.ai/api/v1/transcribe

# Response: { "jobId": "job_abc123", "status": "processing", "pollUrl": "/api/v1/jobs/job_abc123" }

# Poll for result
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://transcript.paatch.ai/api/v1/jobs/job_abc123

GET/jobs/:jobId

Check the status of an async transcription job. Jobs expire after 2 hours.

bash

curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://transcript.paatch.ai/api/v1/jobs/job_abc123

Possible responses:

json

// Processing
{ "jobId": "job_abc123", "status": "processing", "progress": "Running speaker diarization..." }

// Succeeded
{ "jobId": "job_abc123", "status": "succeeded", "result": { /* same as sync response */ } }

// Failed
{ "jobId": "job_abc123", "status": "failed", "error": "Diarization failed: ..." }

Webhook Callbacks

In async mode, provide a webhook URL to receive results automatically when the job completes. The API will retry up to 3 times with exponential backoff (2s, 4s, 8s) if your endpoint is unavailable.

bash

curl -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F "language=fr" \
  -F "async=true" \
  -F "webhook=https://your-server.com/callback" \
  https://transcript.paatch.ai/api/v1/transcribe

Webhook payload (POST to your URL):

json

// On success
{
  "event": "transcription.completed",
  "jobId": "job_abc123",
  "status": "succeeded",
  "result": { /* same as sync JSON response */ },
  "timestamp": "2025-06-15T14:30:00.000Z"
}

// On failure
{
  "event": "transcription.failed",
  "jobId": "job_abc123",
  "status": "failed",
  "error": "Diarization failed: ...",
  "timestamp": "2025-06-15T14:30:00.000Z"
}

Retry policy: If your webhook endpoint returns a non-2xx status or is unreachable, the API retries up to 3 times with exponential backoff (2s → 4s → 8s). Each request has a 10-second timeout.

Auto-Compression (FFmpeg)

The Groq Whisper API has a 25 MB file size limit. To handle larger files seamlessly, the API includes built-in audio compression via FFmpeg.

"auto"Recommended

Default. Compresses only if the file exceeds 25 MB. No action needed for smaller files.

"true"Force

Always compress, regardless of file size. Useful for faster upload/processing.

"false"Skip

Never compress. Files over 25 MB will fail at the Groq transcription step.

Compression settings: mono, 16 kHz, 64 kbps MP3. The JSON response includes a compressed: true/false field.

Speaker Renaming

By default, speakers are labeled SPEAKER_00, SPEAKER_01, etc. Use the speakers parameter to map these IDs to real names.

bash

curl -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F "language=fr" \
  -F 'speakers={"SPEAKER_00": "Alice", "SPEAKER_01": "Bob"}' \
  https://transcript.paatch.ai/api/v1/transcribe

The mapping is applied to all output formats (JSON turns, plain text, markdown, SRT). Unmapped speakers keep their original ID.

Output Formats

jsonapplication/json

Structured data with speakers, turns, timestamps, and full text. Best for building apps.

texttext/plain

Plain text with timestamps and speaker labels. Easy to read and share.

markdowntext/markdown

Formatted Markdown with metadata table and speaker-labeled paragraphs.

srttext/srt

SubRip subtitle format with speaker labels. Import into video editors.

Rate Limits

10 requests per hour per IP address. This is a hackathon API — be mindful of shared resources. If you hit the limit, you'll receive a 429 response with a retryAfter field (in seconds).

Error Codes

Code	Meaning	What to do
`400`	Bad request	Check file field and format parameter
`401`	Missing API key	Add Authorization: Bearer <key> header
`403`	Invalid API key	Check your API key with the organizers
`429`	Rate limit exceeded	Wait for retryAfter seconds
`500`	Server error	Retry or contact organizers

Quick Start (Python)

python

import requests

API_KEY = "YOUR_API_KEY"
API_URL = "https://transcript.paatch.ai/api/v1"

# 1. Check the API is up
health = requests.get(f"{API_URL}/health", headers={"Authorization": f"Bearer {API_KEY}"})
print(health.json())

# 2. Transcribe an audio file (synchronous)
with open("my_recording.m4a", "rb") as f:
    response = requests.post(
        f"{API_URL}/transcribe",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"file": f},
        data={"language": "fr", "format": "json"},
        timeout=600
    )

result = response.json()

# 3. Print the transcript
for turn in result["turns"]:
    mins = int(turn["start"] // 60)
    secs = int(turn["start"] % 60)
    print(f"[{mins:02d}:{secs:02d}] {turn['speaker']}: {turn['text']}")

# 4. Get plain text version
response_txt = requests.post(
    f"{API_URL}/transcribe",
    headers={"Authorization": f"Bearer {API_KEY}"},
    files={"file": open("my_recording.m4a", "rb")},
    data={"language": "fr", "format": "text"},
    timeout=600
)
print(response_txt.text)

# 5. Get Markdown version
response_md = requests.post(
    f"{API_URL}/transcribe",
    headers={"Authorization": f"Bearer {API_KEY}"},
    files={"file": open("my_recording.m4a", "rb")},
    data={"language": "fr", "format": "markdown"},
    timeout=600
)
with open("transcript.md", "w") as f:
    f.write(response_md.text)

How It Works

📤

STEP 1

Upload

Your audio file is uploaded to Pyannote's temporary storage.

👥

STEP 2

Diarize

Pyannote identifies who speaks when (speaker segments).

📝

STEP 3

Transcribe

Groq Whisper converts speech to text with word-level timestamps.

🔗

STEP 4

Merge

Words are aligned to speaker segments to produce the final transcript.