This page outlines the end-to-end flow for creating a Voice File Job, uploading audio, and retrieving results. The current implementation uses a client-triggered upload-complete call (no S3 event).
Poll GET /v1/external/voice-file/jobs/{jobId} with exponential backoff
Start at 1–2s interval, then 4s, 8s, up to 30s cap
Stop conditions:
Success: status = COMPLETED
Failure: status = FAILED
After success:
Always fetch transcript: GET /v1/external/voice-file/jobs/{jobId}/transcript
If you requested translations: GET /v1/external/voice-file/jobs/{jobId}/translations or per-locale endpoint
Get paragraph summaries:
For transcript: GET /v1/external/voice-file/jobs/{jobId}/transcript/paragraph-summary
For translations: GET /v1/external/voice-file/jobs/{jobId}/translations/{locale}/paragraph-summary
The final status for successful processing is COMPLETED. This means all processing is complete including transcript, translation (if requested), and paragraph summaries for both.
The Paragraph Summary feature provides intelligent summarization of audio content, breaking down the transcript or translation into digestible paragraph-level summaries.