Overview

This page outlines the end-to-end flow for creating a Voice File Job, uploading audio, and retrieving results. The current implementation uses a client-triggered upload-complete call (no S3 event).

What you need

API key: Authorization: Bearer {id}.{secret}
transcriptLocaleHints: up to 1 (optional)
- If not provided, language will be automatically detected
translationLocales: up to 5 (optional)

End-to-end checklist

Create job → receive { id, uploadUri }
Upload audio to uploadUri
Call PUT /v1/external/voice-file/jobs/{jobId}/upload-complete
Poll job status until completion
Fetch transcript and (optionally) translations
Get paragraph summaries for better content understanding

Polling strategy (recommended)

Poll GET /v1/external/voice-file/jobs/{jobId} with exponential backoff
- Start at 1–2s interval, then 4s, 8s, up to 30s cap
- Stop conditions:
  - Success: status = COMPLETED
  - Failure: status = FAILED
- After success:
  - Always fetch transcript: GET /v1/external/voice-file/jobs/{jobId}/transcript
  - If you requested translations: GET /v1/external/voice-file/jobs/{jobId}/translations or per-locale endpoint
  - Get paragraph summaries:
    - For transcript: GET /v1/external/voice-file/jobs/{jobId}/transcript/paragraph-summary
    - For translations: GET /v1/external/voice-file/jobs/{jobId}/translations/{locale}/paragraph-summary

The final status for successful processing is COMPLETED. This means all processing is complete including transcript, translation (if requested), and paragraph summaries for both.

File Limits & Requirements

Supported Audio Formats

Based on industry-standard STT capabilities, Tiro supports the following formats: Audio Formats:

Format	MIME Type	Extension
MP3	audio/mpeg	.mp3
WAV	audio/wav	.wav
M4A	audio/mp4	.m4a

Video Formats (audio extraction):

Format	MIME Type	Extension
MP4	video/mp4	.mp4

File Size & Duration Limits

Limit Type	Value	Notes
Max File Size	500 MB	Practical limit for most use cases
Max Duration	3 hours	Covers most meetings and interviews
Min Sample Rate	8 kHz	Minimum for speech recognition
Recommended Sample Rate	16 kHz+	Optimal for accuracy
Max Sample Rate	48 kHz	Studio quality support
Channels	Mono or Stereo	Multi-speaker support

Processing Time Estimates

Processing times are optimized with parallel processing for longer files:

File Duration	Typical Processing Time
< 5 minutes	45-75 seconds
5-20 minutes	1-3 minutes
20-60 minutes	3-6 minutes
1-4 hours	6-18 minutes

Paragraph Summary Feature

The Paragraph Summary feature provides intelligent summarization of audio content, breaking down the transcript or translation into digestible paragraph-level summaries.

How it works

Transcript Processing: After transcription is complete, the text is automatically split into logical paragraphs using the CompositeTextSplitter
Summary Generation: Each paragraph is processed through the ParagraphSummarizer to create concise summaries
Translation Integration: When translations are requested, summaries are regenerated based on the translated content for each locale
Asynchronous Processing: Summary generation happens asynchronously after transcript/translation completion

Sequence

Step-by-Step Tutorial

Complete walkthrough with code examples for processing audio files

Voice File API Reference

Complete API documentation with interactive examples for all Voice File endpoints

Getting Started

Fundamentals

Webhooks

Voice File

Template Based Documents

API Reference

MCP Server

External Viewers

What you need

End-to-end checklist

Polling strategy (recommended)

File Limits & Requirements

Supported Audio Formats

File Size & Duration Limits

Processing Time Estimates

Paragraph Summary Feature

How it works

Sequence

See Also

Step-by-Step Tutorial

Voice File API Reference

Getting Started

Fundamentals

Webhooks

Voice File

Template Based Documents

API Reference

MCP Server

External Viewers

​What you need

​End-to-end checklist

​Polling strategy (recommended)

​File Limits & Requirements

​Supported Audio Formats

​File Size & Duration Limits

​Processing Time Estimates

​Paragraph Summary Feature

​How it works

​Sequence

​See Also

Step-by-Step Tutorial

Voice File API Reference

What you need

End-to-end checklist

Polling strategy (recommended)

File Limits & Requirements

Supported Audio Formats

File Size & Duration Limits

Processing Time Estimates

Paragraph Summary Feature

How it works

Sequence

See Also