Question 1

What audio formats are accepted?

Accepted Answer

MP3, M4A, WAV and WMA, up to 100 MB per file. For longer recordings, split the audio into 30 to 60-minute segments using Audacity or GarageBand.

Question 2

Does the engine automatically detect the language?

Accepted Answer

Yes. Whisper automatically detects the spoken language and transcribes accordingly. French and English are the best-optimized languages.

Question 3

How long does transcription take?

Accepted Answer

About 10 to 20% of the audio duration. A 10-minute recording is processed in 1 to 3 minutes. You can close the tab — the result will be in your library.

Question 4

Is the text timestamped?

Accepted Answer

Yes. The downloadable text file contains the full text with timestamps, making it easy to navigate long recordings.

Question 5

Is transcription accurate for accents and technical terms?

Accepted Answer

Whisper delivers high accuracy on regional accents and common terms. For highly specialized nomenclatures, a proofreading pass is recommended.

Automatic audio transcription with AI: voice to text in minutes

Why choose KitsAI?

Meetings and committees

Interviews and reporting

Voice notes

Podcasts and audio content

Frequently Asked Questions

What audio formats are accepted?

Does the engine automatically detect the language?

How long does transcription take?

Is the text timestamped?

Is transcription accurate for accents and technical terms?

Ready to transform your workflow?