Question 1

What is speaker separation?

Accepted Answer

Speaker diarization automatically detects how many speakers are in an audio file and extracts each speaker's segments into a separate track. Ideal for podcasts, interviews, and meeting recordings where multiple people talk in turn.

Question 2

How is this different from vocal/instrument separation?

Accepted Answer

Vocal/instrument separation pulls vocals and accompaniment out of a song — that's for music. Speaker separation tells different speakers apart in a conversation. Two completely different tools.

Question 3

Can I specify the number of speakers?

Accepted Answer

Yes. You can give an exact number (e.g. 2/3/4 speakers) or leave it blank for auto-detection. If you already know how many speakers are present, specifying improves accuracy.

Question 4

Which audio formats and file sizes are supported?

Accepted Answer

MP3, WAV, M4A, FLAC, and OGG files up to 50MB are supported. For best diarization accuracy, use clear recordings with minimal background music.

Question 5

How long does speaker separation take?

Accepted Answer

Most recordings finish in 1–3 minutes. Longer files or more speakers take a bit more time; the page polls automatically and shows the result as soon as it is ready.

Question 6

Where can I find my separated tracks later?

Accepted Answer

Each task is saved to your account. You can download every speaker track right on this page, and finished tasks also appear in your voice library history.

AI Speaker Separation

Upgrade to Pro to unlock everything

How AI Speaker Separation Works

Upload Audio

Set Speaker Count

AI Diarization

Preview & Download

Why Use Our AI Speaker Separation

Accurate Speaker Diarization

Auto Speaker Detection

One Track per Speaker

Built for Podcasts & Interviews

Fast Cloud Processing

Saved to Your Library

Frequently Asked Questions

What teams say about speaker separation