autorag.diarize¶
Lazy-loaded pyannote speaker-diarization pipeline with CUDA->CPU fallback.
Mirrors whisper_runner.py in structure: a module-level cache, a threading.Lock, and a single CUDA failure flips the process to CPU for the rest of its life.
Public surface:
get_pipeline() -> Pipeline | None (None means “no token / load failed”; callers should fall back to single-speaker behavior)
diarize_file(path) -> list[(start_s, end_s, speaker_label)]
assign_speakers(words, turns) -> list[str] (parallel labels, ‘0’ fallback)
- autorag.diarize.assign_speakers(words, turns)[source]¶
Assign a speaker label to each word.
Strategy: pick the turn with maximum temporal overlap with the word’s [s, e] interval. If no turn overlaps, fall back to the nearest turn (by midpoint distance). If turns is empty, every word becomes “0”.
- autorag.diarize.diarize_file(file_path)[source]¶
Run diarization. Returns sorted [(start, end, label), …] or [] on failure.
pyannote/torchaudio only decodes a small set of containers reliably (wav, flac); everything else (webm, mp3, m4a, ogg, …) is transcoded to a temporary 16 kHz mono wav with ffmpeg first.