autorag.diarize

Lazy-loaded pyannote speaker-diarization pipeline with CUDA->CPU fallback.

Mirrors whisper_runner.py in structure: a module-level cache, a threading.Lock, and a single CUDA failure flips the process to CPU for the rest of its life.

Public surface:

  • get_pipeline() -> Pipeline | None (None means “no token / load failed”; callers should fall back to single-speaker behavior)

  • diarize_file(path) -> list[(start_s, end_s, speaker_label)]

  • assign_speakers(words, turns) -> list[str] (parallel labels, ‘0’ fallback)

autorag.diarize.assign_speakers(words, turns)[source]

Assign a speaker label to each word.

Strategy: pick the turn with maximum temporal overlap with the word’s [s, e] interval. If no turn overlaps, fall back to the nearest turn (by midpoint distance). If turns is empty, every word becomes “0”.

Parameters:
Return type:

list[str]

autorag.diarize.diarize_file(file_path)[source]

Run diarization. Returns sorted [(start, end, label), …] or [] on failure.

pyannote/torchaudio only decodes a small set of containers reliably (wav, flac); everything else (webm, mp3, m4a, ogg, …) is transcoded to a temporary 16 kHz mono wav with ffmpeg first.

Parameters:

file_path (str)

Return type:

list[tuple[float, float, str]]

autorag.diarize.get_pipeline()[source]

Return the cached pyannote pipeline, loading on first call.

Returns None if HF_TOKEN is missing or load failed; callers MUST handle None by skipping diarization.

Return type:

Any | None