`autorag.diarize`¶

Lazy-loaded pyannote speaker-diarization pipeline with CUDA->CPU fallback.

Mirrors whisper_runner.py in structure: a module-level cache, a threading.Lock, and a single CUDA failure flips the process to CPU for the rest of its life.

Public surface:

get_pipeline() -> Pipeline | None (None means “no token / load failed”; callers should fall back to single-speaker behavior)
diarize_file(path) -> list[(start_s, end_s, speaker_label)]
assign_speakers(words, turns) -> list[str] (parallel labels, ‘0’ fallback)

autorag.diarize.assign_speakers(words, turns)[source]¶

Assign a speaker label to each word.

Strategy: pick the turn with maximum temporal overlap with the word’s [s, e] interval. If no turn overlaps, fall back to the nearest turn (by midpoint distance). If turns is empty, every word becomes “0”.

Parameters:

words (list[dict[str, Any]])
turns (list[tuple[float, float, str]])

Return type:

list[str]

autorag.diarize.diarize_file(file_path)[source]¶

Run diarization. Returns sorted [(start, end, label), …] or [] on failure.

pyannote/torchaudio only decodes a small set of containers reliably (wav, flac); everything else (webm, mp3, m4a, ogg, …) is transcoded to a temporary 16 kHz mono wav with ffmpeg first.

Parameters:: file_path (str)
Return type:: list[tuple[float, float, str]]

autorag.diarize.get_pipeline()[source]¶

Return the cached pyannote pipeline, loading on first call.

Returns None if HF_TOKEN is missing or load failed; callers MUST handle None by skipping diarization.

Return type:: Any | None

autorag.diarize¶

`autorag.diarize`¶