Audio pipeline¶

Three modules sit behind AutoRAG.transcribe:

autorag.whisper_runner — whisperX (faster-whisper + wav2vec2 forced-alignment) transcription with frame-accurate word timestamps and a CUDA→CPU fallback.
autorag.diarize — pyannote 3.1 speaker diarization. Adds the speaker field on every WordSpan.
autorag.audio_source — YouTube URL detection and a context manager that downloads remote audio to a temp file while exposing yt-dlp metadata.