Audio pipeline

Three modules sit behind AutoRAG.transcribe:

  • autorag.whisper_runner — whisperX (faster-whisper + wav2vec2 forced-alignment) transcription with frame-accurate word timestamps and a CUDA→CPU fallback.

  • autorag.diarize — pyannote 3.1 speaker diarization. Adds the speaker field on every WordSpan.

  • autorag.audio_source — YouTube URL detection and a context manager that downloads remote audio to a temp file while exposing yt-dlp metadata.