autorag.whisper_runner¶
Lazy-loaded whisperX model cache with CUDA-preferred / CPU-fallback device selection.
The main transcription model (CTranslate2 / faster-whisper backend) is removed from the module cache after each run so Python GC can free VRAM; the smaller wav2vec2 alignment model is offloaded to CPU after aligning and restored on the next call (PyTorch .to() round-trip). Both are re-created from local HF cache on the next pipeline run, which is fast (<1 s for models already downloaded).
- autorag.whisper_runner.get_model(size, device_hint=None)[source]¶
Return a cached whisperX model for size.
device_hintis advisory: ignored when the process is already CPU-pinned.
- autorag.whisper_runner.resolved_device()[source]¶
Return the device most recently used (or the preference if nothing loaded yet).
- Return type:
- autorag.whisper_runner.transcribe_segment(model, file_path, language)[source]¶
Transcribe file_path and return frame-aligned word dicts.
Each dict:
{"w": str, "s": float, "e": float, "p": float}. The alignment pass uses wav2vec2 for frame-accurate word timestamps; if it fails the unaligned faster-whisper timestamps are used as a fallback.