`autorag.whisper_runner`¶

Lazy-loaded whisperX model cache with CUDA-preferred / CPU-fallback device selection.

The main transcription model (CTranslate2 / faster-whisper backend) is removed from the module cache after each run so Python GC can free VRAM; the smaller wav2vec2 alignment model is offloaded to CPU after aligning and restored on the next call (PyTorch .to() round-trip). Both are re-created from local HF cache on the next pipeline run, which is fast (<1 s for models already downloaded).

autorag.whisper_runner.get_model(size, device_hint=None)[source]¶

Return a cached whisperX model for size.

device_hint is advisory: ignored when the process is already CPU-pinned.

Parameters:

size (str)
device_hint (str | None)

Return type:

Any

autorag.whisper_runner.resolved_device()[source]¶

Return the device most recently used (or the preference if nothing loaded yet).

Return type:: str

autorag.whisper_runner.transcribe_segment(model, file_path, language)[source]¶

Transcribe file_path and return frame-aligned word dicts.

Each dict: {"w": str, "s": float, "e": float, "p": float}. The alignment pass uses wav2vec2 for frame-accurate word timestamps; if it fails the unaligned faster-whisper timestamps are used as a fallback.

Parameters:

model (Any)
file_path (str)
language (str | None)

Return type:

list[dict[str, Any]]

autorag.whisper_runner¶

`autorag.whisper_runner`¶