SDK facade (autorag.core)

autorag.core exposes the single public class AutoRAG. Every audio or RAG method performs its heavy imports inside the method body and raises MissingExtraError if the relevant extra is not installed. See Extras model for the mapping of methods to extras.

class autorag.core.AutoRAG(settings=None, store=None, embedder=None, generator=None)[source]

Bases: object

Unified facade for the audio→topics agent and the document-RAG pipeline.

Heavy dependencies (whisper, torch, pyannote, chromadb, …) are loaded lazily on first use, so a base install can import AutoRAG without pulling them. Methods raise MissingExtraError with the specific extras hint when an extra is missing.

Parameters:
ingest(paths)[source]
Parameters:

paths (list[str | Path])

Return type:

IngestResponse

query(question, top_k=None)[source]
Parameters:
Return type:

QueryResponse

transcribe(file, *, whisper_model='base', language='en')[source]

Run Whisper + diarization on an audio file or YouTube URL.

file is either a local audio file path or a YouTube URL (youtube.com, youtu.be, m.youtube.com, music.youtube.com). YouTube URLs are downloaded to a temporary .webm for the duration of the call.

Returns raw word spans. Use generate_topics() for the LLM topic tree, and persist_transcription() / persist_topics() to store results (separate [rag] extra).

language defaults to English ("en"); pass language=None to let Whisper auto-detect.

Requires pip install 'autorag[audio,diarize]', plus [youtube] when passing a URL.

Parameters:
Return type:

list[WordSpan]

generate_topics(words, *, llm_model='gemma4:latest', ollama_base_url=None, num_ctx_l1=8192, num_ctx_fanout=8192, max_concurrency=8, min_subdivide_duration_s=120.0, reasoning=False, boundary_block_seconds=30)[source]

Run LLM topic extraction on pre-computed word spans.

Requires pip install 'autorag[audio,diarize]' (LangChain + Ollama).

Parameters:
Return type:

TopicTree

build_agent(**kwargs)[source]

Return the LangChain Runnable for batched / streaming use.

Same extras as transcribe(). Forwards **kwargs to autorag.agent.build_agent().

Parameters:

kwargs (Any)

Return type:

Runnable[Path | str, TranscriptionResult]

transcribe_blocks(file, seconds=10, *, force_retranscribe=False, db_path=None, whisper_model='base', language='en', title=None)[source]

Return the transcription formatted as N-second time blocks.

Resolution order:
  1. session_id = derive_session_id(file).

  2. If SQLite has a row for session_id with a non-null transcription and force_retranscribe is False, decode it and format — returns immediately (no [audio] needed).

  3. Else run transcribe() and persist_transcription(), then format. Topic generation is not performed here; call generate_topics() and persist_topics() separately.

Each non-empty bucket emits one line per speaker turn, MM:SS-MM:SS Speaker K: <words>. See autorag.blocks.format_blocks() for the full algorithm.

Requires pip install 'autorag[rag]' for the cached path; [audio,diarize] (+ [youtube] for URLs) on cache miss.

Parameters:
Return type:

str

persist_transcription(file, words, *, title=None, db_path=None, source_url=None, upload_date=None, duration_s=None)[source]

Write word spans to SQLite (clip row + words). Returns clip + session_id + timings.

Requires pip install 'autorag[rag]' (pydantic_sqlite). duration_s is informational and not persisted.

source_url (optional) seeds session_id from the canonical URL so re-fetching the same URL overwrites the existing row.

upload_date (optional, "YYYYMMDD" from yt-dlp) anchors created_at to the video’s publish date.

Use persist_topics() to store the topic tree and embed titles.

Parameters:
Return type:

dict[str, Any]

persist_topics(file, topics, *, words=None, transcript_end_s=None, title=None, provider='ollama', llm_model='gemma4:latest', whisper_model='base', db_path=None, source_url=None, upload_date=None, duration_s=None)[source]

Store topic tree to SQLite and embed topic titles into Chroma.

Requires pip install 'autorag[rag]' (chromadb + pydantic_sqlite).

Call persist_transcription() first to create the clip row; this method will create it idempotently if needed.

transcript_end_s: audio end time in seconds used to anchor events. Computed from words[-1] when words is supplied, else 0.0. duration_s is informational and not persisted.

Parameters:
Return type:

dict[str, Any]

exception autorag.core.MissingExtraError[source]

Bases: ImportError

Raised when an AutoRAG method needs an optional extra that isn’t installed.

Errors

Error types for AutoRAG’s extras model.

Every audio / RAG method on AutoRAG imports its heavy dependencies inside the method body and re-raises ModuleNotFoundError as MissingExtraError with a hint naming the install extra that fixes it.

exception autorag.errors.MissingExtraError[source]

Bases: ImportError

Raised when an AutoRAG method needs an optional extra that isn’t installed.