SDK facade (autorag.core)¶
autorag.core exposes the single public class
AutoRAG. Every audio or RAG method performs its
heavy imports inside the method body and raises
MissingExtraError if the relevant extra is not
installed. See Extras model for the mapping of
methods to extras.
- class autorag.core.AutoRAG(settings=None, store=None, embedder=None, generator=None)[source]¶
Bases:
objectUnified facade for the audio→topics agent and the document-RAG pipeline.
Heavy dependencies (whisper, torch, pyannote, chromadb, …) are loaded lazily on first use, so a base install can import
AutoRAGwithout pulling them. Methods raiseMissingExtraErrorwith the specific extras hint when an extra is missing.- Parameters:
- transcribe(file, *, whisper_model='base', language='en')[source]¶
Run Whisper + diarization on an audio file or YouTube URL.
fileis either a local audio file path or a YouTube URL (youtube.com,youtu.be,m.youtube.com,music.youtube.com). YouTube URLs are downloaded to a temporary.webmfor the duration of the call.Returns raw word spans. Use
generate_topics()for the LLM topic tree, andpersist_transcription()/persist_topics()to store results (separate[rag]extra).languagedefaults to English ("en"); passlanguage=Noneto let Whisper auto-detect.Requires
pip install 'autorag[audio,diarize]', plus[youtube]when passing a URL.
- generate_topics(words, *, llm_model='gemma4:latest', ollama_base_url=None, num_ctx_l1=8192, num_ctx_fanout=8192, max_concurrency=8, min_subdivide_duration_s=120.0, reasoning=False, boundary_block_seconds=30)[source]¶
Run LLM topic extraction on pre-computed word spans.
Requires
pip install 'autorag[audio,diarize]'(LangChain + Ollama).
- build_agent(**kwargs)[source]¶
Return the LangChain
Runnablefor batched / streaming use.Same extras as
transcribe(). Forwards**kwargstoautorag.agent.build_agent().- Parameters:
kwargs (
Any)- Return type:
Runnable[Path|str,TranscriptionResult]
- transcribe_blocks(file, seconds=10, *, force_retranscribe=False, db_path=None, whisper_model='base', language='en', title=None)[source]¶
Return the transcription formatted as N-second time blocks.
- Resolution order:
session_id = derive_session_id(file).If SQLite has a row for
session_idwith a non-nulltranscriptionandforce_retranscribeis False, decode it and format — returns immediately (no[audio]needed).Else run
transcribe()andpersist_transcription(), then format. Topic generation is not performed here; callgenerate_topics()andpersist_topics()separately.
Each non-empty bucket emits one line per speaker turn,
MM:SS-MM:SS Speaker K: <words>. Seeautorag.blocks.format_blocks()for the full algorithm.Requires
pip install 'autorag[rag]'for the cached path;[audio,diarize](+[youtube]for URLs) on cache miss.
- persist_transcription(file, words, *, title=None, db_path=None, source_url=None, upload_date=None, duration_s=None)[source]¶
Write word spans to SQLite (clip row + words). Returns clip + session_id + timings.
Requires
pip install 'autorag[rag]'(pydantic_sqlite).duration_sis informational and not persisted.source_url(optional) seedssession_idfrom the canonical URL so re-fetching the same URL overwrites the existing row.upload_date(optional,"YYYYMMDD"from yt-dlp) anchorscreated_atto the video’s publish date.Use
persist_topics()to store the topic tree and embed titles.
- persist_topics(file, topics, *, words=None, transcript_end_s=None, title=None, provider='ollama', llm_model='gemma4:latest', whisper_model='base', db_path=None, source_url=None, upload_date=None, duration_s=None)[source]¶
Store topic tree to SQLite and embed topic titles into Chroma.
Requires
pip install 'autorag[rag]'(chromadb + pydantic_sqlite).Call
persist_transcription()first to create the clip row; this method will create it idempotently if needed.transcript_end_s: audio end time in seconds used to anchor events. Computed fromwords[-1]whenwordsis supplied, else0.0.duration_sis informational and not persisted.- Parameters:
- Return type:
- exception autorag.core.MissingExtraError[source]¶
Bases:
ImportErrorRaised when an
AutoRAGmethod needs an optional extra that isn’t installed.
Errors¶
Error types for AutoRAG’s extras model.
Every audio / RAG method on AutoRAG imports its
heavy dependencies inside the method body and re-raises
ModuleNotFoundError as MissingExtraError with a hint
naming the install extra that fixes it.
- exception autorag.errors.MissingExtraError[source]¶
Bases:
ImportErrorRaised when an
AutoRAGmethod needs an optional extra that isn’t installed.