CLI reference¶
autorag is a thin Typer wrapper over
AutoRAG. Every subcommand maps to one (or two)
SDK methods.
autorag transcribe¶
Whisper + diarization → WordSpan list. With --persist (the
default), the words are written to SQLite.
autorag transcribe SOURCE [OPTIONS]
SOURCE Audio file path or YouTube URL.
--title -t TEXT Clip title (defaults to YouTube video
title for URLs, else filename stem /
video id).
--whisper-model -w TEXT tiny / base / small / medium / large
[default: base]
--language -l TEXT Whisper language code [default: en];
pass '' to auto-detect.
--persist/--no-persist Write word spans to SQLite (default: true).
--db PATH Override database path.
autorag generate-topics¶
Full audio→topics pipeline: transcribe (or read from cache, or accept
a pre-computed --transcription JSON), run the LLM topic
extraction, persist everything.
autorag generate-topics SOURCE [OPTIONS]
--provider -p TEXT LLM provider [default: ollama]
--llm-model -m TEXT LLM model [default: gemma4:latest]
--language -l TEXT Whisper language code [default: en];
pass '' to auto-detect.
--num-ctx-l1 INT LLM context for the Stage 2 L1-boundary
call [default: 8192]; raise to ~16384
for 1hr+ audio (costs one model reload).
--num-ctx-fanout INT LLM context for the batched fan-out
stages 3a/3b/4/5 [default: 8192].
--max-concurrency INT Max parallel LLM calls in batched
stages [default: 4]; match
OLLAMA_NUM_PARALLEL.
--min-subdivide-duration-s Minimum L1 span length in seconds
FLOAT before the L2 subdivide decision runs
[default: 120.0].
--reasoning/--no-reasoning Enable chain-of-thought on
thinking-capable models (slower;
default: --no-reasoning).
--boundary-block-seconds Time-bucket window (s) for the L1/L2
INT boundary-prompt transcript [default:
30]; smaller = finer MM:SS anchors but
more prompt tokens.
--transcription -T TEXT Pre-computed WordSpan JSON (skip Whisper)
--persist/--no-persist Write transcription + topics to
SQLite/Chroma (default: true).
Outputs the persisted topic JSON to stdout; a timing breakdown (whisper / agent / cli_store_words / cli_finalize / cli_embed) goes to stderr.
autorag blocks¶
Cached, dependency-friendly view of a previously transcribed clip:
MM:SS-MM:SS Speaker K: … lines bucketed into N-second blocks.
autorag blocks SOURCE [OPTIONS]
--seconds -n INT Block length [default: 10]
--force-retranscribe Re-run transcription even if cached.
Reads straight from SQLite when the clip is already there — only the
[rag] extra is needed for the cache hit. On a miss the
[audio] / [diarize] / [youtube] extras are imported lazily
to run the full pipeline first, then format. Equivalent SDK call:
AutoRAG.transcribe_blocks.
autorag ingest¶
autorag ingest PATH [PATH ...]
Ingest one or more files or directories into the vector store.
autorag query¶
autorag query QUESTION [--top-k K]
Ask a question against the ingested corpus and print the generated answer.
autorag serve¶
autorag serve [--host HOST] [--port PORT] [--reload]
Run the HTTP API server (default http://127.0.0.1:8000). See
Running the HTTP server.
autorag jobs¶
Optional async pipeline. Needs the [broker] + [rag] extras and
a running RabbitMQ; it is fully decoupled from the synchronous
commands above (installing or running it changes nothing about
transcribe / generate-topics / serve).
autorag jobs submit SOURCE [OPTIONS]
SOURCE Audio file path or YouTube URL.
--title -t TEXT Clip title.
--whisper-model -w TEXT tiny / base / small / medium / large
[default: base]
--llm-model -m TEXT LLM model [default: gemma4:latest]
--language -l TEXT Whisper language code [default: en]
autorag jobs status JOB_ID
submit enqueues the job on the broker and prints
{"job_id": …, "session_id": …}; status prints the job’s
status + per-stage state as JSON. A finished async job writes the
same SQLite / Chroma rows a CLI run would, so /viz and every other reader work unchanged. Without the
extras the commands exit with an install hint. See the “Async
pipeline” section of Running the HTTP server and CLAUDE.md for the
architecture.