CLI reference

autorag is a thin Typer wrapper over AutoRAG. Every subcommand maps to one (or two) SDK methods.

autorag transcribe

Whisper + diarization → WordSpan list. With --persist (the default), the words are written to SQLite.

autorag transcribe SOURCE [OPTIONS]

SOURCE                       Audio file path or YouTube URL.
--title         -t  TEXT     Clip title (defaults to YouTube video
                             title for URLs, else filename stem /
                             video id).
--whisper-model -w  TEXT     tiny / base / small / medium / large
                             [default: base]
--language      -l  TEXT     Whisper language code [default: en];
                             pass '' to auto-detect.
--persist/--no-persist       Write word spans to SQLite (default: true).
--db                PATH     Override database path.

autorag generate-topics

Full audio→topics pipeline: transcribe (or read from cache, or accept a pre-computed --transcription JSON), run the LLM topic extraction, persist everything.

autorag generate-topics SOURCE [OPTIONS]

--provider      -p  TEXT     LLM provider [default: ollama]
--llm-model     -m  TEXT     LLM model [default: gemma4:latest]
--language      -l  TEXT     Whisper language code [default: en];
                             pass '' to auto-detect.
--num-ctx-l1        INT      LLM context for the Stage 2 L1-boundary
                             call [default: 8192]; raise to ~16384
                             for 1hr+ audio (costs one model reload).
--num-ctx-fanout    INT      LLM context for the batched fan-out
                             stages 3a/3b/4/5 [default: 8192].
--max-concurrency   INT      Max parallel LLM calls in batched
                             stages [default: 4]; match
                             OLLAMA_NUM_PARALLEL.
--min-subdivide-duration-s   Minimum L1 span length in seconds
                    FLOAT    before the L2 subdivide decision runs
                             [default: 120.0].
--reasoning/--no-reasoning   Enable chain-of-thought on
                             thinking-capable models (slower;
                             default: --no-reasoning).
--boundary-block-seconds     Time-bucket window (s) for the L1/L2
                    INT      boundary-prompt transcript [default:
                             30]; smaller = finer MM:SS anchors but
                             more prompt tokens.
--transcription -T  TEXT     Pre-computed WordSpan JSON (skip Whisper)
--persist/--no-persist       Write transcription + topics to
                             SQLite/Chroma (default: true).

Outputs the persisted topic JSON to stdout; a timing breakdown (whisper / agent / cli_store_words / cli_finalize / cli_embed) goes to stderr.

autorag blocks

Cached, dependency-friendly view of a previously transcribed clip: MM:SS-MM:SS Speaker K: lines bucketed into N-second blocks.

autorag blocks SOURCE [OPTIONS]

--seconds       -n  INT      Block length [default: 10]
--force-retranscribe         Re-run transcription even if cached.

Reads straight from SQLite when the clip is already there — only the [rag] extra is needed for the cache hit. On a miss the [audio] / [diarize] / [youtube] extras are imported lazily to run the full pipeline first, then format. Equivalent SDK call: AutoRAG.transcribe_blocks.

autorag ingest

autorag ingest PATH [PATH ...]

Ingest one or more files or directories into the vector store.

autorag query

autorag query QUESTION [--top-k K]

Ask a question against the ingested corpus and print the generated answer.

autorag serve

autorag serve [--host HOST] [--port PORT] [--reload]

Run the HTTP API server (default http://127.0.0.1:8000). See Running the HTTP server.

autorag jobs

Optional async pipeline. Needs the [broker] + [rag] extras and a running RabbitMQ; it is fully decoupled from the synchronous commands above (installing or running it changes nothing about transcribe / generate-topics / serve).

autorag jobs submit SOURCE [OPTIONS]

SOURCE                       Audio file path or YouTube URL.
--title         -t  TEXT     Clip title.
--whisper-model -w  TEXT     tiny / base / small / medium / large
                             [default: base]
--llm-model     -m  TEXT     LLM model [default: gemma4:latest]
--language      -l  TEXT     Whisper language code [default: en]

autorag jobs status JOB_ID

submit enqueues the job on the broker and prints {"job_id": …, "session_id": …}; status prints the job’s status + per-stage state as JSON. A finished async job writes the same SQLite / Chroma rows a CLI run would, so /viz and every other reader work unchanged. Without the extras the commands exit with an install hint. See the “Async pipeline” section of Running the HTTP server and CLAUDE.md for the architecture.