CLI reference
=============

``autorag`` is a thin Typer wrapper over
:class:`~autorag.core.AutoRAG`. Every subcommand maps to one (or two)
SDK methods.

``autorag transcribe``
----------------------

Whisper + diarization → ``WordSpan`` list. With ``--persist`` (the
default), the words are written to SQLite.

.. code-block:: text

    autorag transcribe SOURCE [OPTIONS]

    SOURCE                       Audio file path or YouTube URL.
    --title         -t  TEXT     Clip title (defaults to YouTube video
                                 title for URLs, else filename stem /
                                 video id).
    --whisper-model -w  TEXT     tiny / base / small / medium / large
                                 [default: base]
    --language      -l  TEXT     Whisper language code [default: en];
                                 pass '' to auto-detect.
    --persist/--no-persist       Write word spans to SQLite (default: true).
    --db                PATH     Override database path.

``autorag generate-topics``
---------------------------

Full audio→topics pipeline: transcribe (or read from cache, or accept
a pre-computed ``--transcription`` JSON), run the LLM topic
extraction, persist everything.

.. code-block:: text

    autorag generate-topics SOURCE [OPTIONS]

    --provider      -p  TEXT     LLM provider [default: ollama]
    --llm-model     -m  TEXT     LLM model [default: gemma4:latest]
    --language      -l  TEXT     Whisper language code [default: en];
                                 pass '' to auto-detect.
    --num-ctx-l1        INT      LLM context for the Stage 2 L1-boundary
                                 call [default: 8192]; raise to ~16384
                                 for 1hr+ audio (costs one model reload).
    --num-ctx-fanout    INT      LLM context for the batched fan-out
                                 stages 3a/3b/4/5 [default: 8192].
    --max-concurrency   INT      Max parallel LLM calls in batched
                                 stages [default: 4]; match
                                 OLLAMA_NUM_PARALLEL.
    --min-subdivide-duration-s   Minimum L1 span length in seconds
                        FLOAT    before the L2 subdivide decision runs
                                 [default: 120.0].
    --reasoning/--no-reasoning   Enable chain-of-thought on
                                 thinking-capable models (slower;
                                 default: --no-reasoning).
    --boundary-block-seconds     Time-bucket window (s) for the L1/L2
                        INT      boundary-prompt transcript [default:
                                 30]; smaller = finer MM:SS anchors but
                                 more prompt tokens.
    --transcription -T  TEXT     Pre-computed WordSpan JSON (skip Whisper)
    --persist/--no-persist       Write transcription + topics to
                                 SQLite/Chroma (default: true).

Outputs the persisted topic JSON to stdout; a timing breakdown
(whisper / agent / cli_store_words / cli_finalize / cli_embed) goes to
stderr.

``autorag blocks``
------------------

Cached, dependency-friendly view of a previously transcribed clip:
``MM:SS-MM:SS Speaker K: …`` lines bucketed into N-second blocks.

.. code-block:: text

    autorag blocks SOURCE [OPTIONS]

    --seconds       -n  INT      Block length [default: 10]
    --force-retranscribe         Re-run transcription even if cached.

Reads straight from SQLite when the clip is already there — only the
``[rag]`` extra is needed for the cache hit. On a miss the
``[audio]`` / ``[diarize]`` / ``[youtube]`` extras are imported lazily
to run the full pipeline first, then format. Equivalent SDK call:
:meth:`AutoRAG.transcribe_blocks
<autorag.core.AutoRAG.transcribe_blocks>`.

``autorag ingest``
------------------

.. code-block:: text

    autorag ingest PATH [PATH ...]

Ingest one or more files or directories into the vector store.

``autorag query``
-----------------

.. code-block:: text

    autorag query QUESTION [--top-k K]

Ask a question against the ingested corpus and print the generated
answer.

``autorag serve``
-----------------

.. code-block:: text

    autorag serve [--host HOST] [--port PORT] [--reload]

Run the HTTP API server (default ``http://127.0.0.1:8000``). See
:doc:`server`.

``autorag jobs``
----------------

Optional async pipeline. Needs the ``[broker]`` + ``[rag]`` extras and
a running RabbitMQ; it is fully decoupled from the synchronous
commands above (installing or running it changes nothing about
``transcribe`` / ``generate-topics`` / ``serve``).

.. code-block:: text

    autorag jobs submit SOURCE [OPTIONS]

    SOURCE                       Audio file path or YouTube URL.
    --title         -t  TEXT     Clip title.
    --whisper-model -w  TEXT     tiny / base / small / medium / large
                                 [default: base]
    --llm-model     -m  TEXT     LLM model [default: gemma4:latest]
    --language      -l  TEXT     Whisper language code [default: en]

    autorag jobs status JOB_ID

``submit`` enqueues the job on the broker and prints
``{"job_id": …, "session_id": …}``; ``status`` prints the job's
status + per-stage state as JSON. A finished async job writes the
**same** SQLite / Chroma rows a CLI run would, so :doc:`/viz
<visualization>` and every other reader work unchanged. Without the
extras the commands exit with an install hint. See the "Async
pipeline" section of :doc:`server` and ``CLAUDE.md`` for the
architecture.