CLI (autorag.cli)

The autorag console script is a thin Typer wrapper over AutoRAG. The exposed commands are transcribe, generate-topics, blocks, ingest, query, serve, and the optional jobs submit / jobs status subcommands (async pipeline; needs the [broker] extra). The audio commands own temp-file lifetimes for YouTube URLs and forward optional metadata (title, upload date, source URL) to AutoRAG.persist_transcription.

autorag.cli.ingest(paths=<typer.models.ArgumentInfo object>)[source]

Ingest one or more files/directories into the vector store.

Parameters:

paths (list[Path])

Return type:

None

autorag.cli.query(question=<typer.models.ArgumentInfo object>, top_k=<typer.models.OptionInfo object>)[source]

Ask a question against the ingested corpus.

Parameters:
Return type:

None

autorag.cli.serve(host='127.0.0.1', port=8000, reload=False)[source]

Run the HTTP API server.

Parameters:
Return type:

None

autorag.cli.transcribe(source=<typer.models.ArgumentInfo object>, title=<typer.models.OptionInfo object>, whisper_model=<typer.models.OptionInfo object>, language=<typer.models.OptionInfo object>, persist=<typer.models.OptionInfo object>, db_override=<typer.models.OptionInfo object>)[source]

Transcribe an audio file or YouTube URL and output word spans as JSON.

Parameters:
Return type:

None

autorag.cli.generate_topics(source=<typer.models.ArgumentInfo object>, title=<typer.models.OptionInfo object>, whisper_model=<typer.models.OptionInfo object>, provider=<typer.models.OptionInfo object>, llm_model=<typer.models.OptionInfo object>, num_ctx_l1=<typer.models.OptionInfo object>, num_ctx_fanout=<typer.models.OptionInfo object>, max_concurrency=<typer.models.OptionInfo object>, min_subdivide_duration_s=<typer.models.OptionInfo object>, reasoning=<typer.models.OptionInfo object>, boundary_block_seconds=<typer.models.OptionInfo object>, language=<typer.models.OptionInfo object>, transcription_json=<typer.models.OptionInfo object>, persist=<typer.models.OptionInfo object>, db_override=<typer.models.OptionInfo object>)[source]

Generate topics for an audio file or YouTube URL, transcribing first if not cached.

Parameters:
  • source (str)

  • title (str | None)

  • whisper_model (str)

  • provider (str)

  • llm_model (str)

  • num_ctx_l1 (int)

  • num_ctx_fanout (int)

  • max_concurrency (int)

  • min_subdivide_duration_s (float)

  • reasoning (bool)

  • boundary_block_seconds (int)

  • language (str)

  • transcription_json (str | None)

  • persist (bool)

  • db_override (Path | None)

Return type:

None

autorag.cli.blocks(source=<typer.models.ArgumentInfo object>, seconds=<typer.models.OptionInfo object>, force_retranscribe=<typer.models.OptionInfo object>, title=<typer.models.OptionInfo object>, db_override=<typer.models.OptionInfo object>, whisper_model=<typer.models.OptionInfo object>, language=<typer.models.OptionInfo object>)[source]

Print the transcription as N-second time blocks, one line per speaker turn.

Reads from the cached SQLite row when present; otherwise runs Whisper transcription and persists the words first. Topic generation is not performed here; use the transcribe command for that.

Parameters:
Return type:

None

autorag.cli.jobs_submit(source=<typer.models.ArgumentInfo object>, title=<typer.models.OptionInfo object>, whisper_model=<typer.models.OptionInfo object>, llm_model=<typer.models.OptionInfo object>, language=<typer.models.OptionInfo object>)[source]

Enqueue an audio→topics job on the broker; prints the job id.

Parameters:
Return type:

None

autorag.cli.jobs_status(job_id=<typer.models.ArgumentInfo object>)[source]

Print a job’s status + per-stage state as JSON.

Parameters:

job_id (str)

Return type:

None