Changelog¶
Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased¶
Added¶
observability/grafana/provisioning/dashboards/autorag-queue-wait.json— a second auto-provisioned Grafana dashboard focused on stage-queue idle time. Pairsautorag_queue_wait_duration_seconds_*rollups withrabbitmq_queue_messages_{ready,unacked}to expose total idle seconds per stage, avg wait per message, idle/work ratio, p50/p95/p99 quantiles, and a wait-duration heatmap.Per-LLM-call OTel spans via a new
autorag.otel_callbacks.OTelSpanCallbackHandler, wired through every batched stage’sRunnableConfiginagent._build_stage_closures. Oneautorag.llm.callspan per chat-model run, tagged withllm.stageand stamped with Ollama’stotal/load/prompt_eval/eval_duration_msand token counts so the Jaeger waterfall attributes time to GPU eval vs. prompt-eval vs. model-load vs. network round-trip.Stepwise whisper-stage spans:
autorag.whisper.{get_model,load_audio,load_model,ct2_transcribe, get_align_model,align,offload_align},autorag.pyannote.{ensure_on_cuda,inference,offload}, andautorag.gpu.preload.{align,pyannote,whisper_ct2}. Attributes includecache.hit,audio.duration_s,model.compute_type,transcribe.realtime_factor,align.restored_from, andpreload.cuda_{attempted,succeeded}.autorag.queue.wait.<stage>retroactive span +autorag.queue.wait.durationhistogram (labelled bystage.name).RabbitBroker.publishstamps anautorag-publish-ts-nsAMQP header (plus the coarserpika.BasicProperties.timestampas fallback);get_batchextracts it onto a newDelivery.publish_ts_ns;stages._handle_oneopens the queue-wait span between AMQP-context-attach and the stage span so the Jaeger waterfall readsprev stage → queue.wait → stage. TheInMemoryBrokermirrors the stamping so the test path records a non-zero wait.autorag.otel.bind_current_context(fn)— public helper that wrapsfnso a worker-thread call inherits the caller’s OTel context. Used at everyThreadPoolExecutor.submitsite in the new preload / warm-up / offload fan-outs so child spans parent under the calling thread’s current span instead of becoming orphan roots in Jaeger. Safe no-op whenopentelemetry-apiis not installed.New
autorag.gpu.preload.fanoutparent span around the three boot-time preloads (align / pyannote / whisper-CT2 CPU); paired siblingautorag.gpu.preload.whisper_ct2.cpuandautorag.gpu.preload.whisper_ct2.cudaspans (replacing the singleautorag.gpu.preload.whisper_ct2span — see Changed).
Changed¶
whisperXCT2 model now stays resident on CUDA for the worker’s lifetime:whisper_runner.transcribe_segmentno longer destroys the cache per call,GpuArbiter._default_offload_whisperno longer drops it on thewhisper -> llmflip (only the torch parts of the stack — wav2vec2 align + pyannote — go to CPU), andGpuArbiter._preload_whisper_ct2now also builds the CUDA fp16 instance up-front whenvram_probeshows headroom.autorag.whisper.load_modelshould fire at most once per worker boot — its presence on job ≥ 2 is a regression signal.Async-pipeline parallelism: three new
ThreadPoolExecutorfan-outs eliminate sequential waits visible in the prior trace.GpuArbiter.preloadruns the three CPU-side preloads (align, pyannote, whisper-CT2 int8) concurrently underautorag.gpu.preload.fanout; the CUDA fp16 CT2 build is split into a new_preload_whisper_ct2_cudathat runs after the join (single CUDA-driver step).GpuArbiter._default_offload_whisperoffloads wav2vec2 and pyannote concurrently on tenancy flip.agent._run_whisperoverlaps the wav2vec2-align and pyannote CPU→CUDA restores withwhisper_runner.transcribe_segment(the longest leg) via a 2-worker warm-up pool; warm-up failures fall through to the existing inline restore. The originalautorag.gpu.preload.whisper_ct2span is renamed to sibling.cpuand.cudaspans.OLLAMA_NUM_PARALLELdefault 4→8 indocker-compose.yml, mirrored by theAudioJobRequest.max_concurrencydefault and the in-process defaults onagent.build_agent/build_topic_runnable/core.AutoRAG.generate_topics/autorag generate-topics --max-concurrency. Each extra slot reserves a KV-cache copy atq8_0; verify VRAM mid-l1viadocker exec autorag-ollama-1 nvidia-smi --query-gpu=memory.used,memory.free --format=csv,noheaderand roll back viaOLLAMA_NUM_PARALLEL=4in.env(restartollamaonly) if the newautorag.gpu.preload.whisper_ct2.cudaspan flipspreload.cuda_succeededfalse on a tight-VRAM host.Broker fan-out:
summarizesplit intosummarize_l1∥summarize_l2.StageName.summarizeis removed;summarize_l1(L1-node titles/summaries) fans out froml1in parallel withdecide+l2, andsummarize_l2(L2-child titles/summaries) runs afterl2.l0joins both: a newstages._try_emit_l0(ctx, msg)reads the per-jobstage_statesrow and publishes the L0 message exactly once after BOTH summarize stages reachdone(and never if either flips toerror— the job is already observably failed).NEXT_STAGE: StageName -> StageName | NonebecomesNEXT_STAGES: StageName -> list[StageName]to express the fan-out edge;StageOutcome.next_messagebecomesnext_messages: list[StageMessage].decideandl2switch from_save_state(whole-row overwrite) to a new_merge_stateread-modify-write helper so they don’t clobber the disjointl1_summariessubkey that the concurrentsummarize_l1writes;l0then mergesl1_summaries+l2_summariesback onto the livel1node list before aggregating. Single-GPU-worker invariant makes the read+write race-free; a future multi-worker topology would need aBEGIN IMMEDIATEclaim on_try_emit_l0.One-time RabbitMQ cleanup on existing deployments:
_declare_topologywill auto-create the newstage.summarize_l1/stage.summarize_l2queues, but the oldstage.summarize/stage.summarize.dlqlinger. Drain any in-flightstage.summarizemessages before deploying (they’ll fail to validate against the new enum), then rundocker exec autorag-rabbitmq-1 rabbitmqctl delete_queue stage.summarize stage.summarize.dlqonce on the live broker.
Fixed¶
./scripts/stack.sh upno longer silently degrades when.stack-datais root-owned (the failure mode behind the gpu-worker “attempt to write a readonly database” reconnect loop, where every job sticks onqueued).prepare_stack_datanow useschown -R(so existing files inside the dir are also fixed),sudo -n(so the non-ttyuppath fails fast instead of prompting), adds a docker-basedchownfallback that works even when the host user can’t sudo (containers run as root in their own namespace), anddies loudly with an actionable hint if all paths fail instead of pretending the stack came up clean.Database._upsert_clip_identityno longer raisesIndexErrorwhen the row already exists.sqlite_utils.insert(ignore=True)reads back the row bylast_rowid, which is 0 on the ignored conflict and errors on the lookup; the conflict path is the whole point ofignore=True, so the exception is now swallowed.
0.10.0 - 2026-05-20¶
Added¶
OpenTelemetry traces + metrics across the async pipeline (
autorag.otel, new[observability]extra — not folded into[all]). Off by default —AUTORAG_OTEL_ENABLED=falsemakesinitialize_otelshort-circuit before importing anyopentelemetry.*symbols, so base install and[broker]-only install keep booting unchanged. Opt-in adds oneautorag.stage.<name>span per stage inservices.stages.handle_batchand in the in-processservices.runner.run_job_in_process, anautorag.job.submitroot span inservices.broker.submit_audio_job, nestedautorag.gpu.acquire/.evict.llm/.offload.whisper/.enforce_budgetspans onGpuArbiter.acquire, and seven custom metrics (autorag.jobs.{submitted,completed},autorag.stage.{duration,attempts,dlq},autorag.gpu.tenancy.{transitions,duration}). W3C trace context rides AMQP via a newDelivery.otel_ctxfield + manualinject_amqp_headers/extract_amqp_contextcalls inRabbitBroker(the contrib pika instrumentor wraps onlybasic_publish/basic_consume, not the pull-modebasic_getthe broker uses). The handler-side publish runs indrain_and_dispatchafter the stage span has closed, soStageOutcomecarries the stage span’s OTel context (captured at outcome-construction time) anddrain_and_dispatchre-attaches it aroundbroker.publishvia a small_publish_in_ctxhelper — without that re-attach the injectedtraceparentwould have no parent and every stage hop would start a fresh trace. The Typer@app.callbackskips OTel init whenctx.invoked_subcommand == "serve"so the API process registers spans underautorag-apiinstead ofautorag-cli(initialize_otelis idempotent — first call wins).New opt-in compose profile
observability:otel-collector,jaeger,prometheus(scrapes RabbitMQ’s:15692rabbitmq_prometheusplugin on/metrics/per-objectso queue depth is labeled by queue), andgrafanawith a starterAutoRAG Pipelinedashboard. All bound to127.0.0.1. Brought up with./scripts/stack.sh up --with-observability.Settingsgainsotel_enabled,otel_service_name,otel_exporter_endpoint,otel_metric_export_interval_ms,otel_environment,otel_resource_attributes(allAUTORAG_OTEL_*).
Changed¶
Single-command host stack.
docker-compose.ymlis now the one source of truth for the entire heavy stack: it adds anollamaservice (sole owner of the server-side tuning contract), a read-onlytecnativa/docker-socket-proxyas the whole control plane (ps/logs/restartonly — no build/exec/run/published port), and a lean.devcontainer/worker.Dockerfilefor the gpu/io workers (deps baked;./src+pyproject.tomlbind-mounted read-only anduv run, so a code edit is a worker restart, not a rebuild). The host brings everything up with./scripts/stack.sh up; the thin devcontainer joins the sharedautorag-netand drives the stack through the proxy (docker compose -p autorag ps|logs|restart)..devcontainer/start-broker.sh/start-ollama.share removed — deployment is documented in the newSETUP.md.[audio]now pinstorch~=2.8.0and addstorchcodec>=0.7,<0.8(whisperx 3.8.5 transitively caps it<0.8);[diarize]requirespyannote.audio>=4.0.0; theimageio-ffmpegfloor is relaxed to>=0.4.0.worker.Dockerfileinstallslibpython3.12t64because torchcodec’s FFmpeg-6 native libdlopens the shared CPython library, which Ubuntu 24.04’spython3.12package does not pull in.A job needing an Ollama model the server has not pulled now fails with an actionable “Ollama model not available …
./scripts/stack.sh up” message instead of an opaque dead-letter traceback (services.stages._legible_error).autorag-gpu-worker/autorag-io-workerreconnect to RabbitMQ with backoff on a transient broker fault, keeping preloaded models warm (no cold reload) instead of exiting.
Fixed¶
AutoRAG.persist_topics/autorag generate-topicsno longer null the storedtranscriptionwhen persisting topics. The clipDatabasenow uses column-scopedsqlite_utilsupserts instead of a per-instancepydantic_sqliteread-modify-write, so a secondDatabaseinstance (a separate worker process, or AutoRAG’s second persist call) cannot overwrite the transcript another instance wrote;create_clipis first-writer-wins.Async
/jobs/audiopipeline: thepersiststage no longer crashes the persist-only IO worker (it built LLM handlers before the persist branch), and persisted topics are no longer orphaned to a second clip row for YouTube URLs (_default_persistnow forwardssource_urlso the session id canonicalises to the whisper row).
0.9.0 - 2026-05-19¶
Added¶
Async, RabbitMQ-driven, GPU-aware pipeline — a new optional
[broker]extra (pika) and anautorag.servicespackage that runs many audio→topics requests concurrently alongside the unchanged synchronous SDK / CLI / API (which keep their direct in-process path and never need a broker):RabbitMQ work-queue-per-stage topology + dead-letter exchange with bounded handler-driven retry; a dependency-free
InMemoryBroker;submit_audio_job.One
autorag-gpu-worker(owns whisper + every LLM stage) and anautorag-io-worker(owns persist).GpuArbiterCPU-preloads model standbys and smart-unloads the prior GPU tenant on demand, reusing the existingwhisper_runner/diarizeoffload primitives (whisperX CT2 is destroy+rebuilt — it is not a movable torch module).Durable
JobStore(jobstable in the existing SQLite DB, cross-process readable); transcripts travel bysession_idreference (services.blobs), never in messages; the evolving tree lives in the job row.POST /jobs/audio(202 +job_id),GET /jobs/{id},GET /jobs/{id}/resultadded toautorag.api, plus optionalautorag jobs submit/autorag jobs statusCLI subcommands. The handlers importautorag.serviceslazily → clean503when[broker]/[rag]are absent;import autorag.servicesstays base-install safe (no torch / chromadb / pika).autorag.agent.build_stage_handlers()exposes the per-stage closures, sharing_build_stage_closureswithbuild_topic_runnableso the distributed and in-process paths build identical warm Ollama chains and the samekeep_alive=0eviction. NewAUTORAG_BROKER_URLsetting;persistence.load_clip(cross-process clip read); repo-rootdocker-compose.ymland a best-effort devcontainerstart-broker.sh.
Changed¶
The bundled
/vizfrontend’s build tooling and runtime stack were upgraded across several majors: Vite 5→8 (Rolldown bundler), TypeScript 5→6,@vitejs/plugin-react4→6, React 18→19,@react-three/fiber8→9,@react-three/drei9→10,three0.165→0.184, and zustand 4→5. The committedsrc/autorag/static/viz/bundle was rebuilt in lockstep;/vizbehaviour and the Python public API are unchanged.react/react-domare now pinned~19.2.6(tilde, not caret) because@react-three/fiber@9peers requirereact >=19 <19.3.
0.8.0 - 2026-05-16¶
Added¶
autorag generate-topicsnow exposes the LLM tuning knobs thatAutoRAG.generate_topicsalready accepted:--num-ctx-l1,--num-ctx-fanout,--max-concurrency,--min-subdivide-duration-s, and--reasoning/--no-reasoning. Forwarded 1:1 to the facade with the same defaults (8192/8192/4/120.0/False);ollama_base_urlstays env-only viaAUTORAG_OLLAMA_BASE_URL.New
boundary_block_secondstuning kwarg (default30) onAutoRAG.generate_topics/agent.build_topic_runnable/agent.build_agent, exposed as--boundary-block-secondsonautorag generate-topics. Sizes the time-bucketed transcript fed to the L1/L2 boundary prompts (was the hardcoded private_BOUNDARY_BLOCK_SECONDS); smaller windows give finerMM:SSanchors at the cost of more boundary-prompt tokens.
Changed¶
Default topic LLM is now
gemma4:latest(8B Q4_K_M, ~9.6 GB), replacingqwen2.5:14b-instruct-q8_0, acrossAutoRAG.generate_topics/agent.build_topic_runnable/build_agentand theautorag generate-topicsCLI.gemma4:latestis a thinking-capable model; because all five stages do mechanical JSON extraction, the agent disables thinking by default. New overridablereasoning: bool = Falsekwarg onbuild_topic_runnable/build_agent/AutoRAG.generate_topics(sendsthink: falseto Ollama on thinking models; harmless no-op otherwise) — passreasoning=Trueto trade latency for chain-of-thought. The lighter default also frees VRAM: the 4 agent slots + model now sit at ~11 GB on a 24 GB card (was ~15 GB+ for the qwen 14B).The topic agent now keeps the Ollama model resident in VRAM for the whole run instead of cold-reloading it (~15 GB) at every stage boundary. All five stages share one
num_ctxandkeep_alive="5m"(Ollama reloads on anynum_ctxchange, so a uniform size is what keeps it warm);_build_treeissues one throwawaykeep_alive=0call after the run — or on a stage error — to evict the model so it doesn’t squat VRAM during the downstream embed/viz step. Substantially cuts topic-generation wall-clock.num_ctx_l1now defaults to8192(was16384) inAutoRAG.generate_topics/agent.build_topic_runnable/build_agent, so the L1 call shares the fan-out context size. Trade-off: on very long audio (≈1 hr+) the L1 transcript can truncate at 8192 and degrade boundary quality — raisenum_ctx_l1back to16384to restore fidelity, at the cost of one model reload at the Stage 2→3a boundary.Transcription now defaults to English.
--languagedefaults toenonautorag transcribe/generate-topics/blocks, and thelanguageparameter defaults to"en"onAutoRAG.transcribe/AutoRAG.transcribe_blocks/agent.transcribe_audio/agent.build_agent(was Whisper auto-detect). Behavior change for SDK consumers relying on auto-detect: passlanguage=None(SDK) or--language ""(CLI) to restore it.
0.7.0 - 2026-05-15¶
Added¶
GET /viznow renders the interactive 3-D topic constellation: per-level glowing points, clip/cluster coloring, additive knowledge-graph edges, a pointer tooltip, two-way rail↔scene hover sync, and debounced semantic search with click-to-focus. The React rewrite had previously shipped only the left rail, so the page showed no embeddings; the r3f scene (frontend/src/three/) is now implemented and the committed bundle rebuilt. UMAP coordinates are recentred/scaled inthree/layout.ts(raw/viz/datacoords are not origin-centred), and an error boundary keeps the rail usable if WebGL is unavailable.Hosted documentation at https://autologger.github.io/AutoRAG/, published to GitHub Pages on every push to
main(.github/workflows/docs.yml).autorag.blocks.mmss(t)— publicMM:SSsecond-formatter (promoted from the private_mmss), now exported inautorag.blocks.__all__.
Changed¶
The topic agent’s L1/L2 boundary detection now feeds the LLM a 30-second time-bucketed transcript via
blocks.format_blocks(oneMM:SS-MM:SS Speaker K: <words>line per turn) instead of one timestamped line per word, and the boundary LLM emitsMM:SSoffsets thatagent._parse_tsconverts back to seconds in code. Cuts boundary-prompt size sharply;AutoRAG.generate_topics/build_agentsignatures and theRunnable[list[WordSpan], TopicTree]contract are unchanged.
Removed¶
src/autorag/static/viz.html— the original vanilla Three.js/vizpage. It was orphaned once/vizswitched to the React bundle (viz.pyservesstatic/viz/index.html, never this file) and had been shipping unused in the wheel via thestatic/glob.
Fixed¶
IngestRequest(POST /ingest) is no longer left “not fully defined”:pathlib.Pathis imported at runtime again so Pydantic can resolve thepathsfield. RestoresIngestRequest.model_rebuild(), FastAPI OpenAPI schema generation, and the Sphinx autodoc build.The strict docs build no longer fails under
--all-extras:transformers(pulled transitively bylangchain_core, a base dep) is now mocked inautodoc_mock_imports, so base+docs and all-extras builds take the same path.
0.6.0 - 2026-05-12¶
Changed¶
Replaced
openai-whisperwith whisperX (faster-whisper / CTranslate2 backend + wav2vec2 forced-alignment pass). Transcription is ~4× faster and word-level timestamps are frame-accurate rather than Whisper-estimated. The[audio]extra now pullswhisperxinstead ofopenai-whisper; the public API (AutoRAG.transcribe,WordSpanshape) is unchanged.
0.5.0 - 2026-05-11¶
Changed¶
AutoRAG.generate_topics()now appliescollapse_lone_childrenbefore returning, so callers always receive a normalizedTopicTreeregardless of whetherpersist_topicsis called.persist_topicsno longer collapses the tree itself.
Fixed¶
Suppress spurious pyannote
UserWarningaboutstd()degrees of freedom fromStatsPoolon single-frame diarization segments; the warning was harmless (pyannote handles the NaN internally) but polluted log output.
0.4.0 - 2026-05-11¶
Added¶
AutoRAG.generate_topics(words, ...)→TopicTree: pure LLM topic extraction on pre-computedlist[WordSpan], no audio involved.AutoRAG.persist_topics(file, topics, ...): stores the topic tree to SQLite and embeds topic titles into Chroma. Call afterpersist_transcription.build_topic_runnable()inagent.py— LangChainRunnable[list[WordSpan], TopicTree](Whisper-free;build_agentwraps it).agent.transcribe_audio(file)→list[WordSpan]andagent.generate_topics(words)→TopicTreeas standalone module-level helpers (lower-level alternatives to theAutoRAGfacade).autorag generate-topicsCLI command: transcribes (or reads from cache), generates LLM topics, and persists transcription + topics + embeddings.
Changed¶
AutoRAG.transcribe()now returnslist[WordSpan]instead ofTranscriptionResult; callgenerate_topics()separately for the LLM topic tree.AutoRAG.persist_transcription()now stores word spans only; callpersist_topics()to persist the topic tree and Chroma embeddings.autorag transcribeCLI now only transcribes and persists word spans (no LLM topic generation). Useautorag generate-topicsfor the full pipeline.
Removed¶
abs_sfield removed fromWordSpandict construction inagent.py(was redundant withsand was never declared in theWordSpanTypedDict).
0.3.3 - 2026-05-11¶
Fixed¶
Whisper and pyannote pipeline VRAM is released immediately after inference:
transcribe_segmentand_run_diarizationnow move their models to CPU and calltorch.cuda.empty_cache()so Ollama’s LLM stages start with the GPU unencumbered. Both modules restore to CUDA automatically on the next call.
0.3.2 - 2026-05-10¶
Changed¶
/vizrail (header / stats / legend / size legend / controls / search / topic list) now renders from the React app, fed by a typeduseVizData()hook hitting/viz/data. Color-mode and edges-visible state are held in a Zustand store (frontend/src/state/vizStore.ts) so the canvas (Phase C+) can read the same toggles. Phase B: DOM only —<canvas>, raycast, tooltip, and search wiring are still in the unmodifiedviz.htmluntil later phases land them infrontend/src/three/.
0.3.1 - 2026-05-10¶
Changed¶
/vizis now served from a Vite-built React + TypeScript bundle undersrc/autorag/static/viz/index.html, mounted alongside a new/viz-assetsstatic route. Source lives in the new top-levelfrontend/directory (outsidesrc/autorag/souv/ruff/mypydon’t scan TypeScript). Phase A: scaffold + FastAPI wiring only — the existing Three.js scene is preserved inviz.htmland will be ported toreact-three-fiberin subsequent commits.
0.3.0 - 2026-05-10¶
Added¶
transcribeaccepts YouTube URLs via the[youtube]extra; URL is downloaded to a temp.webmthroughautorag.audio_source.resolve_audio_input(lazyyt_dlpimport).AudioSourcecarriessource_url,video_id,title,upload_date,duration_s, anduploaderlifted from yt-dlp’s info dict. The CLI forwards these topersist_transcription.autorag.blocks.format_blocks(re-exported asfrom autorag import format_blocks) renders aWordSpanlist as N-second time blocks with oneMM:SS-MM:SS Speaker K: ...line per speaker turn. Pure stdlib — callable from a base install.AutoRAG.transcribe_blocks(file, seconds=10, ...)returns the same formatted output, reading from the SQLite cache when available and otherwise running the full transcribe + persist pipeline first. Requires[rag]for the cache path,[audio,diarize](+[youtube]for URLs) on cache miss.autorag blocks SOURCE [-n SECONDS]CLI command wrappingtranscribe_blocks.autorag.persistence.derive_session_id(file_or_url)andload_transcription(db, session_id)expose the session-id derivation and the cached-transcription read path as base-safe public helpers.
Changed¶
session_idis derived deterministically from the canonical YouTube URL (youtu.be/m.youtube.com/www.youtube.comvariants collapse to one form) so re-runs overwrite the same SQLite row.Renamed remaining
AUTOLOGGER_*env vars toAUTORAG_*; devcontainer mount updated to match.Clip
created_atand absolute event timestamps anchor to the YouTubeupload_date(midnight UTC) when present, instead of the temp-file mtime.default_title_from(source)moved fromcli.py(private_default_title_from) toautorag.audio_sourceas a public helper.group_by_speakermoved fromagent.pytoautorag.blocksand is now part of the public surface;agent._format_transcriptre-imports it from there.
0.2.0 - 2026-05-10¶
Added¶
SDK facade
from autorag import AutoRAGwith flat methods (transcribe,build_agent,persist_transcription,ingest,query).Pip-installable from GitHub:
pip install "autorag[...] @ git+https://github.com/AutoLogger/AutoRAG@v0.2.0".Optional extras:
[audio],[diarize],[rag],[server],[all].MissingExtraErroris raised with a friendly hint when an extra is missing.Speaker diarization via
pyannote/speaker-diarization-3.1(gated by[diarize]+HF_TOKEN). EachWordSpancarries aspeakerfield.Unified multi-pass L0/L1/L2 topic agent in
src/autorag/agent.py, with boundary detection separated from per-node summarization.GitHub Actions CI: lint/type-check, full-extras tests, and an SDK base-install regression guard for the lazy-import contract.
Changed¶
All LLM and embedding calls migrated to
langchain-ollama.Topic embeddings moved from a SQLite column into a persistent Chroma store.
Default topic model is
qwen2.5:14b-instruct-q8_0.
Removed¶
Non-Ollama LLM providers.
Unused
replace_existingparameter from the transcription flow.