autorag.db¶
SQLite-backed database for audio clip transcription and topic storage.
- class autorag.db.AudioClip(**data)[source]¶
Bases:
BaseModelOne row of the
audio_clipsSQLite table.Fields
transcriptionandtopicsare JSON-encoded strings; useDatabase.get_clip()to fetch and decode them. Thewhisper_model/provider/llm_modelcolumns are populated byDatabase.finalize_topics()to record which backends produced the stored data.- Parameters:
- class autorag.db.Database(db_path)[source]¶
Bases:
objectSQLite façade for AudioRAG clip state, keyed by
session_id.Every
audio_clipsread/write goes through the rawsqlite_utilstable handle as a column-scoped upsert / raw read. It deliberately does not round-trip the row throughpydantic_sqlite’s model registry: that registry is per-instance and in-process, so a freshly-constructedDatabase(a different worker process, or just a secondAutoRAGpersist call) would not see another instance’s rows — and the old read-modify-write viamodel_from_table+ a full-objectaddthen upserted a blank model over the on-disk row, silently nulling the transcript a different process had written. Column-scoped upserts touch only the columns a method owns, so the clobber is impossible by construction regardless of instance or process.pydantic_sqliteis still used for the separatejobstable (seeautorag.services.jobs.JobStore), where a whole-record write is the intended semantics;self.dbis kept for that reuse.Creates the SQLite file (and any missing parent directories) and the
audio_clipsschema on construction.- Parameters:
db_path (
Path)
- add_analytics_event(session_id, *, category, message, metadata, marked_at_utc)[source]¶
Build the analytics-event dict written into a clip’s
topicsJSON.Does not touch the database itself — callers accumulate the returned dicts and pass them to
finalize_topics().
- create_clip(session_id, *, title, file_path, created_at)[source]¶
Insert an
AudioCliprow if one doesn’t already exist.First-writer-wins (
INSERT OR IGNORE): a no-op when thesession_idis already present, so a latercreate_clipwith a different title/path never overwrites the original — and, crucially, never resets thetranscription/topicsa different process wrote in between. Only the four identity columns are written; the rest default to NULL.
- store_transcription(session_id, words)[source]¶
Persist a JSON-encoded
WordSpanlist on the clip.Column-scoped: touches only
transcription. Create-if-absent (upsert), so a row is materialised even ifcreate_cliphas not run yet, and a concurrentfinalize_topicscannot lose it.
- finalize_topics(session_id, transcript_end_s, *, events, provider, llm_model, whisper_model)[source]¶
Flatten topic events, compute durations, and write them to the clip.
Within each L1/L2 level,
duration_sis derived from the gap to the next sibling (or totranscript_end_sfor the last node). Theprovider/llm_model/whisper_modelcolumns record which backends produced the data. Column-scoped upsert (create-if-absent): touches only those columns, so thetranscriptionwritten by an earlier stage/process survives.