autorag.persistence

Topic-tree → SQLite/Chroma persistence helpers.

Pure functions extracted from the CLI so the SDK’s autorag.core.AutoRAG.persist_transcription() can reuse them.

autorag.persistence.derive_session_id(file_or_url)[source]

Compute the same session_id AutoRAG.persist_transcription() would write.

Mirrors the inline logic in AutoRAG.persist_transcription():
  • YouTube URL → uuid5(NAMESPACE_URL, _canonical_youtube_url(url))

  • Local Path → uuid5(NAMESPACE_URL, str(path.resolve()))

Only autorag.audio_source is imported (base-safe; yt_dlp stays behind its own lazy import). Safe to call without [audio]/[rag].

Parameters:

file_or_url (str | Path)

Return type:

str

autorag.persistence.load_transcription(db, session_id)[source]

Return the stored word list for session_id, or None if the row is missing or has no transcription.

Parses the JSON string written by Database.store_transcription(). Reads via raw sqlite_utils (matching Database.list_clips()) so a freshly-opened Database instance can read rows it didn’t write — pydantic_sqlite’s model registry is in-memory only.

Parameters:
Return type:

list[WordSpan] | None

autorag.persistence.load_clip(db, session_id)[source]

Return the clip row as a plain dict, or None if absent.

The cross-process analogue of Database.get_clip(): that method goes through pydantic_sqlite.model_from_table, whose model registry is in-memory only, so a process that did not write the row (e.g. the API reading a row a worker wrote) cannot see it. This reads via raw sqlite_utils exactly like load_transcription() / Database.list_clips().

Parameters:
Return type:

dict[str, Any] | None

autorag.persistence.collapse_lone_children(tree)[source]

Drop single-child chains so a subtopic level only exists with >=2 siblings.

Parameters:

tree (TopicTree)

Return type:

TopicTree

autorag.persistence.iter_topics_flat(tree)[source]

Yield (level, node, number_label) like ‘1’, ‘1.2’, ‘1.2.3’.

Parameters:

tree (TopicTree)

Return type:

Generator[tuple[int, TopicDict, str], None, None]

autorag.persistence.topics_to_events(db, session_id, tree, *, audio_start, provider, llm_model, topic_category_ids)[source]

Walk the topic tree and produce analytics events for each titled node.

Reads the hierarchical-agent’s s / e keys (not start_s / end_s).

Parameters:
Return type:

list[dict[str, Any]]