autorag.persistence¶
Topic-tree → SQLite/Chroma persistence helpers.
Pure functions extracted from the CLI so the SDK’s
autorag.core.AutoRAG.persist_transcription() can reuse them.
- autorag.persistence.derive_session_id(file_or_url)[source]¶
Compute the same
session_idAutoRAG.persist_transcription()would write.- Mirrors the inline logic in
AutoRAG.persist_transcription(): YouTube URL →
uuid5(NAMESPACE_URL, _canonical_youtube_url(url))Local Path →
uuid5(NAMESPACE_URL, str(path.resolve()))
Only
autorag.audio_sourceis imported (base-safe;yt_dlpstays behind its own lazy import). Safe to call without[audio]/[rag].- Mirrors the inline logic in
- autorag.persistence.load_transcription(db, session_id)[source]¶
Return the stored word list for
session_id, orNoneif the row is missing or has no transcription.Parses the JSON string written by
Database.store_transcription(). Reads via rawsqlite_utils(matchingDatabase.list_clips()) so a freshly-openedDatabaseinstance can read rows it didn’t write —pydantic_sqlite’s model registry is in-memory only.
- autorag.persistence.load_clip(db, session_id)[source]¶
Return the clip row as a plain dict, or
Noneif absent.The cross-process analogue of
Database.get_clip(): that method goes throughpydantic_sqlite.model_from_table, whose model registry is in-memory only, so a process that did not write the row (e.g. the API reading a row a worker wrote) cannot see it. This reads via rawsqlite_utilsexactly likeload_transcription()/Database.list_clips().
- autorag.persistence.collapse_lone_children(tree)[source]¶
Drop single-child chains so a subtopic level only exists with >=2 siblings.
- autorag.persistence.iter_topics_flat(tree)[source]¶
Yield (level, node, number_label) like ‘1’, ‘1.2’, ‘1.2.3’.