autorag.chroma_store

Persistent Chroma collection of per-clip topic embeddings.

Backs the /viz page’s search box and acts as a cache so the page load doesn’t have to re-embed every topic on every request.

class autorag.chroma_store.EmbedderEmbeddingFunction(embedder=None)[source]

Bases: Documents

Adapt Embedder to Chroma’s EmbeddingFunction protocol.

Parameters:

embedder (Embedder | None)

static name()[source]
Return type:

str

autorag.chroma_store.default_chroma_dir(db_path)[source]

Return the Chroma persistence directory derived from a SQLite db path.

Parameters:

db_path (Path)

Return type:

Path

class autorag.chroma_store.ChromaStore(persist_dir, embedding_function=None)[source]

Bases: object

Persistent Chroma collection of per-clip topic embeddings.

Parameters:
  • persist_dir (Path)

  • embedding_function (EmbeddingFunction[Documents] | None)

COLLECTION = 'audio_clip_topics'
add_topic_embeddings(clip_id, clip_title, topics, embeddings)[source]

Upsert one document + embedding per topic for clip_id.

Ids use the "{clip_id}:{topic_index}" shape so the position within a clip’s filtered (title-bearing) topic list is the stable key — matches what autorag.viz._collect_rows_embeddings() reads back.

Parameters:
Return type:

None

get_clip_embeddings(clip_id)[source]

Return a topic_index -> embedding map for every cached topic.

Parameters:

clip_id (str)

Return type:

dict[int, list[float]]

query(query_embedding, top_k)[source]

Return the top_k topics nearest query_embedding in cosine space.

Each returned dict carries the topic’s clip/title/summary metadata and a similarity field computed as 1 - distance.

Parameters:
Return type:

list[dict[str, Any]]

delete_clip(clip_id)[source]

Drop every topic row for clip_id from the collection.

Parameters:

clip_id (str)

Return type:

None

count()[source]

Return the total number of topic rows in the collection.

Return type:

int