autorag.topic_cluster

Semantic clustering and similarity-edge construction for topic embeddings.

autorag.topic_cluster.cluster_embeddings(embeddings, distance_threshold=0.35)[source]

Assign cluster labels to topic embeddings using agglomerative clustering.

distance_threshold is cosine distance (0-2); 0.35 ~ similarity >= 0.65. Returns an int array of shape (N,) with labels 0..K-1.

Parameters:
  • embeddings (ndarray)

  • distance_threshold (float)

Return type:

ndarray

autorag.topic_cluster.build_edges(embeddings, top_n=5, min_similarity=0.6)[source]

Return undirected similarity edges between topics.

For each topic, finds top_n most similar neighbours above min_similarity. Returns a deduplicated list of (idx_a, idx_b, similarity) with idx_a < idx_b.

Parameters:
  • embeddings (ndarray)

  • top_n (int)

  • min_similarity (float)

Return type:

list[tuple[int, int, float]]