autorag.ingest
Document loading and chunking primitives for the RAG pipeline.
These functions form the boundary between filesystem inputs (text
files, PDFs, audio clips) and the structured
Document / Chunk
shapes consumed by the embedder and vector store.
The current implementations are stubs that raise
NotImplementedError; concrete loaders are wired up via
autorag.core.AutoRAG.
-
autorag.ingest.load_documents(paths)[source]
Load text documents from disk into Document records.
- Parameters:
paths (list[str | Path])
- Return type:
list[Document]
-
autorag.ingest.load_audio_clips(paths)[source]
Load audio clip metadata for transcript-based ingestion.
- Parameters:
paths (list[str | Path])
- Return type:
list[dict[str, Any]]
-
autorag.ingest.chunk_document(doc, chunk_size, chunk_overlap)[source]
Split a document into overlapping Chunk records.
- Parameters:
-
- Return type:
list[Chunk]