autorag.ingest

Document loading and chunking primitives for the RAG pipeline.

These functions form the boundary between filesystem inputs (text files, PDFs, audio clips) and the structured Document / Chunk shapes consumed by the embedder and vector store.

The current implementations are stubs that raise NotImplementedError; concrete loaders are wired up via autorag.core.AutoRAG.

autorag.ingest.load_documents(paths)[source]

Load text documents from disk into Document records.

Parameters:

paths (list[str | Path])

Return type:

list[Document]

autorag.ingest.load_audio_clips(paths)[source]

Load audio clip metadata for transcript-based ingestion.

Parameters:

paths (list[str | Path])

Return type:

list[dict[str, Any]]

autorag.ingest.chunk_document(doc, chunk_size, chunk_overlap)[source]

Split a document into overlapping Chunk records.

Parameters:
Return type:

list[Chunk]