basic_rag.basic_rag
Basic RAG database
Classes
|
Simple RAG database |
|
A text chunk with metadata |
- class basic_rag.basic_rag.TextChunk(text_chunk: str, file_path: str, start_line: int, end_line: int, file_hash: bytes | None = None)[source]
A text chunk with metadata
- class basic_rag.basic_rag.RAGDatabase(db_path: Path, index_path: Path, rate_limit: RateLimiter | float = 1.1, model='mistral-embed', max_n_tokens=16384)[source]
Simple RAG database
implemented as
- a sqlite database with a single table with columns
id, text_chunk, embedding, file_path, start_line, end_line, file_sha
a faiss index
I ended up re-coding this because I was not able to find a RAG database that was both simple enough (no server needed, no huge framework) and flexible enough.
- insert_db(chunk: TextChunk, *, id=None, embedding, do_commit=True, add_to_index=False)[source]
Insert a text chunk into the sqlite database and the index
- static get_chunks(file, *, chunk_size=25, overlap=5, filename, hash=None)[source]
Cut a file into chunks
- Parameters:
file – a Path or bytes object
chunk_size – the size of the chunks
overlap – the overlap between the chunks
filename – the filename
hash – the hash of the file (optional)
- classmethod get_all_chunks(files: Sequence[Path | bytes], *, chunk_size=25, overlap=5, file_paths: Sequence[str] | None = None, file_shas_to_skip=None)[source]
Cut a list of files into chunks
- Parameters:
files – the files
chunk_size – the size of the chunks
overlap – the overlap between the chunks
file_paths – the filenames (Optional: if not provided, and the files are Path objects the filenames will be the paths)
file_shas_to_skip – the file hashes to skip