basic_rag
A simple RAG database
A very simple RAG database that combines a faiss index and a SQLite database to store chunks of text with embeddings. Embeddings are computed using the mistral API.
This is adapted for storing a relatively low amount of chunks, in an on-disk store.
- class basic_rag.RAGDatabase(db_path: Path, index_path: Path, rate_limit: RateLimiter | float = 1.1, model='mistral-embed', max_n_tokens=16384)[source]
Simple RAG database
implemented as
- a sqlite database with a single table with columns
id, text_chunk, embedding, file_path, start_line, end_line, file_sha
a faiss index
I ended up re-coding this because I was not able to find a RAG database that was both simple enough (no server needed, no huge framework) and flexible enough.
- insert_db(chunk: TextChunk, *, id=None, embedding, do_commit=True, add_to_index=False)[source]
Insert a text chunk into the sqlite database and the index
- static get_chunks(file, *, chunk_size=25, overlap=5, filename, hash=None)[source]
Cut a file into chunks
- Parameters:
file – a Path or bytes object
chunk_size – the size of the chunks
overlap – the overlap between the chunks
filename – the filename
hash – the hash of the file (optional)
- classmethod get_all_chunks(files: Sequence[Path | bytes], *, chunk_size=25, overlap=5, file_paths: Sequence[str] | None = None, file_shas_to_skip=None)[source]
Cut a list of files into chunks
- Parameters:
files – the files
chunk_size – the size of the chunks
overlap – the overlap between the chunks
file_paths – the filenames (Optional: if not provided, and the files are Path objects the filenames will be the paths)
file_shas_to_skip – the file hashes to skip
Modules
Basic RAG database |
|