Standard RAG pipelines treat documents as flat strings of text. They use "fixed-size chunking" (cutting a document every 500 ...
This project is a small pipeline for exploring a corpus of text/PDF documents (e.g., the House Oversight Committee’s Jeffrey Epstein email release). Unzip the contents locally, e.g.: project-root/ ...