Crawl LLMs.txt for better efficiency

HillviewCap · April 15, 2025, 5:17pm

@NachoT Great feedback thank you! I am glad it worked for you after a little fiddling. You are right more clear instructions would certainly help That won’t be hard to update here in the future.

I implemented a fairly basic version of the chunking, splitting, and reranking operations. I tested it out with a few different LLM’s for retrieval and was getting really pretty decent results, so I stopped there. I don’t think it was working well using Ollama and I think it was due to context window issues. That would probably be something to implement in the future.

Ill let everyone know once I have an update to test here in the next few days.

–Cheers

ColeMedin · April 16, 2025, 4:52pm

Amazing work as I’ve already said @HillviewCap ! Is there a specific point you are trying to reach before a PR? I haven’t been super active bringing in PRs the last couple weeks but pretty soon here I’m going to merge some and I’d love to merge yours.

HillviewCap · April 17, 2025, 8:05pm

No, I think everything is good to go other than the few items Nacho referenced. I’ll get those updated and submit a PR shortly.

ColeMedin · April 18, 2025, 12:47am

Sounds great, I’m looking forward to it!

HillviewCap · April 18, 2025, 7:41pm

Alright hierarchical RAG integration for LLMS.txt documents by HillviewCap · Pull Request #119 · coleam00/Archon

Submitted its a biggun. I added a search feature for the https://llms-text.ai site now that I got the API working the other day. You can search for an llms.txt and request to add it to your embeddings right from the search results.