Hello!
I’m looking for someone to nerd with
I’m working on an AI app, which can be summarised as Claude + RAG. I love the Projects in Claude, but they are too limited for my taste. I want my app to allow for web scraping and knowledge reuse (you can create a Project but add the things/websites you’ve already added before).
I’m having a lot of fun with the frontend part and know I can implement the server for it. I want to start designing the backend (AI-related endpoints) now, and I have much less experience there, so I would like to brainstorm with you.
- Vector Database and the source reuse.
In my database (supabase), I want to have projects + lists of sources, so when the user sends a message to the chat in a specific project, I can get the list of references to use as a query for the vector database (VDB). The idea is to not have a separate namespace in VDB for each project (which would prevent me from reusing the sources between projects, at least without duplicating the data) but just one namespace per user and to have a bit more complex query (to say ”use those files in VDB”).
What do you think about this approach? I know the queries to VDB might be slower with more complex queries, but still. It makes sense to me, but I might not understand it well enough.
-
LLM-related Endpoints
-
/send_chat_message
- standard chat functionality - checks user auth and then uses project to tell the LLM what can be used from the VDB- Parameters:
user_id
- from the access tokenmessage
- obviouslyproject_id
- to get the list of sources for the VDB
- to fetch chat history
- any other files uploaded to specific chat messages in a given conversation
conversation_id
- to check the llm model set by the user for this conversation (I want them to be able to choose)
- to get the system prompt or any other additional context specific to the conversation
- Parameters:
-
/upload_to_vector_db
- I planned to use S3 for the files, upload them first and then have an async queue (redis) to grab URLs to files after the upload, handle the embeddings and update the record in supabase. When a user uploads a file, before it gets ready, they will first see “uploading” and then “processing” (statuses on a record in supabase).- Additional ideas:
- I could keep the hash of a file and check for duplicates when another user uploads the same file. Then, I could then copy the data between the namespaces of those users without paying for the embeddings again. Make sense, or am I over-engineering?
- Additional ideas:
-
/scrape_website
(single & crawl)- I was thinking about Crawl4AI
- Parameters
user_id
- from the access tokenurl
- obviouslyproject_id
- optional, if users are adding directly to the project and not to their library of sourcesmode
- single or crawl
I was wondering if it makes sense to keep the files or just to get the embeddings and be done with the files. However, the S3 storage is cheap, and I might not need to worry about the amount of data needed for a while, so it won’t harm me. Files can be useful if I ever want to recreate the embeddings using different chunking strategies, for example.
Regarding scraping, I want to let the users hit refresh or have an option to re-scrape on schedule. However, in both cases, the major concern is not to recreate the embeddings for the unchanged data. In Flowise, there is an option to use Record Manager, so I was hoping for something similar. I would rather not route the embedding process via Flowise just for this; I want to have control over the code. Although it would probably work, but still - there has to be a better way.
What other endpoint do you think I would need? I ofc skipped all the standard get_projects
, etc, I have supabase for that. But maybe there are more smart LLM-related things I haven’t even considered.
I invite you to the discussion! I guess what I want to achieve will be similar to many other AI apps, so hopefully, we can all learn something interesting while exploring the architecture