AI Chat App - Backend Architecture - Brainstorming!

Hello!

I’m looking for someone to nerd with :wink:

I’m working on an AI app, which can be summarised as Claude + RAG. I love the Projects in Claude, but they are too limited for my taste. I want my app to allow for web scraping and knowledge reuse (you can create a Project but add the things/websites you’ve already added before).

I’m having a lot of fun with the frontend part and know I can implement the server for it. I want to start designing the backend (AI-related endpoints) now, and I have much less experience there, so I would like to brainstorm with you.

  1. Vector Database and the source reuse.

In my database (supabase), I want to have projects + lists of sources, so when the user sends a message to the chat in a specific project, I can get the list of references to use as a query for the vector database (VDB). The idea is to not have a separate namespace in VDB for each project (which would prevent me from reusing the sources between projects, at least without duplicating the data) but just one namespace per user and to have a bit more complex query (to say ”use those files in VDB”).

What do you think about this approach? I know the queries to VDB might be slower with more complex queries, but still. It makes sense to me, but I might not understand it well enough.

  1. LLM-related Endpoints

  2. /send_chat_message - standard chat functionality - checks user auth and then uses project to tell the LLM what can be used from the VDB

    • Parameters:
      • user_id - from the access token
      • message - obviously
      • project_id
        • to get the list of sources for the VDB
        • to fetch chat history
        • any other files uploaded to specific chat messages in a given conversation
      • conversation_id
        • to check the llm model set by the user for this conversation (I want them to be able to choose)
        • to get the system prompt or any other additional context specific to the conversation
  3. /upload_to_vector_db - I planned to use S3 for the files, upload them first and then have an async queue (redis) to grab URLs to files after the upload, handle the embeddings and update the record in supabase. When a user uploads a file, before it gets ready, they will first see “uploading” and then “processing” (statuses on a record in supabase).

    • Additional ideas:
      • I could keep the hash of a file and check for duplicates when another user uploads the same file. Then, I could then copy the data between the namespaces of those users without paying for the embeddings again. Make sense, or am I over-engineering?
  4. /scrape_website (single & crawl)

    • I was thinking about Crawl4AI
    • Parameters
      • user_id - from the access token
      • url - obviously
      • project_id - optional, if users are adding directly to the project and not to their library of sources
      • mode - single or crawl

I was wondering if it makes sense to keep the files or just to get the embeddings and be done with the files. However, the S3 storage is cheap, and I might not need to worry about the amount of data needed for a while, so it won’t harm me. Files can be useful if I ever want to recreate the embeddings using different chunking strategies, for example.

Regarding scraping, I want to let the users hit refresh or have an option to re-scrape on schedule. However, in both cases, the major concern is not to recreate the embeddings for the unchanged data. In Flowise, there is an option to use Record Manager, so I was hoping for something similar. I would rather not route the embedding process via Flowise just for this; I want to have control over the code. Although it would probably work, but still - there has to be a better way.

What other endpoint do you think I would need? I ofc skipped all the standard get_projects, etc, I have supabase for that. But maybe there are more smart LLM-related things I haven’t even considered.

I invite you to the discussion! I guess what I want to achieve will be similar to many other AI apps, so hopefully, we can all learn something interesting while exploring the architecture :wink:

1 Like

Wow I can tell you’ve put a lot of thought into this - I love your design!

I agree that keeping the files in S3 even if it’s just for recreating the embeddings with new strategies in the future is a good idea. Also could be useful if you make the LLM cite its source and allow the user to download the file cited.

For rescraping, typically I’m thinking you would just clear all the embeddings for each page before you reinsert them. Reason being if you “upsert” the vectors you might leave old vectors around if the latest version of the page has less chunks than a previous version. See what I’m saying? Like if a page is originally 10 chunks but then goes down to 9, you’d only update 9 chunks and then the 10th one would remain and be outdated.

I’m not actually familiar with the Record Manager in Flowise but I’m curious how it works! I’m sure it would be pretty easy to replicate in code. You’re definitely right that it wouldn’t be worth routing to Flowise just to take advantage of that feature.

Cool idea! I could give people an option to download files relevant to the answer! Nice addition to listing the URLs to related pages when using scraped content :wink:

1 Like

[https://www.youtube.com/watch?v=sNk6-ISi7i4&t=441s](https://Flowise Record Manager: Stop Duplicate Data Forever! (Leon van Zyl))

It’s quite cool, I missed that when I moved to N8N (I initially started with prototyping in Flowise). But I’m not sure if it would cover the case of less embeddings and removing the leftovers. Fair point that it’s worth considering.

1 Like