Allow user file upload for agents?

udd · January 2, 2025, 9:31am

Will there be an option to allow for user file upload (pdf/html/csv etc) for agents? As per the docs

Input Params

{
  "query": "User's input text",
  "user_id": "Unique user identifier",
  "request_id": "Unique request identifier",
  "session_id": "Conversation session identifier"
}

How to allow for user file upload as well? this is crucial for setting up a few of the agents I have built locally as they work on external data sources, instead of using the web.

Will this be possible?

ColeMedin · January 3, 2025, 9:12pm

Great question @udd! This is not implemented yet but I plan on having it ready by the time the Hackathon is under way. So you can safely assume you’ll be able to upload files on the platform for your agents! Assuming the files are small enough (<1MB).

My plan is to have a fifth parameter here called “files” which will just be an array of the base64 for each file.

udd · January 3, 2025, 10:25pm

What will be the compute side restrictions for these scripts? My plan for the agent was to integrate an OpenInterpreter of sorts, which will not use LLM API requests but use CPU compute (LLM will decide the action, action runs on CPU compute). If allowed, I can link e2b to the code for this, which will require another API key for the user.

Another follow up question, for any external api keys used (e2b etc), will the user input the api key or will it be included in the credit based pricing system for the studio.ottomator.ai.

Would it be possible to increase the file size limit to more than 1mb, or maybe take file input from some other method (gdown and url of file)? Also could you please clarify the compute restrictions for these scripts for the tools, as the LLM will require API calls so that is minimal, but the tools can potentially take up a lot of compute. Please let me know.

P.S. Big fan of your work love what youre doing with bolt.diy and even mroe so with the ottomator community.

ColeMedin · January 4, 2025, 5:15pm

I do want the agents to mostly remain I/O bound, but as long as what you are doing is efficient, having the agent be mostly CPU bound will work.

For agent authorization to use specific apps like e2b, that is up to you to implement that. Could be the agent asks for the user’s API key and then your API logic can store that key tied to the user based on the user ID that is given from the Studio frontend.

I am planning on extending the Live Agent Studio to work with these kind of credentials natively in the future (probably mostly OAuth credentials), but right now that isn’t possible. It’s a difficult problem to tackle which is why there are big platforms out there like Composio and Arcade.

What kind of file limit are you looking for? I just want to avoid someone spamming the server with a ton of files, but I could probably accommodate for what you are looking for! I could also go through file storage like you’re suggesting, still debating on that.

Thanks for the kind words, that means a lot!

udd · January 4, 2025, 9:13pm

Hi,
First of all thank you for your prompt responses, really appreciate you taking time out of your day to answer my questions.

For File Upload: I have built an agent that I use locally for my work, and I was planning to submit that. It helps me analyse large databases (spreadsheets and sqlite). For it to work on agent studio, I will have to modify the logic to allow the user to input their db. if you dont want the user to store the file on the server, i could modify the logic so that it is fetched from google drive link using gdown, the user can input the url directly in the prompt and the agent will automatically extract the url, use gdown to download the file, perform the required operations and delete the file as soon as the response is generated. It will still require compute power to manipulate the dfs, so I wanted to give the user the option to use local compute (for database upto 1-5mb) or e2b for larger compute (database from 5mb-1gb).

However the server wont be spammed with large files often as nothing will get saved.

Please let me know if this is feasible or should I submit a different agent, I have lots agents I have locally and dont mind converting them to the format as required by studio.ottomator, it would be helpful for the community.

For E2B: Similar to the google drive logic, I could have the user enter their api key directly in chat, have the agent extract api key and set that as env variable from within the agent itself. Please let me know if this would be allowed or should I change my approach.

harusama8121 · January 11, 2025, 11:50am

Even I was wondering on this, like does Live Agent Studio supports file uploads or not. The agent I use locally, i pass it a pdf file containing certain information and the text is extracted from it and sent to the LLM. so wondering if there is any such option to upload files

harusama8121 · January 12, 2025, 4:10pm

@ColeMedin is file upload feature available yet?

ColeMedin · January 13, 2025, 7:47pm

@udd / @harusama8121 Yes file uploading is now available on the Live Agent Studio! I updated the Developer Guide over there with instructions on how to use it.

ColeMedin · January 13, 2025, 7:48pm

Great thinking here! Local compute for 1-5 MB and then e2b for larger compute sounds like a good plan.

And you can certainly use a Google Drive linkk with gdown and have the user put that into the chat! Similarly with the API key for E2B - you are spot on there.