Ollama Compatibility Suggestions

Dupre · March 11, 2025, 4:20pm

Here are a few things I ran into trying to get this to work with Ollama:

is_ollama = “localhost” in base_url.lower() really screws things up when accessing Ollama outside Docker requires the base URL to be: http://host.docker.internal:11434/v1. Instead of looking for localhost, maybe look for “11434” or just allow us to choose our provider from a dropdown.
For me, using Ollama locally is great for saving on costs while sacrificing speed. I don’t have a blazing-fast PC, so when trying to add the Pydantic AI Documentation to Supabase, I left it run for an hour and it was just a mess. I think this task should be separate and we should be able to select a different model and provider. Why are we forced to use the same base for everything?

Thank you.

ColeMedin · March 11, 2025, 6:51pm

Really great suggestions, I appreciate it a lot!

Revamping the environment variable set up is the next thing on my list. I’m going to make it so you can select your provider first and then provide your base URL which will solve #1. And then I also want to separate the embedding model from the LLMs like you saying for #2!

Dupre · March 12, 2025, 12:21am

If you’re going to do that, can I make an additional suggestion: when you pick a provider and enter URLs and models for that provider, save those URLs and models per provider so we can quickly switch back and forth between providers without having to keep changing the URLs. This would allow us to switch providers on the fly depending on our needs.

ColeMedin · March 13, 2025, 11:15pm

Yeah for sure! I actually just implemented environment variable profiles to do pretty much this! You can make a profile for each provider

Dupre · March 15, 2025, 2:35am

Lots of errors with the new features.

All I’m trying to do is the PydanticAI Documentation to Supabase. It shows the crawling, the processing, but never adds anything to the database. Here are my settings:

LLM Provider: OpenRouter
Embedding Provider: Ollama
BASE_URL: OpenRouter
PRIMARY_MODEL: google/gemini-2.0-flash-001
REASONER_MODEL: deepseek/deepseek-r1:free
EMBEDDING_BASE_URL: http://localhost:11434/v1
EMBEDDING_MODEL: nomic-embed-text

Here are the logs, which are strange, as the errors are related to OpenAI…which I don’t have it set to use at all.

2025-03-14 21:23:41 INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions “HTTP/1.1 401 Unauthorized”
2025-03-14 21:23:41 INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings “HTTP/1.1 401 Unauthorized”
2025-03-14 21:23:41 Error getting title and summary: Error code: 401 - {‘error’: {‘message’: ‘Incorrect API key provided: no-api-k*******ided. You can find your API key at https://platform.openai.com/account/api-keys.’, ‘type’: ‘invalid_request_error’, ‘param’: None, ‘code’: ‘invalid_api_key’}}
2025-03-14 21:23:41 Error inserting chunk: ‘NoneType’ object has no attribute ‘table’

ColeMedin · March 15, 2025, 1:01pm

Just pushed a fix for this, I realized I was using the embedding client for the LLM when creating the summaries for the documents!

puiu.adrian · March 16, 2025, 2:30pm

still not working for me, cant get anything stored in the tables.
did you managed to make it work with the ollama embedding ?

Dupre · March 16, 2025, 6:39pm

@puiu.adrian I’m getting closer… If you are using docker for Archon, make sure you use http://host.docker.internal:11434/v1 instead of http://localhost:11434/v1 or it will not work. There should really be a note in there about that.

@ColeMedin So far, I can get chunks to finally save to the database, but titles and summaries are not generating and saves “Error processing title” and “Error processing summary” to those fields. Here is the error: 2025-03-16 13:19:50 Error getting title and summary: ‘NoneType’ object is not subscriptable

Is there a way to incorporate better error handling? If it’s getting an error such as this, it shouldn’t continue running for the next several minutes inserting garbage into the database. Or, at the very least, there should be an Abort button or an easy way to cancel the run. I had to shut the entire container down to stop it.

Also, what can we add if we want to slow it down to be on the safe side when it comes to possible rate limiting. For example, for OpenRouter: “If you are using a free model variant (with an ID ending in :free), then you will be limited to 20 requests per minute and 200 requests per day” API Rate Limits | Configure Usage Limits in OpenRouter — OpenRouter | Documentation

Dupre · March 16, 2025, 9:08pm

@ColeMedin Here’s another update. I think we need a community table or list where we can submit what models work and don’t work.

For documentation crawling, Ollama nomic-embed-text seems to work fine as long at it’s paired with the “right” primary model.

I have not been able to get it to work with Ollama as the LLM Provider.

I have not been able to get it to work with free OpenRouter models and even many of the paid models. For Example, Gemini 2.0 Flash returns a bunch of these errors and the majority of the processes fail: Error processing https://ai.pydantic.dev/examples/question-graph/: list indices must be integers or slices, not str

The following setting have been the most successful so far:

LLM Provider: OpenRouter
Embedding Provider: Ollama
BASE_URL: OpenRouter
PRIMARY_MODEL: openai/gpt-4o-mini
REASONER_MODEL: openai/gpt-4o-mini
EMBEDDING_BASE_URL: http://host.docker.internal:11434/v1
EMBEDDING_MODEL: nomic-embed-text

URLs Found: 56
URLs Processed: 56
Successful: 55
Failed: 1

Like I said, it would really be nice to start building a list of what models work best for each task, so we are not needlessly wasting credits. I would really like to find less expensive models that work just as well as above.

ColeMedin · March 16, 2025, 9:16pm

Your note about using host.docker.internal instead of localhost for Ollama in Archon is good, noted!

Still thinking through how I want to do error handling but I agree it can be a lot better. I default to continuing to scrape even though the summaries fail just because they aren’t really required. Maybe we should crash the script though.

I’ve never gotten those issues with OpenRouter models before, I’ll have to do some more testing! What error do you get when using Ollama for the primary LLM for the scraping? Same as that list indices error with OpenRouter?

Dupre · March 16, 2025, 10:56pm

@ColeMedin The error I got with Ollama for primary and embedding is: 2025-03-16 13:19:50 Error getting title and summary: ‘NoneType’ object is not subscriptable. But many fail as well, only like 85 chunks were successful, a lot just timed out because the model I was using was running slow on my PC.

I’ve pretty much just given up on trying to use Ollama for the primary, I really don’t have the hardware I need to run more capable models. After all these tests, OpenAI’s 4o-mini only cost $0.04 to get the docs embedded. I wonder if it’s worth the trouble to save 4 cents.

Dupre · March 16, 2025, 11:11pm

@ColeMedin what Ollama models do you recommend? They have to have tools, so there is only a handful of options. How well did they summarize compared to OpenAI?

EDIT: Not sure if my tests are accurate, but it seems like any Ollama model that exceeds VRAM capacity ends up failing. I tested qwen2.5:14b that is just shy of VRAM (for me) and runs on CPU, resulting in the missing titles and summaries. However, a 12b model that fits within GPU memory works. It appears that when a model attempts to load into VRAM and falls back to CPU, something gets lost in the process, causing the issue.

That said, I think I’m going to stick with OpenAI as my primary. Ollama just isn’t consistent. With OpenAI, inserted document chunks consistently reach 450+. With llama3.1:8b, 385. With mistral-nemo:12b, 416 first run, 386 second run.

ColeMedin · March 19, 2025, 4:56pm

Yeah no matter the application or agent, you’ll run into issues with Ollama without the right hardware. Makes sense to stick to OpenAI or OpenRouter instead!

If you do want to keep exploring local LLMs, my general recommendation for less powerful hardware is Qwen 2.5 instruct 7b. Also the new Mistral 3.1 Small models are looking really nice!

johnmag2020 · March 20, 2025, 3:16am

I have a pair of bridged A6000 Adas and the config is pretty basic for ollama but when I started this was NOT the configs I found…just an fyi for anyone with multiple gpu’s it took me a bit to sort out so hopefully helps, its a bit of belt and suspenders but…it works better now than when I started.

ollama-gpu:
profiles: [“gpu-nvidia”]
<<: *service-ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- NVIDIA_VISIBLE_DEVICES=0,1 # A6000 GPU’s
- CUDA_DEVICE_ORDER=PCI_BUS_ID
- OLLAMA_SCHED_SPREAD=1 # Spread model across both GPUs
- OLLAMA_NUM_GPU=2 # Explicitly set to 2 GPUs
- OLLAMA_GPU_LAYERS=0 # 0 = all layers on A6000 GPU

ColeMedin · March 20, 2025, 8:37pm

Yes I do need to update this to suppport multiple GPUs or at least call this out in the docs - thanks for sharing the config changes you had to make!