GPU is not being fully utilized and ollama/qwen2.5:32b is slow

ollama ps shows 28% CPU / 72% GPU when called via oTToDev using a 3090.
happens on windows and linux.
screen updates / code generation are very slow.

on windows, if the model is called via openwebui, ollama ps will show 100% GPU and GPU in task manager shows 100% usage and code / text is generated quickly.

so it seems like this model can run fine on this computer, and that there is something different about how ollama is called.

any ideas on how to get full utilization of the 3090 when using oTToDev?

This is due to the increased context size, Iā€™ve just put a PR in to allow customising this through the env file with some recommended sizes for VRAM

2 Likes

Gotcha, Iā€™ve also seen this if Iā€™m running something simultaneously that accesses local LLMs (in my case, open-webui).

Could you link to the PR directly here? Thanks!

Here your go

1 Like

Reviewed with change request, looks good once thatā€™s addressed.

manually tested and this seems to work, thanks for the fix!
this is at 6144, at 12288 thereā€™s some CPU use. will try more settings

8192 is as high as i can go with one 3090 at 100% GPU while things are snappy. 9216 is still quick (3% CPU), 10240 is ok, above that starts to get slow, curious what the tradeoffs will be likeā€¦

# DEFAULT_NUM_CTX=32768 # Consumes 36GB of VRAM
# DEFAULT_NUM_CTX=24576 # Consumes 32GB of VRAM
# DEFAULT_NUM_CTX=12288 # Consumes 26GB of VRAM
# DEFAULT_NUM_CTX=11264 # Consumes 24GB of VRAM
# DEFAULT_NUM_CTX=10240 # Consumes 24GB of VRAM
# DEFAULT_NUM_CTX=9216 # Consumes 24GB of VRAM
DEFAULT_NUM_CTX=8192 # Consumes 24GB of VRAM
# DEFAULT_NUM_CTX=7168 # Consumes 24GB of VRAM
# DEFAULT_NUM_CTX=6144 # Consumes 24GB of VRAM

Iā€™ve made the changes and also updated the docs, its all in the PR now :slight_smile:

Why are PRs taking so long to be merged into develop? (in general) there are currently 36 open and waitingā€¦

We dont really have a dev environment at the moment, I think with the project being so young it probably a lack of resources aswell, but saying that it would be good to have a dev branch where PRā€™s could be merged in faster and then merged into a more stable branch once tested by the masses.

2 Likes

I agree, a development branch will speed up the testing of new features. @ColeMedin what say you?

2 Likes

I agree and this is something we are talking about setting up as a maintainer team! @bolto90 is right that we are definitely lacking resources right now with how quickly the project is growing, but Iā€™m working hard building the up team so we can continue to get all these awesome PRs merged!

2 Likes

Hi @mahoney any chance this PR can be merged, all the change have been made.

1 Like

Thanks for the ping, been off and online intermittently today. A-okay to merge and Iā€™ll do so shortly, if anyone beats me to it I am :+1:

@bolto90 thanks for the patience, resolved a conflict in .env.example, merged in at Merge pull request #328 from aaronbolton/main Ā· coleam00/bolt.new-any-llm@ad8b48e Ā· GitHub