GPU is not being fully utilized and ollama/qwen2.5:32b is slow

yroc · November 18, 2024, 9:32pm

ollama ps shows 28% CPU / 72% GPU when called via oTToDev using a 3090.
happens on windows and linux.
screen updates / code generation are very slow.

on windows, if the model is called via openwebui, ollama ps will show 100% GPU and GPU in task manager shows 100% usage and code / text is generated quickly.

so it seems like this model can run fine on this computer, and that there is something different about how ollama is called.

any ideas on how to get full utilization of the 3090 when using oTToDev?

bolto90 · November 18, 2024, 10:03pm

This is due to the increased context size, I’ve just put a PR in to allow customising this through the env file with some recommended sizes for VRAM

mahoney · November 18, 2024, 11:15pm

Gotcha, I’ve also seen this if I’m running something simultaneously that accesses local LLMs (in my case, open-webui).

Could you link to the PR directly here? Thanks!

bolto90 · November 18, 2024, 11:27pm

Here your go

github.com/coleam00/bolt.new-any-llm

Created DEFAULT_NUM_CTX VAR with a default of 32768

coleam00:main ← aaronbolton:main

opened 08:53PM - 18 Nov 24 UTC

aaronbolton

+12 -1

Adding DEFAULT_NUM_CTX to enable easier adjust on context sizes DEFAULT_NUM_C…TX VAR set to a default of 32768 but can be adjusted in .env.local, also adding some example values to .env.example Example Context Values for qwen2.5-coder:32b 32768 # Consumes 36GB of VRAM 24576 # Consumes 32GB of VRAM 12288 # Consumes 26GB of VRAM 6144 # Consumes 24GB of VRAM

mahoney · November 19, 2024, 1:02am

Reviewed with change request, looks good once that’s addressed.

yroc · November 19, 2024, 4:49am

manually tested and this seems to work, thanks for the fix!
this is at 6144, at 12288 there’s some CPU use. will try more settings

yroc · November 19, 2024, 5:26am

8192 is as high as i can go with one 3090 at 100% GPU while things are snappy. 9216 is still quick (3% CPU), 10240 is ok, above that starts to get slow, curious what the tradeoffs will be like…

# DEFAULT_NUM_CTX=32768 # Consumes 36GB of VRAM
# DEFAULT_NUM_CTX=24576 # Consumes 32GB of VRAM
# DEFAULT_NUM_CTX=12288 # Consumes 26GB of VRAM
# DEFAULT_NUM_CTX=11264 # Consumes 24GB of VRAM
# DEFAULT_NUM_CTX=10240 # Consumes 24GB of VRAM
# DEFAULT_NUM_CTX=9216 # Consumes 24GB of VRAM
DEFAULT_NUM_CTX=8192 # Consumes 24GB of VRAM
# DEFAULT_NUM_CTX=7168 # Consumes 24GB of VRAM
# DEFAULT_NUM_CTX=6144 # Consumes 24GB of VRAM

bolto90 · November 19, 2024, 7:52am

I’ve made the changes and also updated the docs, its all in the PR now

frogman544 · November 19, 2024, 8:21am

Why are PRs taking so long to be merged into develop? (in general) there are currently 36 open and waiting…

bolto90 · November 19, 2024, 11:26am

We dont really have a dev environment at the moment, I think with the project being so young it probably a lack of resources aswell, but saying that it would be good to have a dev branch where PR’s could be merged in faster and then merged into a more stable branch once tested by the masses.

frogman544 · November 19, 2024, 1:25pm

I agree, a development branch will speed up the testing of new features. @ColeMedin what say you?

ColeMedin · November 20, 2024, 8:51pm

I agree and this is something we are talking about setting up as a maintainer team! @bolto90 is right that we are definitely lacking resources right now with how quickly the project is growing, but I’m working hard building the up team so we can continue to get all these awesome PRs merged!

bolto90 · November 21, 2024, 7:59pm

Hi @mahoney any chance this PR can be merged, all the change have been made.

mahoney · November 22, 2024, 12:15am

Thanks for the ping, been off and online intermittently today. A-okay to merge and I’ll do so shortly, if anyone beats me to it I am

mahoney · November 22, 2024, 2:42am

@bolto90 thanks for the patience, resolved a conflict in .env.example, merged in at Merge pull request #328 from aaronbolton/main · coleam00/bolt.new-any-llm@ad8b48e · GitHub