Ollama - Slow and using my CPU not gpu

leander.starr26 · January 8, 2025, 2:05am

Hi everyone, kinda new to AI and development.

I have a RTX4070ti trying to use qwen 2.5b coder 14b or 32b as i have 64gb ram. At the moment its using only my CPU, any idea why thats the case?

Is it the context size? In that case any idea what is a better context size to fit my gpu and also share 16gb of VRAM with 4GB of DDR5 to run qwen 2.5 32b? Thanks

leex279 · January 8, 2025, 9:22am

Hi @leander.starr26,

you can set the context size in the .env file. There are some examples at the end. Try it and report if it fixes it

See also: Contribution Guidelines - bolt.diy Docs