GPU is not being fully utilized and ollama/qwen2.5:32b is slow

This is due to the increased context size, I’ve just put a PR in to allow customising this through the env file with some recommended sizes for VRAM

2 Likes