GPU is not being fully utilized and ollama/qwen2.5:32b is slow

bolto90 · November 18, 2024, 10:03pm

This is due to the increased context size, I’ve just put a PR in to allow customising this through the env file with some recommended sizes for VRAM