I am trying to use llama3.3 and either using the CLI or Open WebUI it is very slow. My system specs are below. Any know if there is anything I can do to speed it up? Other models like Mistral work fine.
Intel(R) Core™ Ultra 7 265KF, 3900 Mhz, 20 Core(s), 20 Logical Processor(s)
64 GB of RAM
Nvidia GeForce RX 4080 Super
Windows 11
Hi Dale,
What exact Models of Mistral you comparing to? I´m asking because the llama3.3 model is a 70B Model what needs a lot of GPU Power to use properly. I think the RTX 4080 is way to less.
I assume you are using Ollama to run Llama3.3 70b.
Since the VRAM of Nvidia GeForce RTX 4080 Super is 16GB and Llama3.3 70b is 43GB in size, it doesn’t fit whole Llama3.3 70b in VRAM of RTX 4080 Super. Thats why it is slow. So for that you can either use quantized version of Llama 3.3 like 70b-instruct-q2_K[ollama run llama3.3:70b-instruct-q2_K] which is 26GB in size or you can use other LLM suck as:
Thanks for this, llama3:70b-instruct-q2_K is significantly faster than Llama3.3 70b but still too slow for me. I will check around and review all the information you all provided! This is great! Thank you!