New Qwen 2.5 Coder 32b Absolutely CRUSHING it!

ColeMedin · November 12, 2024, 7:34pm

Hey guys!

Just wanted to share quickly that Qwen 2.5 Coder 32b was released yesterday and this is now my favorite model already for working with oTToDev. The 7b parameter version of Qwen 2.5 Coder has been around for a while, but this larger version is brand new and totally crushing it! I’d highly recommend giving it a shot if your machine is capable!

I’ll be making content around it this week. Also I’ll pin this post for the week since honestly this is a pretty big deal considering how well it is performing!

Here is a link to the model in the Ollama library:

Arka · November 12, 2024, 7:39pm

is there a way to use 32b with hyperbolic.xyz api?
my pc almost blew up with this model

dsmflow · November 12, 2024, 8:56pm

Huggingface is hosting their demo. Qwen 2.5 Reference

chidinweke · November 12, 2024, 9:53pm

Running a large language model like Qwen 2.5-Coder 32B on a laptop is likely to be challenging due to the significant computational and memory requirements of such models.

Arka · November 12, 2024, 10:51pm

i run it on pc with rtx 4070, it’s just that i have 32gb ram
ram was used at 100%, gpu at 15%

chidinweke · November 12, 2024, 10:59pm

The RTX 4070 has 12GB of dedicated VRAM. While this is substantial, loading and running a 32B model would likely require more VRAM than is available. Even if you could fit the model into VRAM, the performance would likely be very slow. While the RTX 4070 and 16GB of RAM provide some capability for running large models, the combination of insufficient VRAM, limited system RAM, and the processor’s limitations make it impractical to run Qwen 2.5-Coder 32B on this setup. For practical use, you would likely need a more powerful system with at least 32GB of system RAM and a GPU with more VRAM, such as an RTX 3090 or RTX 4090. Alternatively, you could consider using cloud-based services that provide access to powerful GPUs and the necessary infrastructure to run such large models.

Arka · November 12, 2024, 11:20pm

yep, i just used hyperbolic, although it didn’t work before (just didn’t show available models)
but it’s so lunatic sometimes…

spamming the text like a stutterer

mahoney · November 13, 2024, 3:10am

qwen coder 32b is a stretch for my M1 Max, so I decided to mess around with the 14b and 3b instruct models instead. According to the link @dsmflow posted, the instruct models compete fairly well against the larger base ones. Jury’s out if that’s the case, but doing some testing against oTToDev this week to find out.

wilson · November 13, 2024, 6:20am

Interesting. I’ve heard mixed reviews, but I’m glad to hear some more positive feedback!

ColeMedin · November 13, 2024, 4:19pm

@chidinweke True! There is a 14b parameter version as well, plus you can use Qwen 2.5 Coder 32b through OpenRouter!

ColeMedin · November 13, 2024, 4:19pm

Woahh that is wild, I haven’t seen anything like this before!

chidinweke · November 14, 2024, 12:23pm

@ColeMedin OpenRouter or Deepseek which would you recommend?

Arka · November 14, 2024, 12:33pm

yeah, it’s still do that idk why
2-3 promts is okay, but then…of-of-of-of-of-of-of-of-of-of-of-of-of
maybe the problem on hyperbolic side, i used their api

upd.
it was hyperbolic
i changed base to DeepInfra and it works perfectly
15 prompts so far, there was a lot of errors that everyone is talking about (problems with loading previews, etc), but qwen fixed them all

ColeMedin · November 14, 2024, 2:21pm

OpenRouter will choose the cheapest API under the hood so I would suggest using that!

ColeMedin · November 14, 2024, 2:21pm

Thanks for the update, I’m glad the switch fixed it!

artis.leilands · November 16, 2024, 8:56pm

Hey, how did you get to choose the provider? I tried setting it as staticModels in constants, but it doesn’t show in the list if the provider is any other than OpenRouter.
Seems weird that Hyperbolic would give worse results than other providers because DeepInfra and Fireworks have 33K context and 4K max output but Hyperbolic has 128K context and 8K output. The difference should be significant. Maybe it’s one of the next features of oTToDev – an interface to tune the model parameters and choose the provider?

Arka · November 16, 2024, 10:10pm

i didn’t edit staticModels or constants
i just changed url in .env and api key in .env and it’s just work

the only provider i can’t use is Google AI Studio for Gemini Experimental 1114
all others works fine

dinopio · November 17, 2024, 10:31pm

i got 2x 3090 too with distribution on both but when loading its much larger than the expected size over ollama? how did you tune it to work with 32b?

ColeMedin · November 18, 2024, 1:07am

@dinopio Did you install the default version that Ollama gives?

The default model is actually Q4 quantization so it’s not the actual full size of the model but still performs almost as well. I would make sure you are using that!