New Qwen 2.5 Coder 32b Absolutely CRUSHING it!

ikonnow · November 18, 2024, 12:41pm

Can you confirm that I don’t need to open up and save a new model file for the 32b model as mentioned in the readme in order to use its full capacity?! Or is the 32b version also ducked by default from Obama?

vannoo67 · November 18, 2024, 2:02pm

Can confirm that it is not necessary to manually create a model with a larger context windows (as previously described). I believe there was a PR that addressed this.

vannoo67 · November 18, 2024, 2:13pm

For anyone who is interested. I successfully tested running oTToDev using Qwen2.5-coder:30b on my Windows 10 system with an RTX 4070 TI Super (16G VRAM), Ryzen 7 5700 with 32G System RAM. It wasn’t quick, “Build a todo app in React using Tailwind” took just over 13 minutes.

At its peak it used;
Dedicated GPU memory: 14.3
Shared GPU memory: 11.6

It must have done a fair bit of shuffling between VRAM and Shared GPU RAM because the GPU utilization never got higher than 12%, while the CPU utilization sat at around 95% for most of the time.

I am just stoked that it works. I didn’t have much luck with the smaller models, so I’m glad that I now have a working solution. (Might have to save for a XX90 with 24G VRAM so that it runs in real-time.)

dinopio · November 18, 2024, 8:44pm

yes its this one, but when the UI loads it, it grows! thats my issue

dinopio · November 18, 2024, 8:46pm

here is my bug report Ollama custom Modelfile is listed in the models but reloads it with larger token value · Issue #313 · coleam00/bolt.new-any-llm · GitHub

dinopio · November 18, 2024, 8:51pm

even without the custom model file it still loads it as double size.

bolto90 · November 18, 2024, 9:53pm

I’ve just put a pull request in to allow easy changing of the context size which help reduce the vram requirements

github.com/coleam00/bolt.new-any-llm

Created DEFAULT_NUM_CTX VAR with a default of 32768

coleam00:main ← aaronbolton:main

opened 08:53PM - 18 Nov 24 UTC

aaronbolton

+12 -1

Adding DEFAULT_NUM_CTX to enable easier adjust on context sizes DEFAULT_NUM_C…TX VAR set to a default of 32768 but can be adjusted in .env.local, also adding some example values to .env.example Example Context Values for qwen2.5-coder:32b 32768 # Consumes 36GB of VRAM 24576 # Consumes 32GB of VRAM 12288 # Consumes 26GB of VRAM 6144 # Consumes 24GB of VRAM

dinopio · November 19, 2024, 6:33am

I applied the changes locally to test and they work great! Nice work
!

artis.leilands · November 19, 2024, 5:03pm

Sorry if it’s a dumb question, but could you please guide me and tell what exactly did you change in the .env (or .env.local) to choose specifically Hyperbolic or DeepInfra? My .env.local only contains the OPEN_ROUTER_API_KEY, but that alone doesn’t determine which provider will be used?? As I understand, to choose an alternative provider you need to use dynamic routing, but I can’t seem to figure out how to do it in oTToDev. Thanks in advance!

Arka · November 20, 2024, 4:09pm

hi
my file name is .env

i changed these lines (If you don’t have these lines, you should redownload the repository from github. there will be a file .env.example - you need to rename it to .env or .env.local)

OPENAI_LIKE_API_BASE_URL=https://api.deepinfra.com/v1/openai
OPENAI_LIKE_API_KEY=here is my key from deepinfra

for hyperbolic:
OPENAI_LIKE_API_BASE_URL=https://api.hyperbolic.xyz/v1
OPENAI_LIKE_API_KEY=here is my key from hyperbolic

currently i’m using hyperbolic, because they fixed qwen stutters

p.s. qwen 32b is amazing. it fixed error that claude made and couldn’t fix

Dovid · November 21, 2024, 10:35am

Just checked this out on openrouter. It’s cheaper than Deepseek.
Deepseek: $0.35/M
Qwen Instruct 32B via Openrouter: $0.18/M
just a tad more than half the price.

ColeMedin · November 21, 2024, 6:55pm

Yeah the pricing is one of the reasons I love the model so much! For those who can’t run it with Ollama it’s still super affordable.

artis.leilands · November 21, 2024, 7:20pm

It works, thank you!!!
It’s acting weird though, tried setting up a project with Vite and shadcn, the process was riddled with errors and incorrect structure setup scripts. Same with next + tailwind, same files are rebuilt over and over again, everything is installed again after each prompt, errors are present here as well. Doesn’t act very smart to be honest… is it the model or my system somehow?

Arka · November 21, 2024, 7:40pm

i think it’s still hyperbolic stutters, i’m not sure why, but i had those problems too in ottodev and in other coders. it rewrote the same code many times in a row, creating the same files, stuttering weird words and just consuming tokens
but then the problem just disappeared, i didn’t do anything

try deepinfra. it has lower output but maybe it will be enough for your needs

nickmartin · November 24, 2024, 2:51am

So all this hype around the model is general use outside of oTToDev? Can anyone show or share something awesome that has been built with oTToDev and Qwen Instruct 32B?

ColeMedin · November 25, 2024, 4:17pm

It’s hype both within and outside of oTToDev @nickmartin! Generally smaller models like Qwen-2.5-Coder-32b aren’t going to be strong enough to build huge apps, but it certainly is enough to help you iterate on a proof of concept like I showed in my video on Qwen!