Howdy good folks.
I’m wondering if we, together, can optimize Qwen2.5-Coder:32b by removing programming languages we generally do not need and, in turn, make the model fit below 7.5GiB. This should make it possible to run on normal GPUs like the 3070 and up. On my laptop with a 3070 and 32GiB of RAM, I’ve seen reasonable performance with models up to 11-12GiB. Over that, it becomes painfully slow.
The reason I’m focusing on Qwen2.5-Coder is that a lot of people are praising it.
Personally don’t have any experience with tuning models like this, but I’m hopeful that there are somebody here that can help.
If GPU power is necessary, perhaps I can set up a Novita.ai instance and share the credentials for it. This way we can create a model that is really optimized for oTToDev. Perhaps it can be named oTToDev2.5-Coder.
Thoughts?
2 Likes
I am not knowledgable in tuning a model either, but feel a bit reluctant to limit a model to a particular language.
My limited reasoning:
- I doubt that there is a “language” in the model. It’s a model trained in programming. Some if not many structures in imperative languages are alike. A model is an associative network in which there are not partitions for particular languages. Thus, I doubt that you even could e. g. strip out
Java
- Even if this was possible, I am not sure we’d know what languages are relevant. I also use the tool for Java code, there’ definitely Javascript, Typescript, CSS, some frameworks like React, Vue, Angular (
) and of course – cough – Svelte. Then, for all data science Python of course and with WASM making a step into the browser, we must not forget the Rust-army ![:wink: :wink:](https://thinktank.ottomator.ai/images/emoji/twitter/wink.png?v=12)
Therefore, I’d rather wait for somebody publishing a “Vue with Typescript and Vite” or a “Svelte with Javascript and JSDoc” model than trying to modify an existing one.
Just my 0,02$
Thanks for your feedback @mrsimpson
According to the documentation on ollama.com, Qwen 2.5 Coder 32B performs excellent across more than 40 programming languages (qwen2.5-coder).
If it’s possible to strip out the ones we kind of know we won’t have any use for, perhaps it’s also possible to make the model smaller. I really need the input from people who have knowledge on this.
AFAIK, bolt.new and oTToDev focuses mostly on creating web apps. This can, of course, be extended in the future. But for now we should at least support JS and TS and most of the frameworks.
Other than that, I know very little.
1 Like
That is not how it works, but would be nice to have a smaller one. You can try pruning and but then you would need additional fine tuning.
1 Like
Could fine tune multiple models for different tools, maybe things like tailwind etc. Then it may be possible to have different expert models work on each part.
There already are smaller versions of Qwen2.5-coder:32b. There is a 14b and a 7b, but sadly they don’t have ‘critical mass’ to adequately drive oTToDev at present. Maybe tuning the prompt will help.
I had thought about that a bit too, but that’s not generally how LLM’s work. Generally they are defined by the amount of training tokens in billions. My thought was, could you take a LLM dataset, vectorize it and remove duplicate data in vector space… Basically generalizing the model which would reduce the size.
There are methods people use to reduplicate data in LLM’s though and of course there is Quantization. And 4bit quantization of larger models has shown some impressive results (in some cases <5% loss in accuracy) at a lot less memory.
However, many API’s still don’t support this.
Agreed. In my experience you need at least a 32B parameter model to be useful. And even 4bit Quantized still requires like 40GB VRAM. And the 70+B models are better.
you cannot remove specific knowledge from LLM, its trained as a whole. you can prune the model but still there is no such thing as selective pruning just to keep a language. and also it does not matter what the dataset size it. model size is depends on number of parameters, not the dataset size,
so you can train a 3B model with the same amount of training data that you are training a 70B model with. its just the matter of how much quality data the dataset has and how much the parameters can capture that knowledge
there is another thing, that is the intelligence of a model on coding does not only depends on the training it took on that particular language, but also from the other languages. it can generalize concepts it has seen on one language and can apply in another language. so keeping only one language in the training will only hurt the model after it is trained from scratch.
and these are about training from scratch…
but yes you can use fine-tuning on a specific data, but the model size will remain same.
If you are interested in optimizing qwen for Bolt.diy maybe something like Unsloth would be a good route. Maybe ORDP I have started a dataset for training using there Notebook. I was going to give it a go after I grew the dataset some as we would need more data for fine-tuning, I was going to start with the 14B version though. If anyone is interested in talking about this more I am open:
2 Likes