Just wanted to share something I thought was pretty cool, QwQ-Preview seemed pretty decent but would not interact with the Bolt.diy Terminal when I originally tested it. So I was curious to see if an Instruct model would solve this and it did, but currently there are only two available that I found and the interesting thing is how small the models are 3B and 7B. That and HuggingFace had GGUF quantized models (but not through their API, so running locally was really the only option), so I brought them into LM Studio to them a go.
Models:
Both were able to interact with the terminal, but the 3B had a bit of hit and miss. The 7B model used only 1.4GB RAM and was able to one-shot the “Build a simple blog using Astro” prompt. Obviously it isn’t perfect and none of the images are loading, but for the size it’s promising. Maybe with a RAG (because arguably I am using the QwQ model wrong, it’s meant to pass output back to itself to simulate reasoning, so a type of RAG), or implement it in a workflow (n8n, Flowise, etc.) they really do show some potential.
And I’m curious to see how well the QwQ-Preview (32B) does once Quantized with an Instruct model. It might just out perform the other “Coder” models right out of the box, and if used as intended, lol, who knows?