Claude 3.5 Sonnet is still the best for coding right now, only to be trumped by o1 once that is available through the API for the general public. Though it certainly isn’t as fast!
Qwen 2.5 Coder 32b and DeepSeek Coder v2 236b are my favorite open source LLMs for coding. You definitely need beefy hardware to run them yourself, but you can use a provider like OpenRouter to use them through an API for dirt cheap! WAY cheaper than using Claude.
Hello, it’s really great to be here. I would like to know if you could direct me to a list of coding llm’s and a tutorial of how to add them to Bolt.diy? This would be really helpful. Thank you in advance.
for online ones you just select from the list and add your api key(s). for local ones you can use ollama or lmstudio. do a google search for each if you need instructions, but it’s dead simple. there are also lots of yt vids.
phi-4 is good - but its only 16k context windows(how much the model can read and write in one pass). Context is crucial to keep the model understanding all the dependencies and the code. I am starting to have some good success with the new Gemini Flash with 1M context myself. I used to be on that team - so I am a little bias:)
I feel like i should be asking this question everyday. What is the best local hosted model to use? I am setting up a server with 96 threads and 6tb of persistent memory to run any size llm host to let it churn at its own speed. Any suggestions for the LLM to power bolt.diy?
I downloaded DeepSeek R1 14B and was quite surprised at how bad it was. Not sure if I need to configure further, but out of the box it was unable to even make a working package.json file.
Also, responses to follow up prompts were almost silly. Like taking out a library in one response and putting it back in with a comment that it was to be removed in the next.
The LLM requires VRAM(GPU) to run. I have managed to run very small local models like Quen2.5-coder .5b on a laptop but insanely slow. Even my desktop gaming PC with a 2080 was slow. You need a lot of VRAM and decent CPU to crunch the numbers.
I saw this a couple of days ago and it gives you enough details to understand
Take a look at your system specs. Also be aware of your context length. These constrain the LLM from delivering a suitable outcome. Make sure you have Use Context Optimization switch on too.
I have a poweredge server I am hooking up 3 gpu expansion chassis to it. Looking at trying out v100 32 gb and amd instict cards at this point. Been picking up used gv100 when ever i find them under a grand. 3 x 8 x 32gb = 768 gb when full is the goal until the higher memory cards drop in price. Almost 6tb of persistent memory so still looking for the best model…
I keep getting rate limited with claude. I am even debating picking up perplexity if i can get the 300 calls for claude through them. Maybe a team setup…