Seeking advice as to best models

nicholas · November 21, 2024, 7:54am

Intro
- I, like many others, am not an expert in models, parameters, quantization. I’d like some simple advice as to best practices with model selection.
Gold Standard
- Is Anthropic Claude 3.5 Sonnet (new) always going to be the best result, the gold standard?
- Is there anything else which is better or similar? From other commercial providers?
- What is the best OpenAI model to use and how does it compare to the gold standard.
Criteria
- For everything below, how does it compare to the gold standard in terms of:
  - quality of results
  - cost
  - speed
Bolt.new
- How does Otto with Claude 3.5 Sonnet compare to Bolt.new’s official product? (remember, in terms of resutls, cost, speed).
Free (Local and Online)
- What free option will give the best results? How will they compare to the gold standard?
- What is the best best free local model assuming you have lots of GPU power, or less power.
- What is the best free online model e.g. with OpenRouter.
Budget Online
- What is the best online option which is not free, but much less expensive than the gold standard.
- How does OpenRouter compare to say hosting your own model with something like Novita.
Uncensored
- Is it relevant at all to bolt development if a model is uncensored? Like would it let you create an AI girlfriend site, whereas others wouldn’t?

ColeMedin · November 25, 2024, 4:10pm

Gold Standard & Comparison:

Claude 3.5 Sonnet represents one of the current leading models in terms of reasoning, analysis, and general capabilities
GPT-4 Turbo is comparable in many aspects, sometimes performing better on certain technical/coding tasks
Both are considered top-tier, with different strengths and slight variations in performance

For development specifically:

Claude 3.5 Sonnet: Excellent reasoning, strong coding abilities, well-suited for complex tasks
GPT-4 Turbo: Very strong technical capabilities, good for coding Results: Both excellent Cost: Both premium pricing Speed: Claude 3.5 tends to be faster than GPT-4

Qwen-2.5-Coder-32b: Best free local option if you have significant GPU power
Mistral or DeepSeek: Good for limited GPU resources Results: 60-80% of “gold standard capability”

You’ll get pretty similar results, though there are some prompt additions to commercial Bolt.new that make it perform a bit better sometimes.