Regarding hardware specs of your computer

Hi,

Can’t recall if this was mentioned in Cole’s video or someone else’s, but I remember that ollama’s qwen2.5-coder 32b model has almost same (a bit better) capability as chatgpt. This is just 1 example.
The question is, if you computer cannot handle 32b model (for example), what’s the meaning of running otto.dev locally?
If you run locally, you want to generate a code with same quality as chatgpt or other paid editors such as bolt.new etc.

And to have/assembly computer with such specs you will need at least USD5000.

I use dell xps17 (9730) which has to some extend good specs, but it runs 7b with some latency.

Please correct me if I’m missing anything here.

2 Likes

Hey @askar, great question!

Qwen-2.5-coder-32b is a fantastic open source LLM and does indeed have similar capabilities to GPT a lot of the time.

It’s a fair question to ask what the point of oTToDev is though if you can’t run those kind of models. I wouldn’t say it’s $5k to build a computer that can run a 32b parameter model, but definitely at least between $2k-$3k since you’d want something like a 3090 GPU.

The beauty of oTToDev though is you don’t have to run the models yourself to use open source models. Providers like OpenRouter give you access to these models for VERY cheap compared to the large closed source models like Claude 3.5 Sonnet and GPT-4o.

And the goal with oTToDev is to have something fully built up for when open source models catch up, because at that point it’ll be a hay day for all of us building up this project.

4 Likes

Hi Cole,

Thanks for reply.
My apologies if I’m mistaken, do you mean in oTToDev one can use multiple open-source models via OpenRouter?

Also, it’d be great if you create a post and pin it with recommended (or the one you currently use) hardware specs.
Surely, it’s worth investing $2k-$3k to run AI models locally free of charge.

Thanks again!

1 Like

Regarding, OpenRouter, I got it now. Looks interesting.

1 Like

Yeah OpenRouter is great!

I am actually planning a video diving into hardware requirements for different models and then I would love to create a pinned post along with the video. Still a good amount of research to do to cover different model sizes!

3 Likes

Sounds awesome. Thanks!

1 Like

I’ve tested a lot of local models, and I can’t run much on my local machine. I do have a custom-built AI machine with 48GB of VRAM and 192GB of RAM but that’s for another project and only partially up.

I’ve found that most smaller models that can run on CPU or lower resources don’t really write code well and hallucinate a lot. So, you basically need 40GB of VRAM to run anything useful and that’s with 4bit quantization…

So, I’ve personally been using HuggingFace which is a free option… but then due to token caps, I started using OpenRouter which seems to be the best cost as it’s more of a marketplace for compute. You can get both Qwen2.5 and LLama 3.3 models for under 40¢/MTok, which is fantastic.

And while I run Bolt.New/oTToDev locally for testing, I just run it in Cloudflare Pages, so between that and using an API, it uses very little system resources.

I’d suggest, sign up for a HuggingFace account, create an API key and use that for free for a while. Try out LLama3.3, Qwen3.5-Coder-32B, and Qwen2.5-72B-Instruct. They seem to be pretty good, but with limitations of course.

Hope that helps!

2 Likes

I built one for $800, but I got creative with using an older off-lease Dell T5500 (a server-ish/workstation desktop). Supports two physical Xeon sockets and 192GB ECC memory (definitely not the fastest), and I used two Tesla M40 (with 24GB VRAM each). Added a 20TB HDD and 2TB SSD (doesn’t support NVME). And I already had the PC (but I did factor that into the cost --it’s like $100 on eBay-- and 80GB RAM was ~20 bucks), so…

3 Likes

@aliasfox Thanks for the information. I’ll try HuggingFace.

Good luck. If you have issues, you know where to come!

2 Likes