My 2x 3090 GPU LLM Build

william-ty · December 12, 2024, 1:32pm

Thank you for detailing this!

I actually can’t run it fast at all in Bolt. It runs okay in a VS Code pluging such as Cody or Cline, but with Bolt it’s just a no go. I guess I must be bottlenecked by my 16Gb Memory stick, or maybe I’m using the wrong model.

I’ve been advised to use this version on Ollama discord (not specifically for bolt but because it’s the most optimized for this size apparently)

I’ve followed your video to convert a model to autodev, which I have do with the Qwen 2.5 Coder 32b, but it fails when I use it in bolt.diy.
I just downloaded it from Ollama and did the following command:
ollama create -f Qwen2.5Coder qwen2.5-coder-ottodev:32b-base-q6_K

You would recommend using the Q4_1 version ?

Thanks a lot for taking the time

ColeMedin · December 13, 2024, 3:53pm

Yeah the Q6 version is definitely going to require beefier hardware! I generally just use the Q4 version (it’s the default one in Ollama when you install Qwen 2.5 Coder 32b) since it fits on a single graphics card with 24GB of VRAM like the 3090.

MartinMayday · December 22, 2024, 8:05am

Thanks for the share!

I am about to build one for my self - but coming from apple Mac Studio ultra and can’t really decide to to a Ryzen 9 vs Intel i9 - usage local llm, orbstack, obsidian, UTM or VMware Fusion.

Would you mind sharing why you choose Ryzen 9 7950X3D over Intel I9-14900? (2x RTX 3090 24 GB)

aliasfox · December 22, 2024, 7:25pm

I’d just be cautious with Intel right now because their 13/14th Gen CPU have had some stability issues and require a microcode Bios update (motherboard). Hopefully anything you get at this point would already have it applied, but just be aware.

AMD had some issues with USB firmware last Gen, but I think that has been resolved.

And I wasn’t sure which, either… So with a little research people make some points: Intel has issues with cooling, geared towards specifically productivity, and is the last in the series supported by the socket… so limited upgradeability. And the Intel one is only a little cheaper.

I’d probably go with the consensus of the Internet, AMD, but idk. Do your own research and use your best judgement.

Make sure to get a fast NVME and a bit of RAM though (32GB at least). For running big models, and development, I also got an 18TB for storage.

And sadly it looks like you just missed all the good sales.

ColeMedin · December 23, 2024, 1:48am

I would agree with what @aliasfox said, thank you for that input by the way!

In the end the CPU doesn’t matter a ton for LLM inference because by the time any of an LLM is run on the CPU it’s going to be incredibly slow no matter what. You want your GPUs to be able to fit any LLM you want to run.

I just chose the Ryzen 9 7950X3D because AM5 (7950X) is a newer platform with a longer upgrade path.