AI PC Builds Inspiration

navyseal4000 · November 11, 2024, 2:08am

Hey guys,

I’m hoping this post can serve as a place for people to get inspiration about PC builds for their AI use cases. If you want to be part of this, post your build in the following format and hopefully we can help people out

FORMAT:
Cost: (USD, and local currency after if desired)
Platform: (Windows, MacOS, etc)
AI Models Used with minimum 20 tokens/s output: (if any)
PC Specs:
GPU:
CPU:
RAM:
Motherboard:
PSU:
Storage:
Cooling:
Case:

Add picture here, if you’d like.

navyseal4000 · November 11, 2024, 2:16am

For my own (and don’t worry, no affiliate links lol),

Special Consideration: I’m building this PC new and am waiting on the 5000 series GPUs, so my final cost isn’t determinable yet.
Cost: TBD - Estimated $2500-$3500
Platform: Windows
AI Models Used with minimum 20 tokens/s output: N/A
PC Specs:
GPU: 5080 or 5090, single
CPU: AMD Ryzen™ 9 9900X
RAM: G.SKILL Ripjaws M5 RGB Series (Intel XMP 3.0) DDR5 RAM 96GB (2x48GB) 6400MT/s CL32-39-39-102
Motherboard: ASRock X870 Pro RS WiFi
PSU: GIGABYTE GP-UD1000GM 1000W 80 Plus Gold
Storage: SAMSUNG 990 EVO SSD 2TB, PCIe Gen 4x4, Gen 5x2 M.2 2280 NVMe
Cooling: ARCTIC Liquid Freezer III 360 A-RGB - CPU AIO Water Cooler
Case: Either NZXT H6 Flow or MONTECH King 95 PRO

Picture will be added when built, if I remember

tableflipfoundry · November 11, 2024, 2:25am

I have been looking at this a lot lately. I think my goal is to shoot for an L40 GPU or possibly a couple V100 GPUs to run as an AI machine. The cost is quite a bit, but its not QUITE out of reasonable reach if I’m patient.

Jubilee · November 11, 2024, 2:35am

Wow! This is like way over my head. I was just looking into getting a new laptop for around $1,000 that has a good NVIDIA video card to use with some of these new AI Generative Video apps and tools. I was not looking at the top of the line video cards mind you. I was barely able to handle understanding those specs for what I need to run those types of apps. What you wrote out here is leaving me like

I got so much to learn.

navyseal4000 · November 11, 2024, 2:42am

You may not need those specs - I plan to set up workflows that have the AI running a lot though and generating a number of things, both when I’m doing app development and when I’m offline. I’m sure a less beefy system will still be able to run local AI, it’s just a question of power efficiency and model size.

ColeMedin · November 11, 2024, 4:43pm

@navyseal4000 Love this, thank you for starting this thread!

I’m curious why you’re looking to get a single 5080/5090 GPU instead of two 3090s? Typically for LLM inference having more VRAM is actually better for running the larger model, even if both GPUs aren’t as strong as a more expensive single one. I’d love to know your thoughts on that!

navyseal4000 · November 11, 2024, 4:50pm

@ColeMedin There are a couple of primary reasons. First, the 5000 series GPUs are supposed to have GDDR7 VRAM, which increases bandwidth by 1.4x, max capacity significantly, lowers voltage from 1.35V to 1.2V and increases transfer speeds. It also uses a simpler data transfer protocol, further reducing processing time. VRAM is likely the largest rate limiter in an LLM from what I know, so the upgrade from GDDR6 or GDDR6X to GDDR7 should have a big impact. Second, the higher speed is most useful for me from what I know to run the models I’m looking to run.

Eventually I’ll want to get a second GPU and replace the PSU to accommodate that, but I’m not there yet. When I do, it will likely be a bit of an experiment - as I said before, VRAM is my primary concern… I’ve heard of ways to mod VRAM by soldering new modules on the GPU, but I wouldn’t want to do that with a primary card because there are risks involved, including but not limited to frying the card with the solder job and getting modules incompatible with the VBIOS, which would essentially brick the GPU.

dsmflow · November 11, 2024, 5:46pm

Lamda Labs has a unique offering in this space…lambdalabs dot com. Same “lambda” stack on their entire platform of offerings, desktops (Vector series), rack-mounted (Scalar series) and cloud hosted. They also are not shy about up front pricing so there are no illusions about the barrier cost to entry…

draeician · November 14, 2024, 6:03pm

Cost: Under $2k

Platform: Pop!_OS 22.04 LTS (Linux)

Models Tested:
llama3.2 3.2B 13k context (full gpu)

total duration: 1.229629894s
load duration: 26.243911ms
prompt eval count: 33 token(s)
prompt eval duration: 695ms
prompt eval rate: 47.48 tokens/s
eval count: 51 token(s)
eval duration: 507ms
eval rate: 100.59 tokens/s

qwen-2.5-32k context (cpu only)

total duration: 4.442502369s
load duration: 90.738234ms
prompt eval count: 37 token(s)
prompt eval duration: 1.642s
prompt eval rate: 22.53 tokens/s
eval count: 22 token(s)
eval duration: 2.304s
eval rate: 9.55 tokens/s

PC Specs:
GPU:
- NVIDIA GeForce RTX 3060, 12GB VRAM
- NVIDIA GeForce RTX 4060 Ti, 16GB VRAM

CPU: AMD Ryzen 5 5600X3D 6-Core Processor
RAM: 46GB (49206904 kB total)
Motherboard: TUF GAMING B550-PLUS WIFI II

PSU: 1000 watt Coolermaster Gold (bought almost 10 years ago and still going strong)

Storage:

3.6TB HDD (ST4000DM000-1F2168)
3.6TB HDD (ST4000DM004-2U9104)
1.8TB External Portable Drive
1.8TB NVMe SSD (Samsung SSD 990 PRO 2TB)
1.8TB NVMe SSD (CT2000P3SSD8)
7.8GB cryptswap partition
79TB NAS (connected over NFS for datasets, models, etc.)

Cooling (All Air Cooled):

CPU Temperature: 36.5°C
NVMe SSDs Temperature:
- Samsung SSD 990 PRO: 36.9°C
- CT2000P3SSD8: 46.9°C

Case: I judge an AI by what’s on the inside, not the outside.

I can run models up to 20B inside the VRAM as the 16gb card is used first. The OS will fill both cards evenly, so as long as my os doens’t use more than 4gb, I can use 24gb of vram for models as the left overs. One day I’ll have to money to get two nice cards, but these do well for me atm.

ColeMedin · November 15, 2024, 12:43am

That makes sense, love your thoughts here!

ColeMedin · November 15, 2024, 12:43am

Nice build and I appreciate all the details @draeician!!

navyseal4000 · November 15, 2024, 1:28pm

Great build and eval rate, I’m shocked at the cpu only speed… but I have to say - I like to take a note from Rome and the Old World; that which is useful also ought to be beautiful, because truth (found deterministically through the machine) and beauty are both important transcendentals

juan · December 4, 2024, 12:41am

I’ve build some tables that can solve many questions:

aliasfox · December 10, 2024, 6:41pm

Disclaimer: Not recommended for the faint of heart. I built this a little while back and it appears prices are a little higher or I just got some deals. So, add 20% or so. And I would probably not recommend this for most people.

~$850 Build

Specs:

Dell T5500 ($150 eBay, was ~$90)
Dell T5500 CPU Expansion Board ($30)
192GB DDR3 Memory (was like $40),
2x Nvidia Tesla 24GB M40 GPUs (48GB VRAM total; $185/ea)
2x Intel Xeon X5675 ($20)
18TB HDD ($220)
2TB SSD ($40 Micro Center Deal)
Splitters, Cables, Blower Fan, and Etc. ($40)

Notes: I also made a custom 3D Printed shroud to connect the Graphics cards and only use one blower fan, because they don’t have any as they are server GPU’s. They also don’t have a video output, so I used a spare GPU I had (slim one slot). If you were on a super budget, you could get a MDD (Amazon Recertified HDD with 5-year warranty for very cheap).

I will be trying out the new LLama 3.3 Model with 4bit Quantization and will need to let you know how it works. You can run models <14B on a laptop or PC easily (even if you don’t have a GPU with llama.cpp variants, etc.). I only built this thing to try out bigger models.

mkramer · December 10, 2024, 7:57pm

I would love some thoughts and feedback on using Apple Mac with M4 Max chip for running local LLMs?

The interesting thing for me is that they have a unified memory model which allows the GPU access to the full memory in the system. I am sure that there are pros and cons to this. This is what I am thinking.

Cons:

The memory will not be as fast as dedicated graphics memory
Cost of machine is high

Pros:

Lots of unified memory available for running local LLMs
This is a very strong developer machine
It is a laptop, so I can work anywhere

Specs for a maxed out MBP:

[16-inch MacBook Pro - Space Black]

$7,848.98 or $654.08/mo. for 12 mo.*$7,848.98 or $654.08 per month for 12 months

16-inch Liquid Retina XDR display²
Nano-texture display
Apple M4 Max chip with 16‑core CPU, 40‑core GPU, 16‑core Neural Engine
128GB unified memory
8TB SSD storage
140W USB-C Power Adapter
Three Thunderbolt 5 ports, HDMI port, SDXC card slot, headphone jack, MagSafe 3 port
Backlit Magic Keyboard with Touch ID - US English
Accessory Kit

aliasfox · December 17, 2024, 5:28pm

Small but big correction (but I can no longer edit). I meant my budget setup has 2x Tesla P40’s, not M40’s. Kind of a big deal performance wise because the Pascal series were the first with Tensor cores.