I’m hoping this post can serve as a place for people to get inspiration about PC builds for their AI use cases. If you want to be part of this, post your build in the following format and hopefully we can help people out
FORMAT:
Cost: (USD, and local currency after if desired)
Platform: (Windows, MacOS, etc)
AI Models Used with minimum 20 tokens/s output: (if any)
PC Specs:
GPU:
CPU:
RAM:
Motherboard:
PSU:
Storage:
Cooling:
Case:
I have been looking at this a lot lately. I think my goal is to shoot for an L40 GPU or possibly a couple V100 GPUs to run as an AI machine. The cost is quite a bit, but its not QUITE out of reasonable reach if I’m patient.
Wow! This is like way over my head. I was just looking into getting a new laptop for around $1,000 that has a good NVIDIA video card to use with some of these new AI Generative Video apps and tools. I was not looking at the top of the line video cards mind you. I was barely able to handle understanding those specs for what I need to run those types of apps. What you wrote out here is leaving me like
You may not need those specs - I plan to set up workflows that have the AI running a lot though and generating a number of things, both when I’m doing app development and when I’m offline. I’m sure a less beefy system will still be able to run local AI, it’s just a question of power efficiency and model size.
@navyseal4000 Love this, thank you for starting this thread!
I’m curious why you’re looking to get a single 5080/5090 GPU instead of two 3090s? Typically for LLM inference having more VRAM is actually better for running the larger model, even if both GPUs aren’t as strong as a more expensive single one. I’d love to know your thoughts on that!
@ColeMedin There are a couple of primary reasons. First, the 5000 series GPUs are supposed to have GDDR7 VRAM, which increases bandwidth by 1.4x, max capacity significantly, lowers voltage from 1.35V to 1.2V and increases transfer speeds. It also uses a simpler data transfer protocol, further reducing processing time. VRAM is likely the largest rate limiter in an LLM from what I know, so the upgrade from GDDR6 or GDDR6X to GDDR7 should have a big impact. Second, the higher speed is most useful for me from what I know to run the models I’m looking to run.
Eventually I’ll want to get a second GPU and replace the PSU to accommodate that, but I’m not there yet. When I do, it will likely be a bit of an experiment - as I said before, VRAM is my primary concern… I’ve heard of ways to mod VRAM by soldering new modules on the GPU, but I wouldn’t want to do that with a primary card because there are risks involved, including but not limited to frying the card with the solder job and getting modules incompatible with the VBIOS, which would essentially brick the GPU.
Lamda Labs has a unique offering in this space…lambdalabs dot com. Same “lambda” stack on their entire platform of offerings, desktops (Vector series), rack-mounted (Scalar series) and cloud hosted. They also are not shy about up front pricing so there are no illusions about the barrier cost to entry…
PSU: 1000 watt Coolermaster Gold (bought almost 10 years ago and still going strong)
Storage:
3.6TB HDD (ST4000DM000-1F2168)
3.6TB HDD (ST4000DM004-2U9104)
1.8TB External Portable Drive
1.8TB NVMe SSD (Samsung SSD 990 PRO 2TB)
1.8TB NVMe SSD (CT2000P3SSD8)
7.8GB cryptswap partition
79TB NAS (connected over NFS for datasets, models, etc.)
Cooling (All Air Cooled):
CPU Temperature: 36.5°C
NVMe SSDs Temperature:
Samsung SSD 990 PRO: 36.9°C
CT2000P3SSD8: 46.9°C
Case: I judge an AI by what’s on the inside, not the outside.
I can run models up to 20B inside the VRAM as the 16gb card is used first. The OS will fill both cards evenly, so as long as my os doens’t use more than 4gb, I can use 24gb of vram for models as the left overs. One day I’ll have to money to get two nice cards, but these do well for me atm.
Great build and eval rate, I’m shocked at the cpu only speed… but I have to say - I like to take a note from Rome and the Old World; that which is useful also ought to be beautiful, because truth (found deterministically through the machine) and beauty are both important transcendentals
Disclaimer: Not recommended for the faint of heart. I built this a little while back and it appears prices are a little higher or I just got some deals. So, add 20% or so. And I would probably not recommend this for most people.
~$850 Build
Specs:
Dell T5500 ($150 eBay, was ~$90)
Dell T5500 CPU Expansion Board ($30)
192GB DDR3 Memory (was like $40),
2x Nvidia Tesla 24GB M40 GPUs (48GB VRAM total; $185/ea)
2x Intel Xeon X5675 ($20)
18TB HDD ($220)
2TB SSD ($40 Micro Center Deal)
Splitters, Cables, Blower Fan, and Etc. ($40)
Notes: I also made a custom 3D Printed shroud to connect the Graphics cards and only use one blower fan, because they don’t have any as they are server GPU’s. They also don’t have a video output, so I used a spare GPU I had (slim one slot). If you were on a super budget, you could get a MDD (Amazon Recertified HDD with 5-year warranty for very cheap).
I will be trying out the new LLama 3.3 Model with 4bit Quantization and will need to let you know how it works. You can run models <14B on a laptop or PC easily (even if you don’t have a GPU with llama.cpp variants, etc.). I only built this thing to try out bigger models.
I would love some thoughts and feedback on using Apple Mac with M4 Max chip for running local LLMs?
The interesting thing for me is that they have a unified memory model which allows the GPU access to the full memory in the system. I am sure that there are pros and cons to this. This is what I am thinking.
Cons:
The memory will not be as fast as dedicated graphics memory
Cost of machine is high
Pros:
Lots of unified memory available for running local LLMs
This is a very strong developer machine
It is a laptop, so I can work anywhere
Specs for a maxed out MBP:
[16-inch MacBook Pro - Space Black]
$7,848.98 or $654.08/mo. for 12 mo.*$7,848.98 or $654.08 per month for 12 months
16-inch Liquid Retina XDR display²
Nano-texture display
Apple M4 Max chip with 16‑core CPU, 40‑core GPU, 16‑core Neural Engine
128GB unified memory
8TB SSD storage
140W USB-C Power Adapter
Three Thunderbolt 5 ports, HDMI port, SDXC card slot, headphone jack, MagSafe 3 port
Small but big correction (but I can no longer edit). I meant my budget setup has 2x Tesla P40’s, not M40’s. Kind of a big deal performance wise because the Pascal series were the first with Tensor cores.