Required hardware configuration to run locally the oTToDev

cjgdef2020 · November 11, 2024, 1:53pm

please give us the minimal required hardware configuration to run locally the oTToDev

stevenK · November 11, 2024, 3:05pm

I wouldnt bother unless you got 8gb vram or more

ColeMedin · November 11, 2024, 4:51pm

It depends a lot on which LLMs you want to run locally! And if you use an API like OpenRouter to use open source models not running locally, you can really use any machine!

But if you want to use local LLMs, I would suggest at least having some graphics card for smaller LLMs like Qwen 2.5 coder 7b, and then for larger LLMs that are 30b+ parameters, having at least a 3080 or 3090 GPU.

dsmflow · November 11, 2024, 6:10pm

Update …

CPU:

Optimal: Aim for an 11th Gen Intel CPU or Zen4-based AMD CPU, beneficial for its AVX512 support which accelerates matrix multiplication operations needed by AI models. CPU instruction set features matter more than core counts, with DDR5 support in newer CPUs also important for performance due to increased memory bandwidth.

RAM:

Minimum for a Decent Experience: 16GB is the starting point for running models like the 7B parameters effectively. It’s enough for either running smaller models comfortably or managing larger ones with caution.

Disk Space:

Practical Minimum: About 50GB should suffice, primarily to accommodate the Docker container size (around 2GB+ for ollama-webui) and model files, without needing a large buffer beyond the essentials.

GPU:

Not Mandatory but Recommended for Enhanced Performance: A GPU can significantly improve model inference performance. However, the ability to run quantized models and the requirement for VRAM depends on the GPU generation. For instance, a 7B model at FP16 might require around 26GB of VRAM, which is beyond the capacity of many consumer-grade GPUs.
For Running Quantized Models: GPUs that support 4-bit quantized formats can much more efficiently handle large models, needing significantly less VRAM:
7B model requires ~4 GB
13B model requires ~8 GB
30B model needs ~16 GB
65B model needs ~32 GB

Considerations for Larger Models:

Running 13B+ and MoE Models: Only advisable if you have a Mac, or a particularly large GPU or one that supports quant formats effectively. The memory and computational requirements for these larger models are substantially higher, making them impractical for most consumer-level hardware unless specifically configured for AI.

source

tybohannon · November 12, 2024, 5:46am

@cjgdef2020 it’s important to understand that there are two key configurations to consider to be successful with oTToDev.

Part 1 is installing the oTToDev “app stack” on any computer (Mac, Windows, Linux) that has the “app stack” dependencies ( Node.js, pnpm, etc.) configured. This can be your local computer or a Virtual Private Server (VPS) from a provider like DigitalOcean, etc.

Part 2 is having sufficient Video RAM (VRAM) on a dedicated Graphics Processor Unit (GPU) which is required to run the models (LLM’s) that power the AI coding that oTToDev creates.

If you don’t have a GPU with 8GB or higher, then the best route is to understand how to use OpenRouter so you can link oTToDev running on your local/VPS machine with an LLM running on OpenRouter. This bypasses the need to have a pricey GPU in your computer.