My new “compact” quad-eGPU rig for LLMs
Fits a 19inch rack shelf
Hi everyone! I just finished building my custom open frame chassis supporting 4 eGPUs. You can check it out on YT.
https://youtu.be/vzX-AbquhzI?si=8b7MCMd5GmNR1M51
Setup:
-
Minisforum BD795i mini-itx motherboard I took from the minipc I had
-
Its PCIe5x16 slot set to 4x4x4x4 bifurcation mode in BIOS
-
4 5060 ti 16gb GPUs
-
Corsair HX1500i psu
-
Oculink adapters and cables from AliExpress
This motherboard also has 2 m2 pcie4 x4 slots so potential for 2 more GPUs
Benchmark results:
Ollama default settings.
Context window: 8192
Tool: https://github.com/dalist1/ollama-bench
Model Loading Time (s) Prompt Tokens Prompt Speed (tps) Response Tokens Response Speed (tps) GPU Offload %
qwen3‑next:80b 21.49 21 219.95 1 581 54.54 100
llama3.3:70b 22.50 21 154.24 560 9.76 100
gpt‑oss:120b 21.69 77 126.62 1 135 27.93 91
MichelRosselli/GLM‑4.5‑Air:latest 42.17 16 28.12 1 664 11.49 70
nemotron‑3‑nano:30b 42.90 26 191.30 1 654 103.08 100
gemma3:27b 6.69 18 289.83 1 108 22.98 100
💬 Discussion r/LocalLLM (19 points, 20 commentaires) 🔗 Source