My new “compact” quad-eGPU rig for LLMs

Fits a 19inch rack shelf

Hi everyone! I just finished building my custom open frame chassis supporting 4 eGPUs. You can check it out on YT.

https://youtu.be/vzX-AbquhzI?si=8b7MCMd5GmNR1M51

Setup:

  • Minisforum BD795i mini-itx motherboard I took from the minipc I had

  • Its PCIe5x16 slot set to 4x4x4x4 bifurcation mode in BIOS

  • 4 5060 ti 16gb GPUs

  • Corsair HX1500i psu

  • Oculink adapters and cables from AliExpress

This motherboard also has 2 m2 pcie4 x4 slots so potential for 2 more GPUs

Benchmark results:

Ollama default settings.

Context window: 8192

Tool: https://github.com/dalist1/ollama-bench

Model Loading Time (s) Prompt Tokens Prompt Speed (tps) Response Tokens Response Speed (tps) GPU Offload %

qwen3‑next:80b 21.49 21 219.95 1 581 54.54 100

llama3.3:70b 22.50 21 154.24 560 9.76 100

gpt‑oss:120b 21.69 77 126.62 1 135 27.93 91

MichelRosselli/GLM‑4.5‑Air:latest 42.17 16 28.12 1 664 11.49 70

nemotron‑3‑nano:30b 42.90 26 191.30 1 654 103.08 100

gemma3:27b 6.69 18 289.83 1 108 22.98 100


💬 Discussion r/LocalLLM (19 points, 20 commentaires) 🔗 Source