Costing Only 300,000? Building a Personal AI Supercomputer on 4 Units of 512GB Mac Studio, Local Deployment Guide for Trillion-Parameter Kimi-K2.5

In this era of large model frenzy, we all have a dream: to run a trillion-parameter model locally that rivals GPT-5. But the reality is harsh; even a 4-bit quantized trillion-parameter model requires massive video memory. H100 and B200 are too expensive; what to do if you can't afford them?

Today, JamePeng will guide you to use 4 fully equipped M3 Ultra Mac Studios, through EXO+MLX and Thunderbolt 5, to create a local AI supercomputer with 2TB of unified memory! The goal is simple: to run the Kimi-K2.5 trillion-parameter large model locally.

Why Go Through All This?

Not just for the cool factor, but also for data privacy and ultimate local control.

The core weapon is EXO (GitHub: exo-explore/exo), which supports RDMA (Remote Direct Memory Access) and can merge the unified memory of 4 Macs into a massive video memory pool.

Hardware list: 4 Mac Studios (M3 Ultra, 512GB memory version), total video memory approximately 2TB, connected using Thunderbolt 5 (120Gbps bandwidth), system requires macOS Tahoe 26.2 or later.

Step 1: Enable RDMA Support

On each Mac, perform the following:

Shut down the Mac and enter recovery mode (hold the power button, select "Options" > "Continue")
Open Terminal and run: bputil -a rdma
Restart the Mac
Verify: systemprofiler SPThunderboltDataType to check RDMA is enabled

Thunderbolt 5 provides 120Gbps bandwidth, perfectly supporting data transfer.

Step 2: Install EXO

macOS App installation: Download EXO-version.dmg from GitHub, install and run it. Open the Dashboard to add other Mac IPs.

Source code installation:

Install Homebrew

git clone https://github.com/exo-explore/exo.git

pip install -e .

exo start

Step 3: Physical Connection and Topology

Do not use Wi-Fi for networking! Not even Wi-Fi 7 will work. The inference of trillion-parameter models is extremely sensitive to bandwidth. Please use Thunderbolt 5 cables, designate one Mac as the master node and the other three as worker nodes. A star topology or chain connection is recommended.

In the EXO Dashboard, you should see all 4 devices online, with the total memory pool displayed as 2048 GB.

Step 4: Download and Run MLX Community Version Kimi-K2.5

Download the model:

pip install huggingfacehub huggingface-cli download mlx-community/Kimi-K2.5 --local-dir ./models/mlx-community/Kimi-K2.5

Start the inference engine:

exo run --model ./models/mlx-community/Kimi-K2.5 --quant 4 --shards auto --engine mlx Command explanation:

--model: points to the model directory
--quant 4: uses 4-bit quantization to reduce memory usage
--shards auto: EXO automatically intelligently splits the model
--engine mlx: utilizes the 76-core GPU and Neural Engine of M3 Ultra for inference

Final Effect and Testing

When the terminal shows Ready, you have your own AI supercomputer.

Prefill stage: The fans of the 4 Macs start to accelerate slightly (thanks to the energy efficiency of M3 Ultra, they won't take off).

Generation stage: Tokens pop out one after another.

Speed: Although it cannot match the H100 cluster, thanks to the RDMA support of Thunderbolt 5, the token generation speed can reach 17-28 tokens/s. This is fully interactive for a trillion-parameter model!

Conclusion

This solution is definitely not cheap, but it proves that with Apple Silicon and the efforts of the open-source community, the future of decentralized AI is coming. We do not need to send data to cloud giants; we can build powerful private inference clusters using the devices at hand.

Costing Only 300,000? Building a Personal AI Supercomputer on 4 Units of 512GB Mac Studio, Local Deployment Guide for Trillion-Parameter Kimi-K2.5

Costing Only 300,000? Building a Personal AI Supercomputer on 4 Units of 512GB Mac Studio, Local Deployment Guide for Trillion-Parameter Kimi-K2.5

Why Go Through All This?

Step 1: Enable RDMA Support

Step 2: Install EXO

Step 3: Physical Connection and Topology

Step 4: Download and Run MLX Community Version Kimi-K2.5

Final Effect and Testing

Conclusion

You Might Also Like

Claude Code Buddy Modification Guide: How to Obtain Shiny Legendary Pets

Obsidian Launches Defuddle, Taking Obsidian Web Clipper to New Heights

OpenAI Suddenly Announces 'All-in-One': Browser + Programming + ChatGPT Merge, Internally Admits Mistakes Over the Past Year

2026, No More Forcing Myself to be 'Disciplined'! Do These 8 Simple Things, and Health Will Naturally Follow

Moms Who Work Hard to Lose Weight but Can't, Definitely Fall Here

AI Browser 24-Hour Stable Operation Guide