GitHub Issue
I did get it working, with a lot of pain, if your interested here's a readme I had claud crank out capturing the gotchas.
# Flash-MoE Setup Guide — The Real One
## What This Is
A step-by-step guide to running Qwen3.5-397B-A17B (397 billion parameter MoE model) locally on an Apple Silicon Mac using [danveloper/flash-moe](https://github.com/danveloper/flash-moe). Written from an actual setup on an M4 Max 64GB MacBook Pro — including every gotcha we hit.
**End result:** ~5 tok/s interactive chat + OpenAI-compatible API server. Zero cloud dependency.
---
## Hardware Requirements
- Apple Silicon Mac (M3 Max, M4 Pro, M4 Max, or better)
- **Minimum 48GB unified memory** (64GB+ recommended for better page cache hit rates)
- **~450GB free disk space during setup** (drops to ~215GB after cleanup)
- 1TB+ SSD (all Apple Silicon Macs qualify)
- macOS 26.2+ (Darwin 25.2.0+)
### Disk Space Budget — Read This First
This is the #1 thing that will bite you. The setup has three phases of disk usage:
| Phase | Cumulative Disk Used | Notes |
|-------|---------------------|-------|
| Download MLX 4-bit model | ~210 GB | Source safetensors files |
| Git LFS cache (hidden) | ~420 GB | `.git/lfs/` holds a second copy |
| After `git lfs fetch --all` cleanup | ~210 GB | Delete `.git/lfs/` to reclaim |
| After `repack_experts.py` | ~420 GB | 210GB source + 209GB packed experts |
| After deleting source model | **~215 GB** | Final footprint |
**You need ~450GB free to start.** Plan your cleanup steps. On a 1TB drive, this means you need most of your disk empty.
**Critical cleanup commands** (safe to run at each s...
View Raw Thread
Market Trends