GitHub Issue

I did get it working, with a lot of pain, if your interested here's a readme I had claud crank out capturing the gotchas.

Discovered On Mar 28, 2026
Primary Metric open
# Flash-MoE Setup Guide — The Real One ## What This Is A step-by-step guide to running Qwen3.5-397B-A17B (397 billion parameter MoE model) locally on an Apple Silicon Mac using [danveloper/flash-moe](https://github.com/danveloper/flash-moe). Written from an actual setup on an M4 Max 64GB MacBook Pro — including every gotcha we hit. **End result:** ~5 tok/s interactive chat + OpenAI-compatible API server. Zero cloud dependency. --- ## Hardware Requirements - Apple Silicon Mac (M3 Max, M4 Pro, M4 Max, or better) - **Minimum 48GB unified memory** (64GB+ recommended for better page cache hit rates) - **~450GB free disk space during setup** (drops to ~215GB after cleanup) - 1TB+ SSD (all Apple Silicon Macs qualify) - macOS 26.2+ (Darwin 25.2.0+) ### Disk Space Budget — Read This First This is the #1 thing that will bite you. The setup has three phases of disk usage: | Phase | Cumulative Disk Used | Notes | |-------|---------------------|-------| | Download MLX 4-bit model | ~210 GB | Source safetensors files | | Git LFS cache (hidden) | ~420 GB | `.git/lfs/` holds a second copy | | After `git lfs fetch --all` cleanup | ~210 GB | Delete `.git/lfs/` to reclaim | | After `repack_experts.py` | ~420 GB | 210GB source + 209GB packed experts | | After deleting source model | **~215 GB** | Final footprint | **You need ~450GB free to start.** Plan your cleanup steps. On a 1TB drive, this means you need most of your disk empty. **Critical cleanup commands** (safe to run at each s...
View Raw Thread