Insight for: I did get it working, with a lot of pain, if your interested here's a readme I had claud crank out capturing the gotchas.

`Flash-MoE` for running large MoE models (Qwen3.5-397B-A17B) locally on Apple Silicon Macs.

Analyzed: Apr 1, 2026

This issue provides a critical 'gotcha' guide for `Flash-MoE`, highlighting the significant setup complexity for running massive MoE models locally on Apple Silicon. The primary pain point is the exorbitant temporary disk space requirement (~450GB) and the need for high-end unified memory (48GB+). For B2B SaaS, while 'zero cloud dependency' is a strong value proposition for data privacy and cost control, such demanding local setup requirements create a high barrier to entry. Enterprises seeking to deploy large models on edge devices or developer workstations need streamlined, less resource-intensive deployment processes. This indicates a market need for more efficient model packaging, automated resource management, and clearer, less painful onboarding to unlock the full potential of local LLM inference.

Flash-MoE Qwen3.5-397B-A17B MoE model Apple Silicon Mac M4 Max 64GB MacBook Pro ~5 tok/s interactive chat OpenAI-compatible API server Zero cloud dependency unified memory disk space MLX 4-bit model safetensors files Git LFS cache repack_experts.py

GitHub Issue

Parent Entity

I did get it working, with a lot of pain, if your interested here's a readme I had claud crank out capturing the gotchas.

State: Open