Insight for: I did get it working, with a lot of pain, if your interested here's a readme I had claud crank out capturing the gotchas.
`Flash-MoE` for running large MoE models (Qwen3.5-397B-A17B) locally on Apple Silicon Macs.
This issue provides a critical 'gotcha' guide for `Flash-MoE`, highlighting the significant setup complexity for running massive MoE models locally on Apple Silicon. The primary pain point is the exorbitant temporary disk space requirement (~450GB) and the need for high-end unified memory (48GB+). For B2B SaaS, while 'zero cloud dependency' is a strong value proposition for data privacy and cost control, such demanding local setup requirements create a high barrier to entry. Enterprises seeking to deploy large models on edge devices or developer workstations need streamlined, less resource-intensive deployment processes. This indicates a market need for more efficient model packaging, automated resource management, and clearer, less painful onboarding to unlock the full potential of local LLM inference.
GitHub Issue
SaaS Metrics