Question about Helios-Base speed in Table 3

PKU-YuanGroup/Helios

Status: Open

Opened: Apr 2, 2026

Comments: 1

Hi, thanks for the great work! I have a question about the speed comparison in Table 3. In Table 3, Helios-Base (14B) achieves **0.54 FPS** while Wan 2.1 14B achieves **0.33 FPS**. However, I'm confused about it since: 1. **Helios-Base uses 50 sampling steps** (as stated in Section 5.1: "For Stages 1–2, we adopt UniPC scheduler with 50 sampling steps"), which is the same as the original Wan 14B. 2. **Multi-Term Memory Patchification** is designed to compress the historical context XHist. But for pure T2V tasks (where XHist = all zeros, as mentioned in Section 3.1.1: "if XHist is all zeros, the model performs T2V"), there's no history to compress. **My questions:** 1. Was the 81-frame benchmark in Table 3 evaluated using **autoregressive chunk-by-chunk generation** (like 9 frames per chunk) or **single-pass bidirectional generation**? 2. If it was autoregressive generation, how many frames were generated per chunk? And what's the actual token count reduction from Multi-Term Memory Patchification? 3. If it was single-pass generation, then what caused Helios-Base to be faster than Wan 14B? The token compression should only work when there's actual history context. Thanks for your attention !

Python

View on GitHub ↗