← Back to AI Insights
Gemini Executive Synthesis

Academic integrity and proper citation practices in MoonshotAI's research papers.

Technical Positioning
Addressing concerns about the originality and proper attribution of research by ensuring all relevant prior work is cited, particularly when similarities to other published papers are noted.
SaaS Insight & Market Implications
This issue raises a serious concern regarding academic integrity, specifically the lack of citations in MoonshotAI's Attention Residuals paper, despite strong similarities to another arXiv publication. This is not an isolated incident, referencing a previous issue with Kimi-Linear. For any B2B SaaS built on research and innovation, maintaining a reputation for scientific rigor and ethical conduct is paramount. Allegations of insufficient citation or potential plagiarism can severely damage credibility, impacting partnerships, funding, and talent acquisition. Transparency and proper attribution are non-negotiable for establishing trust in the market and within the scientific community.
Proprietary Technical Taxonomy
参考引用 arxiv.org 几乎一样

Raw Developer Origin & Technical Request

Source Icon GitHub Issue Mar 17, 2026
Repo: MoonshotAI/Attention-Residuals
为什么文章中完全没有任何参考引用?

arxiv.org/abs/2502.06785 和这篇几乎一样,但是文章中一点也不提及
之前也是这样
github.com/MoonshotAI/Kimi-L...

Developer Debate & Comments

chuanyang-Zheng • Mar 17, 2026
> https://arxiv.org/abs/2502.06785 和这篇几乎一样,但是文章中一点也不提及 之前也是这样 [MoonshotAI/Kimi-Linear](https://github.com/MoonshotAI/Kimi-Linear/issues/4) Attention Residual是Layer Dimension的Quadratic Attention, Hyper Connection是Layer Dimension的Linear Attention. 这种关于layer维度的,之前有过一些 相关 工作,比如 [Learning Deep Transformer Models for Machine Translation ](https://arxiv.org/abs/1906.01787) , [Depth-Wise Attention (DWAtt)](https://arxiv.org/abs/2209.15168). 这种Attention Residual的工作相比于线性,可能在层数比较多(e.g. more than 1000)时候有一些明显效果提升相比于线性注意力(e.g. Hyper Connection) Attention在层间流动的信息本质是在做gradient descent,Pre-Norm 和 Post-Norm是在流形上的优化,x是当前点,ffn(x)或者att(x)是对应的梯度,所以可以做很多有意思的东西。从这个角度理解 Attention Residual在做类似SGD+momentum。同时,我们也可以将信息流动建模成流形上的梯度下降 [GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimization](https://arxiv.org/abs/2601.22095) 我对架构方向做的比较多,有问题可以多交流
xxyh1993 • Mar 31, 2026
啊?咱们下载的不是同一篇技术报告?
cho104 • Mar 31, 2026
I’m a bit confused by the flow of this thread. The OP originally linked to the "DeepCrossAttention paper" (published Feb 10, 2025). Since that paper's concepts seem very closely related to this repository's "Attention Residuals" (published Mar 16, 2026), the OP's concern about a missing citation feels completely valid. (similar concern on X: https://x.com/behrouz_ali/status/2033581834953453853 ) The response however seems to focus on comparing Attention Residuals to Hyper Connection, bypassing DeepCrossAttention. I checked the edit history, and it doesn't look like the OP ever changed their URL... Was there a mix up when reading the initial issue?
yisar • Mar 31, 2026
俺只是随手发了一个,大体就是,对残差链接做注意力也有文章,而且效果普遍都不是很好,却只字不提,之前的线性层那篇文章,也是和 RWKV 如出一辙,而且两篇一起看的话,作者明显是读过类似文章的

Adjacent Repository Pain Points

Other highly discussed features and pain points extracted from MoonshotAI/Attention-Residuals.

Extracted Positioning
Community engagement/acknowledgment for MoonshotAI's Attention-Residuals.
Fostering community interaction and acknowledging interest in the Attention-Residuals project, even through informal 'check-in' comments.
Extracted Positioning
Compatibility and synergistic benefits of Attention Residuals with mHC (presumably a memory or caching mechanism).
Exploring the potential for combining Attention Residuals with mHC to achieve superior performance or efficiency, indicating a focus on architectural integration and optimization.
Extracted Positioning
Implementation code for Full Attention Residuals.
Providing concrete implementation code for Full Attention Residuals to validate theoretical understanding and ensure correct application of the technique, especially where only pseudocode for Block Attention Residuals is available.
Extracted Positioning
Code availability for the 'Attention Residuals' technique.
Providing practical implementation code to enable developers to utilize the 'Attention Residuals' technique, moving beyond theoretical descriptions.
Extracted Positioning
`AttnRes` (Attention-Residuals) framework, specifically its limitations in handling 'attention saturation' and 'phase transitions' during 'long-horizon human–AI interactions.'
Enhancing `AttnRes` to manage complex, extended human-AI interactions by introducing dynamic attention modulation and supervisory interventions.

Engagement Signals

4
Replies
open
Issue Status

Cross-Market Term Frequency

Quantifies the cross-market adoption of foundational terms like 参考引用 and arxiv.org by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.