DroPE:通过在预训练后移除位置编码扩展 LLM 上下文窗口
论文标题:Extending the Context of Pretrained LLMs by Dropping Their Positional E
...
DeepSeek 新论文 Engram 深度解读
论文标题:Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Larg
...
Anthropic 提出 EDL 来量化大模型的泛化能力
论文标题:Excess Description Length of Learning Generalizable Predictors
论文链接:h
...
小米 MiMo-V2-Flash 技术报告:MoE 架构、混合注意力机制与多教师在线蒸馏
论文标题:MiMo-V2-Flash Technical Report
论文链接:https://arxiv.org/pdf/2601.02780
...
NVIDIA 提出 GDPO:面向多奖励强化学习的解耦归一化策略
论文标题:GDPO: Group reward-Decoupled Normalization Policy Optimization for Mult
...
DeepSeek-R1 v2 发布:新增技术细节与训练流程全解读
论文标题:DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforceme
...
WAIT, WAIT, WAIT... WHY DO REASONING MODELS LOOP?
论文标题:WAIT, WAIT, WAIT... WHY DO REASONING MODELS LOOP?
论文链接:https://arxiv.
...
陈丹琦团队新作:负样本强化在 LLM 推理中的有效性机制
论文标题:The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
...
深度解读 Recursive Language Models:利用递归与环境交互突破 LLM 上下文限制
论文标题:RECURSIVE LANGUAGE MODELS
论文链接:https://arxiv.org/pdf/2512.24601v1
T
...
论大语言模型强化学习训练中的 KL 正则化
论文标题:A COMEDY OF ESTIMATORS: ON KL REGULARIZATION IN RL TRAINING OF LLMS
论
...
