0xC001
分享机器学习知识
192
文章
0
评论
455
获赞
DeepSeek-R1 v2 发布:新增技术细节与训练流程全解读
论文标题:DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforceme
...
WAIT, WAIT, WAIT... WHY DO REASONING MODELS LOOP?
论文标题:WAIT, WAIT, WAIT... WHY DO REASONING MODELS LOOP?
论文链接:https://arxiv.
...
陈丹琦团队新作:负样本强化在 LLM 推理中的有效性机制
论文标题:The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
...
深度解读 Recursive Language Models:利用递归与环境交互突破 LLM 上下文限制
论文标题:RECURSIVE LANGUAGE MODELS
论文链接:https://arxiv.org/pdf/2512.24601v1
T
...
论大语言模型强化学习训练中的 KL 正则化
论文标题:A COMEDY OF ESTIMATORS: ON KL REGULARIZATION IN RL TRAINING OF LLMS
论
...
阿里新作 Let It Flow: Agentic Crafting on Rock and Roll 深度解读
论文标题:Building the ROME Model within an Open Agentic Learning Ecosystem
论文链
...
DeepSeek 新作 mHC:为何要把超连接约束在流形上?
论文标题:mHC: Manifold-Constrained Hyper-Connections
论文链接:https://arxiv.org/pd
...
字节 Seed 新作:通过辅助损失实现 MoE 专家与路由器的紧密耦合 (ERC Loss)
祝大家新年快乐~
论文标题:Coupling Experts and Routers in Mixture-of-Experts via an Auxi
...
Bottom-up Policy Optimization: 自下而上的策略优化——语言模型内部潜藏的子策略
论文标题:Bottom-up Policy Optimization: Your Language Model Policy Secretly Cont
...
Google DeepMind 新作:自回归模型中的涌现时间抽象实现了分层强化学习
论文标题:Emergent temporal abstractions in autoregressive models enable hierarch
...