0xC001
分享机器学习知识
174
文章
0
评论
405
获赞
NVIDIA 提出端到端 RL 编排,8B 模型在 HLE 榜单超越 GPT-5
论文标题:ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orch
...
LayerNorm 真的不可或缺吗?一文读懂超越归一化层的 Derf
论文标题:Stronger Normalization-Free Transformers
论文链接:https://arxiv.org/pdf/2
...
UBC & DeepMind 揭示“短上下文主导”现象:80%的生成任务只需最后96个Token
论文标题:SHORT-CONTEXT DOMINANCE: HOW MUCH LOCAL CONTEXT NATURAL LANGUAGE ACTUAL
...
谷歌 DeepMind & MIT 发布智能体 Scaling Law
论文标题:Towards a Science of Scaling Agent Systems
论文链接:https://arxiv.org/pdf
...
Native Parallel Reasoner: 基于自蒸馏强化学习的原生并行推理框架
论文标题:Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled R
...
预训练、中期训练与强化学习在推理模型中的相互作用
论文标题:On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Lan
...
Differential Smoothing——缓解 RL 微调中的分布坍缩并提升 LLM 推理能力
论文标题:Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning
...
Natural Language Actor-Critic: 语言空间中的可扩展异策略学习 (NLAC)
论文标题:Natural Language Actor-Critic: SCALABLE OFF-POLICY LEARNING IN LANGUAGE
...
AAAI 2026:DeltaEdit 实现 LLM 连续知识编辑
论文标题:On the Superimposed Noise Accumulation Problem in Sequential Knowledge
...
复现 Search-R1 总是失败?GRPO 训练不稳定的幕后真凶与对策
论文标题:On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death S
...