0xC001
分享机器学习知识
205
文章
0
评论
473
获赞
Agentic Reasoning for Large Language Models 综述:基础、进化与协作
论文标题:Agentic Reasoning for Large Language Models: Foundations, Evolution, Co
...
JudgeRLVR:先判断后生成——打破推理模型“长思维链”的效率悖论
论文标题:JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
论文链接:
...
你的 GRPO 的优势估计是有偏差的:GRPO 中的统计陷阱与 HA-DW 修正方案
论文标题:Your Group-Relative Advantage Is Biased
论文链接:https://arxiv.org/pdf/26
...
Meta 提出 Dr.Zero:零数据训练的自进化 Search Agent
论文标题:Dr. Zero: Self-Evolving Search Agents without Training Data
论文链接:http
...
深度解析 Ministral 3:基于级联蒸馏的参数高效密集模型训练方法论
论文标题:Ministral 3
论文链接:https://arxiv.org/pdf/2601.08584
TL;DR
Mistral AI
...
Sparse-RL:通过稳定稀疏 Rollout 突破 LLM 强化学习的显存墙
论文标题:Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via S
...
Qwen 发布 ArenaRL:解决开放域 Agent 的奖励建模难题
论文标题:ArenaRL: Scaling RL for Open-Ended Agents via Tournamentbased Relative
...
NTU & 通义提出 AgentOCR:基于光学自压缩的智能体历史重构
论文标题:AgentOCR: Reimagining Agent History via Optical Self-Compression
论文链接
...
DroPE:通过在预训练后移除位置编码扩展 LLM 上下文窗口
论文标题:Extending the Context of Pretrained LLMs by Dropping Their Positional E
...
DeepSeek 新论文 Engram 深度解读
论文标题:Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Larg
...