0xC001
分享机器学习知识
152
文章
0
评论
351
获赞
陈丹琦团队新作 Retaining by Doing:揭示 RL 比 SFT 为什么更能缓解灾难性遗忘
论文标题:Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting
...
微软提出 GAD:通过生成对抗蒸馏方法实现 On-Policy 蒸馏 GPT-5
论文标题:Black-Box On-Policy Distillation of Large Language Models
论文链接:https:
...
LightReasoner:利用小模型引导大模型推理的对比学习框架
论文标题:LIGHTREASONER: CAN SMALL LANGUAGE MODELS TEACH LARGE LANGUAGE MODELS RE
...
WeiBo AI 推出 1.5B 小模型,成本实现 SOTA 级推理
论文标题:Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Mode
...
Meta FAIR 推出 HERO:LLM 强化中集成稀疏与密集奖励
论文标题:Hybrid Reinforcement: When Reward Is Sparse, It’s Better to Be Dense
...
专挑模型的“软肋”下手:阿里 MIWV 如何实现用1%数据超越全量微调?
论文标题:Importance-Aware Data Selection for Efficient LLM Instruction Tuning
...
Meta AI 推出 RIFL:基于准则的强化学习来提升 LLM 指令遵循能力
论文标题:Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM
...
NeurIPS 2025 满分论文:LLM 强化学习的上限已被基座锁死了
论文标题:Does Reinforcement Learning Really Incentivize Reasoning Capacity in LL
...
小红书推出 RedOne 2.0:SNS 领域大模型后训练实践指南
论文标题:RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Netw
...
EMNLP 2025 主会论文解读:Towards Automated Error Discovery
论文链接:https://arxiv.org/pdf/2509.10833
论文标题:Towards Automated Error Discove
...