Paper detail

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent upon extensive human-annotated demonstrations, and models' capabilities are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically harnessed to guide and enhance the reasoning capabilities of smaller models.

preprint2026arXivOpen access
DeepSeek-AIDaya GuoDejian YangHaowei ZhangJunxiao SongPeiyi WangQihao ZhuRunxin XuRuoyu ZhangShirong MaXiao BiXiaokang ZhangXingkai YuYu WuZ. F. WuZhibin GouZhihong ShaoZhuoshu LiZiyi GaoAixin LiuBing XueBingxuan WangBochao WuBei FengChengda LuChenggang ZhaoChengqi DengChenyu ZhangChong RuanDamai DaiDeli ChenDongjie JiErhang LiFangyun LinFucong DaiFuli LuoGuangbo HaoGuanting ChenGuowei LiH. ZhangHan BaoHanwei XuHaocheng WangHonghui DingHuajian XinHuazuo GaoHui QuHui LiJianzhong GuoJiashi LiJiawei WangJingchang ChenJingyang YuanJunjie QiuJunlong LiJ. L. CaiJiaqi Nijian LiangJin ChenKai DongKai HuKaige GaoKang GuanKexin HuangKuai YuLean WangLecong ZhangLiang ZhaoLitong WangLiyue ZhangLei XuLeyi XiaMingchuan ZhangMinghua ZhangMinghui TangMeng LiMiaojun WangMingming LiNing TianPanpan HuangPeng ZhangQiancheng WangQinyu ChenQiushi DuRuiqi GeRuisong ZhangRuizhe PanRunji WangR. J. ChenR. L. JinRuyi ChenShanghao LuShangyan ZhouShanhuang ChenShengfeng YeShiyu WangShuiping YuShunfeng ZhouShuting PanS. S. LiShuang ZhouShaoqing WuTao YunTian PeiTianyu SunT. WangWangding ZengWanjia ZhaoWen LiuWenfeng LiangWenjun GaoWenqin YuWentao ZhangW. L. XiaoWei AnXiaodong LiuXiaohan WangXiaokang ChenXiaotao NieXin ChengXin LiuXin XieXingchao LiuXinyu YangXinyuan LiXuecheng SuXuheng LinX. Q. LiXiangyue JinXiaojin ShenXiaosha ChenXiaowen SunXiaoxiang WangXinnan SongXinyi ZhouXianzu WangXinxia ShanY. K. LiY. Q. WangY. X. WeiYang ZhangYanhong XuYao LiYao ZhaoYaofeng SunYaohui WangYi YuYichao ZhangYifan ShiYiliang XiongYing HeYishi PiaoYisong WangYixuan TanYiyang MaYiyuan LiuYongqiang GuoYuan OuYuduan WangYue GongYuheng ZouYujia HeYunfan XiongYuxiang LuoYuxiang YouYuxuan LiuYuyang ZhouY. X. ZhuYanping HuangYaohui LiYi ZhengYuchen ZhuYunxian MaYing TangYukun ZhaYuting YanZ. Z. RenZehui RenZhangli ShaZhe FuZhean XuZhenda XieZhengyan ZhangZhewen HaoZhicheng MaZhigang YanZhiyu WuZihui GuZijia ZhuZijun LiuZilin LiZiwei XieZiyang SongZizheng PanZhen HuangZhipeng XuZhongyu ZhangZhen Zhang
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Authors

DeepSeek-AIDaya GuoDejian YangHaowei ZhangJunxiao SongPeiyi WangQihao ZhuRunxin XuRuoyu ZhangShirong MaXiao BiXiaokang ZhangXingkai YuYu WuZ. F. WuZhibin GouZhihong ShaoZhuoshu LiZiyi GaoAixin LiuBing XueBingxuan WangBochao WuBei FengChengda LuChenggang ZhaoChengqi DengChenyu ZhangChong RuanDamai DaiDeli ChenDongjie JiErhang LiFangyun LinFucong DaiFuli LuoGuangbo HaoGuanting ChenGuowei LiH. ZhangHan BaoHanwei XuHaocheng WangHonghui DingHuajian XinHuazuo GaoHui QuHui LiJianzhong GuoJiashi LiJiawei WangJingchang ChenJingyang YuanJunjie QiuJunlong LiJ. L. CaiJiaqi Nijian LiangJin ChenKai DongKai HuKaige GaoKang GuanKexin HuangKuai YuLean WangLecong ZhangLiang ZhaoLitong WangLiyue ZhangLei XuLeyi XiaMingchuan ZhangMinghua ZhangMinghui TangMeng LiMiaojun WangMingming LiNing TianPanpan HuangPeng ZhangQiancheng WangQinyu ChenQiushi DuRuiqi GeRuisong ZhangRuizhe PanRunji WangR. J. ChenR. L. JinRuyi ChenShanghao LuShangyan ZhouShanhuang ChenShengfeng YeShiyu WangShuiping YuShunfeng ZhouShuting PanS. S. LiShuang ZhouShaoqing WuTao YunTian PeiTianyu SunT. WangWangding ZengWanjia ZhaoWen LiuWenfeng LiangWenjun GaoWenqin YuWentao ZhangW. L. XiaoWei AnXiaodong LiuXiaohan WangXiaokang ChenXiaotao NieXin ChengXin LiuXin XieXingchao LiuXinyu YangXinyuan LiXuecheng SuXuheng LinX. Q. LiXiangyue JinXiaojin ShenXiaosha ChenXiaowen SunXiaoxiang WangXinnan SongXinyi ZhouXianzu WangXinxia ShanY. K. LiY. Q. WangY. X. WeiYang ZhangYanhong XuYao LiYao ZhaoYaofeng SunYaohui WangYi YuYichao ZhangYifan ShiYiliang XiongYing HeYishi PiaoYisong WangYixuan TanYiyang MaYiyuan LiuYongqiang GuoYuan OuYuduan WangYue GongYuheng ZouYujia HeYunfan XiongYuxiang LuoYuxiang YouYuxuan LiuYuyang ZhouY. X. ZhuYanping HuangYaohui LiYi ZhengYuchen ZhuYunxian MaYing TangYukun ZhaYuting YanZ. Z. RenZehui RenZhangli ShaZhe FuZhean XuZhenda XieZhengyan ZhangZhewen HaoZhicheng MaZhigang YanZhiyu WuZihui GuZijia ZhuZijun LiuZilin LiZiwei XieZiyang SongZizheng PanZhen HuangZhipeng XuZhongyu ZhangZhen Zhang

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.