[返回军事纵横首页]·[所有跟帖]·[ 回复本帖 ] ·[热门原创] ·[繁體閱讀]·[版主管理]
Deepseek r1 对比 OpenAI o1
送交者: 张旺教授[☆★★声望品衔12★★☆] 于 2025-01-25 8:40 已读 971 次 1 赞  

张旺教授的个人频道

Notes on Deepseek r1: Just how good it is compared to OpenAI o1

Discussion



6park.com

Finally, there is a model worthy of the hype it has been getting since Claude 3.6 Sonnet. Deepseek has released something anyone hardly expected: a reasoning model on par with OpenAI’s o1 within a month of the v3 release, with an MIT license and 1/20th of o1’s cost.

This is easily the best release since GPT-4. It's wild; the general public seems excited about this, while the big AI labs are probably scrambling. It feels like things are about to speed up in the AI world. And it's all thanks to this new DeepSeek-R1 model and how they trained it. 

Some key details from the paper

Pure RL (GRPO) on v3-base to get r1-zero. (No Monte-Carlo Tree Search or Process Reward Modelling)

The model uses “Aha moments” as pivot tokens to reflect and reevaluate answers during CoT.

To overcome r1-zero’s readability issues, v3 was SFTd on cold start data.

Distillation works, small models like Qwen and Llama trained over r1 generated data show significant improvements.

Here’s an overall r0 pipeline

v3 base + RL (GRPO) → r1-zero

r1 training pipeline.

DeepSeek-V3 Base + SFT (Cold Start Data) → Checkpoint 1

Checkpoint 1 + RL (GRPO + Language Consistency) → Checkpoint 2

Checkpoint 2 used to Generate Data (Rejection Sampling)

DeepSeek-V3 Base + SFT (Generated Data + Other Data) → Checkpoint 3

Checkpoint 3 + RL (Reasoning + Preference Rewards) → DeepSeek-R1

We know the benchmarks, but just how good is it?
Deepseek r1 vs OpenAI o1.

So, for this, I tested r1 and o1 side by side on complex reasoning, math, coding, and creative writing problems. These are the questions that o1 solved only or by none before.

Here’s what I found:

For reasoning, it is much better than any previous SOTA model until o1. It is better than o1-preview but a notch below o1. This is also shown in the ARC AGI bench.

Mathematics: It's also the same for mathematics; r1 is a killer, but o1 is better.

Coding: I didn’t get to play much, but on first look, it’s up there with o1, and the fact that it costs 20x less makes it the practical winner.

Writing: This is where R1 takes the lead. It gives the same vibes as early Opus. It’s free, less censored, has much more personality, is easy to steer, and is very creative compared to the rest, even o1-pro.

What interested me was how free the model sounded and thought traces were, akin to human internal monologue. Perhaps this is because of the less stringent RLHF, unlike US models.

The fact that you can get r1 from v3 via pure RL was the most surprising.

For in-depth analysis, commentary, and remarks on the Deepseek r1, check out this blog post: Notes on Deepseek r1

What are your experiences with the new Deepseek r1? Did you find the model useful for your use cases? 6park.com

喜欢张旺教授朋友的这个贴子的话, 请点这里投票,“赞”助支持!
[举报反馈]·[ 张旺教授的个人频道 ]·[-->>参与评论回复]·[用户前期主贴]·[手机扫描浏览分享]·[返回军事纵横首页]
帖子内容是网友自行贴上分享,如果您认为其中内容违规或者侵犯了您的权益,请与我们联系,我们核实后会第一时间删除。

所有跟帖:        ( 主贴楼主有权删除不文明回复,拉黑不受欢迎的用户 )


用户名:密码:[--注册ID--]

标 题:

粗体 斜体 下划线 居中 插入图片插入图片 插入Flash插入Flash动画


     图片上传  Youtube代码器  预览辅助

打开微信,扫一扫[Scan QR Code]
进入内容页点击屏幕右上分享按钮

楼主本栏目热帖推荐:

>>>>查看更多楼主社区动态...






[ 留园条例 ] [ 广告服务 ] [ 联系我们 ] [ 个人帐户 ] [ 版主申请 ] [ Contact us ]