'How DeepSeek-R1 Was Built, for Dummies' bookmarked by ~hackernews | Tefter

January 27

How DeepSeek-R1 Was Built, for Dummies

vellum.ai

TLDR;DeepSeek just made a breakthrough: you can train a model to match OpenAI o1-level reasoning using...
TLDR;DeepSeek just made a breakthrough: you can train a model to match OpenAI o1-level reasoning using pure reinforcement learning (RL) without using labeled data (DeepSeek-R1-Zero). But RL alone isn’t perfect — it can lead to challenges like poor readability. A mix of methods in a multi-stage training fixes these (...
1

More like this (1)

March 5

QwQ-32B: Embracing the Power of Reinforcement Learning

qwenlm.github.io

QWEN CHAT Hugging Face ModelScope DEMO DISCORDScaling Reinforcement Learning (RL) has the potential to enhance model...
QWEN CHAT Hugging Face ModelScope DEMO DISCORDScaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R...
1