Engineers behind the viral Chinese artificial intelligence (AI) reasoning model DeepSeek-R1 have unveiled the deep science behind its training.
Upon its release in January, the open-source model developed by Hangzhou-based AI start-up DeepSeek sent shock waves through the industry when it became a challenger to US-based OpenAI’s industry-leading o1 model.
Now, the DeepSeek AI team has revealed how they used rewards to train their R1 model to solve problems, allowing them to bypass some of the costly computational and scaling barriers to teaching AI models to reason like humans.
“General reasoning represents a long-standing and formidable challenge in artificial intelligence,” the team said in a paper published in the peer-reviewed journal Nature on Wednesday.
Reasoning, or the logical process of using existing knowledge and new information to form conclusions, is a cornerstone of human cognition.
It allows for the execution of complex cognitive tasks, including mathematical problem solving, making it a key element in developing more advanced, humanlike AI.