The latest speculation about DeepSeek-R2 – the successor to the R1, reasoning model, which was released in January – that surfaced over the weekend included the product’s imminent launch and the purported new benchmarks it set for cost-efficiency and performance.
That reflects heightened online interest in DeepSeek after it generated worldwide attention from late December 2024 to January by consecutively releasing two advanced open-source AI models, V3 and R1, which were built at a fraction of the cost and computing power that major tech companies typically require for large language model (LLM) projects. LLM refers to the technology underpinning generative AI services such as ChatGPT.
According to posts on Chinese stock-trading social-media platform Jiuyangongshe, R2 was said to have been developed with a so-called hybrid mixture-of-experts (MoE) architecture, with a total of 1.2 trillion parameters, making it 97.3 per cent cheaper to build than OpenAI’s GPT-4o.
MoE is a machine-learning approach that divides an AI model into separate sub-networks, or experts – each focused on a subset of the input data – to jointly perform a task. This is said to greatly reduce computation costs during pre-training and achieve faster performance during inference time.
In machine learning, parameters are the variables present in an AI system during training, which helps establish how data prompts yield the desired output.