DeepSeek Signals Next AI Model Direction With New Training Research
DeepSeek releases new technical research proposing a cost-efficient way to train large AI models, a signal that the Chinese startup is refining the foundations for its next wave of releases amid global compute constraints.
Topics
News
- TCS Sets Sights on Becoming World’s Largest AI-Led IT Services Firm
- OpenAI Rebuilds Around Audio in Push Toward Voice-First AI
- DeepSeek Signals Next AI Model Direction With New Training Research
- Musk’s xAI Buys Third Data Center Site to Expand Computing Capacity
- AI May Add $1.7 Trillion to India’s Economy by 2035
- Reliance lays out draft AI manifesto, signals shift to AI-native operating model
Chinese AI startup DeepSeek has published a new research paper outlining a training technique that could shape its next generation of large language models, offering fresh clues to how the company plans to scale AI systems under tight compute constraints.
The paper, co-authored by DeepSeek founder and CEO Liang Wenfeng, introduces a method called Manifold-Constrained Hyper-Connections (mHC), aimed at improving the efficiency and scalability of foundational AI models.
The approach targets a central challenge for AI developers outside the US: how to train increasingly capable models without relying on massive, high-cost computing infrastructure.
Developed by a 19-member research team in Hangzhou, the technique is designed to reduce memory and compute demands while preserving model performance.
In experiments, the researchers tested mHC on models ranging from 3 billion to 27 billion parameters, finding that the method scaled reliably while adding “negligible computational overhead”.
“Empirical results confirm that mHC enables stable large-scale training with superior scalability compared with conventional hyper-connections,” the authors wrote, attributing the gains to “efficient infrastructure-level optimisations.”
DeepSeek’s research output is closely watched across the AI industry, as the company’s technical papers have often foreshadowed architectural choices behind its subsequent model releases.
Analysts said such publications frequently serve as early indicators of DeepSeek’s internal roadmap.
The latest publication also highlights Liang Wenfeng’s continued hands-on involvement in DeepSeek’s core research, despite his otherwise low public profile.
Liang was listed as the final author and personally uploaded the paper to arXiv, an open-access repository where he has previously posted the company’s most important technical work, including research tied to its R1 and V3 models.
The paper builds on earlier work by ByteDance researchers, who in September 2024 proposed hyper-connections as an enhancement to ResNet, a foundational deep learning architecture introduced in 2015 by Microsoft Research Asia scientists, including renowned computer scientist He Kaiming.
ResNet underpins some of the world’s most influential AI systems, from OpenAI’s GPT models to Google DeepMind’s Nobel Prize-winning AlphaFold.
While hyper-connections helped address long-standing training issues in deep networks, DeepSeek argues that prior approaches failed to fully account for rising memory and cost constraints, limiting their real-world scalability.
Its mHC method introduces an additional “manifold constraint” to keep compute and memory usage in check while preserving performance.
“mHC will help address current limitations and potentially illuminate new pathways for the evolution of next-generation foundational architectures,” the researchers said.
The release has fuelled industry speculation that DeepSeek may be preparing a major model update in the coming weeks, following a pattern seen last year when the company released R1 ahead of a national holiday.
DeepSeek has not announced any launch timeline.