DeepSeek reveals novel AI technique claiming superior reasoning for language models
In collaboration with researchers from Tsinghua University, DeepSeek has created a dual technique that integrates generative reward modeling (GRM) with self-principled critique tuning, as reported by the local media on Sunday.
This new approach is intended to enable LLMs to deliver more precise and quicker responses to general inquiries, according to a paper released on Friday.
The researchers noted that the DeepSeek-GRM models demonstrated superior performance compared to current techniques, reaching "competitive performance" alongside established public reward models. Reward modeling is a method used to align the behavior of LLMs with human preferences.
DeepSeek intends to release its GRM models as open source, although a specific timeline for this initiative has not been disclosed.
The paper, which was published on the online scientific repository arXiv, has sparked increased interest in the firm’s future projects, particularly following the global attention garnered by its V3 foundation model as well as R1 reasoning model.
Legal Disclaimer:
MENAFN provides the information “as is” without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the provider above.
Legal Disclaimer:
EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.
