International Relations |
---|
|
What’s the ongoing story:
On Monday, the stock market opened with a massive dip, especially the tech-heavy Nasdaq, which dropped by about 3 per cent. This is its worst performance in the last two years. This drop has been attributed to the meteoric rise of Chinese AI startup DeepSeek, which has in the last few weeks grabbed global attention after it unveiled its AI models — DeepSeek-V3 and DeepSeek-R1, a reasoning model. Key Takeaways: • Owing to its optimal use of scarce resources, DeepSeek has been pitted against US AI powerhouse OpenAI, as it is widely known for building large language models. DeepSeek-V3, one of the first models unveiled by the company, earlier this month surpassed GPT-4o and Claude 3.5 Sonnet in numerous benchmarks. • DeepSeek-V3 stands out because of its architecture, known as Mixture-of-Experts (MOE). • The MOE models are like a team of specialist models working together to answer a question, instead of a single big model managing everything. • The DeepSeek-V3 model is trained on 14.8 trillion tokens, which includes large, high-quality datasets that offer the model greater understanding of language and task-specific capabilities. • Additionally, the model uses a new technique known as Multi-Head Latent Attention (MLA) to enhance efficiency and cut costs of training and deployment, allowing it to compete with some of the most advanced models of the day. • Even as the AI community was marveling at the DeepSeek-V3, the Chinese company launched its new model, DeepSeek-R1. The new model comes with the ability to think, a capability that is also known as test-time compute. The R1 model has the same MOE architecture, and it matches, and often surpasses, the performance of the OpenAI frontier model in tasks like math, coding, and general knowledge. R1 is reportedly 90-95 per cent more affordable than OpenAI-o1. • The R1, an open-sourced model, is powerful and free. While O1 is a thinking model that takes time to mull over prompts to produce the most appropriate responses, one can see R1’s thinking in action, meaning the model, while producing the output to the prompt, also shows its chain of thought. • R1 arrives at a time when industry giants are pumping billions into AI infrastructure. DeepSeek has essentially delivered a state-of-the-art model that is competitive. Moreover, the company has invited others to replicate their work by making it open-source. • The release of R1 raises serious questions about whether such massive expenditures are necessary and has led to intense scrutiny of the industry’s current approach. Do You Know: • DeepSeek is a Chinese AI company based out of Hangzhou founded by entrepreneur Liang Wenfeng. He is also the CEO of quantitative hedge fund High Flyer. Wenfeng reportedly began working on AI in 2019 with his company, High Flyer AI, dedicated to research in this domain. DeepSeek has Wenfeng as its controlling shareholder, and according to a Reuters report, HighFlyer owns patents related to chip clusters that are used for training AI models. • What sets DeepSeek models apart is their performance and open-sourced nature with open weights, which essentially allows anyone to build on top of them. The DeepSeek-V3 has been trained on a meager $5 million, which is a fraction of the hundreds of millions pumped in by OpenAI, Meta, Google, etc., into their frontier models. |
>> More UPSC Current Affairs |