Mint Explainer | Why Big Tech is focusing on Indian languages

Share This Post


On Thursday, OpenAI chief executive Sam Altman unveiled GPT-5 with native support for 12 Indian languages. Last year, Google expanded its AI model Gemini’s native support for nine Indian languages. With artificial intelligence startups Anthropic and Perplexity also focusing on Indian languages, regional-language internet is fast emerging as a huge AI battleground. Mint explains why.

Why are languages important for AI firms?

Foundational AI models are trained on massive troves of data and produce responses in plain text. To do this, AI firms rely on publicly available information to train their models, and most information on the internet is in English. As a result, the world’s top AI models are all natively based on data that’s available in English. This leads to various biases in the way AI models understand user queries, which makes wider language access a fundamental necessity for AI companies.

Why are Indian languages important for AI firms?

Hindi, as per global language repository Ethnologue, is the world’s third-most spoken language, after English and Mandarin. Cumulatively, 10 Indian languages are spoken by 1.7 billion people, or 21% of the world’s population—ahead of English (with 1.5 billion speakers), and varying versions of Chinese (1.4 billion). This makes India the world’s single-largest region for tech companies to tap into. 

Beyond the numbers, experts underline that each language has its own nuance, regional dialects, biases, and complications. Indian languages, owing to their scale, are crucial resources for AI models that cater to the world.

Are all global firms targeting India?

Yes. Last week, Sam Altman said OpenAI’s latest model, GPT-5, natively supports 12 Indian languages. Last year, Google announced native support for nine Indian languages. Meta, too, said last year that its Llama family of AI models would support eight Indian languages. Anthropic’s Claude supports Hindi and Bangla. Perplexity, another prominent Silicon Valley startup, supports inputs and outputs in Hindi. 

In India, Sarvam unveiled a text-to-speech AI model trained in 11 Indian languages in May. In the same month, conversational voice startup Gnani became one of four startups selected for government backing under the India AI Mission. It announced an intent to build a 14-billion-parameter voice AI model. BharatGPT-maker CoRover and Soket are also building AI models natively trained on local languages.

How important is India in terms of business potential?

This is difficult to assess. India is one of the world’s largest user bases for any AI firm. However, diverse consumer behaviour makes it difficult to monetize this market. As a result, India’s contribution to the net revenue of global tech firms has only ranged between 1% and 4%. 

AI-first companies, however, are of the opinion that they can incrementally add to the way global companies have generated revenue from India, as most AI tools and platforms need enterprise-grade subscriptions to leverage AI. With a vast base of users, most tech firms expect India to become a major monetization hub.

 

Can AI see the replication of India’s DPI push?

India, through the government’s backing, is keen to build foundational models trained natively on Indian languages. Startups and industry veterans state that in the long run, an AI model trained on most Indian languages can be used as a template for other non-English AI models around the world. 

This, in the long run, could be akin to India’s push to offer digital public infrastructure (DPI) to the world—which it did in digital payments via the unified payments interface (UPI). While other nations are also building their own sovereign AI models, India believes it can gain soft power by offering AI models to the global south.



Source link

Related Posts

- Advertisement -spot_img