Now Loading

BharatGen AI to Support All 22 Scheduled Languages by June 2026

BharatGen

India is setting an ambitious course to make artificial intelligence linguistically inclusive, aiming to extend BharatGen AI’s capabilities to all 22 languages listed in the Eighth Schedule of the Constitution by June 2026. This goal was articulated by Minister of State for Science & Technology and Earth Sciences, Jitendra Singh, during a session in Parliament, reaffirming the government’s commitment to bridging digital divides through accessible AI deployments.

Operational under the Department of Science and Technology’s National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS), the BharatGen initiative represents India’s first government-backed multimodal foundational AI model tailored to native linguistic and cultural contexts. The model is designed to integrate text, speech, and image modalities across all scheduled languages, enhancing use cases ranging from governance and agriculture to healthcare and education.

Presently, BharatGen supports nine major languages—Hindi, Marathi, Tamil, Malayalam, Bengali, Punjabi, Gujarati, Telugu, and Kannada. The roadmap envisions expanding this to include Assamese, Maithili, Nepali, Odia, Sanskrit, Sindhi, and others by December 2025, before completing the full spectrum by mid-2026. This tiered rollout underscores a deliberate staging process: first enhance foundational languages, then roll out the full spread in the next phase.

Execution is coordinated by a consortium of premier institutions, with IIT Bombay’s Technology Innovation Hub (TIH) for IoT & IoE at the helm, supported by IIT Madras, IIT Kanpur, IIIT Hyderabad, IIT Mandi, and IIM Indore. Each partner focuses on specialty areas like speech modeling, tokenization strategy, vision-language processing, inclusive model training, and benchmarking.

Beyond language coverage, BharatGen is anchored in public good: early pilot applications have been tested in sectors like agriculture, defense, and citizen services, with plans for national deployment across states and rural districts once the system scales fully. One tangible example involves AI-powered medical consultations delivered in local dialects—designed to build trust and improve outcomes in remote areas. The platform also integrates with tools like CPGRAMS, providing multilingual grievance redressal systems that enhance public engagement.

Yet developing BharatGen is not without hurdles. As IIT Bombay professor Ganesh Ramakrishnan—principal investigator on the project—has noted, challenges include limited data availability, linguistic diversity, and script variation. To address these, the team curates content through OCR from public-domain books, aligns translations via platforms like UDAAN, and sources datasets from libraries and government repositories across the consortium.

Critics and analysts view BharatGen as a critical step toward “sovereign AI” and language equity—aiming to reduce dependence on global, English-dominant AI systems and assert national control over data infrastructure. By anchoring development within Indian institutions and serving vernacular use cases, the initiative models a path for culturally contextual technological sovereignty.

As India approaches mid-2026, BharatGen’s expanded language reach might redefine how citizens engage with AI, bridging the urban–rural and English–vernacular gap. The challenge now lies in maintaining quality across languages, ensuring robust evaluation for low-resource scripts, and securing inclusive access. If successful, BharatGen may become a flagship example of AI designed not for the few, but for the many—sovereign, inclusive, and rooted in the linguistic diversity of India.

Tags: BharatGen AI

Upcoming Conferences