C-MInDS Logo
Natural Language Processing

Natural Language ProcessingUnderstanding Human Language

Developing advanced language models and conversational AI systems with special focus on Indian languages and multilingual applications

🇮🇳
Indian Languages
🧠
Large Language Models
💬
Conversational AI
📊
Text Analytics
Research Overview

Bridging Language Barriers with AI

Our NLP research focuses on making AI accessible to India's diverse linguistic population while advancing the global state-of-the-art in language understanding.

🌐

Multilingual Language Models

Developing large language models that understand and generate text in multiple Indian languages with cultural context.

Key Applications:

Hindi-English Translation
Code-Mixed Text Processing
Cross-lingual Information Retrieval
Multilingual Chatbots
💬

Conversational AI

Building intelligent dialogue systems and chatbots that can engage in natural, contextual conversations.

Key Applications:

Customer Service Bots
Educational Assistants
Healthcare Chatbots
Voice Assistants
📊

Text Analytics & Mining

Extracting insights from large-scale text data for business intelligence and social media analysis.

Key Applications:

Sentiment Analysis
Topic Modeling
Social Media Mining
Document Classification
🔍

Information Extraction

Automatically extracting structured information from unstructured text documents and web content.

Key Applications:

Named Entity Recognition
Relation Extraction
Knowledge Graph Construction
Document Understanding

Indian Languages Coverage

We're building comprehensive NLP capabilities for India's major languages, covering over 1.2 billion speakers across the subcontinent.

Hindi

600M+

Production Ready

Bengali

300M+

Production Ready

Telugu

95M+

Production Ready

Marathi

85M+

Production Ready

Tamil

80M+

Production Ready

Gujarati

60M+

In Development

Kannada

50M+

In Development

Malayalam

35M+

In Development

Punjabi

30M+

Research Phase

Odia

45M+

Research Phase

Major Research Achievements

Our groundbreaking research has led to several landmark contributions in multilingual NLP and Indian language processing.

IndicBERT

First multilingual BERT model for 12 Indian languages, achieving state-of-the-art performance on multiple NLP tasks.

2020
Impact:
Used by 500+ researchers globally

AI4Bharat Initiative

Leading national initiative to democratize AI for Indian languages with open-source tools and datasets.

2021
Impact:
10M+ downloads of language models

Multilingual Speech Recognition

Advanced ASR systems supporting code-mixed speech in Indian languages with 95%+ accuracy.

2022
Impact:
Deployed in 100+ applications

IndicGPT

Large generative language model for Indian languages with 13B parameters, supporting creative and informative text generation.

2023
Impact:
1M+ active users

Active Research Projects

Our NLP projects focus on making AI accessible to India's diverse linguistic population while advancing global language understanding capabilities.

IndicBERT: Multilingual Language Model

Production

Large-scale multilingual BERT model supporting 12 Indian languages with state-of-the-art performance.

Funding:₹4.5 Cr
Duration:2019-2023
Partners:
AI4BharatGoogle Research
10M+ downloads globally

Code-Mixed Speech Recognition

Active

Advanced ASR system for Hindi-English code-mixed speech with 95%+ accuracy in real-world scenarios.

Funding:₹2.8 Cr
Duration:2021-2024
Partners:
Microsoft ResearchIIT Delhi
100+ applications deployed

Multilingual Conversational AI

Active

Intelligent chatbot system supporting natural conversations in multiple Indian languages.

Funding:₹3.5 Cr
Duration:2022-2025
Partners:
FlipkartPaytm
1M+ active users

NLP Publications

Our NLP research focuses on multilingual systems, Indian languages, and culturally-aware language technologies.

Conference

IndicGPT: Large Language Model for Indian Languages with Cultural Context

89+
citations
Authors: Pushpak Bhattacharyya, Preethi Jyothi, Ganesh Ramakrishnan
ACL 2024 2024

We present IndicGPT, a 13B parameter large language model specifically designed for Indian languages, incorporating cultural context and achieving state-of-the-art performance on multilingual tasks.

Large Language ModelsIndian LanguagesMultilingual NLPCultural Context
Impact:1M+ active users
Conference

Code-Mixed Sentiment Analysis: A Transformer-Based Approach for Social Media

67+
citations
Authors: Preethi Jyothi, Radhika Mamidi, Monojit Choudhury
EMNLP 2023 2023

A novel transformer architecture for sentiment analysis in code-mixed text, achieving 15-20% improvement over existing methods on Hindi-English and other Indian language pairs.

Code-MixingSentiment AnalysisSocial MediaTransformers
Impact:Deployed in 5+ social platforms
Journal

Multilingual Question Answering for Low-Resource Indian Languages

43+
citations
Authors: Ganesh Ramakrishnan, Pushpak Bhattacharyya, Sunita Sarawagi
TACL 2024

A comprehensive framework for multilingual question answering that leverages cross-lingual transfer learning to support low-resource Indian languages with limited training data.

Question AnsweringLow-Resource LanguagesCross-lingual TransferIndian Languages
Impact:Educational AI in 10 states
Conference

IndicBERT: A Pre-trained Language Model for Indian Languages

450+
citations
Authors: Divyanshu Kakwani, Anoop Kunchukuttan, Mitesh M. Khapra
Findings of ACL 2020 2020

The first multilingual BERT model for 12 major Indian languages, achieving state-of-the-art performance on multiple downstream NLP tasks and enabling AI for 1.2B+ speakers.

Multilingual BERTIndian LanguagesPre-trained ModelsTransfer Learning
Impact:Most cited Indian NLP paper

Top Publication Venues

Our NLP research appears in premier computational linguistics venues

25+
ACL
Computational Linguistics
22+
EMNLP
Empirical Methods
18+
NAACL
North American ACL
12+
TACL
Top Journal
8+
CL
Computational Linguistics
15+
COLING
International Conference
200+
Total Papers
12,000+
Total Citations
500M+
Speakers Supported

Research Team

Meet our natural language processing research team.

Team component coming soon...