Open Source is a Good Start for India (2024)

Last updated May 29, 2024
In

India should focus on how to solve ground level problems. Open source is the answer to all.

Share

Open Source is a Good Start for India (1)

Illustration by Nikhil Kumar

Published onMay 29, 2024

bySagar Sharma

Open Source is a Good Start for India (2)

Open Source is a Good Start for India (3)

Cropin’s Aksara AI model is a perfect example of how you can build a solution on top of open-source models. Aksara is a micro language model built on top of Mistral-7B-v0.1, aiming to democratise agricultural knowledge to empower farmers.

There are other models like OpenHathi and Tamil LLaMA that are built on open-source models trying to break the language barrier.

Sure, there are initiatives and companies that are building LLMs from scratch in India, like Mahindra’s Project Indus, Sarvam AI, and Krutrim AI, but they have yet to be released to the general public, and for now, open-source LLMs are the only way forward.

As Nandan Nilekani rightly pointed out, India’s focus should be on using AI to make a difference in people’s lives. “We are not in the arms race to build the next LLM, let people with capital, let people who want to pedal ships do all that stuff… We are here to make a difference, and our aim is to put this technology in the hands of people,” Nilekani said.

Multiple Languages? Open Source is Here to Help

Apart from cost and other resources, having 22 official languages and hundreds of dialects is a major challenge in building an AI model for India. Here’s where the core features of open source come into play.

To solve this issue, India can use MoE (Mixture of Experts) to blend available language-specific models like Tamil LLaMA and Kannada LLaMA to create one multilingual model running on minimal resources, solving the language barrier problem.

Also, it is quite easy to train your model when you have an existing one in a neighbouring language. For example, if you want to train a model in the Avadhi language and you have available LLMs for Hindi, then taking it forward for Avadhi will be quite easy compared to building it from scratch.

Open-source LLMs like BLOOM and IndicBERT, which are already pre-trained in multiple Indian languages, are a perfect example of how easy it will be to jumpstart the development of multilingual LLMs.

Initiatives like Core ML from Wadhwani AI are supposedly working on creating reusable libraries and open-source their data and code so that their efforts can be reused for further development.

Another good example is how Google/Flan-T5-XXL was used for legal text analysis, specifically focusing on the Indian Constitution. This is yet another direct indication of how open source is helping Indian citizens.

Costs are reduced drastically

Training a large model like GPT-3 from scratch is estimated to cost anywhere from $4 to $10 million or more, and some models are on par or better than GPT-3 for free. For any developing country like India, it will make sense to use such open-source models rather than spending millions (or billions) on training in 22 languages.

Research shows that data scientists spend almost 50% of their time cleaning their data. This becomes an even worse problem when you deal with multiple Indian languages and dialects with their own quirks, taking into account sarcasm, ambiguity, and irony.

Opting for an open-source model with pre-trained data saves a lot of time to build something helpful around it. When you want to build something around open-source LLMs, you have the advantage of transfer learning, where you use data captured by pre-trained models through training on large datasets that can be transferred to new tasks. This can help a lot of new Indian AI startups that don’t have enough resources to train data from scratch.

For India, building AI from scratch while using open-source LLMs parallelly makes sense, as you are already leveraging AI to solve problems. However, building LLMs from scratch can also brighten the Indian AI ecosystem.

When you work with an open-source model, it is already pre-trained, and there’s flexibility in training it further in a specific language and dialect. Furthermore, users worldwide can contribute to your project with datasets that never made it to your list, making it way more robust than a closed-source model.

Access all our open Survey & Awards Nomination forms in one place

Sagar Sharma

A software engineer who loves to experiment with new-gen AI. He also happens to love testing hardware and sometimes they crash. While reviving his crashed system, you can find him reading literature, manga, or watering plants.

Mixture of Experts Will Power the Next Generation of Indic LLMs

Sagar Sharma27/05/2024

6 Techniques to Reduce Hallucinations in LLMs

Sukriti Gupta19/05/2024

Why Ollama is Good for Running LLMs on Computer

Sagar Sharma17/05/2024

10 Free Courses to Build AI Agents in 2024

Sukriti Gupta17/05/2024

5 Ways to Run LLMs Locally on a Computer

Sagar Sharma13/05/2024

Shyam Nandan Upadhyay10/05/2024

The Relevance of RAG in the Era of Long-Context LLMs

Sukriti Gupta06/05/2024

GPT-4 Beats Human Psychologists in Understanding Complex Emotions

Sukriti Gupta30/04/2024

Open Source is a Good Start for India (14)

Open Source is a Good Start for India (15)

Open Source is a Good Start for India (17)

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Join Today >>

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Adobe vs Canva for Enterprise

Vidyashree Srinivas

While Adobe boasts a longer history and established presence in the industry, Canva’s rapid revenue growth and market acceptance position it as a formidable contender.

Open Source is a Good Start for India

Sagar Sharma

Yann LeCun Puts a Brake on Musk’s AGI Timeline

Siddharth Jindal

Top Editorial Picks

Meta Introduces Vision Language Models, Shows Superior Performance Over Traditional CNNs

Sukriti Gupta

Zscaler CEO Denies Broadcom Acquisition Rumours

Shyam Nandan Upadhyay

GPT-5 is Arriving Sooner Than Ever

Vandana Nair

Microsoft Launches Telegram Bot Powered by Copilot

Vidyashree Srinivas

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Also in News

Intuit is Certified as a Best Firm for Data Scientists

Hating on Python is Literally a Skill Issue

Observability Tools Can Now Monitor LLMs: New Relic CEO

8 Online AI Tools for Creating PPTs In Seconds

QX Lab AI Launches Hybrid Generative AI Multimodal Platform Ask QX PRO

India is Likely to Develop its Foundational Model This Year

Vishal Bhola Appointed President of Nothing India

How Oracle is Fueling India’s GenAI Ambitions

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration withNVIDIA.

Join the Community >>

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

GenAI
Corner

View All

Wipro Brings ‘Parallel Reality’ to Airports

Meet the Creator of Sanskriti Bench, Building Cultural AI for India with Hugging Face and GitHub

Google AI Overview Changes Internet Forever, Pushes Only High-Quality Content

Zoho Invests in Drone Startup Yali Aerospace to Solve Emergency Medical Deliveries

How Microsoft HoloLens Bridges the Healthcare Gap in Rural India

Apple Introduces Denoising LM for Correcting Errors in ASR Systems

How This Crypto-Miner Turned AI Hyperscaler