The development of artificial intelligence in India faces challenges, as MeitY points out

As the dust settles on the general elections and the allocation of Union ministries, the focus will now turn to upcoming regulations and policies that have been stuck in the pipeline over the past year.

Notification of the provisions of the Digital Personal Data Protection Act 2023, development of guidelines for artificial intelligence (AI), amendment of the IT Act and development of a framework under the Digital India initiative will be the top priorities of the Ministry of Electronics and IT (MeitY). Even after MeitY’s successful push for consensus on digital public infrastructure (DPI) during India’s G20 chairmanship, comprehensive policies and regulations to support digitalization in India have lagged behind.

Digitization has enabled detailed data collection across sectors through high-performance computing and network effects. However, there are concerns that a few large technology players will monopolize these data sets. In the AI-driven economy, digital platforms with significant proprietary data have a competitive advantage by having the resources to ingest data at scale. On the other hand, startups have difficulty collecting and refining data for AI applications.

Similarly, government agencies have started setting data standards and publishing datasets through the Open Government Data (OGD) platform in India, providing access to open data from 165 government departments across 33 sectors. However, the OGD platform is affected by quality issues, different metadata schema and standardization, and a lack of high-value data, which makes preparing and engineering this data costly for AI developers.

Even the penetration of 330 million unique users through UPI-enabled digital payments has unlocked the potential for large-scale financial data collection in India. This is further developed within the Account Aggregator (AA) platform, a consent-based data sharing mechanism for the financial sector. Recently, Sesame, India’s first large language model (LLM), designed specifically for the BFSI sector, was unveiled, but it would require deeper penetration of UPI and consent from more users to share financial data through the AA framework to effectively access diverse user data.

In this context, creating opportunities for open and accessible public data can be a game-changer in the development of the Indian AI ecosystem. With the rapid adoption of applications like ChatGPT, developers building AI for India are now looking for data representation, i.e. ensuring accurate representation of India’s diverse population in LLMs developed for Indian use cases.

India has made significant progress in developing the open data ecosystem. To eliminate language barriers and biases in overseas-developed AI models, IIT Madras’ ‘AI4Bharat’, Sarvam AI’s ‘OpenHathi series’ and MeitY’s ‘BhashaDaan Initiative’ are focusing on developing open source datasets, tools, models and applications for Indian languages. Zomato’s ‘Weather Union’ and ‘Zomato Food Trends’ and its real-time Namma Yatri dashboard show that the private sector intends to contribute to open data.

At the state level, Tamil Nadu and Punjab are developing state-specific data sharing policies, while Odisha and Karnataka are promoting open data through policies. Public transport agencies in Bengaluru have unveiled plans to make real-time public transport data available to start-ups to use in building mobility-as-a-service applications. However, concerted efforts are needed at the national level to harness the potential of open data for responsible AI development in India.

To start, the government must convene states, industry, academia, and other partners across the public data ecosystem to develop technical guidelines, quality standards, and curation methods for publishing AI-ready open data for a wide range of use cases. The creation of a National Data Management Office (NDMO) under the jurisdiction of MeitY, as proposed in the Digital India Bill, is a step in the right direction, but emphasis needs to be placed on ecosystem-wide consultations to ensure multilateralism in AI development.

To promote the availability of good quality data, a good benchmark to aim for is the EU Metadata Quality Panel, which helps data providers assess their metadata on various indicators such as accessibility, interoperability and reusability. Increasing DPI penetration in India may also provide the means to implement these metrics, with technology architecture consultancies such as the Center for DPI (CDPI) providing foundational knowledge on creating open data sets – advocating for a federated design to avoid data centralization and using open standards and APIs for interoperability. The underlying data infrastructure can be designed using these principles to enable stakeholders to effectively contribute to open data initiatives.

However, these measures would be ineffective without making necessary changes to the DPDP Act, 2023 and the IT Rules, 2020 to counter consumer concerns arising from blatant data theft by Big Tech. They should be prioritized to ensure the security of your personal data online and aim to ensure that you can use digital platforms without being exposed to poor data collection practices and copyright infringement.

Finally, as part of the India AI program, the government is also working on creating a data collection platform in the public-private partnership model, which will be able to collect the largest set of anonymized data. However, it is suggested that restrictions will be placed on sharing these datasets only with companies that the government deems trustworthy to curb disinformation, deepfakes and AI bias. This is a highly ineffective method of preventing AI-related harm because it is difficult to ensure compliance of AI applications at scale and severely limits the potential for equitable use of open data sets. Instead, a better solution would be to create independent fact-checking organizations that could immediately report and flag misinformation or deepfakes on platforms – minimizing government intervention and adding no obstacles to AI innovation.

(Rohan Pai is a senior research analyst at the Aapti Institute.)

Disclaimer: The views expressed above are those of the author. They do not necessarily reflect the views of DH.