How open data can save AI

This post originally appears on the Web Foundation website and was written by Web Foundation Policy Fellow Juan Ortiz Freuler & Policy Director Craig Fagan.

Artificial intelligence (AI) is reshaping our world — changing our economies (think Uber and driverless cars), our politics (think Cambridge Analytica and targeted political adverts), and our societies (think predictive policing). While AI systems offer a host of benefits, the risks cannot be ignored. Two recent Web Foundation studies highlight both the risks and opportunities, looking at how AI and algorithms are impacting low and middle-income countries.

AI and algorithms require the same basic fuel to work: data. AI systems acquire their intelligence by using data to understand how past problems were solved, and applying this learning to decision-making. The availability of quality data is crucial for AI systems to learn and function effectively.

Data is the fuel that drives AI

Though AI has been around for decades, the discipline is going through a new spring, with a surge of research and development triggered by a combination of greater data availability and lower-cost, more powerful computers. While existing data is being digitised, falling costs allow chips to be inserted into everyday objects — from exercise bands to public transport — generating entirely new sources of data. Giving a sense of the exponential growth in volume and varieties of data, some estimate that 90% of all the data in the world has been created in the last two years.

Garbage in, garbage out

But the data that is being produced and fed to AI systems is frequently incomplete, biased, or otherwise of poor quality. Unless AI algorithms are designed to be aware that they are being fed imperfect data and take steps to account for this, bad data will undermine the learning process and skew the outputs of AI systems. This can lead to devastating and often discriminatory outcomes, particularly as AI is increasingly used to make decisions in important areas of our lives.

Better data is one way to address these risks. Offering data under an open license (i.e., one that allows data to be freely used, modified, and shared by anyone for any purpose) is an effective strategy towards achieving quality.

Through our Open Data Barometer — which assesses the openness and quality of government data — we are continuously monitoring whether government datasets are being proactively disclosed in key sectors, including education, health, the environment, and public expenditure. The latest edition suggests that most government datasets — more than in 9 in 10 — are still not open.

Open data to improve AI

One way to make more data available and to improve data quality is to push governments that use algorithms and AI systems for public service delivery to open up the data upon which these systems rely. All non-personally identifiable datasets used should be released in open formats. When datasets are considered too sensitive for release, appropriate metadata should be provided. Opening key datasets will help identify potential biases, lead to more competition between potential service providers, ensure better public services, and increase citizen trust in government.

As governments adopt algorithms and AI systems to improve service delivery, we should take steps to ensure this is done in a transparent way that reassures citizens that these systems will produce fair outcomes, as well as higher quality services. Making the underlying data available is a first step towards public understanding of how public service AI systems make decisions.

If AI systems are to reshape the world, we must have the opportunity to shape them. And a wider group of people must have access to datasets necessary to build AI technologies. The Web Foundation will continue to explore these issues and how to tackle them.

If you are interested in how governments use AI, algorithms or statistical models, please complete our 4-question survey (also available in Spanish) to share your expertise and help shape our research on governments’ use of data and algorithms.

The Web Foundation recently published research mapping academic and commercial AI activity underway in various African countries. For updates on our research, follow us on Twitter at @webfoundation and sign up to our email newsletters.

Data is the fuel that drives AI

Garbage in, garbage out

Open data to improve AI

Leave a Reply Cancel reply