Are We Past Time for Transparent Standards in AI Training?
Photo by Andrea De Santis on Unsplash

Are We Past Time for Transparent Standards in AI Training?

I often look through previous articles I have written to see whether the ideas within them are still relevant or if the state of thinking has evolved in that area. The recent discussion around whether a Google AI model could be sentient as well some recent mainstream reporting on Foundation AI models, caused me to think of article I wrote last year on the hidden ethics in AI /Technology naming and how bias is still one of the major problems we face in ethical data use and AI.

As gender bias can be clearly seen to be exhibited through intelligent digital assistants and our efforts at their more physical forms as well, we have seen many instances of racial bias in intelligent systems adopted by both public and private sectors. Machines and automation greatly enhance our capability to deliver — where they have inherent bias, they greatly enhance our capability to deliver harm to our fellow humans.

Whether this bias is consciously or unconsciously built by a creator/programmer or whether it emerges through the absorption of poorly constructed, governed or understood data sets, the results can be equally disastrous. This is particularly acute in public and service sectors where such automation can serve to further entrench inequalities — even when the purpose can be to tackle such inequality.

What worried me more than whether the Lamda model at Google had become sentient (to be clear I don’t believe it had), was the idea that any emergent intelligence built on the data we would feed it now could not avoid being anything but riddled with all sorts of human bias. Worse, depending on what data it had been exposed, it could have collected a multitude of potentially harmful biases — any many more than a single person could harbour.

I believe it is time for us to establish control over how our new intelligent helpers are ‘raised and educated’. We have clear rules about what we teach our children in schools, how that information is presented and how to link different topics together. We must apply the same care and attention to our technological children as we do our human children. We should not continue to allow the creation and education of artificial intelligence to go on in the dark with opaque governance around the data it is educated with and the way in which it learns. I have never been a particular fan of anthropomorphising AI systems, but we should still treat these systems with respect and not allow them to be taught, in the dark, things are unfair, unkind or unnecessary.

So, how can we start to deal with this? We can begin by being much more transparent about the quality, lineage and context of the data sets we feed these models. I believe we should have a clear code and set of rules by which we select and approve what data is provided to our models. There are a growing number of players in this space from big general players, such as Google, Amazon and IBM to niche players you may still have heard of such as Open.AI, Sift and Clearview. Companies will over zealously guard the specifics of their advanced models and also the exact nature of the data they consume. We should not let any AI models out of the lab and into service unless we are transparently provided with the details of the data sets they have trained on and whatever context has been provided in the learning method for the model to ‘understand’ what bias may exist in the data and take that into account.

I believe it’s time for us to have independent, external oversight of AI development and hold creators to account in divulging quality, lineage and context of data sets and the controls they have in place to maintain that model training is unbiased, fair and ethical. Ambitious? Perhaps, but necessary if we are all to maintain our personal digital sovereignty.

One approach to implementing this could be self-regulation — an industry body to set the rules and keep everyone in check. However, schemes like this are voluntary, are likely to be localised in countries and generally rely on the participants wanting to be regulated. Adding powers to existing regulators is an alternative solution and would also provide a platform for a more global approach as regulators work within existing government co-operation channels. This approach may also make the rules more transparent as they are shaped by legislation and public consultation.

If you are building large models that impact decisions made about individuals by corporations or governments you can help by asking questions around how the training data is collected and governed, how safeguards are built-in and how the model is challenged and cleansed of bias. You can push your organisation to be a leader in transparency in the space, build trust with people and so be established as a provider of choice.

Start asking simple questions — is what we are building going to end up in outcome for people that are genuinely fair, kind, and necessary to those they are targeted on.

Individuals, corporations, and government need to act now — it’s not too late and taking the first incremental steps that will combine into a powerful and human-protective control is not hard.

About the Author

John Michaelides, is a Data Privacy, Security and Ethics Senior Principal with Slalom UK, a progressive consulting firm pioneering Modern Culture of Data and AI for All.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics