Our growing presence in Machine Learning and AI means we need a Data Ethics Policy to ensure Kainos is exploiting data for outcomes in line with our ethical stance.

In my last blog I looked at the data exploitation opportunities afforded once personal data is brought under proper governance. In other words, if you’ve found out where personal data resides in the organisation – for example in preparation for GDPR – you’re halfway to actually making better use of that data. However, as highlighted by recent high-profile news stories in the area of data privacy, those exploitation opportunities may immediately run into some ethical challenges. As a company with an increasing presence in the market for Machine Learning and AI services, this is not a subject for which Kainos can ethically or legally abdicate responsibility. Two recent incidents crystallised this challenge for me. First, an interview candidate for one of our Data Scientist roles asked me if Kainos had a Data Ethics Policy – he was concerned about the approach to processing personal data that he had observed elsewhere. Second, I was discussing analytics for risk assessment with a senior client, who challenged me on the inadvertent discrimination occasionally resulting from such techniques. I had no ready reply.

As others have pointed out, privacy concerns do not prevent the use of Machine Learning techniques, but they do create sets of ethical trade-offs that need to be evaluated in the context of the particular project. This goes beyond legislative compliance – at Kainos we recognise that much of our success depends on our trusted reputation, as a supplier of services, as a business partner and as an employer. So, to complement our existing Code of Ethics and GDPR Guidelines, I have set about drafting a Data Ethics policy, which brings together some existing good practice. Although I lead our Data and Analytics Capability, my background in social sciences has proven helpful in this instance.

The necessity of action

It is not just our corporate reputation that creates a reason to act; this is about being good corporate citizens, for which Kainos is well known in other spheres (such as our work in schools). More organisations need a position on data ethics because the direction of travel is potentially worrying: as individuals, we may become increasingly reliant on the good practice of data curators for control of the news and political agenda as well as our consumption habits and social interactions. It is no wonder that Aurélie Pols has described Trust as ‘the new currency’.

I have found the Open Data Institute’s Data Ethics Canvas tremendously helpful. The ODI works with companies and governments to encourage open, trustworthy data ecosystems and they encourage organisations to use ODI materials as inspiration to draft their own policy. Here are three themes from our policy.

Data sourcing

The first point made by the ODI is that data ethics needs to consider a broad scope including datasets in which there is no personal data. Inadvertent economic discrimination is one example: demographics and geographies where there is greater technology penetration can skew a dataset towards the needs and behaviours of sub-groups without this being obvious in the data itself. This is relevant for the work Kainos does in citizen-facing Digital Services – our User Researchers go to extra lengths to counteract biases in qualitative research; as we increasingly exploit quantitative research, the same caution needs to be applied. Both gaps and biases need to be considered. It may not be possible to fix all the inherent biases and gaps in a dataset in short timescales – even basic data quality issues may be intractable(!). The balance of social utility vs data bias can mean that it is still better to proceed, with an acknowledgement of limitations, a right of reply for those affected and a commitment to monitor the effects and make further fixes down the line.

Where Kainos design and deliver data collection systems, our Privacy by Design guidelines apply. Where we don’t, we are generally reliant on customers to ensure that data is collected and shared lawfully and with consent for a proposed use. Anonymisation can appear to side-step some of these issues but a mature understanding of anonymisation is also required since de-identification using quasi-identifiers is a persistent challenge. Along with our colleagues at Privitar, we view privacy protection as needing a risk-based approach. With more sophisticated anonymisation techniques, there is a balance to be struck between the likelihood of re-identification and the utility of the data-sets for the desired analytical outcome.

In terms of sharing data insight, there are two valid views: ‘data is the new oil’ (it’s an asset on the balance sheet, data exploitation is a core differentiator) or ‘data is the new infrastructure’ (it should be considered a shared resource that is more powerful when everyone can use it). We do not view this as a primary ethical issue and work with organisations with both perspectives. On data sharing, we work closely with Government Digital Service (GDS) – for example we are collaborating on data standards for streaming open data sets.

Data usage

Here we need to go beyond the regulatory requirements around consent and consider how types of data exploitation may be perceived in the public arena. This is not always clear-cut – for example one of our customers the Co-op – who have published findings of their own research amongst members around attitude towards exploitation of personal data to offer new services and drive community involvement. Co-op members did not speak with one voice on this issue, with many reasons given why individuals would or would not share some personal data for these purposes.

We will not involve ourselves in political profiling – either directly or indirectly – or analysis which has the effect of exploiting vulnerable groups in society. Sometimes these decisions require great foresight however, because techniques developed for entirely benign uses can be re-purposed beyond our control. This is part of the point of a Data Ethics Policy – we can trace our own actions back to a set of ethical principles.

In the public debate around how much data access to provide to law enforcement agencies (an area generally excluded by GDPR for example), I personally advocate equipping our police and other agencies with the data necessary to keep citizens safe and preserve economic infrastructure.

Embedding a data ethics policy

I am indebted to Emer Coleman, Technology Engagement at Co Op Digital, for her advice on this huge topic. We are fortunate to be starting from a position of a strong ethical culture – embedding this policy is about helping colleagues understand the unintended consequences of design decisions, alongside re-iterating Privacy by Design. We have a blanket training programme in place, but this needs to be supplemented by case studies. This again is where the ODI’s material is useful. We will look to adapt the Ethics Canvas for our own internal use – for example developing ways to record risks and re-review milestones.

This will be a journey for all those in our industry, but the prize is the power of data exploitation to serve everyone.