Having been a part of Kainos’ Big data & analytics practice for almost a year and a half now has raised many questions in my head. Not least because it is an emerging area of technology, growing so rapidly that there are new programming models, frameworks and data stores seemingly every other week! With these rapidly evolving technologies and more precedence than ever being placed on big data and analytics, it raises the issue of ethics.

Ethics is defined by Oxford dictionaries as ‘moral principles that govern a person’s behaviour or the conducting of an activity’. Technology and by extension big data and are ethically neutral, how we utilise big data is not! When I search Google for more information on big data ethics does this give Google the right to try to sell me O’Reilly’s newest ‘Ethics of Big Data’ book? (*Caveat – This is a great book and if you’re interested in the topic, it definitely deserves a read). Should it allow them to infer that I’m a software engineer who might like to attend a conference with a big data ethics stream?  How does a company or an organisation who intends to use big data analytics to enhance their operations know that their utilisation of the technology is ethical?
Much research into the topic of ethical big data as surfaced four common themes which could be used to characterise the issue.

  • Identity
    • Is offline existence identical to online existence?
  • Ownership
    • What does it mean to own data about ourselves?
  • Privacy
    • Who should control access to data about you?
  • Reputation
    • How can we determine what is trustworthy?

To further understand the issue we should explore these in turn. If we look at a person’s identity in the modern world it is apparent it is multi-faceted. There are records stored about an individual in many disparate silos, both online and offline. Multitudes of data analytics projects involve the collection and aggregation of these records. With available technology this is quite a common and easy use case. But does the aggregation of these disparate records build up truly valuable information on a person’s identity? It is also for the most part that this correlation occurs without the direct participation of the people involved.

Davis and Patterson suggest in their book (Ethics of Big Data – O’Reilly 2012) that the issue of privacy and big data ethics boils down to two things. Does privacy in the real world (offline) and privacy online mean the same thing? Should individuals be able to control data about themselves? And if so, to what degree? The sweet spot lies in discovering the balance between the applied benefits of big data technologies with the apparent risk in widely sharing information.
With the evolution of big data technologies the meaning of reputation and how it can be altered has changed considerably. Previously an individual or even a companies’ reputation would most often be formed by speaking to others with personal experience of the individual or company. Or a little further removed could be reputation that is spread by word of mouth from the individuals with firsthand knowledge. If a person was seeking to improve their reputation this was essentially in their direct control and they could choose to act more morally.  Now, a profile of an individual and by extension their reputation can be built up quite simply using data analytics. This reputation can be shared with an almost exponential number of people. The simplicity of having direct control to improve your own individual reputation slips away. The matter of managing your online reputation has even spawned companies whose sole mission is reputation management.

If we look at ownership, there is a similar quandary as to what degree of ownership an individual has over information about themselves. For example, in the real world do we “own” the information about our family tree, or medical history? Do we then own the online presence of that information? These questions are something I feel will become clearer as the analytics space matures and structured guidelines or legislation is put in place to ensure ethical online behaviour.

The evolution of big data technologies has pushed the boundaries of what is possible…but with great power comes great responsibility. It is of the upmost importance to ensure that ethical values motivate our actions. It is our responsibility as big data practitioners to be at the forefront of answering these questions and shaping a future of ethical analytics.