My experience as a Data Scientist

Date posted
11 April 2019
Reading time
19 Minutes
Claire Davidson

My experience as a Data Scientist

I have been working as a Data Scientist in Kainos for over two and a half years. Kainos helps their customers to create technology solutions that allows them to work smarter and faster. The word 'Kainos' means innovation and so they are always keen to use new techniques to provide the most benefit to their customers including artificial intelligence and data analytics.

Kainos does not only provide a 'shiny' model as an end product for their customers, they take their customers on an analytics journey where they move businesses from looking at descriptive analytics through to the benefits of predictive analytics. This journey is key to ensuring customers are getting the most out of their data and advanced analytics are used in an accurate and appropriate manner for that specific customer.

How I got the role as a Data Scientist

I studied a Masters in Mathematics with Statistics and Operational Research at Queen's University Belfast. In my course, I studied a data mining module which introduced me to some machine learning techniques and inspired me to pursue a career in Data Science.

When I joined Kainos, there were only two other Data Scientists in the company that were part of a small data team. Over the past two years, this team has grown significantly, consisting of Data Engineers, Data Scientists and Data Analysts spread across the UK and Poland who deliver data transformation and analytical solutions to a variety of customers.

What I have done as a Data Scientist at Kainos

Throughout my time in Kainos, I have been working on projects involving data mining, data visualisation, machine learning and text analysis. I am currently working on a project with the Driving Vehicle & Standards Agency (DVSA) who aim to keep Britain's roads safe by ensuring all vehicles obtain a MOT test of the same high standard across Great Britain.

The project seeks to identify any unusual testing patterns by vehicle testing stations and testers so that garages can be prioritised, allowing efficient allocation of examiners to monitor them. DVSA have the task of monitoring over 58,000 testers and 23,000 garages with only around 265 examiners. This requires one examiner to monitor the standards for over 200 testers.

Their old process was used for a long time and it was very static and easily manipulated. It was also completely dependent on human intervention which required more resources than existed.

I have been part of a team which has completely recreated this enforcement process by using machine learning techniques to identify unusual testing behaviour and prioritises the backlog of visits for the examiners. This new process is dynamic, changing monthly to accommodate for changes in the data and testing behaviour and so is adaptable to the surrounding data trends. It allows efficient use of a limited amount of resources and uses a combination of data and human intervention to create a more powerful process which supports the job of the examiners and allows them to work quicker and more efficiently.

This project was the winner of the AI & Machine Learning Project of the Year at the UK IT Industry awards in 2018. The key to its success was the journey that DVSA have taken around data and analytics. A few years ago, DVSA started using their data to produce dashboards and understand how their service was running, in order to find any areas they could improve. They gradually started to answer more complex questions, identify and improve their data and overall build up a really strong knowledge of their whole MOT service.

This journey is essential to allow advanced analytics to work effectively. To create a useful model, business knowledge and a solid understanding of the data are key to producing reliable results.

A chance to give back too

Alongside working with our customers, I've also had the opportunity to deliver a teacher development program to primary and secondary school teachers in collaboration with CCEA. This allows teachers to introduce coding and programming into their classrooms. I have been involved in helping build the resources which make up the progression pathway from Key Stage 2 through to Key Stage 3. This course is in high demand and has proven successful in developing teacher's confidence in computational thinking, scratch and python.

I became involved in this initiative through recommendation of a colleague. I was keen to get involved as I have always felt that I had a significant lack of coding experience, and it was the biggest learning curve for me in this job. Through school and university, there was limited focus put on coding, however entering into the world of work, it became clear very quickly that these skills are imperative for a large range of jobs. Particularly for math graduates, the career options generally require a high level of ICT skills, with most requiring coding.

Through my own development at Kainos and by being part of this initiative with CCEA, it is clear there is a strong link between maths and coding. Therefore I am keen to help bridge the gap and allow coding and ICT to be used across the curriculum to provide children with skills that will benefit their future careers.

A typical day as a Data Scientist

Working as a data scientist is not all about creating models, in fact, this is a very small part of the overall role. Producing a model alone is not usually beneficial to a customer, it needs to be part of a bigger process and so there is much more required to produce a functioning product. Anyone who has ever worked with data will know that most of your time is spent cleansing and manipulating data. Often datasets are large which can make them susceptible to errors. Without removing these records, a model could be built which is actually wrong. There are often strange edge cases in the data which need to be considered and dealt with appropriately.

Building up a strong knowledge of the data, how it is created within the business, how it is used and what is available is key to building suitable advanced analytics solutions. On a general day, I would have a number of stakeholder meetings to present findings or plan our next steps. I could also be writing SQL to answer a business question, I could be reviewing my peers work, I could be writing R code to improve the currently implemented model or even creating a dashboard.

There is a large amount of variation in the role of a data scientist! On a daily basis, we will also collaborate with the rest of the data team, assisting the analysts with their queries, data modelling or discussing data structure and quality with the engineers.

How to be successful in a Data Science career

To be successful in companies like Kainos, you need more than just the technical knowledge: 

  • Communication and presentation skills are extremely important to allow you to discuss technical problems with the customer and explain technical results in an understandable way. Storytelling is an important skill where you can spark interest and lead others on a journey through the results. You must be able to communicate effectively to other members of the team as well as work independently when required.
  • Another key skill for a data scientist is curiosity?�??��?�?explore a range of datasets, be aware of a range of machine learning models and ask lots of question. This analytical mindset is what companies are looking for in their data scientists.
  • The ability to continually learn new things quickly and upskill are also important to ensure your career progresses within the IT industry.
  • Experience in a wide range of tools and data sources will also help you to adapt to any work that you may be faced with. This could include programming languages such a Python, R or SQL and visualisation tools such as PowerBI or Google analytics. 

If you are from a maths/science background, I would suggest focusing on your coding skills. As you can imagine, joining an IT company with minimal coding experience was quite daunting and I can remember feeling completely overwhelmed when I had never even heard of terminal on a Mac, never mind knowing any of the commands! However, coding just takes practice and when you combine this with the logic and problem solving skills from a maths/science background, you can become a successful data scientist.

If you are from a computer science background, then I would suggest focusing on the maths and statistics. Being able to understand how the models work, producing suitable features for the model and understanding the data and output requires a good foundational knowledge of maths and statistics. We are often asked why you would need to understand the model when the code is already built. The power of understandable AI can sometimes outweigh the benefits of black box models. When you are able to describe to the customer how the model works and some of the underlying details, they are much more likely to trust the results at the end. I would also suggest starting with simple models, such as linear regression and clustering before diving into deep learning because quite often in reality the requirement from the customer and what is possible from the data is just a simple ML model!

Finally, find a role that you like! Data Science varies between companies, and even within a company you may find a variety of data scientists; They are expected to be experts in so many things. Expertise spread across members of the team is vital to a company's success.

So you should understand what you love to do and ensure that the job responsibilities reflect those tasks.

Why you should get involved in Data Science

Everyone uses technology and there are endless opportunities in the IT industry, which will continue to grow in the near future. As companies grow, they need better solutions to ensure they are above their competition, requiring software engineers, architects, researchers, designers and data analysts to produce better applications, better websites or better products.

Data Science as a role is quite new but since data analytics, machine learning and AI are attracting more interest, more of these roles are starting to surface. Companies will use their data to create better products for the benefit of their own business and for their customers. So there has never been a better time to invest in yourself and build the skills required for a career in Data Science. 

About the author

Claire Davidson