This blog looks at the challenge of taking machine learning (ML) models from the data science notebook running on a laptop to integrating ML as a robust part of a live service.

The Challenge 

Productionising a machine learning model is partly an organisational and cultural challenge. The area is still extremely new and even experts in the field still have much to learn, particularly when taking the output of machine learning efforts to the point that they can provide value to end users.

Data science teams are usually separated from software engineering teams and the gap between the disciplines leads to significant challenges in bringing an ML model into production. This is usually due to gaps in understanding and inadequate processes.

Even basic machine learning models which may be simplistic in isolation still present complexities from the perspective of a traditional software engineering release pipeline – there are more vectors of change to consider than just code.

What about the records on which the model was trained? The data’s schema? If code is not the only way to trigger a new build, then how do we initiate a new release from changing data and how is that new artefact validated?

Machine learning solutions do not function in isolation, they are highly dependent on a supply of high-quality data. The challenge of making that data available in a way which is performant and secure is non-trivial, but essential to get right if we want to make ML a reliable part of live services.

 Figure 1 Disconnect between Data Science and Operations 

Although these challenges are substantial, we can apply our experience of continuous integration, testing and deployment in software delivery, adapting these techniques to the task of making machine learning production ready.

Model Ops 

Much as DevOps sought to bridge the gap between development teams and operations, we require an integrated approach to provide the output of the exploratory data analysis that data science teams conduct with a clear path to live which satisfies the non-functional requirements of the overall service. 


Figure 2 Automating machine learning delivery 

 Automation is the key to building a process which can allow businesses to adapt quickly in response to change and to do so in a way that is reliable, repeatable and removes human error wherever possible.  

Data science development is generally conducted using data science notebooks. These are ideal for iterative exploratory analysis, enabling data scientists to test multiple approaches when selecting a candidate model.  

Significant problems can arise if the model packaging and deployment occurs in isolation with no connection to the training data. Retraining requires reverting back to the notebook environment and requires input from the data scientist once more.  

The traditional code versioning approach is adapted to include the version history on training data. This is essential to provide lineage, auditability and reproducibility that would be impossible with code versioning alone. Production code is separated from the concept of iterative notebooks.

Quality Live Services  

Once a model has been promoted to Live, we apply our experience of observability to understand how that model performs and to monitor for model drift or decay. As more data becomes available, model retraining and refinement should be possible.  

Our automated delivery approach allows us to make those refinements through regular, incremental updates which are tightly controlled. 

An approach founded on automation and observability provides a process which can ensure that the Non-Functional Requirements (NFRs) are well understood and built-in to the delivery pipeline. Key NFRs include:  

  • Performance does the model selection hinder performance or can we better optimise processing and associated cost? 
  • Security models are often dependant on sensitive and valuable data so we must apply enterprise rigour to the handling, storage and usage of that data 
  • Accountability – machine learning services which are explainable, auditable and transparent so they can meet regulatory and legal requirements 
  • Reliability – we leverage resilient public cloud services to ensure a reliable flow of data through end-to-end pipelines and automate testing at each stage 
  • Maintainability software engineering techniques for versioning, lifecycle management and automated deployment reduce the risk and cost of change and ensure that deliverables are of a high quality 

 It is also crucial to understand when and when not to build a bespoke machine learning model. It may not be a data scientist’s first instinct to look at available offtheshelf solutions e.g. AWS Comprehend for Natural Language Processing (NLP) or the Azure computer vision solution for image analysis.  

While it may be possible to develop a model that is better than an offtheshelf solution, it creates a significant overhead to maintain the model and manage the infrastructure on which it runs.    

How much “better” does your model need to be to justify this overhead? There are clearly cases where a bespoke model is needed, but in others the available cognitive services will provide an ideal option.  

Our Thinking 

Members of the Kainos team have a wealth of experience in tackling these issues in the wild. 

This talk from our colleague Joe McGrath presents more in-depth technical discussion of some of the issues on a real project bringing together Data Scientists, Developers and Ops in a unified delivery team: Data Science in a DevOps World 

Our data science team is also responsible for risk rating garages across the UK for DVSA , incorporating machine learning into a nationally deployed digital service to improve Britain’s road safety.