The main challenge with MLOps implementation? Data scientists’ limited experience of productionising solutions.
This is not surprising: until recently Machine Learning (ML) solutions were more of an exception, rather than a rule in most organisations. Data science roles themselves are still a new and fast-evolving but under-served professional area.
However, cloud providers and businesses are catching up quickly. Nowadays, most organisations have at least experimented with Data Science and Machine Learning. And cloud providers have democratised access and ease of implementations of such experiments. So, by the start of 2021 the potential benefits of AI/ML (or the potential of your competition who use AI/ML) are clear. Proof-of-concept, proof-of-value type experiments demonstrated as much – we can now build a basic ML-pipeline in Python with literally a few lines of code.
However, the problem remains that there is a huge difference between an experiment, doing something once and using an ML-solution to underline the whole business. As a data scientist implementing Machine Learning, you and your team should be asking yourselves the below questions before embarking:
- Throwing data onto an algorithm is easy, however, real data is often a bit quirkier than that, and algorithms don’t like it:
- A few outliers (away from typical measurements) could skew the algorithm and make it underperform
- Data features at significantly different scales would undermine a solution
- Correlated data features could be less than helpful
- Features with complex non-linear relationships (read: much of real-life complex data!) may require complex feature engineering
- The above points are just the basics, however, even these few data treatment examples often are not covered (and even less often automated)
- How streamlined are data pipelines? What if the next improvement needs a change or an extension of datasets used by the ML-engine? Difference between operationalised and ad-hoc/done-by-hand solutions is hours vs months for what sounds like it could be a typical operation.
- It’s the same with a model update: what if a better model became available? How long and how cumbersome would it be to update it? If we think of an experiment running on a laptop or a Virtual Machine – not hard at all. But we are talking about, say, a retail business that is being hit by thousands of operations per second(!). Was it architected with an option to update the model “on-the-fly” in mind?
- What about a roll-back model (if a current one is found to have a problem) – how long would it take to roll it out to potentially thousands compute nodes without disrupting a business for hours/days or months?
- How about operational and cost efficiency? Is your architecture going to be flexible or wasteful and prohibitively expensive?
- What about all the data security considerations? How is that taken care of? For every element of the pipeline, for every environment (development, testing, UAT, Live, etc.), for every significantly different user role? And even more complex: would your model expose too much PII information that could be reconstructed from outputs?
- Are your data scientists aware of the best practices and following them? How long would it take to onboard a new data scientist? What about test-driven-development? If a change is implemented into a growing-in-complexity solution which by now has many thousands of lines of code – would you be able to guarantee that the change is not going to break the solution or are you comfortable to deploy it “blindly”? Even worse: what if it would be broken it in a way you won’t necessarily notice (as in: the ML-engine would produce some results; they just won’t be accurate)?
- Expanding on changes: what if your data changes? It’s easy enough to eyeball your data, even for a few dozen dimensions – but what about going forward? As new/live data is arriving daily, possibly in large volumes, and the solution grew into multiple models and hundreds-thousands features between them. Who is checking that data? And how often? And if the monitors are automated (they should be!) – are they sufficiently comprehensive?
The good news is that none of the above is overly complex or unachievable. However, there is a significant difference in the ROI achieved between an AI/ML experiment that is increasingly more and more accessible through proliferation of fundamental Data Science skills and democratisation of AI/ML through cloud provider offerings and an experienced team of platform and data engineers, ML-engineers and data scientists, data analysts, platform architects and security experts, business analysts and delivery managers who are collaborating together, building a solution that would support the whole of the organisation going forward.
Tune in next time…
Hopefully, this has given you an outline of the challenges of embedding MLOps into an organisation. In my next blog, we will discuss the practical steps to get started on your MLOps journey and key tips for delivering value early.
Are you interested in finding out how we have operationalised Data and AI solutions for our customers?
Check out our Machine Learning Operations (MLOps) services page to read our latest case study or speak to an expert.