How to Estimate a Machine Learning Project

Updated Sep 27, 2023 • 9 min read

Artificial Intelligence (AI) may add unparalleled value to your product, but it will make it more difficult and unpredictable to build for sure.

To meet this challenge you need to partner with an experienced engineering and project management team to define and estimate the process of custom software development with exceptional precision. Here's an evaluation framework used by Netguru.

Enterprise software development projects are very difficult to estimate. Adding a Machine Learning (ML) modules make them much more challenging. That is why you should keep in mind that you are asking engineers to write a program that will generate a program, which will learn to do something useful for your business. It is complicated.

You bring on a ton of unknowns and then you unload them

On the Neturu blog, we wrote advice on how to estimate a project and avoid problems. One of the key elements is “getting rid of the unknowns”. Well, while Machine Learning end products help us reduce the unknown, the process of building them is long and obscure.

ML projects are not only more complex, they are realized with a untested technology. As I noted in "7 Challenges for Machine Learning Projects", the first version of TensorFlow was released in February 2017, while PyTorch, another popular library, came out in October 2017.

That is why when thinking about AI features, you should look for the most experienced software development company. The Netguru Machine Learning team works on a four-phase estimation framework that includes: Discovery, Exploration, Development, and Improvement.

Phase I: Discovery (from one to two weeks)

The goal is to gather requirements and evaluate whether Machine Learning fits your business goals. You need to confront your vision with the engineers who will inform you what problems can be solved with the use of the current state-of-the-art and what metrics can be used to measure it.

First, metrics and business goals are often different. Users can rate movies by giving them from 1 to 10 stars. Let's say an algorithm can be trained to predict these outcomes with 90% accuracy. Sounds great but, from the business perspective, it may be more helpful to know if a viewer is going to watch the whole movie or switch to something else. It does not have to correlate with their star ratings.

Second, the development team needs to recognize what kind of data you have. Are you gathering it correctly? Or maybe the data needs to be fetched from an outside service?

Third, can we supervise the algorithm? Can we give it the correct answer each time it makes a prediction? It's a critical question, since unsupervised algorithms which have to make the decision on their own are much more difficult to train. It's like making the prediction from the previous paragraph without having the feedback from the review site.

Four, during this stage we estimate the Proof of Concept (PoC), i.e. exactly what we want to achieve, something like: "we want the model to predict whether a user will watch the whole movie and have our prediction be accurate 70% of the time".

As you can imagine, depending on the PoC, the project may be trivial for the ML engineers and bring a substantial business value, or the other way around. That is why smart goals are so important.

Let me give you another example. If we build an algorithm for detecting cats in images, we might demand it to reach 99% accuracy. On the other hand, in an experiment published in the journal Annals of Oncology, a Machine Learning program was able to tell the difference between images showing cancerous skin changes and benign ones with 95% accuracy, while dermatologists were only 86.6% accurate - this was the benchmark to beat to bring revolutionary positive change to predictive medicine.

Phase II: Exploration (4-6 weeks)

At this stage, the objective is to build a Proof-of-Concept model that will be installed as an API. Once we have trained a baseline model that is performing the task, we can estimate the goal performance of our production-ready solution.

Once again, smart matching of business goals with ML metrics prove to be beneficial. When testing a recommendation system for an e-commerce site, if there were no advanced recommendations before, a trained baseline model can be implemented at a very early stage, practically when it's performing the task.

On the other hand, this may become an extremely difficult problem if we are working on improving an already effective recommendation system.

At the end of the exploration phase, the team should be able to estimate what performance can be achieved with any of the metrics planned during the discovery phase.

Phase III: Development (+3 months)

It's time for the bespoke software development team to work iteratively until they reach a production-ready solution. As there are fewer uncertainties with each step of the project, at this stage estimation is getting more precise.

When training an algorithm, we can react to each output of our experiments as we watch the computer program write another computer program.

An algorithm learns very fast, we make a test on a set of data, we apply the metric, and see if the result has reached our goal or, if not, how far away we are. If we wanted our recommendation module to predict a viewer watching the whole movie with 70% accuracy, and our model is still at 55%, we need to readjust the algorithm and run it once again until we reach the goal.

If the result is not improving, sometimes the engineers have to apply a different model or change the method, or adjust the data. We do it until we reach the goal measured with the metric.

In this phase, the team is working in sprints, deciding after each iteration what to do next. The outcomes of each sprint can be predicted efficiently, however planning more than one sprint ahead is a mistake, especially in Machine Learning, where you are often sailing on uncharted water.

Phase IV: Improvement (indefinite)

When we have already deployed a solution to the production environment, business decision makers are often tempted to end the project in order to cut costs. In Machine Learning technology this is often a mistake. Usually, the data, such as user preferences and trends, changes over time. That is why an AI model needs to be constantly monitored and reviewed to protect it from erosion and degradation.

Machine Learning projects need time to achieve satisfying results. Even if you are lucky and your algorithms beat the benchmarks immediately, chances are it's just one strike, and your program will get completely lost using a different dataset.

That is why the improvement phase is perpetual. It can be done efficiently and does not need as many resources as the previous two phases, nevertheless, it has to be done. Continuous monitoring will not only protect the model from degradation, but improve it over time.

Machine Learning brings uncertainty to the project. That is why pays off to get the best and most experienced engineering team involved. Defining business goals and metrics, sketching architecture, and planning technical requirements at the earliest stage will determine the failure or success of your venture.