This article is a continuation of our series on the field of computer vision and the application of computer vision algorithms for tasks such as object detection and facial recognition. We’ll examine the process for training machine learning models and metrics for evaluating machine learning model performance. We’ll learn about deploying machine learning models once your algorithm reaches an acceptable level of performance. Finally, we’ll highlight some considerations that project leaders should take into account to make sure their initiative goes according to plan.
(The previous article in this series can be found here: Computer Vision Projects – Part 1: Your Dataset.)
The computer vision project life cycle mapped to the CRISP-DM.
Step 4: Selecting and Training Machine Learning Models
Once your project has clean and well-labeled training data to feed into one or more computer vision algorithms, it can now move into the training phase of the machine learning project management life cycle.
Here we break down some considerations for this phase:
Hardware Requirements: CPUs, GPUs, and TPUs – Oh My!
Training machine learning models requires massive amounts of computational power. Computer vision, in particular, is particularly computationally-intensive because image data is large in size and contains lots of noise out of which relevant features must be identified.
Image processing of still images or video data requires tens of millions of mathematical operations in order to transform pixel-level data into higher-order concepts interpretable by human beings.
The exact requirements during the training phase depend heavily on your objectives. The pixel resolution at which your computer vision algorithms need to resolve objects will determine your overall power consumption. More advanced classification tasks that include the breakdown of individual images into overlapping subsets may require larger convolutional neural networks and a corresponding rise in computational requirements.
Project teams must decide whether to feed these power-hungry machine learning algorithms using computational power generated through CPUs, GPUs, or other more specialized hardware like ASICS or TPUs (Tensor Processing Units).
CPUs, or Central Processing Units, are the core components of computer technology. They are most adept at performing a series of mathematical calculations in order, parsing and interpreting long chunks of complex code.
The advantage of training machine learning models on CPUs is that they are cheap to purchase, widely available, and are energy-efficient (read: cost-efficient).
The main disadvantage of CPUs is that they are limited to performing one mathematical operation at a time in a sequence. This limitation is perfect for digesting sequential code but poses constraints on computationally-intensive training calculations for deep learning models. These calculations may require tens of millions of operations to train successfully. The net effect is that training computer vision algorithms on CPUs takes longer, which reduces the speed at which teams can iterate and experiment with their model design.
GPUs, or Graphics Processing Units, were initially designed for use in video games and graphics rendering. This type of processing requires many simple transformations of matrices and vectors, which directly mirrors the types of operations present in machine learning data processing.
The advantage of training machine learning models on GPUs is that they can train models much faster than CPUs. GPUs parallelize the learning process and distribute it among a larger number of processing units. This feature gives project teams a greater ability to rapidly train, test, and iterate on different models and strategies.
The disadvantage of GPU training is largely due to cost. GPUs are typically 2-3x more expensive to purchase and run. Renting instances of GPUs from cloud providers is also more expensive.
Hardware evolved over recent years to form components more specialized for use in a single application. These components involve specially designed logic boards that are optimized for a specific purpose. An example from outside the machine learning world might include the design of hardware specifically optimized for the types of calculations required in bitcoin mining to speed up performance and energy efficiency.
These types of components exist for machine learning, but are highly specialized and require a greater depth of knowledge to deploy ML systems using them.
The landscape of available machine learning tools and computer vision algorithms in use by data scientists during the training phase is beyond the scope of this paper. However, it is important to note that there is a wide variety of options that serve various purposes.
You can view this list of open-source software options for training machine learning models.
Project teams should avoid getting locked into one particular model development solution for a number of reasons:
- Data scientists have strong preferences; some prefer GUI based tools, others prefer writing code, some like R, others Python. Allowing the data science team to use the tools they prefer will maximize their productivity.
- Model development tools and open source solutions evolve rapidly; being locked into one development tool set might constrain future development efforts; first came R followed by Python. Today we have Tensorflow, Torch, Caffe, Scikit learn, and tomorrow there will be more.
- Not all tools are equally good for all tasks; certain tools excel at deep learning, others have superior libraries for statistical models, and some computer vision algorithms scale better than others for big data.
- Likewise, some tools are better at enterprise deployment and integration than others; ideally firms should separate model development tools from model deployment solutions and leverage open standards for model representation whenever possible.
- Change is inevitable…
Keep your options open and use a broad variety of toolsets to experience the advantages of each. Be sure to decouple what you need for training the model and for deploying it. Open standards frameworks such as the Predictive Model Markup Language can help with the process of transferring data between machine learning models in different open-source tools.
Remember: training and deploying machine learning models on an enterprise scale is a delicate process. These guidelines will help keep your project nimble, flexible, and resilient to unanticipated forces outside of your control.
Neural Network Design: Variety is the Spice of Life
Machine learning project performance depends on the specific choices made when picking an architecture. The scope of possible network configurations available for computer vision projects is beyond the scope of this article.
The salient point here is that your team should experiment with different architectures while keeping in mind that many best-in-class ML models are available for a wide variety of computer vision tasks.
Research your specific application and understand the landscape of potential options before investing fully into one particular strategy or network design.Correct decision making here relies on the experience of your data scientists and machine learning engineers. Your team should understand the tradeoffs between different feature engineering approaches and how the quality of training data affects your machine learning model performance.
If your team is light in terms of skill set in this area, it may be advantageous to outsource model design to an experienced third party.
Accelerate Training Time With Transfer Learning
Model training speeds can vary according to the scale, complexity, and objective of your computer vision project. Teams can use a technique called transfer learning to accelerate the training process under certain situations.
Training a deep learning model from scratch is time-intensive and computationally-expensive. Luckily there exist a number of models that offer a solid pre-built foundation for a given application. Transfer learning involves taking a model that is already fully trained on a high-quality, diverse, and relevant dataset, and augmenting it with images that capture your project-specific problem space.
The advantages of transfer learning include:
- Reduce total time and cost of model training
- Achieve high performance without procuring a massive data set
- Demonstrate satisfactory proof-of-concept quickly
An example might be using a model trained on the ImageNet data set and supplying it with additional photos of specific categories of objects you wish to teach it to classify. The net effect is that you can skip the weeks of training time needed to bring it up to a baseline level of performance and can then add the detection capabilities your specific project requires.
This process will accelerate deploying machine learning models into production where they can make an impact in the real world.
Step 5: Evaluating Machine Learning Model Performance
Find Your North Star (Goal KPI Metric)
One of the fundamental requirements of successful computer vision projects is that the outcome is measurable. Team members will be unable to judge the success of their project without evaluating machine learning model performance at every step.
Project managers should ask themselves these questions:
- What is the minimum threshold needed to achieve feasibility from a business strategy / ROI standpoint? Is ‘OK’ enough?
- How much risk are we willing to tolerate when deploying machine learning models that are inaccurate?
(example: an acceptable false-positive rate in breast cancer diagnosis)
- What are the consequences (in terms of finances or the risk to human life) if the computer vision algorithm performs poorly or the model is inaccurate?
Project managers can evaluate machine learning model performance according to a wide number of criteria. Here are some of the most common metrics:
Accuracy is a score used to gain a baseline understanding of performance on classification tasks. This metric refers to the number of correctly classified images divided by the total number of images in the overall testing set.
Accuracy – a metric used for evaluating machine learning model performance.
Warning: accuracy should not be the only metric used. A common mistake is to forget that accuracy depends heavily on the distribution of examples within your training set. If the number of images in your set is skewed heavily towards a class where the model has high predictive performance, you may achieve a high overall accuracy score.
This result can be misleading – your model may still fail to classify every single other type of class in the set. Since these examples only constitute a small percentage of the total dataset, your accuracy score may lead you to conclude that the model is successful.
A confusion matrix is a tool used to gain a deeper understanding of the types of classes on which your classification or tagging algorithm underperforms or meets expectations.
Confusion matrix – a metric for evaluating machine learning model performance.
We can see here that the model performs well on classifying Concrete, but that examples from the “Asphalt” and “Building” categories are more difficult to classify (higher disparity between predicted results and actual class identity).
Confusion matrices provide a nuanced understanding of how your ontology (set of category definitions) responds to classification, or where in your dataset you need to supply more training examples. Informed by these results, project managers can retrain their neural network with additional images to improve performance on a layer-by-layer basis.
Precision and recall are two related metrics used to understand the quality of the classifications made by your vision system.
Precision and recall – two metrics for evaluating machine learning model performance.
These are measurements of the four possible outcomes of classification: True Positive, True Negative, False Positive, and False Negative.
Precision and recall equations for evluating machine learning model performance.
Important: keep in mind what these numbers represent in the context of your actual use case. In healthcare, for instance, the objective of a diagnostic tool for pathologists should skew heavily towards high recall. You want the vision system to return as many possible examples of potential cancers, and let the doctor sort through the results to make the final call. However, in a system for movie recommendations, the model should optimize for high precision whereby any recommendations made should be accurate and relevant versus complete relative to all available choices.
Jaccard-index or Intersection of Union (IoU) is a metric used specifically for image and video analysis tasks. The purpose of this metric is to compare the resulting object locations produced by detection algorithms with the actual locations of observed objects in the training and test data. In practice, this process essentially compares the location markers of objects detected by the algorithm with where those objects actually are in the image as observed by a human.
Jaccard Index – a metric for evaluating machine learning model performance.
Evaluating machine learning model performance is of critical importance. Companies often lack clarity around which of these metrics their project should aim toward. The result is that they produce models that are effective in the wrong areas. In many cases, this ineffectiveness may pose large risks to individuals or the business itself.
Step 6: Deploying Machine Learning Models In Production
After your model hits or exceeds its performance goals on your test set, it must be moved into a live production environment. Your overall project goals will determine how and where you deploy your AI solution.
Some projects will require software engineering skills to wrap your ML model in software and deploy in the cloud, while others may involve hardware embedding on an IoT device.
Firms should consider these factors when choosing a deployment medium:
- Proximity to data
Note: Data proximity has a direct effect on costs throughout an entire project. While this factor depends on the size of your dataset and ongoing data transfer requirements, firms can minimize these overhead costs by strategically deploying their computer vision solution close to where their data lives.
Here are some options for deployment:
Firms can deploy their solutions on the cloud quickly and easily through the major providers Infrastructure-as-a-Service (AWS, Google Cloud, Azure, etc.) The cloud offers several advantages:
- Redundancy in case of failure, maximum uptime
- Reliable scaling
- Microservice availability
- Proximity to data lakes
And several disadvantages:
- No control over outages or service disruptions when they do occur
- Security of sensitive data
- At the mercy of the cloud providers
- Costs can skyrocket for projects with large datasets, especially if you need to re-train frequently
Firms can also deploy their computer vision solutions over their own local hardware such as servers or desktops. The advantages of deploying on-premise include:
- Total control over data
- Total control of system updates
- Free from cloud
- Once paid for, your computing power is essentially free over your hardware’s lifetime
The disadvantages of on-premise deployment are:
- High overhead investment
- Difficult to scale
- More complex to manage and monitor than cloud-based Infrastructure-as-a-Service
Mobile deployment of computer vision algorithms is possible thanks to the advances in compute capacity and processing speed available to smaller handheld devices. The exact goal of your project will determine whether your needs to be deployed here.
Edge / IoT / Embedded
Oftentimes a project may require that computer vision systems be installed on Edge devices such as IoT devices, cameras, or drones. These types of deployments require an additional set of skills that may include hardware embedding, among others. Edge deployments often become attractive when latency sensitivity is a priority or during projects in which deployment in the field is advantageous.
Important: Do not underestimate the complexity of implementing your computer vision system in a live environment. Our experience with a wide variety of companies across industries such as healthcare, infrastructure, and finance highlights to us that bridging this gap is no easy feat. The following graph indicating the growth in market size for computer vision is especially informative.
Growth in computer vision services highlights difficulty of training and deploying machine learning models into live production.
The noteworthy number here is the growth in the ‘Services’ category. We believe this to be an obvious indicator of the growing skills gap and talent shortage in computer vision system integration. There is growing demand for specialists capable of designing computer vision systems, training machine learning models, and deploying machine learning models successfully in industry.
How to Build Trust When Deploying Machine Learning Models
Organizations implementing brand new computer vision systems face many challenges – one of the biggest is trust. Building organizational trust in a new algorithm is difficult. Project managers often face resistance from employees, lack of confidence in the predictive conclusions of the machine learning model, and anxiety surrounding potential changes to the firm brought about through AI technology.
The way through this minefield is to follow an established framework for guiding your fledgling AI system from adolescence through to adulthood. This process begins the moment the model is deployed in a live environment. Firms must monitor their computer vision algorithms for drastic changes in performance. Performance declines need to be addressed quickly, either through the addition of more training examples, different hyperparameter tuning, or different network design.
The idea that machines will replace humans is a common worry among firms just dipping their feet in the world of AI. This philosophy is dangerous. It discourages a beneficial relationship between an AI system and its human operators.
The idea is not to replace humans, but to augment their powers of observation and prediction with the machine’s. The AI system informs the human, who in turn makes value judgments that an AI system cannot make due to its lack of contextual understanding. The human then returns and adjusts the AI in order to improve it. This cycle repeats ad infinitum until the AI system gets to a place where it performs with accuracy, reliability, and predictability in critical business functions.
How to build organizational trust when deploying machine learning models.
This continuous cycle of improvement is the pathway to trust in AI and the only viable path forward in tomorrow’s business environment.
Managing Cost in Computer Vision Projects
The history of commercialized computer vision implementations is plagued by stories of severe cost overruns and premature project failures. Prospective companies looking to build solutions can mitigate their financial risks through the following guidelines.
Find a reputable partner to help accelerate the first initiatives and control risk and cost. Valuable partners are those who specialize in deep learning and computer vision algorithms design. They should possess a multidisciplinary team spanning every portion of the machine learning project management process. These partners take the form of consulting companies, academic entities, or larger technology firms.
A good option might be to partner with reputable boutique AI-as-a-Service firms with demonstrated proficiency in your industry or use case. These firms can provide you the type of one-on-one relationship that nascent projects require for success.
The financial and business risks associated with AI projects are highest during the first few initiatives. The learning curve is steep, from project management requirements, financial investment, and technical skills investment, all the way through to cultural concerns and human resource management.
AI-as-a-Service companies can provide tactical support for companies in the early stages of their AI strategy by eliminating the need to invest in hiring the full landscape of talent needed to bring computer vision projects across the finish line. This strategy can accelerate opportunities for innovative companies looking to future-proof their businesses at a lower overall cost.
We are here to help.
Bridging the gap between an organization’s business need to define and execute a successful AI program and the ability to hire and maintain the staff needed to do so is where Dynam.AI brings the greatest value to our customers. Our mission is to deliver a competitive advantage to our customers by serving as their trusted Artificial Intelligence services partner. Our end-to-end service ranges from initial data science exploration through the development of advanced deep learning and computer vision algorithms. We specialize in rapid operational deployment and enterprise integration. Our team is comprised of data scientists, physicists, engineers, and business leaders with over 100 years of combined experience in AI development and implementation.
About the author
Dr. Michael Zeller has over 15 years of experience leading artificial intelligence and machine learning organizations through business expansion and technical success. Before joining Dynam.AI, Dr. Zeller led innovation in artificial intelligence for global software leader Software AG, where his vision was to help organizations deepen and accelerate insights from big data through the power of machine learning. Previously, he was CEO and co-founder of Zementis, a leading provider of software solutions for predictive analytics acquired by Software AG. Dr. Zeller is a member of the Executive Committee of ACM SIGKDD, the premier international organization for data science and also serves on the Board of Directors of Tech San Diego. He is an advisory board member at Analytics Ventures, Dynam.AI’s founding venture studio