The Problem With Computer Vision + How To Fix It

An image and a GIF side by side. The image on the left is of different versions of a Coca-Cola glass bottle. The GIF on the right is a digital 3D model of that same Coca-Cola bottle.
The cost and time requirements of using real data for Computer Vision far exceed that of Synthetic Data.

The Image Recognition Status Quo

Computer Vision (CV) has come along way in the past few decades. From self-driving cars to Optical Character Recognition (OCR), it continues to transform the world around us.

Deriving meaningful information from digital images unlocks limitless automation potential. Yet, for a field of study which has grasped the attention of AI researchers since the 1960s, mainstream breakthroughs in Computer Vision have not been as drastic as the lofty advances that were promised.

The true potential of Computer Vision, as it stands, is only accessible to a niche of image recognition experts and machine learning specialists worldwide.

While the Teslas and Googles of the world can spend eye-watering budgets on their AI endeavours to develop next-level consumer products, there exists a large majority of non-technical industries, ripe for automation with the technology, that are hindered by an unnecessary barrier to adoption, data.

A collage of hundreds of small images.
The collection and labelling of real data to train Computer Vision models is exceedingly onerous.

Attempts to democratise Computer Vision for widespread commercial use have been throttled by failure time and time again to optimise its largest dependency, the sourcing and preparation of high-quality training data.

Most CV solutions on the market today rely on a process that requires massive amounts of costly and time-consuming, real data as input. This makes it impossible to adapt and scale to the demands of the masses, and unrealistic for most companies to consider its use for domain-specific applications.

Simply put, traditional Computer Vision and image recognition technology will remain out of reach for the majority of companies until the data problem is solved.

Traffic on a road with bounding boxes around the vehicles.
The potential of Computer Vision is stifled by the intense data requirements of traditional CV model training.

The Devil is in the Data

Currently, it is estimated that only 1% of AI research is focused on the sourcing and preparation of data for AI models. The other 99% is focused on AI model training and algorithm optimisation. This is in spite of the fact that the data preparation stage of traditional Computer Vision, which requires vasts amounts of real data, takes up largely 80% of a developer’s time while 20% of their time is spent on training the model itself.

This disconnect between where a developer spends their time versus where advances are being made presents a very big problem for the future of Computer Vision.

On the flip side, it presents a very big opportunity for those who are willing to innovate with a more sophisticated and capable approach.

Rather than approaching the problem with both hands tied behind your back i.e. having a human painstakingly collect and label copious amounts of real data to train a Computer Vision model, take a step into the virtual world to generate that training data synthetically and experience a CV process that is faster, more cost-effective, and truly scalable.

An image of a supermarket aisle that has that a filter reminiscent of the matrix on it. Alphabetic characters make up the structure of the image.
An image speaks a thousand words and many more thousands of data points.

Breaking the Mold

Synthetic Computer Vision is a groundbreaking approach to image recognition that is powered by Synthetic Data.

Synthetic Data is a virtual recreation of real world data that is used to train Synthetic Computer Vision models to detect real world objects.

For real world object detection, Synthetic Data encompasses rendered images and videos of a 3D, digital twin of a real world object including virtual scenes that it is placed in. This data represents the attributes of the object as well as possible environments in which it may be found in real life. It is used to train Computer Vision models to detect that real world object.

Using Synthetic Data to train Computer Vision models is known as Synthetic Computer Vision (SCV). Its use is leading to the widespread adoption, accessibility, and scalability of CV technology in ways that traditional CV with real data never could.

A diagram describing how Synthetic Computer Vision uses the power of digitisation and virtual reality to make impact in the real world.
Synthetic Computer Vision uses virtual copies of an object for training to enable real world automation.

SCV simplifies the input stage of image recognition. Instead of manually collecting and labelling thousands of individual data points for one object, you create a computer generated object and scenery that you can generate vast amounts of images with to train a CV model.

Synthetic Computer Vision provides multimodal metadata (2D/3D bounding boxes, depth data, masks, etc.) at virtually zero cost. With SCV, bounding boxes are created programmatically from the get-go vs the long learning curve associated with traditional Computer Vision.

A virtual recreation of a supermarket shelf with products on it. The products have 3D bounding boxes around them.
Virtual recreations of supermarket products and the retail environment help automate real world shelf monitoring for retailers.

SCV is extremely robust as it eliminates the human annotation errors that are typical with conventional CV methods.

It is also extremely flexible as it captures real data variation with an easy to manipulate digital, 3D object as the training data.

Synthetic Data not only benefits the initial stages of a CV workflow, it streamlines the entire CV process.

A virtual recreation of a box of Kelloggs Cornflakes
Creating digital twins of real world objects is the first step in Synthetic Computer Vision.

Synthetic Computer Vision in Action

1. Digital Twins

SCV always starts with high quality data. Step one is to create a digital twin of the real world object.

Take, for example, a supermarket product or Stock-Keeping Unit (SKU). In order to generate Synthetic Data for a product, we first create its virtual doppelgänger using its real world packaging in 3D modelling software.

An image of a products packaging on the left with a GIF of that same product recreated in 3D in a virtual world on the right.
Creating digital twins from real world products unlocks the limitless automation potential of SCV.

Using Neurolabs, we can upload the digital twin of the product to the platform to be used along with thousands of other products to train a Computer Vision model for our chosen CV use case.

An image of the Neurolabs platform with digital supermarket products visible.
Any object can be digitised and used to train a Synthetic Computer Vision model to detect it.

With a digital twin in hand, we can use it to create a Synthetic Dataset.

2. Synthetic Datasets

Using the same software that we used to create the virtual, 3D version of the product, we build virtual scenes or digital replicas of real world environments in which the object can be placed. This helps create environmental context for the training stage.

Once we have our digital twin and virtual scenes, we can render as many images and videos, with infinite variations of the products and their environment, as we want. A collection of these rendered images and videos makes up the Synthetic Dataset which will be used to train the Computer Vision model.

As we are working with Synthetic Data, we aren’t limited by the data collection constraints of reality. We can simulate any position or condition for the product using its digital twin. We can also simulate whatever background we want using variations of the virtual scenes. In essence, the form and quantity of data is limitless.

There are three specialised techniques that the Neurolabs platform provides to generate Synthetic Datasets:

  1. Using Domain Randomisation,
  2. Using Pre-Existing Scenes,
  3. Using own scenes sourced from the 3D modelling software of your choice.

Machine Learning algorithms will create a diverse mix of data for the Synthetic Dataset automatically, cutting even more time from the process.

A digital recreation of a supermarket shelf with products on it.
Randomly generated Synthetic Data used in a Synthetic Dataset on the Neurolabs platform.

3. Model Training

Armed with a Synthetic Dataset, you can now use it as the training data to train a Computer Vision model to detect real products in the real world.

For example, you could use a Synthetic Dataset containing digital versions of supermarket products to help train a CV model to carry out Shelf Monitoring or Shelf Auditing in very different, real-world retail environments.

Using Neurolabs product, the training process is automated and easy to test on platform.

An image of the Neurolabs platform with product detections being carried out. There is an image on the platform of a supermarket fridge and the products have bounding boxes around them.
Training a precise Computer Vision model is seamless with Synthetic Data.

Real World Deployment

The result is a fully trained Synthetic Computer Vision model that is as simple to deploy to a real world, production environment as making a call to an API endpoint.

Neurolabs makes the whole process simple by providing all of the data generation and SCV model training via our platform. The product then applies an iterative training process to improve the synthetic training data using the models themselves.

Using Synthetic Data in this way allows you to build an SCV solution that excels where conventional solutions are limited in many ways:

  1. Adaptability: The virtual nature of Synthetic Data makes it easy to transfer datasets and models between domains and CV use cases.
  2. Speed: A real-world deployment can be implemented in less than one week, saving you a ton of time and radically cutting costs.
  3. Scale: Easy access to image recognition datasets for over 100,000 SKUs through Neurolabs’ ReShelf product.
  4. Quality: Achieve 96% accuracy for SKU-level product recognition from day 1.
A GIF showcases real time shelf monitoring in a supermarket fridge. The products that are available as well as the gaps where no products are available have bounding boxes around them.
Using Synthetic Data, a Synthetic Computer Vision model can be deployed at speed to detect any real world object such as grocery store products.

Data Duel: Comparing the Use of Real Data versus Synthetic Data

Neurolabs has been deploying Synthetic Computer Vision models in the real world for two years now.

Our Machine Learning experts pitted a Computer Vision model trained using real data against one that was trained using Synthetic Data. Specifically the test focused on the task of object localisation.

Real Data

For the real data, SKU110K, an open source dataset of mobile images from supermarkets, released in 2019 by Trax, was used. They benchmarked the performance of a pre-trained model on the SKU110K dataset. This real dataset contains more than 10,000 manually acquired images. The estimated cost of collating this real dataset is about $20,000. A mAP (Mean Average Precision) of 60% mAP was reached when tested on a new real dataset from a real world grocery store.

A series of supermarket product and shelf images from a real dataset.
A sample of supermarket product and shelf images from the SKU110K real dataset.

Synthetic Data

After generating 1,000 Synthetic images using Neurolabs’ Synthetic Data generator, the team observed a mAP of 65% as tested on the real data from the real world supermarket. The team randomised the lighting, camera locations and position of the objects. They used physical simulations to create more realistic structure when creating the datasets. When compared with the real data results, they observed a 5% increase in mAP using Synthetic Data. This resulted in a 100x decrease in cost and time associated with the deployment, thus making the solution much more scalable.

A series of supermarket product and shelf images from a synthetic dataset.
A sample of the rendered images from the Synthetic Dataset that was used.

Conclusion

Using Synthetic Data proved to be the superior option when deploying Computer Vision in a real world environment. Not only did the mAP improve but the cost and time involved in the project was radically reduced.

Furthermore, applying Neurolabs’ data mixing and domain adaptation techniques increases the model’s mAP performance to 80%. This was done using a mix of Synthetic and real data at a ratio of 100:1 i.e. using 1,000 synthetic images with 10 real annotated real images.

A virtual recreation of a packet of Doritos.
Radical improvement to image recognition results start with the underling training data.

A Virtual Future

Commercial application of Computer Vision will continue to grow in the coming years. The smart use of the technology will become a necessity for any company that wishes to effectively automate visual-based tasks.

The most forward thinking companies understand the value of investing in the right image recognition tech stack. Innovating in this area will not only create organisational-wide process efficiencies but indeed create a competitive advantage for the organisations that deploy it.

Synthetic Data is the future for a truly scalable and easily deployable Computer Vision solution.

Synthetic Computer Vision democratises automation potential that should not be reserved for an elite technical few but instead should be readily available to the masses to positively impact the world.

A GIF on the left of a virtual recreation of a supermarket fridge with products in it juxtaposed with a GIF of a real supermarket fridge with real-time product detections being carried out. The products in the real supermarket fridge have bounding boxes around them.
Synthetic Data enables the scaling of Computer Vision technology at an unprecedented scale.

Written by Luke Hallinan, Product Marketing Manager at Neurolabs, and Patric Fulop, Co-Founder & CTO at Neurolabs.

Retailers worldwide lose a mind-blowing $634 Billion annually due to the cost of poor inventory management with 5% of all sales lost due to Out-Of-Stocks alone. 🤯

Neurolabs helps optimise in-store retail execution for supermarkets and CPG brands using a powerful combination of Computer Vision and Synthetic Data, called Synthetic Computer Vision, improving customer experience and increasing revenue. 🤖 🛒

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Neurolabs

Neurolabs

We help retailers automate time-consuming and costly business processes using Synthetic Computer Vision.