The Power of Synthetic Data — Part 2

Moving closer to the real world

Introduction

This is the second of a series of blogposts in which we explore the power of synthetically generated datasets as a basis for object detection in real environments. The first post illustrates the power of randomising the parameters of a synthetic scene to generate robust synthetic datasets and validates this power of representation on an object recognition task in a simple environment.

In this blogpost, we incrementally increase the complexity of our problem and explore how well our synthetically generated datasets scale to more difficult object detection environments.

Real Dataset

  • Occlusions between objects (each occlusion covers at most 25% of an object)
  • Using 3 different instances of the same class (e.g. 3 different bananas) instead of using only one instance per class

The images have the same camera view and the same background as the first dataset. The classes remain the same: Orange, Banana, Red Apples, Green Apples, Bun Plain, Bun Cereal, Croissant, Broccoli, Snickers, Bounty.

Synthetic Dataset

  • Using three 3D assets for each class
  • Increased the amount of allowed overlap between objects in an image (max 25% of an object occluded)
  • The number of objects per image has been extended from [2,4] to [2, 6]
  • Scaling has been set from [1x, 2x] to [0.75x, 1.75x], to accommodate for the larger number of objects in an image

Synthetic vs. Real

Figure 1. Real vs. synthetic data side-by-side comparison.

Experiments

We use 2 architectures, a one-shot detector and a two-shot detector, to show the robustness of our results. Furthermore, we have not specifically designed our networks to generalise better for distributions of data similar to the one we generate for our synthetic data. The reason behind this is that we aim to prove the general usability of synthetic datasets regardless of detection technique used.

Table 1. Experimental results

The experiments in Table 1 keep the same hyperparameter configurations for jobs trained with the same model, to ensure a fair comparison between them. For the synthetic data, we average over 3 runs of data generation.

We can see a considerable difference between the real only (green) and synthetic only (red) experiments. This can be attributed to the domain transfer capabilities of our data to real environments. The variations of the real environment (natural light variation, object overlap, object instance characteristics etc.) add layers of difficulty that models trained on our synthetic data struggle to represent. With similar amounts of data for real and synthetic experiments, real experiments perform better.

The same can not be said about our mixed dataset experiments. With the addition of a small fraction of real data into our training dataset (as little as 50 images) along with the synthetically generated data, the gap in performance becomes narrow.

Adding 50 real images to the synthetic dataset gives a boost in performance of 0.287 mAP on our EfficientDet experiments and 0.144 mAP on our FasterRCNN experiments. Returns diminish as we increase the number of real images, adding 100 real images gives an increase of 0.047 mAP and 0.019 mAP respectively over the 50 image experiments.

A mix of 200 real images and 900 synthetic images achieves a difference in mAP of 0.041 below the real only experiment for EfficienetDet, while using 27.21% of the amount of real data that the real dataset uses (735 images). The result using Faster-RCNN with a mix of 200 real images is only behind the real experiment with 0.014 mAP.

Figure 2. Validation losses across experiments.

The plots in Figure 2 present the evolution of the validation loss during training with different datasets. The same validation set has been used across all the runs, randomly sampled from the real data. We use the evolution of the loss to check on the correctness of the learning process.

Validation loss for training only on synthetic data is the outlier in this graphic. This is because of the difficulty of fitting the model to the particularities of real data when the model has only seen synthetic data. They are not drawn from the same distribution.

Top losses

We apply the method on the synthetic dataset and mixed-200 dataset. As it is shown in both images in Figure 3, the hardest images to learn are those that have occlusions between objects. Also, the light seems to be another factor which challenges the model training as well as the texture of objects when it is similar with the background colour.

Figure 3. Top Loss Validation Images — left image obtained from synthetic experiment, right image obtained from mixed 200 experiment

Conclusions

We highlight the strong potential of synthetic data, slowly moving away from the lab-like environment and peaking into the real world. For our next blog posts, we plan to move the problem to a generic real world environment and further increase the complexity of our synthetic data.

Written by Daniela Palcu, Flaviu Samarghitan & Patric Fulop, Computer Vision Team @Neurolabs

Sign up for the Alpha version of our Synthetic Generation and Computer Vision platform!

Using the power of synthetic data to democratise Computer Vision.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store