Semantic Segmentation
In this example, we demonstrate the use of Tensorleap on a Computer Vision task - Semantic Segmentation with COCO data. For training and validation coco 14 data files were used, combined with a MobileNetV2 backbone and a pix2pix based decoder.
In this example, a model is trained to segment images to three categories: background, person and car. After training, the Mean IoU for person and car was0.309 and 0.262 respectively.

Population Exploration

The Tensorleap platform tracks how each learned feature, within each layer, responds to each sample. From that information, it constructs a vector that captures how the model perceives each sample. This allows the platform to create a similarity map between samples as they are interpreted by the model. A more intuitive explanation would be that similar samples would activate similar learned features within the model.
The Population Exploration analysis generates a plot that represents the samples' similarity map based on the model's latent space, built using the extracted features from the trained model. An unsupervised clustering algorithm was used to define the colors of the dots. Moving the cursor along different areas reveals samples that have common features, for example:
Black & White Cluster
Ski Cluster
Surf Cluster
Tennis Cluster

Cluster Analysis

Tensorleap's Fetch Similars tool is used to return a cluster of samples that are similar to a chosen sample. Next, we will analyze a few clusters from different areas of the model's latent space (presented in the Population Exploration plot).

Tennis Cluster

One of the clusters was a cluster of people playing tennis:
Tennis Cluster
The tool also provieds heat-maps of the samples that highlight common features within the cluster. In this cluster, for example, one of the highlighted features is the tennis rackets:

B&W cluster

As seen in the population exploration plot above, the platform detected a cluster of gray images. Running the Fetch Similars tool resulted in these images:
Fetch Similar Grayscale Cluster
From the fetched images we can see that this cluster indeed contains grayscale images but also RGB images with a small variation in hue. Tensorleap's cluster analysis shows that the vast majority of samples are not grayscale images but in fact RGB images (plotted as red dots below). Moreover, comparing the model's performance on grayscale vs RGB images yields that on average, RGB images have lower error loss.
RGB vs Grayscale Loss Comparison
Grayed Cluster - RGB (red) and Grayscale (Blue)

'Vehicle-like' Clusters

Our model's latent space includes multiple semantically meaningful vehicle clusters, for example, the Motorcycles/Bicycles and Bus clusters:
Bus Cluster
We can see that the Bus cluster also includes quite a few buildings. Further examining the heat-maps of samples in this cluster indeed revealed model's attention not only to bus features, but also features from building and towers, as they share similar features. For example:
Cluster-Defining Features Heat-Map
Motorcycles/Bicycles cluster:
Motorcycles/Bicycles Cluster
The features defining this cluster are mostly of the wheels and rider, as seen in these heat-maps:

Vehicle SuperCategory Model

Analyzing the model on Tensorleap revealed that the model had a difficulty segmenting cars as a separate class from trucks and busses (which are labeled as background). One possible solution is to segment the entire Vehicle SuperCategory together, which will be reviewed in this section.
After training the model with the Vehicle SuperCategory the Mean IoU for person and vehicle is now 0.319 and 0.312 respectively (compared to 0.309 for person and 0.262 for cars previously).

Cluster Analysis

When Fetching Similars to one of the vehicle samples, the result is a more homogeneous cluster composed of cars and buses:
Vehicles Cluster (click-to-zoom)
The attention maps below show that the model is able to find strong, discriminative features in the analyzed cluster, such as wheels. In addition, it reveals a possible confusion as some round objects could be categorized as vehicles, due to their similarities to wheels.
Vehicles Cluster Heat-Map (click-to-zoom)

Performance and Metadata Analysis

The Tensorleap Dashboard enables you to analyze how your data is distributed across various features. This enables us to identify trends and factors that might be correlated to the model's performance.
Tensorleap Dashboard (click-to-zoom)
The Car Percent and Person Percent represent the ratio between the number of pixels that are labeled as car or person and the total number of pixels in a sample. The histograms reveal a strong correlation between the two variables and the loss. Zooming in on the Car Percent vs Loss histogram, shows a significant increase in loss above ~50% Car Percent. One of the bars indicates a relatively high average loss for this bucket (and has ~80% Car Percent):
High Loss Bar
Examining this bucket we can see that it contains samples with close-up photos of cars that present very few features, or car interiors.

Metrics Filtering

All metrics dashboards in the Tensorleap platform are interactive, and can be filtered.
For example, we can further explore this bucket by filtering the view to show results only for samples belonging to it, updating the dashboard accordingly:
Interactive Filtering of High Loss Bar (click-to-zoom)
Exploring the validation samples table, we can see that only 6 samples are in that bucket (marked by the red box below):
Filtered Dashboard (click-to-zoom)
​Examining the sample with the highest loss, we can see it is a close-up on the car's front (as expected due to the high Car Percent), and that there are two cats that were mislabeled as car:
High Loss Sample
Next we'll present a few additional samples that were detected in a similar way.

Sample Analysis

False and Ambiguous Labels

The Sample Analysis tool allows us to detect ambiguous labels and mislabeled images. These can later be considered for exclusion in order to improve performance.

Examples of Mislabeling

On the right is the model's prediction, which correctly segmented all three people, while the Ground Truth includes just one person:
Mislabeled Ground Truth
Model's Prediction
Another example is a person labeled as car:
Mislabeled Ground Truth
Model's Prediction

Inaccurate Labeling

An example of a poor and inaccurate labeling of persons in the background and suitcases as car:
Inaccurate Labeling Example (click-to-zoom)
Suitcases Labeled as Car (click-to-zoom)

Ambiguous Labeling

In the examples below, persons in a magazine were labeled as persons and a toy was labeled as car.
Magazine as Person (click-to-zoom)
Toy as Car (click-to-zoom)

Challenging Images

Some samples in the dataset could be very challenging for the model. For example, on the left - a crowded image of people and on the right - a person in very low lighting conditions:
Crowd (click-to-zoom)
Low Light (click-to-zoom)


The Tensorleap platform provides powerful tools for analyzing and understanding deep learning models. In this example, we presented only a few examples of the types of insights that can be gained using the platform.
For more information, see our additional Examples and Guides.