Dataset Curation

Learn how to get labeling recommendations using Tensorleap's Dataset Curation functionality.

After evaluation has completed, TensorLeap’s active learning workflow can be used to prioritize unlabeled data for labeling. This operation analyzes model representations and dataset structure to identify a diverse and informative subset of samples that expand coverage of the problem space and reduce redundancy. The selected data can then be reviewed and labeled to drive targeted, performance-driven model improvement.

In the Dataset Curation window, you can initiate a new curation process and review suggested recommendations for any model that has been uploaded and evaluated.

To run the active learning pipeline:

  • Click the button at the top to open the Dataset Curation dialog.

  • Select the model to analyze

  • Set the number of samples to recommend, or let TensorLeap automatically determine the optimal number.

  • (Optional) Apply dataset filters based on metadata or calculated metrics.

  • Click

Exploring Labeling Recommendations

Once the analysis completes, the recommendations appear in the Previous Recommendations section. These recommendations can be downloaded as a .csv file - or applied directly as a filter to the Population Exploration dashlet .

After applying the filter to the Population Exploration view, the selected scenes for labeling are displayed alongside labeled data and unselected samples from the unlabeled dataset. This view allows users to review and compare the selected data against both the existing training data distribution and the remaining unlabeled data.

Using the Color Filter dropdown in the Population Exploration view, points can be colored by labeling state or labeling importance, making it easy to distinguish between selected and unselected scenes:

  • Labelling state will show the categorical split into a labelled set, samples that were not chosen for labelling, and samples that were chosen for labelling.

  • Labelling importance will show an importance score of all samples chosen for labelling, ranging from most important (1) to least important (0).

The animation below illustrates the full active learning workflow in TensorLeap, showing how dataset curation results are reviewed, explored, and used to guide efficient labeling decisions.

Last updated

Was this helpful?