Evaluate Unlabelled Data

This describes how to evaluate the performance of your model on unlabelled data

Why Unlabeled Evaluation Matters

Many teams work with large unlabeled datasets — either as part of a data pipeline or when exploring new domains. But without labels, it's hard to assess whether that data contains edge cases, domain shifts, or problematic samples.

Tensorleap allows you to estimate model behavior on unlabeled data by computing Predicted Metrics — model-driven approximations of performance metrics like loss, IOU, confidence, and more.

This enables:

  • Detecting potential errors or low-quality samples

  • Surfacing outliers and high-loss examples

  • Auditing incoming datasets before investing in labeling

How Tensorleap Helps Evaluate Unlabeled Data

Once a model has been evaluated, Tensorleap computes predicted metrics on any connected unlabeled set. These estimates are derived from the model's internal signals — giving you visibility into likely performance without requiring ground truth.

You can:

  • Sort and filter samples based on predicted loss, IOU, or confidence

  • Visualize distributions to uncover outliers and edge cases

  • Flag suspicious or low-confidence samples for further review or labeling

Predicting the loss for unlabeled samples for a NER dataset (yellow is higher)

Evaluating unlabelled data Walkthrough

Coming Soon

Last updated

Was this helpful?