LogoLogo
  • Tensorleap
  • Examples
    • Semantic Segmentation
    • Image Analysis
    • Sentiment Analysis
    • MNIST Project Walkthrough
    • IMDB Project Walkthrough
  • Quickstart using CLI
  • Guides
    • Full Guides
      • MNIST Guide
        • Dataset Integration
        • Model Integration
        • Model Perception Analysis
        • Advanced Metrics
      • IMDB Guide
        • Dataset Integration
        • Model Integration
        • Model Perception Analysis
        • Advanced Metrics
    • Integration Script
      • Preprocess Function
      • Input Encoder
      • Ground Truth Encoder
      • Metadata Function
      • Visualizer Function
      • Prediction
      • Custom Metrics
      • Custom Loss Function
      • Custom Layers
      • Unlabeled Data
      • Examples
        • CelebA Object Detection (YoloV7)
        • Wikipedia Toxicity (using Tensorflow Datasets)
        • Confusion Matrix
        • CelebA Classification (using GCS)
  • Platform
    • Resources Management
    • Project
    • Dataset
    • Secret Manager
    • Network
      • Dataset Node
      • Layers
      • Loss and Optimizer
      • Visualizers
      • Import Model
      • Metrics
    • Evaluate / Train Model
    • Metrics Dashboard
    • Versions
    • Issues
    • Tests
    • Analysis
      • helpers
        • detection
          • YOLO
    • Team management
    • Insights
  • API
    • code_loader
      • leap_binder
        • add_custom_metric
        • set_preprocess
        • set_unlabeled_data_preprocess
        • set_input
        • set_ground_truth
        • set_metadata
        • add_prediction
        • add_custom_loss
        • set_visualizer
      • enums
        • DatasetMetadataType
        • LeapDataType
      • datasetclasses
        • PreprocessResponse
      • visualizer_classes
        • LeapImage
        • LeapImageWithBBox
        • LeapGraph
        • LeapText
        • LeapHorizontalBar
        • LeapImageMask
        • LeapTextMask
  • Tips & Tricks
    • Import External Code
  • Legal
    • Terms of Use
    • Privacy Policy
Powered by GitBook
On this page

Was this helpful?

  1. Guides
  2. Integration Script

Unlabeled Data

PreviousCustom LayersNextExamples

Last updated 2 years ago

Was this helpful?

As a data scientist, one of the most important things you can do is label your data samples. This allows you to build models that are more accurate and can be applied to real-world data. However, with the vast amount of data out there, it can be tough to prioritize which samples to label.

Tensorleap constructs the model's most informative latent-space, which enables you to prioritize which samples to label in an efficient way, by utilizing the learnt features of the model.

Integration Script

The unlabeled_data_preprocessing_func (custom name) is a preprocess function that is called just once before the reading the data, similar to the . It prepares the data for later use in input encoders.

from code_loader import leap_binder
from code_loader.contract.datasetclasses import PreprocessResponse

# Preprocessing Function
def unlabeled_preprocessing_func() -> PreprocessResponse:
...
    return PreprocessResponse(length=len(unlabeled_df), data=unlabeled_df)

leap_binder.set_unlabeled_data_preprocess(function=unlabeled_preprocessing_func)

This function returns a single object.

Fetch Similar

In order to prioritize unlabeled data, choose a sample within the analysis that correlates to a desired cluster, and request to fetch similar samples from the unlabeled data.

Once the Fetch Similar process finished, a similarity map of the found samples will be presented. You can choose to set the color and size of the dots to to similarity in order to indicate which were found to be the most similar to the target sample.

Preprocess Function
PreprocessResponse
Population Exploration
Fetch Similar from Unlabeled Data (click-to-zoom)
Target Sample (click-to-zoom)
Fetch Similar Results (click-to-zoom)