As a data scientist, one of the most important things you can do is label your data samples. This allows you to build models that are more accurate and can be applied to real-world data. However, with the vast amount of data out there, it can be tough to prioritize which samples to label.
Tensorleap constructs the model's most informative latent-space, which enables you to prioritize which samples to label in an efficient way, by utilizing the learnt features of the model.
unlabeled_data_preprocessing_func(custom name) is a preprocess function that is called just once before the reading the data, similar to the Preprocess Function. It prepares the data for later use in input encoders.
from code_loader import leap_binder
from code_loader.contract.datasetclasses import PreprocessingResponse
# Preprocessing Function
def unlabeled_preprocessing_func() -> PreprocessingResponse:
return PreprocessingResponse(length=len(unlabeled_df), data=unlabeled_df)
In order to prioritize unlabeled data, choose a sample within the Population Exploration analysis that correlates to a desired cluster, and request to fetch similar samples from the unlabeled data.
Fetch Similar from Unlabeled Data (click-to-zoom)
Fetch Similarprocess finished, a similarity map of the found samples will be presented. You can choose to set the color and size of the dots to to
similarityin order to indicate which were found to be the most similar to the target sample.
Target Sample (click-to-zoom)
Fetch Similar Results (click-to-zoom)