Unlabeled Data
Last updated
Last updated
As a data scientist, one of the most important things you can do is label your data samples. This allows you to build models that are more accurate and can be applied to real-world data. However, with the vast amount of data out there, it can be tough to prioritize which samples to label.
Tensorleap constructs the model's most informative latent-space, which enables you to prioritize which samples to label in an efficient way, by utilizing the learnt features of the model.
The unlabeled_data_preprocessing_func
(custom name) is a preprocess function that is called just once before the reading the data, similar to the Preprocess Function. It prepares the data for later use in input encoders.
This function returns a single PreprocessResponse
object.
In order to prioritize unlabeled data, choose a sample within the Population Exploration analysis that correlates to a desired cluster, and request to fetch similar samples from the unlabeled data.
Once the Fetch Similar
process finished, a similarity map of the found samples will be presented. You can choose to set the color and size of the dots to to similarity
in order to indicate which were found to be the most similar to the target sample.