Writing Integration Code

Integration Script - How to integrate your dataset and dataloaders with Tensorleap

The way to integrate a dataset into Tensorleap is by providing an Integration Script that instructs the platform on how to load, parse and visualize your data when evaluating and analyzing a model.

Integration Script

The Integration Script has the following structure:

Preprocess Function - prepares the data state for fetching into the neural network. Similar to a Pytorch Dataset.
Input Encoders - read and prepare each input for your neural networks. Similar to a __getitem__ that fetches the input.
Ground Truth Encoders - read and prepare each Ground Truth. Similar to a __getitem__ that fetches the ground truth.
Metadata Functions - add extra data to each sample for future analysis.
Visualizer Functions (optional) - An interface that instructs the Tensorleap platform on how to visualize your inputs and outputs.
Prediction - assign meaningful data to the prediction node(s).
Decorators - bind the functions above for Tensorleap to register the dataset structure.

Architecture

The Preprocess Function runs once, and returns a list of data objects of type PreprocessResponse that correspond to the training, validation, test, and unlabelled dataset slices.

The returned PreprocessResponse object is then passed to the Input Encoders, Ground Truth Encoders and Metadata Functions, whose function is to read and prepare the data for a single sample with index idx passed as an argument.

When processing the unlabelled set, the Ground Truth encoders and every metric or loss they're connected to is not called - since Ground Truth by definition is absent.

Finally, the Input Encoders, Ground Truth Encoders, must be declared using the Tensorleap decorators and add_prediction.

The Metadata Functions, Visualizer Functions, Metric Functions are not mandatory, but are recommended to ensure a complete analysis using the Platform

The script can be integrated into the Platform using the CLI.

Examples

Basic Usage

from typing import List, Union

# Tensorleap imports
from code_loader import leap_binder
from code_loader.contract.datasetclasses import PreprocessResponse
from code_loader.contract.visualizer_classes import LeapHorizontalBar
from code_loader.inner_leap_binder.leapbinder_decorators import *

# Preprocess Function:
@tensorleap_preprocess()
def preprocess_func() -> List[PreprocessResponse]:
...
    train = PreprocessResponse(length=len(train_X), data=train_df)
    val = PreprocessResponse(length=len(val_X), data=val_df)
    test = PreprocessResponse(length=len(test_X), data=test_df)

    return [train, val, test]

# Input Encoder(s):
@tensorleap_input_encoder('image')
def input_encoder(idx: int, preprocess: PreprocessResponse) -> np.ndarray:
    return preprocess.data.iloc[idx]['samples'].astype('float32')

# Ground Truth Encoder(s):
@tensorleap_gt_encoder('classes')
def gt_encoder(idx: int, preprocess: PreprocessResponse) -> np.ndarray:
    return preprocess.data.iloc[idx]['ground_truth'].astype('float32')

# Metadata Function(s):
@tensorleap_metadata(name='label')
def metadata_label(idx: int, preprocess: PreprocessResponse) -> Union[int, float, str, bool]:
    return preprocess.data.iloc[idx]['class_name']

@tensorleap_custom_visualizer(name="pet_visualizer", visualizer_type=LeapDataType.HorizontalBar)
def is_pet_visualizer(animal_prediction: np.ndarray) -> LeapHorizontalBar:
    np_labels = np.array(LABELS)
    pet_confidence = animal_prediction[np_labels == 'cat'][0] + animal_prediction[np_labels == 'dog'][0]
    body=np.array([pet_confidence, 1-pet_confidence])
    return LeapHorizontalBar(body=body, labels=['pet', 'not-pet')
    
# Dataset Binders:
LABELS = ['cat', 'dog', 'tiger', 'cow', 'goat', 'zebra', 'horse']
leap_binder.add_prediction(name='animal', labels=LABELS)

Retrieve the Secret

Tensorleap allows you to store sensitive information as a Secret in a secure location called Secret Manager.

The Integration Script has access to the Secret set for this Codebase. The secret is stored at the AUTH_SECRET environment variable, and can be accessed simply by using:

    import os
    auth_secret_string = os.environ['AUTH_SECRET']

Persistent Storage

Persistent storage is data storage that persists across different instances and reboots of job processes. In some cases there is a need to cache data. For example, after preprocessing or for very large files.

Cloud Platform

Tensorleap's cloud persistent storage can be accessed via writing and reading to the /nfs/ path:

    persistent_dir = '/nfs/'

NOTE: Mounted storage is set up only to serve as a cache, and it is regularly cleaned.

On Premise

In case you are running an on-premise solution, you can access your chosen mounted storage.

PreviousTensorleap Integration NextPreprocess Function

Last updated 28 days ago

Was this helpful?