Search…
Integration Script
Add a dataset instance prior to building a model
Within the Tensorleap platform, a Dataset contains the Integration Script and properties for reading and encoding the data that will later be used when training, evaluating and analyzing a model.
Additionally, store sensitive information and credentials securely with Secret Manager.

Integration Script

In order for the Tensorleap platform to read and encode the data, it must be supplied with an Integration Script. The DatasetScript is stored within the Dataset Instance, at which the Dataset Block points and used as input/ground truth encoders.
The Dataset Script has the following structure:

Architecture

The Preprocess Function runs once, and returns a list of data objects of type PreprocessResponse that correspond to the training, validation, test dataset slices.
The returned PreprocessResponse object is then passed to the Input Encoders, Ground Truth Encoders and Metadata Functions, whose function is to read and prepare the data for a single sample with index idx passed as an argument.
Finally, the Input Encoders, Ground Truth Encoders and Metadata Functions, must be bounded to the Tensorleap platform. This is done by the leap_binder object with set_preprocess, set_input, set_ground_truth, set_metadata and add_prediction_type.
The script can be integrated into the Dataset using either the UI or CLI.
In addition, there is an option add custom tensor Decoder Functions for more informative visualizations and analysis. The leap_binder then sets these functions by using set_decoder.

Examples

Basic Usage

1
from typing import List, Union
2
3
# Tensorleap imports
4
from code_loader import leap_binder
5
from code_loader.contract.datasetclasses import PreprocessResponse
6
from code_loader.contract.enums import DatasetMetadataType, Metric
7
from code_loader.contract.decoder_classes import LeapHorizontalBar
8
9
# Preprocess Function:
10
def preprocess_func() -> List[PreprocessResponse]:
11
...
12
train = PreprocessResponse(length=len(train_X), data=train_df)
13
val = PreprocessResponse(length=len(val_X), data=val_df)
14
test = PreprocessResponse(length=len(test_X), data=test_df)
15
16
return [train, val, test]
17
18
# Input Encoder(s):
19
def input_encoder(idx: int, preprocess: PreprocessResponse) -> np.ndarray:
20
return preprocess.data.iloc[idx]['samples'].astype('float32')
21
22
# Ground Truth Encoder(s):
23
def gt_encoder(idx: int, preprocess: Union[PreprocessResponse, list]) -> np.ndarray:
24
return preprocess.data.iloc[idx]['ground_truth'].astype('float32')
25
26
# Metadata Function(s):
27
def metadata_label(idx: int, preprocess: Union[PreprocessResponse, list]) -> Union[int, float, str, bool]:
28
return preprocess.data.iloc[idx]['class_name']
29
30
# Dataset Binders:
31
LABELS = ['cat', 'dog', 'tiger', 'cow', 'goat', 'zebra', 'horse']
32
leap_binder.set_preprocess(function=preprocess_func)
33
leap_binder.set_input(function=input_encoder,input_name='image')
34
leap_binder.set_ground_truth(function=gt_encoder, gt_name='classes',)
35
leap_binder.set_metadata(function=metadata_label, metadata_type=DatasetMetadataType.string, name='label')
36
leap_binder.add_prediction_type(name='animal', labels=LABELS, metrics=[Metric.Accuracy])
37
38
# Decoders
39
def is_pet_decoder(animal_prediction: np.ndarray) -> LeapHorizontalBar:
40
np_labels = np.array(LABELS)
41
pet_confidence = animal_prediction[np_labels == 'cat'][0] + animal_prediction[np_labels == 'dog'][0]
42
body=np.array([pet_confidence, 1-pet_confidence])
43
return LeapHorizontalBar(body=body, labels=['pet', 'not-pet')
44
45
leap_binder.set_decoder(
46
name='is_pet',
47
function=is_pet_decoder,
48
decoder_type=LeapHorizontalBar.type
49
)
Copied!

Full Examples

Full examples can be found at the Dataset Integration section of the following guides:

Retrieve the Secret

Tensorleap allows you to store sensitive information as a Secret in a secure location called Secret Manager.
The Dataset Script has access to the Secret set for the Dataset Instance. The secret is stored at the AUTH_SECRET environment variable, and can be accessed simply by using:
1
import os
2
auth_secret_string = os.environ['AUTH_SECRET']
Copied!

Persistent Storage

Persistent storage is data storage that persists across different instances and reboots of job processes. In some cases there is a need to cache data. For example, after preprocessing or for very large files.

Cloud Platform

Tensorleap's cloud persistent storage can be accessed via writing and reading to the /nfs/ path:
1
persistent_dir = '/nfs/'
Copied!
NOTE: Mounted storage is set up only to serve as a cache, and it is regularly cleaned.

On Premise

In case you are running an on-premise solution, you can access your chosen mounted storage.