LogoLogo
  • Tensorleap
  • Examples
    • Semantic Segmentation
    • Image Analysis
    • Sentiment Analysis
    • MNIST Project Walkthrough
    • IMDB Project Walkthrough
  • Quickstart using CLI
  • Guides
    • Full Guides
      • MNIST Guide
        • Dataset Integration
        • Model Integration
        • Model Perception Analysis
        • Advanced Metrics
      • IMDB Guide
        • Dataset Integration
        • Model Integration
        • Model Perception Analysis
        • Advanced Metrics
    • Integration Script
      • Preprocess Function
      • Input Encoder
      • Ground Truth Encoder
      • Metadata Function
      • Visualizer Function
      • Prediction
      • Custom Metrics
      • Custom Loss Function
      • Custom Layers
      • Unlabeled Data
      • Examples
        • CelebA Object Detection (YoloV7)
        • Wikipedia Toxicity (using Tensorflow Datasets)
        • Confusion Matrix
        • CelebA Classification (using GCS)
  • Platform
    • Resources Management
    • Project
    • Dataset
    • Secret Manager
    • Network
      • Dataset Node
      • Layers
      • Loss and Optimizer
      • Visualizers
      • Import Model
      • Metrics
    • Evaluate / Train Model
    • Metrics Dashboard
    • Versions
    • Issues
    • Tests
    • Analysis
      • helpers
        • detection
          • YOLO
    • Team management
    • Insights
  • API
    • code_loader
      • leap_binder
        • add_custom_metric
        • set_preprocess
        • set_unlabeled_data_preprocess
        • set_input
        • set_ground_truth
        • set_metadata
        • add_prediction
        • add_custom_loss
        • set_visualizer
      • enums
        • DatasetMetadataType
        • LeapDataType
      • datasetclasses
        • PreprocessResponse
      • visualizer_classes
        • LeapImage
        • LeapImageWithBBox
        • LeapGraph
        • LeapText
        • LeapHorizontalBar
        • LeapImageMask
        • LeapTextMask
  • Tips & Tricks
    • Import External Code
  • Legal
    • Terms of Use
    • Privacy Policy
Powered by GitBook
On this page
  • Integration Script
  • Architecture
  • Examples
  • Retrieve the Secret
  • Persistent Storage

Was this helpful?

  1. Guides

Integration Script

Add a dataset instance prior to building a model

PreviousAdvanced MetricsNextPreprocess Function

Last updated 2 years ago

Was this helpful?

Within the Tensorleap platform, a contains the and properties for reading and encoding the data that will later be used when training, evaluating and analyzing a model.

Additionally, store sensitive information and credentials securely with .

Integration Script

In order for the Tensorleap platform to read and encode the data, it must be supplied with an Integration Script. The DatasetScript is stored within the Dataset Instance, at which the Dataset Block points and used as input/ground truth encoders.

The Dataset Script has the following structure:

  • - prepares the data state for fetching into the neural network.

  • - read and prepare each input for your neural networks.

  • - read and prepare each output.

  • - add extra data to each sample for future analysis.

  • (optional) - custom interpretation of tensors for analysis and visualizations.

  • - assign meaningful data to the prediction node(s).

  • - bind the functions above for Tensorleap to register the dataset structure.

Architecture

The runs once, and returns a list of data objects of type PreprocessResponse that correspond to the training, validation, test dataset slices.

The returned object is then passed to the , and , whose function is to read and prepare the data for a single sample with index idx passed as an argument.

Finally, the , and , must be bounded to the Tensorleap platform. This is done by the object with , , , and .

Examples

Basic Usage

from typing import List, Union

# Tensorleap imports
from code_loader import leap_binder
from code_loader.contract.datasetclasses import PreprocessResponse
from code_loader.contract.enums import DatasetMetadataType, Metric
from code_loader.contract.visualizer_classes import LeapHorizontalBar

# Preprocess Function:
def preprocess_func() -> List[PreprocessResponse]:
...
    train = PreprocessResponse(length=len(train_X), data=train_df)
    val = PreprocessResponse(length=len(val_X), data=val_df)
    test = PreprocessResponse(length=len(test_X), data=test_df)

    return [train, val, test]

# Input Encoder(s):
def input_encoder(idx: int, preprocess: PreprocessResponse) -> np.ndarray:
    return preprocess.data.iloc[idx]['samples'].astype('float32')

# Ground Truth Encoder(s):
def gt_encoder(idx: int, preprocess: Union[PreprocessResponse, list]) -> np.ndarray:
    return preprocess.data.iloc[idx]['ground_truth'].astype('float32')

# Metadata Function(s):
def metadata_label(idx: int, preprocess: Union[PreprocessResponse, list]) -> Union[int, float, str, bool]:
    return preprocess.data.iloc[idx]['class_name']

# Dataset Binders:
LABELS = ['cat', 'dog', 'tiger', 'cow', 'goat', 'zebra', 'horse']
leap_binder.set_preprocess(function=preprocess_func)
leap_binder.set_input(function=input_encoder,input_name='image')
leap_binder.set_ground_truth(function=gt_encoder, gt_name='classes',)
leap_binder.set_metadata(function=metadata_label, metadata_type=DatasetMetadataType.string, name='label')
leap_binder.add_prediction(name='animal', labels=LABELS, metrics=[Metric.Accuracy])

# Visualizers
def is_pet_visualizer(animal_prediction: np.ndarray) -> LeapHorizontalBar:
    np_labels = np.array(LABELS)
    pet_confidence = animal_prediction[np_labels == 'cat'][0] + animal_prediction[np_labels == 'dog'][0]
    body=np.array([pet_confidence, 1-pet_confidence])
    return LeapHorizontalBar(body=body, labels=['pet', 'not-pet')
    
leap_binder.set_visualizer(
    name='is_pet',
    function=is_pet_visualizer,
    visualizer_type=LeapHorizontalBar.type
)

Full Examples

Full examples can be found at the Dataset Integration section of the following guides:

Retrieve the Secret

    import os
    auth_secret_string = os.environ['AUTH_SECRET']

Persistent Storage

Persistent storage is data storage that persists across different instances and reboots of job processes. In some cases there is a need to cache data. For example, after preprocessing or for very large files.

Cloud Platform

Tensorleap's cloud persistent storage can be accessed via writing and reading to the /nfs/ path:

    persistent_dir = '/nfs/'

NOTE: Mounted storage is set up only to serve as a cache, and it is regularly cleaned.

On Premise

In case you are running an on-premise solution, you can access your chosen mounted storage.

The script can be integrated into the using either the UI or CLI.

In addition, there is an option add custom tensor for more informative visualizations and analysis. The then sets these functions by using .

Tensorleap allows you to store sensitive information as a Secret in a secure location called .

The Dataset Script has access to the Secret set for the . The secret is stored at the AUTH_SECRET environment variable, and can be accessed simply by using:

Visualizer Functions
leap_binder
set_visualizer
MNIST Guide
IMDB Guide
Secret Manager
Dataset
Secret Manager
Preprocess Function
Input Encoders
Ground Truth Encoders
Metadata Functions
Visualizer Functions
Prediction
Binding Functions
Preprocess Function
PreprocessResponse
Input Encoders
Ground Truth Encoders
Metadata Functions
Input Encoders
Ground Truth Encoders
Metadata Functions
leap_binder
set_preprocess
set_input
set_ground_truth
set_metadata
add_prediction
Integration Script
Dataset
Dataset Instance