Dataset Integration
This section covers the integration of the
mnist
dataset into Tensorleap. We'll later use this dataset with a classification model.Below is the full dataset script to be used in the integration. More information about the structure of this script can be found under Dataset Script.
from typing import List
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Tensorleap imports
from code_loader import leap_binder
from code_loader.contract.datasetclasses import PreprocessResponse
from code_loader.contract.enums import Metric, DatasetMetadataType
# Preprocess Function
def preprocess_func() -> List[PreprocessResponse]:
(data_X, data_Y), (test_X, test_Y) = mnist.load_data()
data_X = np.expand_dims(data_X, axis=-1) # Reshape :,28,28 -> :,28,28,1
data_X = data_X / 255 # Normalize to [0,1]
data_Y = to_categorical(data_Y) # Hot Vector
test_X = np.expand_dims(test_X, axis=-1) # Reshape :,28,28 -> :,28,28,1
test_X = test_X / 255 # Normalize to [0,1]
test_Y = to_categorical(test_Y) # Hot Vector
train_X, val_X, train_Y, val_Y = train_test_split(data_X, data_Y, test_size=0.2, random_state=42)
# Generate a PreprocessResponse for each data slice, to later be read by the encoders.
# The length of each data slice is provided, along with the data dictionary.
# In this example we pass `images` and `labels` that later are encoded into the inputs and outputs
train = PreprocessResponse(length=len(train_X), data={'images': train_X, 'labels': train_Y})
val = PreprocessResponse(length=len(val_X), data={'images': val_X, 'labels': val_Y})
test = PreprocessResponse(length=len(test_X), data={'images': test_X, 'labels': test_Y})
response = [train, val, test]
return response
# Input encoder fetches the image with the index `idx` from the `images` array set in
# the PreprocessResponse data. Returns a numpy array containing the sample's image.
def input_encoder(idx: int, preprocess: PreprocessResponse) -> np.ndarray:
return preprocess.data['images'][idx].astype('float32')
# Ground truth encoder fetches the label with the index `idx` from the `labels` array set in
# the PreprocessResponse's data. Returns a numpy array containing a hot vector label correlated with the sample.
def gt_encoder(idx: int, preprocess: PreprocessResponse) -> np.ndarray:
return preprocess.data['labels'][idx].astype('float32')
# Metadata functions allow to add extra data for a later use in analysis.
# This metadata adds the int digit of each sample (not a hot vector).
def metadata_label(idx: int, preprocess: PreprocessResponse) -> int:
one_hot_digit = gt_encoder(idx, preprocess)
digit = one_hot_digit.argmax()
digit_int = int(digit)
return digit_int
LABELS = ['0','1','2','3','4','5','6','7','8','9']
# Dataset binding functions to bind the functions above to the `Dataset Instance`.
leap_binder.set_preprocess(function=preprocess_func)
leap_binder.set_input(function=input_encoder, name='image')
leap_binder.set_ground_truth(function=gt_encoder, name='classes')
leap_binder.set_metadata(function=metadata_label, metadata_type=DatasetMetadataType.int, name='label')
leap_binder.add_prediction(name='classes', labels=LABELS)
UI
CLI
To add a new Dataset Instance:
- 1.
- 2.In the Dataset Editor, enter these properties:
- Dataset Name:
mnist
- 3.Click Save.

Add a New Dataset Instance
After saving the
mnist
dataset, the platform will automatically parse the dataset script. This process evaluates the script and ensures that all its functions, including the ability to successfully read the data, are working as expected.Upon successful parsing, the details of the MNIST dataset will be displayed on the right. In case of unsuccessful parsing, errors will be shown instead.
- 1.Create a folder for our
mnist
project.mkdir mnistcd mnist - 2.Initialize and synchronize the created folder with the Tensorleap platform by running a command that will set up the
.tensorleap
folder within the project folder. The commandleap init (PROJECT) (DATASET) (--h5/--onnx)
with the following parameters:- PROJECT = MNIST (project name)
- DATASET = mnist (dataset name)
- (--h5/--onnx) = model format,
--h5
for Tensorflow (H5) and--onnx
for PyTorch (ONNX)
leap init MNIST mnist --h5 - 3.Next, we need to set your credentials to
leap
CLI by running the following command:leap login [API_ID] [API_KEY] [ORIGIN]
The
button within the Resources Management view.
API_ID
, API_KEY
and the ORIGIN
, along with the full command, can easily be found by clicking the 
When using the CLI, the Dataset Script is defined within the
.tensorleap/dataset.py
file, and the Dataset Instance is created/updated upon performing leap push
.- 1.By default, the
.tensorleap/dataset.py
file has a sample template. Let's replace it with our Dataset Script above. One way to do it is withvim
:rm .tensorleap/dataset.pycat > .tensorleap/dataset.py<< paste the dataset script above + CTRL-D >> - 2.Let's test our dataset script using
leap check
:leap check --dataset - 3.Next, we'll push our dataset to the Tensorleap platform using the following command:leap push --datasetIt should print out:
New dataset detected. Dataset name: mnist Push command successfully complete
Congrats! You have successfuly created the
mnist
Dataset Instance and integrated the Dataset Script. You can view it in the UI in the Resources Management view.The purpose of this section was to help you define a dataset script and create a dataset instance in Tensorleap.
Now that the
mnist
dataset has been integrated into Tensorleap, we can use it with a classification model. That's what we'll do in the next section, where we'll build a classification model.Last modified 11d ago