# Writing Integration Code

The way to integrate a dataset into Tensorleap is by providing an [**Integration** **Script**](#integration-script) that instructs the platform on how to load, parse and visualize your data when **evaluating** and **analyzing** a model.

## Integration Script

The Integration Script has the following structure:

* [**Preprocess Function**](/tensorleap-integration/writing-integration-code/preprocess-function.md) - prepares the data state for fetching into the neural network. Similar to a Pytorch Dataset.
* [**Input Encoders**](/tensorleap-integration/writing-integration-code/input-encoder.md) - read and prepare each input for your neural networks. Similar to a \_\_getitem\_\_ that fetches the input.
* [**Ground Truth Encoders**](/tensorleap-integration/writing-integration-code/ground-truth-encoder.md) - read and prepare each Ground Truth. Similar to a \_\_getitem\_\_ that fetches the ground truth.
* [**Metadata Functions**](/tensorleap-integration/writing-integration-code/metadata-function.md) - add extra data to each sample for future analysis.
* [**Visualizer Functions**](/tensorleap-integration/writing-integration-code/visualizer-function.md) *(optional)* - An interface that instructs the Tensorleap platform on how to visualize your inputs and outputs.
* [**Decorators**](/tensorleap-integration/python-api/code_loader/decorators.md) - bind the functions above for Tensorleap to register the dataset structure.

### Architecture

The [**Preprocess Function**](/tensorleap-integration/writing-integration-code/preprocess-function.md) runs once, and returns a list of data objects of type **PreprocessResponse** that correspond to the *training*, *validation*, *test, and unlabelled* dataset slices.

The returned [**PreprocessResponse**](/tensorleap-integration/python-api/code_loader/datasetclasses/preprocessresponse.md) object is then passed to the [**Input Encoders**](/tensorleap-integration/writing-integration-code/input-encoder.md), [**Ground Truth Encoders**](/tensorleap-integration/writing-integration-code/ground-truth-encoder.md) and [**Metadata Functions**](/tensorleap-integration/writing-integration-code/metadata-function.md), whose function is to read and prepare the data for a **single** sample with index `idx` passed as an argument.

{% hint style="info" %}
When processing the unlabelled set, the Ground Truth encoders and every metric or loss they're connected to is not called - since Ground Truth by definition is absent.
{% endhint %}

Finally, the [**Input Encoders**](/tensorleap-integration/writing-integration-code/input-encoder.md), [**Ground Truth Encoders**](/tensorleap-integration/writing-integration-code/ground-truth-encoder.md)**,** must be declared using the [Tensorleap decorators](/tensorleap-integration/python-api/code_loader/decorators.md).

The [**Metadata Functions**](/tensorleap-integration/writing-integration-code/metadata-function.md)**,** [**Visualizer Functions**](/tensorleap-integration/writing-integration-code/visualizer-function.md)**,** [**Metric Functions**](/tensorleap-integration/writing-integration-code/custom-metrics.md) are not mandatory, but are recommended to ensure a complete analysis using the Platform

The script can be integrated into the Platform using the [CLI](/tensorleap-integration/uploading-with-cli/cli-assets-upload.md).

### Examples

#### Basic Usage

```python
from typing import List, Union

# Tensorleap imports
from code_loader.contract.datasetclasses import PreprocessResponse
from code_loader.contract.visualizer_classes import LeapHorizontalBar
from code_loader.inner_leap_binder.leapbinder_decorators import *
from code_loader.contract.enums import DataStateType

# Preprocess Function:
@tensorleap_preprocess()
def preprocess_func() -> List[PreprocessResponse]:
...
    train = PreprocessResponse(sample_ids=list(train_X.index), data=train_df, state=DataStateType.training)
    val = PreprocessResponse(sample_ids=list(val_X.index), data=val_df, state=DataStateType.validation)
    test = PreprocessResponse(sample_ids=list(test_X.index), data=test_df, state=DataStateType.test)

    return [train, val, test]

# Input Encoder(s):
@tensorleap_input_encoder('image')
def input_encoder(idx: int, preprocess: PreprocessResponse) -> np.ndarray:
    return preprocess.data.iloc[idx]['samples'].astype('float32')

# Ground Truth Encoder(s):
@tensorleap_gt_encoder('classes')
def gt_encoder(idx: int, preprocess: PreprocessResponse) -> np.ndarray:
    return preprocess.data.iloc[idx]['ground_truth'].astype('float32')

# Metadata Function(s):
@tensorleap_metadata(name='label')
def metadata_label(idx: int, preprocess: PreprocessResponse) -> Union[int, float, str, bool]:
    return preprocess.data.iloc[idx]['class_name']

LABELS = ['cat', 'dog', 'tiger', 'cow', 'goat', 'zebra', 'horse']

@tensorleap_custom_visualizer(name="pet_visualizer", visualizer_type=LeapDataType.HorizontalBar)
def is_pet_visualizer(animal_prediction: np.ndarray) -> LeapHorizontalBar:
    np_labels = np.array(LABELS)
    pet_confidence = animal_prediction[np_labels == 'cat'][0] + animal_prediction[np_labels == 'dog'][0]
    body = np.array([pet_confidence, 1-pet_confidence])
    return LeapHorizontalBar(body=body, labels=['pet', 'not-pet'])
```

### Retrieve the Secret

Tensorleap allows you to store sensitive information as a **Secret** in a secure location called [**Secret Manager**](/user-interface/secrets-management.md).

The **Integration Script** has access to the **Secret** [set for this Codebase](/tensorleap-integration/uploading-with-cli/cli-assets-upload.md#uploading-code-only). The secret is stored at the `AUTH_SECRET` environment variable, and can be accessed simply by using:

```python
    import os
    auth_secret_string = os.environ['AUTH_SECRET']
```

### Persistent Storage

Persistent storage is data storage that persists across different instances and reboots of job processes. In some cases there is a need to cache data. For example, after preprocessing or for very large files.

#### Cloud Platform

Tensorleap's cloud persistent storage can be accessed via writing and reading to the `/nfs/` path:

```python
    persistent_dir = '/nfs/'
```

{% hint style="warning" %}
NOTE: Mounted storage in the cloud is set up only to serve as a cache, and it is regularly cleaned.
{% endhint %}

#### On Premise

In case you are running an on-premise solution, you can access your chosen mounted storage.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tensorleap.ai/tensorleap-integration/writing-integration-code.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
