# Dataset Parse Process

### Overview

Once an upload of a script integration to Tensorleap had initiated, an import model process starts.

This process:

* Uploads the dataset code into the Tensorleap server
* (optionally) if the "Build Dynamic dependencies" flag is on in the [settings](/user-interface/settings.md) page - it creates a virtual environment for your data loading using the provided requirements.txt file.

{% hint style="info" %}
The first time a virtual environment is created might take some time. A requirements.txt file would only be used to build a virtual environment once. This environment is cached and used in all future runs of the same requirements.txt file. If, for some reason, there is a need to re-run the virtual environment creation step - any change to the requirements.txt file would invalidate the cache and trigger a new environment
{% endhint %}

* Extract [registered](/tensorleap-integration/python-api/code_loader/decorators.md) functions.
* Parse and test the validity of the data loading:
  * Initializes a [Preprocess function](/tensorleap-integration/writing-integration-code/preprocess-function.md)
  * Loads the first sample from the [preprocess response](/tensorleap-integration/python-api/code_loader/datasetclasses/preprocessresponse.md), and retrieves its' [input](/tensorleap-integration/writing-integration-code/input-encoder.md), [ground truth](/tensorleap-integration/writing-integration-code/ground-truth-encoder.md) & [metadata](/tensorleap-integration/writing-integration-code/metadata-function.md) values.

### Common Run Issues:

{% hint style="danger" %}
a Dataset Parse can fail due to:

* When the Build Dynamic dependencies flag on:
  * Missing requirements.txt&#x20;
  * Invalid requirements.txt. Make sure you can use pip to install this environment from file locally, and remove any redundant/conflicting dependency.
* A bug in the [integration script](/tensorleap-integration/writing-integration-code.md) data flow. It is highly suggested to run an [integration test](/tensorleap-integration/integration-test.md) before pushing code to the platform to prevent this.
* Missing files/folders & unable to read specific files:
  * Make sure the [leap.yaml](/tensorleap-integration/leap.yaml.md) includes all of the files and folders required to parse your dataset. Using the [CLI upload](/tensorleap-integration/uploading-with-cli/cli-assets-upload.md#uploading-code-only) prints all the files that are included in the integration so they can be reviewed and the [code viewer](/user-interface/project/network/code-integration/code-viewing.md#included-files) within the platform allows browsing through all included files.
  * Make sure the server has [access](/getting-started/tensorleap-setup/installation.md#tensorleap-server) to each file the [integration script ](/tensorleap-integration/writing-integration-code.md) tries to load, read, or access. This included the dataset, configuration files, and any other assets. If using an on-prem installation the `leap server info`  command would output the folders Tensorleap can access under the **datasetvolumes** attribute.
* OOM for data-loaders that try to load memory-heavy objects. Increase the [settings](/user-interface/settings.md) memory limits for the dataset parse job.
  {% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tensorleap.ai/user-interface/project/menu-bar/runs-and-processes/process-types/dataset-parse-process.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
