# Dataset Parse Process

<figure><img src="https://3509361326-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F9UXeOlFqlw8pl79U2HGU%2Fuploads%2FtJB5VvmPi6fK11nfHduq%2Fimage.png?alt=media&#x26;token=d7665a70-034a-47c8-b6cf-e198fdae2ba5" alt=""><figcaption><p>Filtering for dataset parse processes within the platform</p></figcaption></figure>

### Overview

Once an upload of a script integration to Tensorleap had initiated, an import model process starts.

This process:

* Uploads the dataset code into the Tensorleap server
* (optionally) if the "Build Dynamic dependencies" flag is on in the [settings](https://docs.tensorleap.ai/user-interface/settings) page - it creates a virtual environment for your data loading using the provided requirements.txt file.

{% hint style="info" %}
The first time a virtual environment is created might take some time. A requirements.txt file would only be used to build a virtual environment once. This environment is cached and used in all future runs of the same requirements.txt file. If, for some reason, there is a need to re-run the virtual environment creation step - any change to the requirements.txt file would invalidate the cache and trigger a new environment
{% endhint %}

* Extract [registered](https://docs.tensorleap.ai/tensorleap-integration/python-api/code_loader/decorators) functions.
* Parse and test the validity of the data loading:
  * Initializes a [Preprocess function](https://docs.tensorleap.ai/tensorleap-integration/writing-integration-code/preprocess-function)
  * Loads the first sample from the [preprocess response](https://docs.tensorleap.ai/tensorleap-integration/python-api/code_loader/datasetclasses/preprocessresponse), and retrieves its' [input](https://docs.tensorleap.ai/tensorleap-integration/writing-integration-code/input-encoder), [ground truth](https://docs.tensorleap.ai/tensorleap-integration/writing-integration-code/ground-truth-encoder) & [metadata](https://docs.tensorleap.ai/tensorleap-integration/writing-integration-code/metadata-function) values.

### Common Run Issues:

{% hint style="danger" %}
a Dataset Parse can fail due to:

* When the Build Dynamic dependencies flag on:
  * Missing requirements.txt&#x20;
  * Invalid requirements.txt. Make sure you can use pip to install this environment from file locally, and remove any redundant/conflicting dependency.
* A bug in the [integration script](https://docs.tensorleap.ai/tensorleap-integration/writing-integration-code) data flow. It is highly suggested to run an [integration test](https://docs.tensorleap.ai/tensorleap-integration/integration-test) before pushing code to the platform to prevent this.
* Missing files/folders & unable to read specific files:
  * Make sure the [leap.yaml](https://docs.tensorleap.ai/tensorleap-integration/leap.yaml) includes all of the files and folders required to parse your dataset. Using the [CLI upload](https://docs.tensorleap.ai/tensorleap-integration/uploading-with-cli/cli-assets-upload#uploading-code-only) prints all the files that are included in the integration so they can be reviewed and the [code viewer](https://docs.tensorleap.ai/user-interface/network/code-integration/code-viewing#included-files) within the platform allows browsing through all included files.
  * Make sure the server has [access](https://docs.tensorleap.ai/getting-started/tensorleap-setup/installation#tensorleap-server) to each file the [integration script ](https://docs.tensorleap.ai/tensorleap-integration/writing-integration-code) tries to load, read, or access. This included the dataset, configuration files, and any other assets. If using an on-prem installation the `leap server info`  command would output the folders Tensorleap can access under the **datasetvolumes** attribute.
* OOM for data-loaders that try to load memory-heavy objects. Increase the [settings](https://docs.tensorleap.ai/user-interface/settings) memory limits for the dataset parse job.
  {% endhint %}
