Dataset Parse Process
This describes the dataset parse process

Overview
Once an upload of a script integration to Tensorleap had initiated, an import model process starts.
This process:
Uploads the dataset code into the Tensorleap server
(optionally) if the "Build Dynamic dependencies" flag is on in the settings page - it creates a virtual environment for your data loading using the provided requirements.txt file.
Extract registered functions.
Parse and test the validity of the data loading:
Initializes a Preprocess function
Loads the first sample from the preprocess response, and retrieves its' input, ground truth & metadata values.
Common Run Issues:
a Dataset Parse can fail due to:
When the Build Dynamic dependencies flag on:
Missing requirements.txt
Invalid requirements.txt. Make sure you can use pip to install this environment from file locally, and remove any redundant/conflicting dependency.
A bug in the integration script data flow. It is highly suggested to run an integration test before pushing code to the platform to prevent this.
Missing files/folders & unable to read specific files:
Make sure the leap.yaml includes all of the files and folders required to parse your dataset. Using the CLI upload prints all the files that are included in the integration so they can be reviewed and the code viewer within the platform allows browsing through all included files.
Make sure the server has access to each file the integration script tries to load, read, or access. This included the dataset, configuration files, and any other assets. If using an on-prem installation the
leap server info
command would output the folders Tensorleap can access under the datasetvolumes attribute.
OOM for data-loaders that try to load memory-heavy objects. Increase the settings memory limits for the dataset parse job.
Last updated
Was this helpful?