Runs and Processes

Runs and processes allow a user to inspect processes, get logs, resume failed jobs and kill processes

The Runs & Processes window

The purpose of the runs & processes window is to have an in-depth view of the Tensoleap processes that were started by the user - their progress, status, and logs. This window also allows to extract logs from failing processes for debugging purposes, re-run failed evaluations, and kill processes.

Overview

The Runs and processes window presents all of the processes that was started in the platform. The table lists:

  • The Model Name that is used by process

  • The Model Run that is used by process

  • The Code Integration Name, branch, and version that is used by the process

  • The Type & status of the process

  • Creation time and total process duration.

The top of the runs & processes bar supports a filtering of specific types of processes, stopping all jobs, and deleting all logged processes from the window.

Each process could be expanded (by clicking the process Name), to show additional process attributes.

Process Types in Tensorleap

Filtering, killing, and removing process logs

Filtering a process

In order to only view some of the types you can open the filter icon at the top of the runs and process, and select the relevant process.

Killing a process

  • Clicking the top "Skull" next to the filter kills all processes (in case multiple processes are stuck)

  • For active processes, hovering over them would add another in-line "skull" icon. pressing it would kil the process

Clearing logs and previous processes

The right-most button removes all previous process logs from the platform.

Inspecting a process

Inspecting a process allows a user to monitor logs from the process (live) or download a .tar.gz of the most recent logs once a process is terminated.

How to inspect a process

To inspect a process - click the process within the Runs & processes and then click "Inspect Process"

Inspecting an evaluate process

The process inspection view

In the process inspection view we can download the logs (top right cloud icon downloads a .tar.gz file), or review them within the paltform.

The logs are divided into tab, each tab essentially describe the logs from a kubernetes pod that takes part in the work needed for this process to complete.

There are tabs per pod, named describe-POD-NAME and POD-NAME.

  • The desribe-POD-NAME pod is essentially the output of a kubectl describe pod command. It provides information on the status of the pod, if it encountered an OOM, and other high-level configurations (limits, etc.)

  • The POD-NAME contains the most recent logs from the pod.

An example of a process inspection view

Debugging process errors

Most Process error result in a meaningful notification that should point a specific issue that should be resolved. To further debug the issue, it is possible to examine the logs - and review any errors that might appear.

Last updated

Was this helpful?