# IMDB Project Walkthrough

The IMDB project, included in the [**Free Trial**](broken://pages/reHfp80krQ2R6gWEaNyr)**,** uses the [**IMDB Dataset**](http://ai.stanford.edu/~amaas/data/sentiment/) (Large Movie Reviews) with a simple sentiment classification model.

The IMDB dataset contains 50K movie reviews for natural language processing and is used for binary sentiment classification. 25,000 highly polar movie reviews are provided for training and 25,000 for testing.

In the following steps, you will train the model, see analytics, and perform basic analyses. For a more in-depth guide to the IMDB use-case, see the full [**IMDB Guide**](/guides/full-guides/imdb-guide.md).

To open the project, in the Welcome screen, go to Projects and click `IMDB`:

![(click-to-zoom)](/files/ILlYu2bMnlMxgbYNLCWl)

The [**Network**](/user-interface/project/network.md) tab displays the model's nodes and connections in a simple convolutional neural network (CNN) model:

![(click-to-zoom)](/files/eurCnZActGN1UfM1try4)

You can **zoom** in and out using the scroll wheel, and **pan** by dragging the background. To see the details of a node, click on it.&#x20;

The orange node on the left represents the IMDB dataset (the script can be viewed in [**Resources Management**](/user-interface/resources-management.md)**)**.&#x20;

The light blue nodes seen in the center of the model represent the model's layers, and the colored nodes at the end of the model represent the [**Loss and Optimizer**](/user-interface/project/network/network-mapping/create-a-mapping-deprecated/loss-node.md).

The dark blue nodes represent Tensorleap [**Visualizer**](/user-interface/project/network/network-mapping/create-a-mapping-deprecated/visualizer-node.md) nodes. These nodes extract visualizations from different outputs.

## Training

The pre-saved projects provided are not yet trained. They **must** be trained before presenting analytics and analyses.

To train the model for 1 epoch click, on the top bar click <img src="/files/Nn62uo2iqbsf0ktto6If" alt="" data-size="line">. The Training Model will appear, at the bottom, click <img src="/files/dAwEfcHEyqMU3AsPHnIx" alt="" data-size="line">:

![Training (click-to-zoom)](/files/x2aeBP68k8QyrKM5DjPE)

Once training has initiated, a `PENDING` notification will appear indicating that the training process is initializing. This could take a minute or so. Once the training begins you will see a `STARTED` notification.

Training will take somewhere between 20 minutes to 3 hours, depending on your machine. To track the Training status, click <img src="/files/KRNhyA1vp5Sspp4PlJV9" alt="" data-size="line">.

## Metrics and Analytics

To display the model's analytics on the dashboard, on the top left of the dashboard, click <img src="/files/DIDpjjdNuVwZa9XqdZfC" alt="" data-size="line"> to open the Versions view. Expand the version and **make sure** that the current model is selected:

![Dashboard (click-to-zoom)](/files/NQDsaT7DkP5yP3wJ4dgb)

This [**Dashboard**](/user-interface/project/dashboards/dashlets/metrics-dashboard.md) includes the following [**Dashlets**](/user-interface/project/dashboards/dashlets/metrics-dashboard.md#dashlet):

* Top left - Loss (error) vs Batch. See how the loss is reduced as training increases&#x20;
* Top right - Samples ordered according to loss from high to low&#x20;
* Bottom left - User Score vs Loss - good performance (low average loss) on the edges of the 1 to 10 scale and poor performance in the middle of the scale (high average loss)
* Bottom right - Accuracy vs Batch. See how the accuracy of the model with training

## Population Exploration

Tensorleap's technology creates a latent space that is relatively close to the entire model's latent space. This latent space is composed of feature activations from **all** the model's layers\~\~,\~\~ that distribute the data in the most informative way.

This allows the platform to create a similarity map between samples as they are interpreted by the model. A more intuitive explanation would be that similar samples would activate similar learned features within the model.&#x20;

This similarity map is called a [**Population Exploration**](/user-interface/project/dashboards/dashlets/sample-analysis.md#population-exploration) analysis, and it is performed automatically after each epoch.

Below there is a short clip that illustrates the following steps:

* After the **training is completed**, click <img src="/files/1z4Gb30eHCsuE2F46EbU" alt="" data-size="line"> at the the top left, to see the population exploration analysis.
* Resize the Population Exploration analysis panel.
* Color the dots by their ground-truth label by clicking <img src="/files/jy1gesuvXd2wv7BuSo6R" alt="" data-size="line">, where the `loss` is currently selected, and change it to <img src="/files/rLAZU9BNu8yg6Tw1L5gB" alt="" data-size="line">.

![Population Exploration (click-to-zoom)](/files/T6y50zvBBcqXc3rtNzHT)

The similarity map shows various clusters corresponding to each given label. Within these clusters, there are samples that are misclassified by the model. These samples have different labels (thus different colors) and high loss.&#x20;

Hovering over the dots shows a preview of the sample:

![Population Exploration Samples Preview (click-to-zoom)](/files/6oJAKHVdrcq2qRDdXRfZ)

{% hint style="info" %}
The **\[OOV]** seen in the samples means **Out of Vector**. It indicates that the word was not represented in the **word tokens**, and thus ignored by the model..
{% endhint %}

In the center of each cluster we can see samples with **high loss**, indicated by the **large dots**:

![](/files/xjBC4MpQiCJYlhdRFYgK)

These samples are good candidates for mis-labeling. Previewing the samples, and clicking them reveals that they were indeed valid candidates:

{% tabs %}
{% tab title="First Candidate" %}
![(click-to-zoom)](/files/jmk8WNpIRcPOakzizj0I)

The sample above shows a very bad review, ending with ***"don't see this film..."***, but its ground truth and score label it as a good film. It is indeed a **mis-label**.
{% endtab %}

{% tab title="Second Candidate" %}
![(click-to-zoom)](/files/XTvyk629Mg8soCaDW9ld)

The samples says a lot of good things about Brad Pitt and about the director, even though it is labeled and scored as a bad film.
{% endtab %}

{% tab title="Third Candidate" %}
![(click-to-zoom)](/files/GfJ8xdIWhTGbZTrM4n93)

In this review, it is indicated that the script is bad. Even though it is highly scored. This is actually not a "real" mis-labeling case, as the original review had additional paragraphs indicating the positive things about the movie, but was truncated due to the input length.

It exposed a problem with long reviews that are not well represented in the current model's words limit.
{% endtab %}
{% endtabs %}

Click each sample to show its preview, metadata, and metrics. Click <img src="/files/jYd8FiJyj20JfxF5Si89" alt="" data-size="line">to analyze it.

![(click-to-zoom)](/files/id5eD7TBXDY4RmAgmj8J)

## Sample Root Cause Analysis

The Sample Analysis tool runs explainability algorithms on selected samples and displays the visualizations correlated with the [**Visualizer**](/user-interface/project/network/network-mapping/create-a-mapping-deprecated/visualizer-node.md) blocks.

To analyze a sample, select it and click <img src="/files/jYd8FiJyj20JfxF5Si89" alt="" data-size="line"> on the right panel. This sends the sample to be analyzed by the platform, and once finished, the results are displayed in the **Analyzer** panel.

These are the results of the sample mentioned in the image above. This sample represents a good review, but was predicted as a bad one (shown by the horizontal bars). In addition, we can ask what tokens affected the `positive` prediction to go `up`, and what affected the `negative` prediction to go `up`. Below we can see the sample analysis, where the words ***"this was an excellent film"*** were marked for positive and ***"worst movie ever"*** was marked as negative:

![Sample Analysis (click-to-zoom)](/files/mgYQ0kJ9YzuSobUqqLu6)

## Summary

Congratulations on completing this short IMDB walkthrough for the Standalone Trial.

Next, you can follow the **full guide** at [**IMDB Guide**](/guides/full-guides/imdb-guide.md),  which takes you through dataset integration, model building and importing, as well as reviewing and analyzing additional metrics.

You can also check our [**Full Guides**](/guides.md) for additional and more challenging use cases.

To learn about **integrating** **your custom data** into the platform, have look at the [**Integration Script**](/tensorleap-integration/writing-integration-code.md).

For more information about the Standalone Trial, see [**Quickstart Standalone Trial**](broken://pages/reHfp80krQ2R6gWEaNyr).&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tensorleap.ai/examples/imdb-project-walkthrough.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
