# Agressors detection

### **What Are Aggressors and Why They Matter**

In deep learning workflows, understanding *why* a model fails is often harder than spotting *where* it fails. While metrics like accuracy or loss may flag issues, they rarely reveal their underlying causes. This is where **aggressors** come in.

#### **What Is a Model Aggressor?**

An aggressor is a **semantic subgroup** of data where your model consistently underperforms. These patterns—often called *error slices*—might relate to lighting conditions, phrasing, or edge-case user behaviors. They’re rarely visible through aggregate metrics but can lead to persistent failure modes, especially in production.

Aggressors are important because they:

* Reveal systematic model weaknesses
* Often go unnoticed during standard evaluation
* Tend to reappear in real-world deployment
* Are difficult to catch without deeper analysis

#### **Why They’re Worth Addressing**

Ignoring aggressors can lead to:

* Passing benchmarks but failing on edge cases
* Long debug cycles with little progress
* Production failures and wasted resources

Treating them early can:

* Speed up model development
* Improve generalization
* Reduce the cost of failure

### **The Aggressor Lifecycle**

Most teams—explicitly or not—follow the same general process when addressing model failures. This lifecycle includes:

1. **Indicating** a problematic behavior and measuring its severity
2. **Forming** a root cause hypothesis
3. **Validating** that hypothesis
4. **Solving** the issue through model or data changes
5. **Tracking** the outcome over time

#### **One Lifecycle, Two Ways to Execute It**

This process is often done manually, but it’s slow and hard to scale. Tensorleap supports the same lifecycle, but makes each step faster, more structured, and easier to explain.

The next sections walk through each stage — comparing traditional approaches with how Tensorleap streamlines them. It ends with a complete example that showcase an example of end-to-end tackling of an aggressor in the platform.

<details>

<summary>Indicating a Model Aggressor</summary>

#### **How Aggressors Are Typically Identified**

Teams often notice problems late — through metric drops, user feedback, or failed test cases. Identifying the underlying data patterns usually requires manual inspection and domain intuition. Even when an issue is spotted, it's hard to estimate how severe it is or whether it's worth prioritizing. This process is slow, reactive, and hard to repeat.

#### **How Tensorleap Helps You Identify Aggressors**

After uploading a model and running an evaluation, Tensorleap automatically analyzes the results and extracts semantic concepts from your data. In the **Insights** panel, these concepts are ranked by severity — based on their average loss or other performance signals.

This not only helps surface underperforming groups early, but also allows you to measure how impactful each group is. You can focus attention on the most critical aggressors without relying on manual slicing or guesswork.

<figure><img src="https://3509361326-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F9UXeOlFqlw8pl79U2HGU%2Fuploads%2F0GiCi2edjgPojLtbz0l1%2Fimage.png?alt=media&#x26;token=f62d2ca4-2157-4d91-9d5c-29a5db6f0ca6" alt=""><figcaption><p>Tensorleap automatically detects aggressors in a model</p></figcaption></figure>

|                           | **Manual Approach**             | **With Tensorleap**                           |
| ------------------------- | ------------------------------- | --------------------------------------------- |
| **When issues surface**   | Late in testing or production   | During validation                             |
| **How they’re found**     | Manual slicing, trial and error | Automatic detection of weak semantic concepts |
| **Effort required**       | High                            | Low                                           |
| **Ability to prioritize** | Low — unclear severity          | High — ranked by average loss and impact      |

</details>

<details>

<summary>Forming a root cause hypothesis</summary>

Tensorleap provides a structured way to explore *why* a concept might be failing, even before inspecting individual samples.

**1. Metadata Correlation**\
For each aggressor, Tensorleap automatically analyzes correlations between the concept and available metadata. This helps surface patterns like:

* “High vegetation density”
* “Low lighting conditions”
* “Specific capture location”

These insights immediately suggest possible causes without requiring manual filtering.

<figure><img src="https://3509361326-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F9UXeOlFqlw8pl79U2HGU%2Fuploads%2FoAhGeGswG6iCD5DtvQQc%2Fimage.png?alt=media&#x26;token=940ac434-9f07-448e-aefb-ec5c66afde19" alt=""><figcaption><p>Aggressors in the Tensorleap platform are automatically characterised using metadata</p></figcaption></figure>

**2. Representative Samples**\
Once a pattern is suspected, you can quickly browse representative samples from the aggressor group to validate whether the correlation seems meaningful. This helps ground the hypothesis in real examples.

<figure><img src="https://3509361326-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F9UXeOlFqlw8pl79U2HGU%2Fuploads%2FirB5t68noDe5qgHzvw6e%2FKapture%202025-07-10%20at%2007.11.11.gif?alt=media&#x26;token=d01ac2f7-b696-46be-b821-0f631805731d" alt=""><figcaption><p>Viewing representative samples of an aggressor in the platform</p></figcaption></figure>

**3. Shared Characteristics via Heatmaps**\
Tensorleap provides input-level heatmaps that visualize what the model is focusing on within each concept. This highlights common regions of attention or failure, revealing specific visual traits that unite the group — such as consistent background patterns, object locations, or missing context.

<figure><img src="https://3509361326-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F9UXeOlFqlw8pl79U2HGU%2Fuploads%2FBt6vVlNHvKEpRDtFV1SG%2FKapture%202025-07-10%20at%2007.14.10.gif?alt=media&#x26;token=cd4dedfc-2286-44bc-890e-9118116cd892" alt=""><figcaption><p>Using heatmaps to highlight important features in a detected aggressor</p></figcaption></figure>

Users typically form hypotheses either by induction — reviewing examples until a pattern emerges — or by adding metadata that represents the suspected cause (e.g., vegetation near highways).

|                               | **Manual Approach**             | **With Tensorleap**                             |
| ----------------------------- | ------------------------------- | ----------------------------------------------- |
| **How hypotheses are formed** | Manual inspection, intuition    | Metadata correlation and concept-level analysis |
| **Access to examples**        | Requires filtering or scripting | Immediate access to representative samples      |
| **Pattern recognition**       | Visual guessing                 | Supported by metadata and heatmaps              |
| **Consistency and speed**     | Varies, slow                    | Fast, repeatable, and guided                    |

</details>

<details>

<summary>Validating the hypothesis</summary>

#### **How Practitioners Typically Approach This Step**

*Testing whether a suspected cause is actually responsible for model underperformance*

In many cases, this step is skipped entirely — teams assume the hypothesis is correct, apply a fix, and only then check if it helped. When validation is attempted, it often involves filtering, scripting, or reviewing small sample sets. This process is time-consuming, manual, and rarely scalable.

#### **How Tensorleap Helps You Validate Hypotheses**

*Test your assumptions efficiently using metadata, concept samples, and population filters*

Tensorleap enables quick, structured validation through:

* **Metadata correlation**: If custom fields are available (e.g. “vegetation near highways”), A user can verify whether they align with the current model performance issue.

<figure><img src="https://3509361326-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F9UXeOlFqlw8pl79U2HGU%2Fuploads%2FdGg5EQtDuS4WxQjDa7w1%2Fimage.png?alt=media&#x26;token=d642186b-3d38-4a75-b434-e4ac82b7c89c" alt=""><figcaption><p>Correlating metadata with model performance. In this example, the more vegetation pixels the higher the loss (right). Moreover, there's a different amount of vegetation between examined datasets (left)</p></figcaption></figure>

* **Concept sample inspection**: Easily browse many samples within an aggressor to confirm shared patterns or edge cases.
* **Population exploration**: Visualize and filter metadata or performance distributions between the aggressor and the rest of the dataset — helping confirm whether the hypothesis is reflected in the data.

<figure><img src="https://3509361326-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F9UXeOlFqlw8pl79U2HGU%2Fuploads%2FEoOSy0gXGnVJ7q4bcEcO%2Fimage.png?alt=media&#x26;token=d8d92f96-2c98-4647-96d3-7db0db859eda" alt=""><figcaption><p>The detected aggressor exhibit a high amount of vegetation. Left map is colored by vegetation - red pixels have more vegetation than blue ones.</p></figcaption></figure>

|                            | **Manual Approach**                | **With Tensorleap**                             |
| -------------------------- | ---------------------------------- | ----------------------------------------------- |
| **How validation is done** | Filtering, scripting, test subsets | Concept-based metadata and metric exploration   |
| **Confidence in result**   | Often assumed or anecdotal         | Supported by data distributions and visual cues |
| **Effort required**        | High                               | Low                                             |

</details>

<details>

<summary>Solving Root Cause</summary>

#### **How Practitioners Typically Approach This Step**

*Once a hypothesis is assumed correct, applying the fix can be vague and manual*

In traditional workflows, fixing model weaknesses often comes down to trial-and-error: collecting more data, re-labeling, or engineering new heuristics — without a clear connection to the aggressor’s characteristics. This often results in inefficient use of resources and slow iteration.

#### **How Tensorleap Helps You Act on Aggressors**

*Each detected aggressor is paired with a recommended action to guide resolution*

For every identified aggressor, Tensorleap provides an action suggestion based on its severity, representation, and metadata profile. This can include:

* Removing noisy samples?
* Re-labeling subsets?
* Augmenting specific classes
* Collecting more examples with similar patterns

One of the most impactful workflows is the ability to retrieve **unlabeled samples** that are semantically similar to the aggressor. These can be prioritized for labeling and added to the training set — targeting the failure mode directly and efficiently.

<figure><img src="https://3509361326-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F9UXeOlFqlw8pl79U2HGU%2Fuploads%2FUiGHmA398WxPtKPBrXZd%2FKapture%202025-07-10%20at%2007.23.04.gif?alt=media&#x26;token=09c129c9-c28f-4f86-bf77-dbe7b58460da" alt=""><figcaption><p>Solving an aggressor by fetching similar samples to label</p></figcaption></figure>

This closes the loop from insight to intervention — and ensures the fix is grounded in the concept that caused the issue.

</details>

<details>

<summary>Tracking the Effectiveness of the solution</summary>

#### **Why This Step Is Often Challenging**

*Fixes are easy to forget — and failure modes can silently return*

In many workflows, once a model issue is addressed, there's no structured way to ensure it stays fixed. As new models are trained and deployed, teams often lose track of past aggressors — and without consistent monitoring, the same failure modes can quietly resurface.

This creates a gap in deep learning development: there's no equivalent of continuous testing to guard against regressions. What CI does for software, Tensorleap brings to model reliability.

#### **How Tensorleap Helps You Track Improvements**

*Prevent regressions and enforce progress across model deployments*

Once an aggressor is identified, Tensorleap allows you to save it as a **test** — a reusable checkpoint that is automatically evaluated with every new model uploaded to the system.

Tensorleap ensures that:

* Fixes are verified consistently with each new deployment
* Regressions are caught early, before rollout

<figure><img src="https://3509361326-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F9UXeOlFqlw8pl79U2HGU%2Fuploads%2F2xBoYhXnaayrq61A82jt%2Fimage.png?alt=media&#x26;token=476babdd-9166-4bea-85ed-834e353d8e81" alt=""><figcaption><p>Tensorleap Testing allows a model comparison on an aggressor level to prevent regressions. Here, a new model (orange) was uploaded to the system and tested against performance on known aggressor. The model performed poorly on an aggressor that was previously solved (scenes with a high amount of vegetation).</p></figcaption></figure>

With concept-based tests, teams can turn reactive debugging into a structured, continuous validation workflow — protecting against repeated mistakes and ensuring long-term improvement.

|                                 | **Manual Approach**              | **With Tensorleap**                               |
| ------------------------------- | -------------------------------- | ------------------------------------------------- |
| **How results are measured**    | Global metrics or one-off checks | Concept tests evaluated across model deployments  |
| **Regression detection**        | Often encountered in production  | Automatic — detected before model rollout         |
| **Reusability**                 | Low — repeated effort            | High — one-click reuse of past aggressors         |
| **Confidence in long-term fix** | Unclear                          | High — tracked consistently across model versions |

</details>

<details>

<summary>Putting It All Together: A Full Aggressor Debugging Walkthrough</summary>

Coming Soon

</details>

### **Aggressors Video Tutorial**

{% embed url="<https://app.guidde.com/share/playbooks/wd44deRrBVBESEcbGAFcsd?mode=videoOnly&origin=k2buG3CvzZWUzfsWk7HPoOLDKpg2>" %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tensorleap.ai/getting-value-from-tensorleap/agressors-detection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
