Robustness in Optimal Transport Theory: Building Reliable AI Models | by Chaimae | Apr, 2025


Example 1: Image Classification

Imagine an AI model trained to distinguish cats from dogs using thousands of clear, well-composed photos. Now, what happens when users upload blurry images, unusual angles, or photos with filters? A non-robust model might completely fail, while a robust model maintains reasonable performance despite these variations.

Example 2: Transportation Planning

Consider an AI system optimizing delivery routes based on typical traffic patterns. If unexpected road construction or a major event disrupts traffic, a non-robust system would stick to now-inefficient routes, while a robust system would adapt its recommendations accordingly.

Let’s explore the mathematical concepts behind robustness in optimal transport, breaking down the formulas into intuitive components.

In standard optimal transport, the problem is formulated as:

minγ∈Π(μ,ν)​∫X×Y​c(x,y)dγ(x,y)

where:
– μ and ν are probability distributions
– Π(μ, ν) is the set of all joint distributions with marginals μ and ν
– c(x, y) is the cost of transporting mass from x to y
– γ is the transport plan we’re trying to optimize

In plain language: We’re finding the most efficient way to transform distribution μ into distribution ν, where “efficient” means minimizing the total transport cost.

To make our models robust, we need to account for uncertainty. One powerful approach is the adversarial formulation:

minγ​maxμ~​∈B(μ),ν~∈B(ν)​∫X×Y​c(x,y)dγ(x,y)

where:
– $\mathcal{B}(\mu)$ represents a “ball” of distributions around μ (all distributions within some distance of μ)
– $\mathcal{B}(\nu)$ is similar for ν

In simpler terms: Instead of finding the optimal transport plan for specific distributions μ and ν, we’re finding a plan that works reasonably well for any distributions that are “close” to μ and ν.

This is like preparing for the worst-case scenario within reasonable bounds of what we expect.

The Wasserstein distance (also called Earth Mover’s Distance) plays a central role in robust optimal transport. For two distributions μ and ν, it’s defined as:

Wp​(μ,ν)=(infγ∈Π(μ,ν)​∫X×Y​d(x,y)pdγ(x,y))1/p

where d(x,y) is the “ground distance” between points x and y.

This distance has a natural interpretation: it measures the minimum “work” required to transform one distribution into another, where work is the amount of mass multiplied by the distance it travels.

For robustness, we often use a variant called the robust Wasserstein distance:

RWp​(μ,ν)=supμ~​∈B(μ),ν~∈B(ν)​Wp​(μ~​,ν~)

This measures the maximum possible distance between any distributions in the “balls” around μ and ν, giving us a conservative estimate that accounts for uncertainty.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here