MISO Learning Multiple Initial SOlutions to Optimization Problems

Elad Sharony1,2, Heng Yang2,3, Tong Che2, Marco Pavone2,4, Shie Mannor1,2, Peter Karkus2
1Technion    2NVIDIA Research    3Harvard University    4Stanford University
ICRA 2026

Overview

Many real-world applications—from autonomous driving to robotic manipulation—require solving similar optimization problems sequentially under strict runtime constraints. The performance of local optimizers is highly sensitive to the initial solution: poor initialization can lead to slow convergence or convergence to suboptimal local minima.

We propose MISO (Learning Multiple Initial SOlutions), a framework that trains a single neural network to predict multiple diverse initial solutions for optimization problems. Unlike ensemble methods that require training and running multiple models, MISO uses a single efficient network with diversity-promoting training objectives.

  • Single-optimizer setting: Select the most promising initial solution using a selection function, then run a single optimizer.
  • Multiple-optimizers setting: Initialize multiple parallel optimizers, each with a different predicted solution, and select the best final output.
  • Guaranteed improvement: By including the default initialization among predictions, MISO is guaranteed to perform equal or better than baseline.
  • Efficient inference: Unlike ensembles, MISO's inference time remains nearly constant as $K$ increases.
MISO method overview
Figure 1: Prior work (top) predicts a single initial solution from problem parameters. MISO (middle, bottom) predicts multiple diverse initial solutions using a single neural network. In the single-optimizer setting, a candidate evaluator selects the best initial solution. In the multiple-optimizers setting, all solutions initialize parallel optimizers, with the best final output selected.

Motivation

Real-World Scenario: Autonomous Driving

Consider an autonomous vehicle following a planned trajectory. Suddenly, a pedestrian steps onto the road, causing the high-level planner to abruptly change the reference path. The previous trajectory (warm-start) is now far from optimal—the optimizer must find a completely different solution within milliseconds. Similar abrupt changes occur with traffic light switches, sudden lane closures, or newly detected obstacles.

Traditional initialization approaches fail in these scenarios:

Toy task illustration
Figure 2: Left: A cost function $c(x)$ with global minima at A and C and a local minimum at B. Right: Predicted initial solutions for different methods. Single-output regression and ensembles converge to the mean (local minimum B), while MISO with winner-takes-all and mixture losses successfully predicts both global optima.
Key Insight: By explicitly training for diversity using winner-takes-all or mixture losses, MISO learns to cover multiple modes of the solution distribution, enabling the optimizer to reach global optima regardless of which mode is active.

Method

Given a parameter vector $\boldsymbol{\psi}$ that defines a problem instance (e.g., current state, goal, obstacles), our multi-output predictor outputs $K$ candidate initial solutions:

$$\{\hat{\boldsymbol{x}}_{k}^{\mathrm{init}}\}_{k=1}^{K} = \boldsymbol{f}(\boldsymbol{\psi}; \boldsymbol{\theta})$$

Training Strategies

To ensure diverse outputs that cover multiple modes, we introduce three training strategies:

Winner-Takes-All (WTA) Loss

Only penalize the best-predicted output, allowing other outputs to specialize on different modes:

$$\mathcal{L}_{\mathrm{WTA}} = \min_{k} \{\mathcal{L}_{\mathrm{reg}}(\hat{\boldsymbol{x}}_{k}^{\mathrm{init}}, \boldsymbol{x}^{\star})\}$$

Pairwise Distance (PD) Loss

Add a term to encourage dispersion among outputs:

$$\mathcal{L}_{\mathrm{PD}} = \frac{1}{K} \sum_{k=1}^{K} \mathcal{L}_{\mathrm{reg}}(\hat{\boldsymbol{x}}_{k}^{\mathrm{init}}, \boldsymbol{x}^{\star}) + \alpha_K \frac{1}{K(K-1)} \sum_{k \neq k'} \| \hat{\boldsymbol{x}}_{k}^{\mathrm{init}} - \hat{\boldsymbol{x}}_{k'}^{\mathrm{init}} \|$$

Mixture Loss

Combine WTA with pairwise distance for enhanced diversity control:

$$\mathcal{L}_{\mathrm{MIX}} = \min_{k} \left\{\mathcal{L}_{\mathrm{reg}}(\hat{\boldsymbol{x}}_{k}^{\mathrm{init}}, \boldsymbol{x}^{\star}) + \alpha_K \Phi\left(\mathcal{L}_{\mathrm{PD}, k}\right)\right\}$$

Selection Function

In the single-optimizer setting, a selection function $\Lambda$ chooses the most promising initial solution. A natural choice is selecting the candidate that minimizes the objective: $\Lambda := \arg\min_{k} J(\hat{\boldsymbol{x}}_{k}^{\mathrm{init}}; \boldsymbol{\psi})$. Alternative criteria include constraint satisfaction, robustness measures, or domain-specific requirements.

Experiments

We evaluate MISO across three optimal control benchmark tasks, each using a different local optimization algorithm:

Cart-pole task

Cart-pole
Box-DDP optimizer

Reacher task

Reacher
MPPI optimizer

Autonomous driving task

Driving (nuPlan)
iLQR optimizer

Main Results

Single Optimizer Setting

Table 1: Mean cost (lower is better) for single optimizer with different initialization methods. Best results in bold green, second best underlined.

Method K One-Off Optimization Sequential Optimization
Reacher Cart-pole Driving Reacher Cart-pole Driving
Warm-start 1 13.48 11.69 283.86 13.48 11.69 283.86
Regression 1 13.40 11.19 74.23 19.56 6.18 70.62
Ensemble 32 13.39 10.94 47.22 8.40 3.55 52.59
MISO WTA 32 13.36 10.48 30.17 2.72 0.83 30.75
MISO mix 32 12.74 10.48 33.95 2.44 0.79 33.38
Key Results: MISO WTA and MISO mix consistently outperform all baselines. On the challenging driving task, MISO reduces mean cost by 89% compared to warm-start and 42% compared to ensemble methods in sequential optimization.

Scaling with Number of Initial Solutions

Driving single optimizer scaling
(a) Single Optimizer
Driving multiple optimizers scaling
(b) Multiple Optimizers

Figure 3: Mean cost on the driving task (sequential optimization) for varying $K$ values. MISO scales effectively with the number of predicted initial solutions.

Inference Time Comparison

Inference times comparison
Figure 4: Mean inference time as $K$ increases. MISO (multi-output) maintains nearly constant inference time, while ensemble inference time grows linearly with $K$.

Mode Frequency Analysis

Mode frequency heatmap
Figure 5: Heatmap of output selection frequency for MISO WTA on the driving task. All outputs remain active, demonstrating that MISO effectively utilizes all predicted initial solutions.

Qualitative Results

Example trajectories
Figure 6: Left: Single optimizer for the driving task. When the high-level planner abruptly modifies the reference path, warm-start and regression fail to adapt, while MISO closely follows the updated reference. Right: MISO's multiple initial solutions for cart-pole capture different modes.

Citation

@article{sharony2024learningmultipleinitialsolutions,
  title={Learning Multiple Initial Solutions to Optimization Problems},
  author={Elad Sharony and Heng Yang and Tong Che and Marco Pavone and Shie Mannor and Peter Karkus},
  journal={arXiv preprint arXiv:2411.02158},
  year={2024},
}

References

  1. Amos, B., et al. (2018). Differentiable MPC for end-to-end planning and control. NeurIPS.
  2. Williams, G., et al. (2015). Model predictive path integral control using covariance variable importance sampling. arXiv:1509.01149.
  3. Li, W., & Todorov, E. (2004). Iterative linear quadratic regulator design for nonlinear biological movement systems. ICINCO.
  4. Caesar, H., et al. (2021). nuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles. CVPR Workshop.
  5. Guzman-Rivera, A., et al. (2012). Multiple choice learning: Learning to produce multiple structured outputs. NeurIPS.