Learning Multiple Initial SOlutions to Optimization Problems

Elad Sharony^1,2, Heng Yang^2,3, Tong Che², Marco Pavone^2,4, Shie Mannor^1,2, Peter Karkus²

¹Technion ²NVIDIA Research ³Harvard University ⁴Stanford University

ICRA 2026

Overview

Many real-world applications—from autonomous driving to robotic manipulation—require solving similar optimization problems sequentially under strict runtime constraints. The performance of local optimizers is highly sensitive to the initial solution: poor initialization can lead to slow convergence or convergence to suboptimal local minima.

We propose MISO (Learning Multiple Initial SOlutions), a framework that trains a single neural network to predict multiple diverse initial solutions for optimization problems. Unlike ensemble methods that require training and running multiple models, MISO uses a single efficient network with diversity-promoting training objectives.

Single-optimizer setting: Select the most promising initial solution using a selection function, then run a single optimizer.
Multiple-optimizers setting: Initialize multiple parallel optimizers, each with a different predicted solution, and select the best final output.
Guaranteed improvement: By including the default initialization among predictions, MISO is guaranteed to perform equal or better than baseline.
Efficient inference: Unlike ensembles, MISO's inference time remains nearly constant as $K$ increases.

MISO method overview — **Figure 1:** **Prior work (top)** predicts a single initial solution from problem parameters. **MISO (middle, bottom)** predicts multiple diverse initial solutions using a single neural network. In the single-optimizer setting, a candidate evaluator selects the best initial solution. In the multiple-optimizers setting, all solutions initialize parallel optimizers, with the best final output selected.

Motivation

Real-World Scenario: Autonomous Driving

Consider an autonomous vehicle following a planned trajectory. Suddenly, a pedestrian steps onto the road, causing the high-level planner to abruptly change the reference path. The previous trajectory (warm-start) is now far from optimal—the optimizer must find a completely different solution within milliseconds. Similar abrupt changes occur with traffic light switches, sudden lane closures, or newly detected obstacles.

Traditional initialization approaches fail in these scenarios:

Warm-start: Reuses the previous solution, which becomes invalid after abrupt changes.
Single-output regression: Learns to predict the mean of multiple modes, often corresponding to a local minimum.
Ensemble of regressors: Each model is biased toward the same mean, failing to cover multiple modes.

Toy task illustration — **Figure 2:** **Left:** A cost function $c(x)$ with global minima at A and C and a local minimum at B. **Right:** Predicted initial solutions for different methods. Single-output regression and ensembles converge to the mean (local minimum B), while MISO with winner-takes-all and mixture losses successfully predicts both global optima.

Key Insight: By explicitly training for diversity using winner-takes-all or mixture losses, MISO learns to cover multiple modes of the solution distribution, enabling the optimizer to reach global optima regardless of which mode is active.

Method

Given a parameter vector $\boldsymbol{\psi}$ that defines a problem instance (e.g., current state, goal, obstacles), our multi-output predictor outputs $K$ candidate initial solutions:

\{\hat{\boldsymbol{x}}_{k}^{\mathrm{init}}\}_{k=1}^{K} = \boldsymbol{f}(\boldsymbol{\psi}; \boldsymbol{\theta})

Training Strategies

To ensure diverse outputs that cover multiple modes, we introduce three training strategies:

Winner-Takes-All (WTA) Loss

Only penalize the best-predicted output, allowing other outputs to specialize on different modes:

\mathcal{L}_{\mathrm{WTA}} = \min_{k} \{\mathcal{L}_{\mathrm{reg}}(\hat{\boldsymbol{x}}_{k}^{\mathrm{init}}, \boldsymbol{x}^{\star})\}

Pairwise Distance (PD) Loss

Add a term to encourage dispersion among outputs:

\mathcal{L}_{\mathrm{PD}} = \frac{1}{K} \sum_{k=1}^{K} \mathcal{L}_{\mathrm{reg}}(\hat{\boldsymbol{x}}_{k}^{\mathrm{init}}, \boldsymbol{x}^{\star}) + \alpha_K \frac{1}{K(K-1)} \sum_{k \neq k'} \| \hat{\boldsymbol{x}}_{k}^{\mathrm{init}} - \hat{\boldsymbol{x}}_{k'}^{\mathrm{init}} \|

Mixture Loss

Combine WTA with pairwise distance for enhanced diversity control:

\mathcal{L}_{\mathrm{MIX}} = \min_{k} \left\{\mathcal{L}_{\mathrm{reg}}(\hat{\boldsymbol{x}}_{k}^{\mathrm{init}}, \boldsymbol{x}^{\star}) + \alpha_K \Phi\left(\mathcal{L}_{\mathrm{PD}, k}\right)\right\}

Selection Function

In the single-optimizer setting, a selection function $\Lambda$ chooses the most promising initial solution. A natural choice is selecting the candidate that minimizes the objective: $\Lambda := \arg\min_{k} J(\hat{\boldsymbol{x}}_{k}^{\mathrm{init}}; \boldsymbol{\psi})$. Alternative criteria include constraint satisfaction, robustness measures, or domain-specific requirements.

Experiments

We evaluate MISO across three optimal control benchmark tasks, each using a different local optimization algorithm:

Cart-pole
Box-DDP optimizer

Reacher
MPPI optimizer

Driving (nuPlan)
iLQR optimizer

Main Results

Single Optimizer Setting

Table 1: Mean cost (lower is better) for single optimizer with different initialization methods. Best results in bold green, second best underlined.

Method	K	One-Off Optimization			Sequential Optimization
Method	K	Reacher	Cart-pole	Driving	Reacher	Cart-pole	Driving
Warm-start	1	13.48	11.69	283.86	13.48	11.69	283.86
Regression	1	13.40	11.19	74.23	19.56	6.18	70.62
Ensemble	32	13.39	10.94	47.22	8.40	3.55	52.59
MISO WTA	32	13.36	10.48	30.17	2.72	0.83	30.75
MISO mix	32	12.74	10.48	33.95	2.44	0.79	33.38

Key Results: MISO WTA and MISO mix consistently outperform all baselines. On the challenging driving task, MISO reduces mean cost by 89% compared to warm-start and 42% compared to ensemble methods in sequential optimization.

Scaling with Number of Initial Solutions

Driving single optimizer scaling — **(a) Single Optimizer**

Driving multiple optimizers scaling — **(b) Multiple Optimizers**

Figure 3: Mean cost on the driving task (sequential optimization) for varying $K$ values. MISO scales effectively with the number of predicted initial solutions.

Inference Time Comparison

Inference times comparison — **Figure 4:** Mean inference time as $K$ increases. MISO (multi-output) maintains nearly constant inference time, while ensemble inference time grows linearly with $K$.

Mode Frequency Analysis

Mode frequency heatmap — **Figure 5:** Heatmap of output selection frequency for MISO WTA on the driving task. All outputs remain active, demonstrating that MISO effectively utilizes all predicted initial solutions.

Qualitative Results

Example trajectories — **Figure 6: Left:** Single optimizer for the driving task. When the high-level planner abruptly modifies the reference path, warm-start and regression fail to adapt, while MISO closely follows the updated reference. **Right:** MISO's multiple initial solutions for cart-pole capture different modes.

Visualizations

Citation

@article{sharony2024learningmultipleinitialsolutions,
  title={Learning Multiple Initial Solutions to Optimization Problems},
  author={Elad Sharony and Heng Yang and Tong Che and Marco Pavone and Shie Mannor and Peter Karkus},
  journal={arXiv preprint arXiv:2411.02158},
  year={2024},
}

References

Amos, B., et al. (2018). Differentiable MPC for end-to-end planning and control. NeurIPS.
Williams, G., et al. (2015). Model predictive path integral control using covariance variable importance sampling. arXiv:1509.01149.
Li, W., & Todorov, E. (2004). Iterative linear quadratic regulator design for nonlinear biological movement systems. ICINCO.
Caesar, H., et al. (2021). nuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles. CVPR Workshop.
Guzman-Rivera, A., et al. (2012). Multiple choice learning: Learning to produce multiple structured outputs. NeurIPS.