Title: Learning with Importance Weighted Variational Inference
Joint work with François Roueff
Abstract: Several variational bounds involving importance weighting ideas have been proposed to generalize and improve on the Evidence Lower BOund (ELBO) in the context of maximum likelihood optimization, such as the Importance Weighted Auto-Encoder (IWAE), Variational Rényi (VR) and VR-IWAE bounds. Learning the parameters of interest using these bounds typically involves stochastic gradient-based variational inference procedures. Yet, it remains unclear how the joint choice of bound and gradient estimator impacts the behavior of the resulting algorithms.
In this talk, we study reparameterized and doubly-reparameterized gradient estimators tied to the IWAE, VR and VR-IWAE bounds. Our asymptotic analyses provide a unified comparison of these estimators under mild assumptions, allowing us to identify their respective strengths. Additional asymptotic analyses reveal a new perspective on challenging regimes where the variational approximation deteriorates: even in such settings, importance-weighted gradient estimators can still be used to learn the parameters of interest. Consequently, our work motivates further exploration of importance weighting as a principle for designing and analyzing variational inference algorithms. In addition, our proof techniques establish general theoretical tools that apply more broadly within importance weighting and are of independent interest. We complement our theoretical contributions with experiments illustrating our findings.
10H50-11H40: Jimmy Olsson
10H50-11H40: Jimmy Olsson
11H40-12H30: Julien Stoehr
11H40-12H30: Julien Stoehr
14H00-14H50: François Portier
14H00-14H50: François Portier
Title: Stochastic mirror descent for nonparametric adaptive importance sampling
Joint work with Pascal Bianchi, Bernard Delyon and Victor Priser
Abstract: This paper addresses the problem of approximating an unknown probability distribution with density $f$ - which can only be evaluated up to an unknown scaling factor - with the help of a sequential algorithm that produces at each iteration $n\geq 1$ an estimated density $q_n$. The proposed method optimizes the Kullback-Leibler divergence using a mirror descent (MD) algorithm directly on the space of density functions, while a stochastic approximation technique helps to manage between algorithm complexity and variability. One of the key innovations of this work is the theoretical guarantee that is provided for an algorithm with a fixed MD learning rate \(\eta \in (0,1 )\). The main result is that the sequence \(q_n\) converges almost surely to the target density \(f\) uniformly on compact sets. Through numerical experiments, we show that fixing the learning rate \(\eta \in (0,1 )\) significantly improves the algorithm's performance, particularly in the context of multi-modal target distributions where a small value of $\eta$ allows to increase the chance of finding all modes. Additionally, we propose a particle subsampling method to enhance computational efficiency and compare our method against other approaches through numerical experiments.
14H50-15H40: Yazid Janati
14H50-15H40: Yazid Janati
16H10-16H40: François Bertholom
16H10-16H40: François Bertholom
16H40-17H10: Yvann Le Fay
16H40-17H10: Yvann Le Fay
world/std2025_abstract.1760689151.txt.gz · Last modified: 2025/10/17 10:19 by rdouc