Hao Wu

Machine Learning Research
Engineer at Verses.AI

Welcome to my website! I am researcher in Machine Learning, with expertises in Deep Generative Modeling, Amortized Variational Inference, and Probabilistic Programming. Essentially I am interested in developing machine learning systems that can generalize to sparse data, provide data-efficient inference, and uncover structure from data. Check out my Research Statement for details.

I acquired my Ph.D degree in Computer Science at Northeastern University, where I was advised with Prof. Jan-Willem van de Meent. Before that I had my M.Sc in Computer Science at University of Virginia and my M.Sc in Applied Mathematics from University of Washington.

News

Aug 2023 : I successfully defended my PhD! I will be joining Verses.AI as a Machine Learning Research Engineer soon. Looking forward to the next journey!

May 2022 : I am excited to be joining Google Research for an internship and working on variational inference on deep switching dynamical systems!

Selected Publications

zimmermann_nvi_2021
Nested Variational Inference
Conference on Neural Information Processing Systems (NeurIPS), 2021
Heiko Zimmermann, Hao Wu, Babak Esmaeili, Sam Stites, Jan-Willem van de Meent
We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler.In our experiments we observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size. [Paper]
stites_combinators_2021
Learning Proposals for Probabilistic Programs with Inference Combinators
Uncertainty in Artificial Intelligence (UAI), 2021
Sam Stites*, Heiko Zimmermann*, Hao Wu, Eli Sennesh, Jan-Willem van de Meent
We develop operators for construction of proposals in probabilistic programs, which we refer to as inference combinators. Inference combinators define a grammar over importance samplers that compose primitive operations such as application of a transition kernel and importance resampling. Proposals in these samplers can be parameterized using neural networks, which in turn can be trained by optimizing variational objectives. The result is a framework for user-programmable variational methods that are correct by construction and can be tailored to specific models. [Paper]
wu_cebm_2021
Conjugate Energy-Based Models
International Conference on Machine Learning (ICML), 2021
Hao Wu*, Babak Esmaeili*, Michael Wick, Jean-Baptiste Tristan, Jan-Willem van de Meent
We propose conjugate energy-based models (CEBMs), a class of deep latent-variable models with a tractable posterior. CEBMs have similar use cases as variational autoencoders, in the sense that they learn an unsupervised mapping between data and latent variables. However these models omit a generator, which allows them to learn more flexible notions of similarity between data points. Our experiments demonstrate that CEBMs achieve competitive results in terms of image modelling, predictive power of latent space, and out-of-distribution detection on a variety of datasets. [Paper]
wu_amortized_2020
Amortized Population Gibbs Samplers with Neural Sufficient Statistics
International Conference on Machine Learning (ICML), 2020
Hao Wu, Heiko Zimmermann, Eli Sennesh, Tuan Anh Le, Jan-Willem van de Meent
We develop amortized population Gibbs (APG) samplers, a class of scalable methods that frame structured variational inference as adaptive importance sampling. APG samplers construct high-dimensional proposals by iterating over updates to lower-dimensional blocks of variables. We train each conditional proposal by minimizing the inclusive KL divergence with respect to the conditional posterior. To appropriately account for the size of the input data, we develop a new parameterization in terms of neural sufficient statistics. Experiments show that APG samplers can be used to train highly-structured deep generative models in an unsupervised manner. [Paper]
esmaeili_hfvae_2019
Structured Disentangled Representations
Artificial Intelligence and Statistics (AISTATS), 2019
Babak Esmaeili, Hao Wu, Sarthak Jain, Alican Bozkurt, N. Siddharth, Brooks Paige, Dana H. Brooks, Jennifer Dy, Jan-Willem van de Meent
Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation by introducing modifications to the standard objective function. These approaches generally assume a simple diagonal Gaussian prior and as a result are not able to reliably disentangle factors of variation. We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks. [Paper]