Hao Wu

PhD Candidate in Khoury
College of Computer Sciences
Northeastern University

I am a Ph.D candidate in the Khoury College of Computer Sciences at Northeastern University. My supervisor is Prof. Jan-Willem van de Meent. Previously, I received my M.Sc in Computer Science from University of Virginia and an M.Sc in Applied Mathematics from University of Washington.

My research focuses on deep generative modeling, amortized inference and probabilistic programming. I have mainly worked on developing scalable inference methods for structured generative models, by which I hope to learn representations that characterize complex data modalities. Check out my Research Statement for details about my research work.


May 2021 : Our paper on Conjugate Energy-Based Models will appear at ICML 2021.

Mar 2021 : I am excited to be joining MIT-IBM Watson AI Lab for an internship in summer 2021, advised by Soumya Ghosh.

Dec 2020 : Two extended abstracts will appear at AABI 2021: One work on Conjugate Energy-Based Models and one work on Nested Variational Inference.

Selected Publications

Nested Variational Inference
Conference on Neural Information Processing Systems (NeurIPS), 2021
Heiko Zimmermann, Hao Wu, Babak Esmaeili, Sam Stites, Jan-Willem van de Meent
We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler.In our experiments we observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size. [Paper]
Learning Proposals for Probabilistic Programs with Inference Combinators
Uncertainty in Artificial Intelligence (UAI), 2021
Sam Stites*, Heiko Zimmermann*, Hao Wu, Eli Sennesh, Jan-Willem van de Meent
We develop operators for construction of proposals in probabilistic programs, which we refer to as inference combinators. Inference combinators define a grammar over importance samplers that compose primitive operations such as application of a transition kernel and importance resampling. Proposals in these samplers can be parameterized using neural networks, which in turn can be trained by optimizing variational objectives. The result is a framework for user-programmable variational methods that are correct by construction and can be tailored to specific models. [Paper]
Conjugate Energy-Based Models
International Conference on Machine Learning (ICML), 2021
Hao Wu*, Babak Esmaeili*, Michael Wick, Jean-Baptiste Tristan, Jan-Willem van de Meent
We propose conjugate energy-based models (CEBMs), a class of deep latent-variable models with a tractable posterior. CEBMs have similar use cases as variational autoencoders, in the sense that they learn an unsupervised mapping between data and latent variables. However these models omit a generator, which allows them to learn more flexible notions of similarity between data points. Our experiments demonstrate that CEBMs achieve competitive results in terms of image modelling, predictive power of latent space, and out-of-distribution detection on a variety of datasets. [Paper]
Amortized Population Gibbs Samplers with Neural Sufficient Statistics
International Conference on Machine Learning (ICML), 2020
Hao Wu, Heiko Zimmermann, Eli Sennesh, Tuan Anh Le, Jan-Willem van de Meent
We develop amortized population Gibbs (APG) samplers, a class of scalable methods that frame structured variational inference as adaptive importance sampling. APG samplers construct high-dimensional proposals by iterating over updates to lower-dimensional blocks of variables. We train each conditional proposal by minimizing the inclusive KL divergence with respect to the conditional posterior. To appropriately account for the size of the input data, we develop a new parameterization in terms of neural sufficient statistics. Experiments show that APG samplers can be used to train highly-structured deep generative models in an unsupervised manner. [Paper]
Structured Disentangled Representations
Artificial Intelligence and Statistics (AISTATS), 2019
Babak Esmaeili, Hao Wu, Sarthak Jain, Alican Bozkurt, N. Siddharth, Brooks Paige, Dana H. Brooks, Jennifer Dy, Jan-Willem van de Meent
Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation by introducing modifications to the standard objective function. These approaches generally assume a simple diagonal Gaussian prior and as a result are not able to reliably disentangle factors of variation. We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks. [Paper]