Anming Gu

I'm an incoming PhD student at UT Austin, advised by Kevin Tian. I recently graduated from Boston University, where I worked with Edward Chien and Kristjan Greenewald on optimal transport for machine learning.

My graduate coursework includes:

Mathematics: Functional Analysis, Stochastic Calculus, Mathematics of Deep Learning, PDEs, Stochastic PDEs
Computer Science: Complexity Theory, Mathematical Methods for Theoretical Computer Science

Teaching experience:

Algorithmic Data Mining, S25
Analysis of Algorithms, S22, F24, S25
Algebraic Algorithms, F24
Theory of Computation, S24
Concepts of Programming Languages, F23

CV / Google Scholar / Github

Research

I'm interested in optimal transport, optimization/sampling, robust statistics, and differential privacy. I'm also more broadly interested in problems at the intersection of probability, theoretical computer science, and machine learning.

General research directions that seem interesting to me:

Applications of sampling: diffusion, functional inequalities, spin glasses, and stochastic localization
Interplay between differential privacy and robust statistic, e.g. private and robust mean estimation, linear regression, and PCA.
Notions of average-case DP, e.g. DP algorithms with public data
End-to-end guarantees for diffusion models under differential privacy

(α-β) denotes alphabetical order, * denotes equal contribution, and ‡ denotes student advising

Private Continuous-Time Synthetic Trajectory Generation via Mean-Field Langevin Dynamics
Anming Gu, Edward Chien, Kristjan Greenewald
Under review.
arXiv

We provide an algorithm to privately generate continuous-time data (e.g. marginals from stochastic differential equations) by leveraging the connections between trajectory inference and continuous-time synthetic data generation, along with a computational method based on mean-field Langevin dynamics. Our method has strong utility guarantees under the setting where each person contributes data for only one time point, while prior methods require each person to contribute their entire temporal trajectory.

Mirror Mean-Field Langevin Dynamics
Anming Gu*, Juno Kim*
Under review.
arXiv

The mean-field Langevin dynamics minimizes an entropy-regularized nonlinear convex functional over Wasserstein space. It has gained attention recently due to its connection to noisy gradient descent for mean-field two-layer neural networks. We extend the analysis of mean-field Langevin dynamics to the mirror mean-field Langevin dynamics setting, where optimization is constrained to a convex subset of Euclidean space.

Compute-Optimal LLMs Provably Generalize Better with Scale
Marc Anton Finzi, Sanyam Kapoor, Diego Granziol, Anming Gu, Christopher De Sa, J Zico Kolter, Andrew Gordon Wilson
International Conference on Learning Representations, 2025.
arXiv

We develop generalization bounds on the LLM pretraining objective in the compute optimal regime. We prove a novel fully empirical Freedman-type martingale concentration inequality, tightening existing bounds to account for the low loss variance. With larger models this variance decreases, meaning that our generalization bounds can even get tighter as the models get larger. We pair these findings with an analysis of the theoretically achievable quantization bitrates based on the Hessian of the loss function, controlling the other component of the bounded gap.

Partially Observed Trajectory Inference using Optimal Transport and a Dynamics Prior
Anming Gu, Edward Chien, Kristjan Greenewald
International Conference on Learning Representations, 2025.
Preliminary version in OPT Workshop on Optimization for Machine Learning, 2024. [link]
arXiv / code / poster

Trajectory inference is the problem of recovering a stochastic process from temporal marginals. We consider the setting when we cannot observe the process directly but we have access to a known velocity field. Using tools in optimal transport, stochastic calculus, and optimization theory, we show that a minimum entropy estimator will recover the latent trajectory of the process. We provide theoretical guarantees that our estimator will converge to the ground truth as the number of observations becomes dense in the time domain. We also provide empirical results to show the robustness of our method.

k-Mixup Regularization for Deep Learning via Optimal Transport
Kristjan Greenewald, Anming Gu, Mikhail Yurochkin, Justin Solomon, Edward Chien
Transactions on Machine Learning Research, 2023.
arXiv / code

Mixup is a regularization technique for training neural networks that perturbs input training data in the direction of other randomly chosen training data. We propose a new variant of mixup that uses optimal transport to perturb training data in the direction of other training data that are more similar to the input data. We show theoretically and experimentally that our method is more effective than mixup at improving generalization performance.

Template from Jon Barron's webpage