uncertainty quantification

overview

To apply ML techniques to scientific and engineering systems, we require reliable uncertainty estimates. Models must quantify uncertainty, detect distribution shift, and support adaptive decision making, yet many uncertainty quantification (UQ) techniques in deep learning remain ad hoc. Often, classical methods do not scale, and heuristic approaches behave unpredictably in safety-critical environments such as control, robotics, and adaptive sensing.

My work approaches UQ along two complementary fronts:

theoretical tools for anytime-valid and sequential decision making, and
practical diagnostics and corrections for empirical UQ methods in large models.

information-theoretic anytime-valid inference

Many scientific and engineering tasks operate in sequential or adaptive settings where fixed-time uncertainty bounds fail. Using ideas from universal coding and gambling, I develop confidence sequences (CSs) that provide time-uniform uncertainty guarantees. These results strengthen classical concentration bounds and support sequential decisiom making problems such as A/B testing and contextual bandits.

This line of work establishes the power of information-theoretic constructions of confidence sequences, including:

confidence sequences for bounded variables [1],
confidence sequences for bounded vectors [2], and
confidence sequences for semi-unbounded variables [3], which have applications in offline bandit problems.

diagnosing empirical UQ

There are many proposals for scalable uncertainty quantification in large neural predictive models, yet their statistical meanings and failure modes often remain unclear. In [4], we analyze evidential deep learning and show that its uncertainty targets do not correspond to the underlying data distribution, contrary to the original interpretation. This explains its inconsistent empirical behavior and highlights structural issues in commonly used heuristic UQ pipelines. These findings motivate the need for uncertainty quantification methods that remain meaningful and reliable even in large-scale or misspecified regimes.

broader perspective

These theoretical and empirical perspectives support reliable decision making in systems such as data-driven control, safe autonomy, adaptive sensing, and scientific inference. Ongoing work includes:

extending confidence sequence techniques to operator-valued settings such as streaming PCA,
integrating time-uniform UQ with operator learning and generative modeling,
and designing UQ frameworks that scale to large models with provable reliability guarantees.

These tools aim to provide the foundations for uncertainty-aware scientific and engineering workflows, where reliable decisions depend on trustworthy uncertainty estimates.

references

2025

COLT

Improved Offline Contextual Bandits with Second-Order Bounds: Betting and Freezing

J. Jon Ryu, Jeongyeol Kwon, Benjamin Koppe, and Kwang-Sung Jun

In COLT, June 2025

TL;DR: We introduce a betting-based confidence bound for off-policy selection, and a new off-policy learning technique that provides a favorable bias-variance tradeoff.
Abs arXiv PDF Code

We consider off-policy selection and learning in contextual bandits, where the learner aims to select or train a reward-maximizing policy using data collected by a fixed behavior policy. Our contribution is two-fold. First, we propose a novel off-policy selection method that leverages a new betting-based confidence bound applied to an inverse propensity weight sequence. Our theoretical analysis reveals that this method achieves a significantly improved, variance-adaptive guarantee over prior work. Second, we propose a novel and generic condition on the optimization objective for off-policy learning that strikes a different balance between bias and variance. One special case, which we call freezing, tends to induce low variance, which is preferred in small-data regimes. Our analysis shows that it matches the best existing guarantees. In our empirical study, our selection method outperforms existing methods, and freezing exhibits improved performance in small-sample regimes.

2024

TransIT

On Confidence Sequences for Bounded Random Processes via Universal Gambling Strategies

J. Jon Ryu and Alankrita Bhatt

IEEE TransIT, June 2024

TL;DR: We provide a simple two-horse-race perspective for constructing confidence sequences for bounded random processes, demonstrate new properties of the confidence sequence induced by universal portfolio, and propose its computationally efficient relaxations.
Abs arXiv HTML PDF Code

This paper considers the problem of constructing a confidence sequence, which is a sequence of confidence intervals that hold uniformly over time, for estimating the mean of bounded real-valued random processes. This paper revisits the gambling-based approach established in the recent literature from a natural \emphtwo-horse race perspective, and demonstrates new properties of the resulting algorithm induced by Cover (1991)’s universal portfolio. The main result of this paper is a new algorithm based on a mixture of lower bounds, which closely approximates the performance of Cover’s universal portfolio with constant per-round time complexity. A higher-order generalization of a lower bound on a logarithmic function in (Fan et al., 2015), which is developed as a key technique for the proposed algorithm, may be of independent interest.
ICML

Gambling-Based Confidence Sequences for Bounded Random Vectors

J. Jon Ryu and Gregory W. Wornell

In ICML, July 2024

Spotlight
TL;DR: We propose a new approach to constructing confidence sequences for means of bounded multivariate stochastic processes using a general gambling framework.
Abs arXiv HTML PDF Code Poster

Top 3.5% of submissions

A confidence sequence (CS) is a sequence of confidence sets that contains a target parameter of an underlying stochastic process at any time step with high probability. This paper proposes a new approach to constructing CSs for means of bounded multivariate stochastic processes using a general gambling framework, extending the recently established coin toss framework for bounded random processes. The proposed gambling framework provides a general recipe for constructing CSs for categorical and probability-vector-valued observations, as well as for general bounded multidimensional observations through a simple reduction. This paper specifically explores the use of the mixture portfolio, akin to Cover’s universal portfolio, in the proposed framework and investigates the properties of the resulting CSs. Simulations demonstrate the tightness of these confidence sequences compared to existing methods. When applied to the sampling without-replacement setting for finite categorical data, it is shown that the resulting CS based on a universal gambling strategy is provably tighter than that of the posterior-prior ratio martingale proposed by Waudby-Smith and Ramdas.
NeurIPS

Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?

Maohao Shen^*, J. Jon Ryu^*, Soumya Ghosh, Yuheng Bu, Prasanna Sattigeri, Subhro Das, and Gregory W. Wornell

In NeurIPS, December 2024

TL;DR: We theoretically characterize the pitfalls of evidential deep learning (EDL) in quantifying predictive uncertainty by unifying various EDL objective functions, and empirically demonstrate their failure modes.
Abs arXiv HTML PDF Code Poster Slides

This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called \emphevidential deep learning (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies by Bengs et al. identify limitations of the existing methods to conclude their learned epistemic uncertainties are unreliable, e.g., in that they are non-vanishing even with infinite data. Building on and sharpening such analysis, we 1) provide a sharper understanding of the asymptotic behavior of a wide class of EDL methods by unifying various objective functions; 2) reveal that the EDL methods can be better interpreted as an out-of-distribution detection algorithm based on energy-based-models; and 3) conduct extensive ablation studies to better assess their empirical effectiveness with real-world datasets. Through all these analyses, we conclude that even when EDL methods are empirically effective on downstream tasks, this occurs despite their poor uncertainty quantification capabilities. Our investigation suggests that incorporating model uncertainty can help EDL methods faithfully quantify uncertainties and further improve performance on representative downstream tasks, albeit at the cost of additional computational complexity.