Researcher profile

Navid Azizan

Navid Azizan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies

Diffusion and flow policies are gaining prominence in online reinforcement learning (RL) due to their expressive power, yet training them efficiently remains a critical challenge. A fundamental difficulty in online RL is the lack of direct samples from the target distribution; instead, the target is an unnormalized Boltzmann distribution defined by the Q-function. To address this, two seemingly distinct families of methods have been proposed for diffusion policies: a noise-expectation family, which utilizes a weighted average of noise as the training target, and a gradient-expectation family, which employs a weighted average of Q-function gradients. Yet, it remains unclear how these objectives relate formally or if they can be synthesized into a more general formulation. In this paper, we propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of training diffusion and flow models without direct target samples. By adopting a reverse inferential perspective, we formulate the training target as a posterior mean estimation problem given an intermediate noisy sample. Crucially, we introduce Langevin Stein operators to construct zero-mean control variates, deriving a general class of estimators that effectively reduce importance sampling variance. We show that existing noise-expectation and gradient-expectation methods are two specific instances within this broader class. This unified view yields two key advancements: it extends the capability of targeting Boltzmann distributions from diffusion to flow policies, and enables the principled combination of Q-value and Q-gradient information to derive an optimal, minimum-variance estimator, thereby improving training efficiency and stability. We instantiate RFM to train a flow policy in online RL, and demonstrate improved performance on continuous-control benchmarks compared to diffusion policy baselines.

preprint2022arXiv

A Unified View of SDP-based Neural Network Verification through Completely Positive Programming

Verifying that input-output relationships of a neural network conform to prescribed operational specifications is a key enabler towards deploying these networks in safety-critical applications. Semidefinite programming (SDP)-based approaches to Rectified Linear Unit (ReLU) network verification transcribe this problem into an optimization problem, where the accuracy of any such formulation reflects the level of fidelity in how the neural network computation is represented, as well as the relaxations of intractable constraints. While the literature contains much progress on improving the tightness of SDP formulations while maintaining tractability, comparatively little work has been devoted to the other extreme, i.e., how to most accurately capture the original verification problem before SDP relaxation. In this work, we develop an exact, convex formulation of verification as a completely positive program (CPP), and provide analysis showing that our formulation is minimal -- the removal of any constraint fundamentally misrepresents the neural network computation. We leverage our formulation to provide a unifying view of existing approaches, and give insight into the source of large relaxation gaps observed in some cases.

preprint2022arXiv

Control-oriented meta-learning

Real-time adaptation is imperative to the control of robots operating in complex, dynamic environments. Adaptive control laws can endow even nonlinear systems with good trajectory tracking performance, provided that any uncertain dynamics terms are linearly parameterizable with known nonlinear features. However, it is often difficult to specify such features a priori, such as for aerodynamic disturbances on rotorcraft or interaction forces between a manipulator arm and various objects. In this paper, we turn to data-driven modeling with neural networks to learn, offline from past data, an adaptive controller with an internal parametric model of these nonlinear features. Our key insight is that we can better prepare the controller for deployment with control-oriented meta-learning of features in closed-loop simulation, rather than regression-oriented meta-learning of features to fit input-output data. Specifically, we meta-learn the adaptive controller with closed-loop tracking simulation as the base-learner and the average tracking error as the meta-objective. With both fully-actuated and underactuated nonlinear planar rotorcraft subject to wind, we demonstrate that our adaptive controller outperforms other controllers trained with regression-oriented meta-learning when deployed in closed-loop for trajectory tracking control.

preprint2022arXiv

Explicit Regularization via Regularizer Mirror Descent

Despite perfectly interpolating the training data, deep neural networks (DNNs) can often generalize fairly well, in part due to the "implicit regularization" induced by the learning algorithm. Nonetheless, various forms of regularization, such as "explicit regularization" (via weight decay), are often used to avoid overfitting, especially when the data is corrupted. There are several challenges with explicit regularization, most notably unclear convergence properties. Inspired by convergence properties of stochastic mirror descent (SMD) algorithms, we propose a new method for training DNNs with regularization, called regularizer mirror descent (RMD). In highly overparameterized DNNs, SMD simultaneously interpolates the training data and minimizes a certain potential function of the weights. RMD starts with a standard cost which is the sum of the training loss and a convex regularizer of the weights. Reinterpreting this cost as the potential of an "augmented" overparameterized network and applying SMD yields RMD. As a result, RMD inherits the properties of SMD and provably converges to a point "close" to the minimizer of this cost. RMD is computationally comparable to stochastic gradient descent (SGD) and weight decay, and is parallelizable in the same manner. Our experimental results on training sets with various levels of corruption suggest that the generalization performance of RMD is remarkably robust and significantly better than both SGD and weight decay, which implicitly and explicitly regularize the $\ell_2$ norm of the weights. RMD can also be used to regularize the weights to a desired weight vector, which is particularly relevant for continual learning.

preprint2022arXiv

Online Learning for Traffic Routing under Unknown Preferences

In transportation networks, users typically choose routes in a decentralized and self-interested manner to minimize their individual travel costs, which, in practice, often results in inefficient overall outcomes for society. As a result, there has been a growing interest in designing road tolling schemes to cope with these efficiency losses and steer users toward a system-efficient traffic pattern. However, the efficacy of road tolling schemes often relies on having access to complete information on users' trip attributes, such as their origin-destination (O-D) travel information and their values of time, which may not be available in practice. Motivated by this practical consideration, we propose an online learning approach to set tolls in a traffic network to drive heterogeneous users with different values of time toward a system-efficient traffic pattern. In particular, we develop a simple yet effective algorithm that adjusts tolls at each time period solely based on the observed aggregate flows on the roads of the network without relying on any additional trip attributes of users, thereby preserving user privacy. In the setting where the O-D pairs and values of time of users are drawn i.i.d. at each period, we show that our approach obtains an expected regret and road capacity violation of $O(\sqrt{T})$, where $T$ is the number of periods over which tolls are updated. Our regret guarantee is relative to an offline oracle that has complete information on users' trip attributes. We further establish a $Ω(\sqrt{T})$ lower bound on the regret of any algorithm, which establishes that our algorithm is optimal up to constants. Finally, we demonstrate the superior performance of our approach relative to several benchmarks on a real-world transportation network, thereby highlighting its practical applicability.

preprint2022arXiv

Uncertainty in Contrastive Learning: On the Predictability of Downstream Performance

The superior performance of some of today's state-of-the-art deep learning models is to some extent owed to extensive (self-)supervised contrastive pretraining on large-scale datasets. In contrastive learning, the network is presented with pairs of positive (similar) and negative (dissimilar) datapoints and is trained to find an embedding vector for each datapoint, i.e., a representation, which can be further fine-tuned for various downstream tasks. In order to safely deploy these models in critical decision-making systems, it is crucial to equip them with a measure of their uncertainty or reliability. However, due to the pairwise nature of training a contrastive model, and the lack of absolute labels on the output (an abstract embedding vector), adapting conventional uncertainty estimation techniques to such models is non-trivial. In this work, we study whether the uncertainty of such a representation can be quantified for a single datapoint in a meaningful way. In other words, we explore if the downstream performance on a given datapoint is predictable, directly from its pre-trained embedding. We show that this goal can be achieved by directly estimating the distribution of the training data in the embedding space and accounting for the local consistency of the representations. Our experiments show that this notion of uncertainty for an embedding vector often strongly correlates with its downstream accuracy.

preprint2021arXiv

Sketching Curvature for Efficient Out-of-Distribution Detection for Deep Neural Networks

In order to safely deploy Deep Neural Networks (DNNs) within the perception pipelines of real-time decision making systems, there is a need for safeguards that can detect out-of-training-distribution (OoD) inputs both efficiently and accurately. Building on recent work leveraging the local curvature of DNNs to reason about epistemic uncertainty, we propose Sketching Curvature of OoD Detection (SCOD), an architecture-agnostic framework for equipping any trained DNN with a task-relevant epistemic uncertainty estimate. Offline, given a trained model and its training data, SCOD employs tools from matrix sketching to tractably compute a low-rank approximation of the Fisher information matrix, which characterizes which directions in the weight space are most influential on the predictions over the training data. Online, we estimate uncertainty by measuring how much perturbations orthogonal to these directions can alter predictions at a new test input. We apply SCOD to pre-trained networks of varying architectures on several tasks, ranging from regression to classification. We demonstrate that SCOD achieves comparable or better OoD detection performance with lower computational burden relative to existing baselines.