Source author record

Ioannis Ch. Paschalidis

Ioannis Ch. Paschalidis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Systems and Control eess.SY Robotics Artificial Intelligence Distributed, Parallel, and Cluster Computing Information Theory math.IT Multiagent Systems Networking and Internet Architecture Applications Computer Vision Human-Computer Interaction Logic in Computer Science physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

23works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Bridging the Gap Between Average and Discounted TD Learning

The analysis of Temporal Difference (TD) learning in the average-reward setting faces notable theoretical difficulties because the Bellman operator is not contractive with respect to any norm. This complicates standard analyses of stochastic updates that are effective in discounted settings. Although a considerable body of literature addresses these challenges, existing theoretical approaches come with limitations. We introduce a novel algorithm designed explicitly for policy evaluation in the average-reward setting, utilizing sampling from two Markovian trajectories. Our proposed method overcomes previous limitations by guaranteeing convergence to the unique solution of a properly defined projected Bellman equation. Notably, and in contrast to earlier work, our convergence analysis is uniformly applicable to both linear function approximation and tabular settings and does not involve explicit dimension-dependent terms in its convergence bounds. These results align with what is known to hold in the discounted setting. Furthermore, our algorithm achieves improved dependence on the problem's condition number, reducing the sample complexity from quartic, as in prior literature, to quadratic scaling, and thus matching the efficiency seen in the discounted setting.

preprint2026arXiv

Multiple-policy Evaluation via Density Estimation

We study the multiple-policy evaluation problem where we are given a set of $K$ policies and the goal is to evaluate their performance (expected total reward over a fixed horizon) to an accuracy $ε$ with probability at least $1-δ$. We propose an algorithm named $\mathrm{CAESAR}$ for this problem. Our approach is based on computing an approximate optimal offline sampling distribution and using the data sampled from it to perform the simultaneous estimation of the policy values. $\mathrm{CAESAR}$ has two phases. In the first we produce coarse estimates of the visitation distributions of the target policies at a low order sample complexity rate that scales with $\tilde{O}(\frac{1}ε)$. In the second phase, we approximate the optimal offline sampling distribution and compute the importance weighting ratios for all target policies by minimizing a step-wise quadratic loss function inspired by the DualDICE \cite{nachum2019dualdice} objective. Up to low order and logarithmic terms $\mathrm{CAESAR}$ achieves a sample complexity $\tilde{O}\left(\frac{H^4}{ε^2}\sum_{h=1}^H\max_{k\in[K]}\sum_{s,a}\frac{(d_h^{π^k}(s,a))^2}{μ^*_h(s,a)}\right)$, where $d^π$ is the visitation distribution of policy $π$, $μ^*$ is the optimal sampling distribution, and $H$ is the horizon.

preprint2026arXiv

Towards General Preference Alignment: Diffusion Models at Nash Equilibrium

Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computationally efficient alternative that avoids explicit reward modeling and has been widely adopted in diffusion alignment. However, existing preference-based methods for diffusion alignment still rely on reward-induced preference signals and typically assume that human preferences can be adequately modeled by the Bradley--Terry (BT) model, which may fail to capture the full complexity of human preferences. In this paper, we formulate diffusion alignment from a game-theoretic perspective. We propose Diffusion Nash Preference Optimization (Diff.-NPO), an intuitive general preference framework for diffusion alignment. Diff.-NPO encourages the current policy to play against itself to achieve self improvement and lead to a better alignment. Empirically, we demonstrate the effectiveness of Diff.-NPO on the text-to-image generation task via various metrics. Diff.-NPO consistently outperforms existing preference-based diffusion alignment methods.

preprint2021arXiv

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

This paper is concerned with minimizing the average of $n$ cost functions over a network in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD). Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate, which we show behaves as $K_T=\mathcal{O}\left(\frac{n}{(1-ρ_w)^2}\right)$, where $1-ρ_w$ denotes the spectral gap of the mixing matrix. Moreover, we construct a "hard" optimization problem for which we show the transient time needed for DSGD to approach the asymptotic convergence rate is lower bounded by $Ω\left(\frac{n}{(1-ρ_w)^2} \right)$, implying the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.

preprint2021arXiv

Online Baum-Welch algorithm for Hierarchical Imitation Learning

The options framework for hierarchical reinforcement learning has increased its popularity in recent years and has made improvements in tackling the scalability problem in reinforcement learning. Yet, most of these recent successes are linked with a proper options initialization or discovery. When an expert is available, the options discovery problem can be addressed by learning an options-type hierarchical policy directly from expert demonstrations. This problem is referred to as hierarchical imitation learning and can be handled as an inference problem in a Hidden Markov Model, which is done via an Expectation-Maximization type algorithm. In this work, we propose a novel online algorithm to perform hierarchical imitation learning in the options framework. Further, we discuss the benefits of such an algorithm and compare it with its batch version in classical reinforcement learning benchmarks. We show that this approach works well in both discrete and continuous environments and, under certain conditions, it outperforms the batch version.

preprint2020arXiv

Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning

We provide a discussion of several recent results which, in certain scenarios, are able to overcome a barrier in distributed stochastic optimization for machine learning. Our focus is the so-called asymptotic network independence property, which is achieved whenever a distributed method executed over a network of n nodes asymptotically converges to the optimal solution at a comparable rate to a centralized method with the same computational power as the entire network. We explain this property through an example involving the training of ML models and sketch a short mathematical analysis for comparing the performance of distributed stochastic gradient descent (DSGD) with centralized stochastic gradient decent (SGD).

preprint2020arXiv

Congestion-aware Routing and Rebalancing of Autonomous Mobility-on-Demand Systems in Mixed Traffic

This paper studies congestion-aware route-planning policies for Autonomous Mobility-on-Demand (AMoD) systems, whereby a fleet of autonomous vehicles provides on-demand mobility under mixed traffic conditions. Specifically, we first devise a network flow model to optimize the AMoD routing and rebalancing strategies in a congestion-aware fashion by accounting for the endogenous impact of AMoD flows on travel time. Second, we capture reactive exogenous traffic consisting of private vehicles selfishly adapting to the AMoD flows in a user-centric fashion by leveraging an iterative approach. Finally, we showcase the effectiveness of our framework with two case-studies considering the transportation sub-networks in Eastern Massachusetts and New York City. Our results suggest that for high levels of demand, pure AMoD travel can be detrimental due to the additional traffic stemming from its rebalancing flows, while the combination of AMoD with walking or micromobility options can significantly improve the overall system performance.

preprint2020arXiv

Explainability of Intelligent Transportation Systems using Knowledge Compilation: a Traffic Light Controller Case

Usage of automated controllers which make decisions on an environment are widespread and are often based on black-box models. We use Knowledge Compilation theory to bring explainability to the controller's decision given the state of the system. For this, we use simulated historical state-action data as input and build a compact and structured representation which relates states with actions. We implement this method in a Traffic Light Control scenario where the controller selects the light cycle by observing the presence (or absence) of vehicles in different regions of the incoming roads.

preprint2020arXiv

Joint Pricing and Rebalancing of Autonomous Mobility-on-Demand Systems

This paper studies optimal pricing and rebalancing policies for Autonomous Mobility-on-Demand (AMoD) systems. We take a macroscopic planning perspective to tackle a profit maximization problem while ensuring that the system is load-balanced. We begin by describing the system using a dynamic fluid model to show the existence and stability of an equilibrium (i.e., load balance) through pricing policies. We then develop an optimization framework that allows us to find optimal policies in terms of pricing and rebalancing. We first maximize profit by only using pricing policies, then incorporate rebalancing, and finally we consider whether the solution is found sequentially or jointly. We apply each approach on a data-driven case study using real taxi data from New York City. Depending on which benchmarking solution we use, the joint problem (i.e., pricing and rebalancing) increases profits by 7% to 40%

preprint2020arXiv

Local SGD With a Communication Overhead Depending Only on the Number of Workers

We consider speeding up stochastic gradient descent (SGD) by parallelizing it across multiple workers. We assume the same data set is shared among $n$ workers, who can take SGD steps and coordinate with a central server. Unfortunately, this could require a lot of communication between the workers and the server, which can dramatically reduce the gains from parallelism. The Local SGD method, proposed and analyzed in the earlier literature, suggests machines should make many local steps between such communications. While the initial analysis of Local SGD showed it needs $Ω( \sqrt{T} )$ communications for $T$ local gradient steps in order for the error to scale proportionately to $1/(nT)$, this has been successively improved in a string of papers, with the state-of-the-art requiring $Ω\left( n \left( \mbox{ polynomial in log } (T) \right) \right)$ communications. In this paper, we give a new analysis of Local SGD. A consequence of our analysis is that Local SGD can achieve an error that scales as $1/(nT)$ with only a fixed number of communications independent of $T$: specifically, only $Ω(n)$ communications are required.

preprint2020arXiv

Robust Grouped Variable Selection Using Distributionally Robust Optimization

We propose a Distributionally Robust Optimization (DRO) formulation with a Wasserstein-based uncertainty set for selecting grouped variables under perturbations on the data for both linear regression and classification problems. The resulting model offers robustness explanations for Grouped Least Absolute Shrinkage and Selection Operator (GLASSO) algorithms and highlights the connection between robustness and regularization. We prove probabilistic bounds on the out-of-sample loss and the estimation bias, and establish the grouping effect of our estimator, showing that coefficients in the same group converge to the same value as the sample correlation between covariates approaches 1. Based on this result, we propose to use the spectral clustering algorithm with the Gaussian similarity function to perform grouping on the predictors, which makes our approach applicable without knowing the grouping structure a priori. We compare our approach to an array of alternatives and provide extensive numerical results on both synthetic data and a real large dataset of surgery-related medical records, showing that our formulation produces an interpretable and parsimonious model that encourages sparsity at a group level and is able to achieve better prediction and estimation performance in the presence of outliers.

preprint2020arXiv

Robustified Multivariate Regression and Classification Using Distributionally Robust Optimization under the Wasserstein Metric

We develop Distributionally Robust Optimization (DRO) formulations for Multivariate Linear Regression (MLR) and Multiclass Logistic Regression (MLG) when both the covariates and responses/labels may be contaminated by outliers. The DRO framework uses a probabilistic ambiguity set defined as a ball of distributions that are close to the empirical distribution of the training set in the sense of the Wasserstein metric. We relax the DRO formulation into a regularized learning problem whose regularizer is a norm of the coefficient matrix. We establish out-of-sample performance guarantees for the solutions to our model, offering insights on the role of the regularizer in controlling the prediction error. Experimental results show that our approach improves the predictive error by 7% -- 37% for MLR, and a metric of robustness by 100% for MLG.

preprint2019arXiv

Joint Estimation of OD Demands and Cost Functions in Transportation Networks from Data

Existing work has tackled the problem of estimating Origin-Destination (OD) demands and recovering travel latency functions in transportation networks under the Wardropian assumption. The ultimate objective is to derive an accurate predictive model of the network to enable optimization and control. However, these two problems are typically treated separately and estimation is based on parametric models. In this paper, we propose a method to jointly recover nonparametric travel latency cost functions and estimate OD demands using traffic flow data. We formulate the problem as a bilevel optimization problem and develop an iterative first-order optimization algorithm to solve it. A numerical example using the Braess Network is presented to demonstrate the effectiveness of our method.

preprint2019arXiv

Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions

We consider the standard model of distributed optimization of a sum of functions $F(\bz) = \sum_{i=1}^n f_i(\bz)$, where node $i$ in a network holds the function $f_i(\bz)$. We allow for a harsh network model characterized by asynchronous updates, message delays, unpredictable message losses, and directed communication among nodes. In this setting, we analyze a modification of the Gradient-Push method for distributed optimization, assuming that \begin{enumerate*}[label=(\roman*)] \item node $i$ is capable of generating gradients of its function $f_i(\bz)$ corrupted by zero-mean bounded-support additive noise at each step, \item $F(\bz)$ is strongly convex, and \item each $f_i(\bz)$ has Lipschitz gradients. We show that our proposed method asymptotically performs as well as the best bounds on centralized gradient descent that takes steps in the direction of the sum of the noisy gradients of all the functions $f_1(\bz), \ldots, f_n(\bz)$ at each step.

preprint2016arXiv

An Improved Composite Hypothesis Test for Markov Models with Applications in Network Anomaly Detection

Recent work has proposed the use of a composite hypothesis Hoeffding test for statistical anomaly detection. Setting an appropriate threshold for the test given a desired false alarm probability involves approximating the false alarm probability. To that end, a large deviations asymptotic is typically used which, however, often results in an inaccurate setting of the threshold, especially for relatively small sample sizes. This, in turn, results in an anomaly detection test that does not control well for false alarms. In this paper, we develop a tighter approximation using the Central Limit Theorem (CLT) under Markovian assumptions. We apply our result to a network anomaly detection application and demonstrate its advantages over earlier work.

preprint2016arXiv

Data-driven Estimation of Origin-Destination Demand and User Cost Functions for the Optimization of Transportation Networks

In earlier work (Zhang et al., 2016) we used actual traffic data from the Eastern Massachusetts transportation network in the form of spatial average speeds and road segment flow capacities in order to estimate Origin-Destination (OD) flow demand matrices for the network. Based on a Traffic Assignment Problem (TAP) formulation (termed "forward problem"), in this paper we use a scheme similar to our earlier work to estimate initial OD demand matrices and then propose a new inverse problem formulation in order to estimate user cost functions. This new formulation allows us to efficiently overcome numerical difficulties that limited our prior work to relatively small subnetworks and, assuming the travel latency cost functions are available, to adjust the values of the OD demands accordingly so that the flow observations are as close as possible to the solutions of the forward problem. We also derive sensitivity analysis results for the total user latency cost with respect to important parameters such as road capacities and minimum travel times. Finally, using the same actual traffic data from the Eastern Massachusetts transportation network, we quantify the Price of Anarchy (POA) for a much larger network than that in Zhang et al. (2016).

preprint2016arXiv

Robust measurement-based buffer overflow probability estimators for QoS provisioning and traffic anomaly prediction applicationm

Suitable estimators for a class of Large Deviation approximations of rare event probabilities based on sample realizations of random processes have been proposed in our earlier work. These estimators are expressed as non-linear multi-dimensional optimization problems of a special structure. In this paper, we develop an algorithm to solve these optimization problems very efficiently based on their characteristic structure. After discussing the nature of the objective function and constraint set and their peculiarities, we provide a formal proof that the developed algorithm is guaranteed to always converge. The existence of efficient and provably convergent algorithms for solving these problems is a prerequisite for using the proposed estimators in real time problems such as call admission control, adaptive modulation and coding with QoS constraints, and traffic anomaly detection in high data rate communication networks.

preprint2015arXiv

Botnet Detection using Social Graph Analysis

Signature-based botnet detection methods identify botnets by recognizing Command and Control (C\&C) traffic and can be ineffective for botnets that use new and sophisticate mechanisms for such communications. To address these limitations, we propose a novel botnet detection method that analyzes the social relationships among nodes. The method consists of two stages: (i) anomaly detection in an "interaction" graph among nodes using large deviations results on the degree distribution, and (ii) community detection in a social "correlation" graph whose edges connect nodes with highly correlated communications. The latter stage uses a refined modularity measure and formulates the problem as a non-convex optimization problem for which appropriate relaxation strategies are developed. We apply our method to real-world botnet traffic and compare its performance with other community detection methods. The results show that our approach works effectively and the refined modularity measure improves the detection accuracy.

preprint2015arXiv

Robust Anomaly Detection in Dynamic Networks

We propose two robust methods for anomaly detection in dynamic networks in which the properties of normal traffic are time-varying. We formulate the robust anomaly detection problem as a binary composite hypothesis testing problem and propose two methods: a model-free and a model-based one, leveraging techniques from the theory of large deviations. Both methods require a family of Probability Laws (PLs) that represent normal properties of traffic. We devise a two-step procedure to estimate this family of PLs. We compare the performance of our robust methods and their vanilla counterparts, which assume that normal traffic is stationary, on a network with a diurnal normal pattern and a common anomaly related to data exfiltration. Simulation results show that our robust methods perform better than their vanilla counterparts in dynamic networks.

preprint2014arXiv

Data-Driven Estimation in Equilibrium Using Inverse Optimization

Equilibrium modeling is common in a variety of fields such as game theory and transportation science. The inputs for these models, however, are often difficult to estimate, while their outputs, i.e., the equilibria they are meant to describe, are often directly observable. By combining ideas from inverse optimization with the theory of variational inequalities, we develop an efficient, data-driven technique for estimating the parameters of these models from observed equilibria. We use this technique to estimate the utility functions of players in a game from their observed actions and to estimate the congestion function on a road network from traffic count data. A distinguishing feature of our approach is that it supports both parametric and \emph{nonparametric} estimation by leveraging ideas from statistical learning (kernel methods and regularization operators). In computational experiments involving Nash and Wardrop equilibria in a nonparametric setting, we find that a) we effectively estimate the unknown demand or congestion function, respectively, and b) our proposed regularization technique substantially improves the out-of-sample performance of our estimators.

preprint2013arXiv

Network Anomaly Detection: A Survey and Comparative Analysis of Stochastic and Deterministic Methods

We present five methods to the problem of network anomaly detection. These methods cover most of the common techniques in the anomaly detection field, including Statistical Hypothesis Tests (SHT), Support Vector Machines (SVM) and clustering analysis. We evaluate all methods in a simulated network that consists of nominal data, three flow-level anomalies and one packet-level attack. Through analyzing the results, we point out the advantages and disadvantages of each method and conclude that combining the results of the individual methods can yield improved anomaly detection results.

preprint2012arXiv

Temporal Logic Motion Control using Actor-Critic Methods

In this paper, we consider the problem of deploying a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through the regions of the environment as a Markov Decision Process (MDP). The robot control problem becomes finding the control policy maximizing the probability of satisfying the temporal logic task on the MDP. For a large environment, obtaining transition probabilities for each state-action pair, as well as solving the necessary optimization problem for the optimal policy are usually not computationally feasible. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates on sample paths of the robot and optimizes a randomized control policy with respect to a small set of parameters. The transition probabilities are obtained only when needed. Hardware-in-the-loop simulations confirm that convergence of the parameters translates to an approximately optimal policy.

preprint2011arXiv

Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control

We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic models of robot motion are required to satisfy temporal logic task specifications. We transform this problem into a Stochastic Shortest Path (SSP) problem and develop a new approximate dynamic programming algorithm to solve it. This algorithm is of the actor-critic type and uses a least-square temporal difference learning method. It operates on sample paths of the system and optimizes the policy within a pre-specified class parameterized by a parsimonious set of parameters. We show its convergence to a policy corresponding to a stationary point in the parameters' space. Simulation results confirm the effectiveness of the proposed solution.

Ioannis Ch. Paschalidis

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

Bridging the Gap Between Average and Discounted TD Learning

Multiple-policy Evaluation via Density Estimation

Towards General Preference Alignment: Diffusion Models at Nash Equilibrium

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

Online Baum-Welch algorithm for Hierarchical Imitation Learning

Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning

Congestion-aware Routing and Rebalancing of Autonomous Mobility-on-Demand Systems in Mixed Traffic

Explainability of Intelligent Transportation Systems using Knowledge Compilation: a Traffic Light Controller Case

Joint Pricing and Rebalancing of Autonomous Mobility-on-Demand Systems

Local SGD With a Communication Overhead Depending Only on the Number of Workers

Robust Grouped Variable Selection Using Distributionally Robust Optimization

Robustified Multivariate Regression and Classification Using Distributionally Robust Optimization under the Wasserstein Metric

Joint Estimation of OD Demands and Cost Functions in Transportation Networks from Data

Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions

An Improved Composite Hypothesis Test for Markov Models with Applications in Network Anomaly Detection

Data-driven Estimation of Origin-Destination Demand and User Cost Functions for the Optimization of Transportation Networks

Robust measurement-based buffer overflow probability estimators for QoS provisioning and traffic anomaly prediction applicationm

Botnet Detection using Social Graph Analysis

Robust Anomaly Detection in Dynamic Networks

Data-Driven Estimation in Equilibrium Using Inverse Optimization

Network Anomaly Detection: A Survey and Comparative Analysis of Stochastic and Deterministic Methods

Temporal Logic Motion Control using Actor-Critic Methods

Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control