Source author record

Theodoros Tsiligkaridis

Theodoros Tsiligkaridis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT Methodology Computer Vision Multiagent Systems Systems and Control Artificial Intelligence Social and Information Networks

Catalog footprint

What is connected

16works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Fourier-Based Augmentations for Improved Robustness and Uncertainty Calibration

Diverse data augmentation strategies are a natural approach to improving robustness in computer vision models against unforeseen shifts in data distribution. However, the ability to tailor such strategies to inoculate a model against specific classes of corruptions or attacks -- without incurring substantial losses in robustness against other classes of corruptions -- remains elusive. In this work, we successfully harden a model against Fourier-based attacks, while producing superior-to-AugMix accuracy and calibration results on both the CIFAR-10-C and CIFAR-100-C datasets; classification error is reduced by over ten percentage points for some high-severity noise and digital-type corruptions. We achieve this by incorporating Fourier-basis perturbations in the AugMix image-augmentation framework. Thus we demonstrate that the AugMix framework can be tailored to effectively target particular distribution shifts, while boosting overall model robustness.

preprint2022arXiv

Graph-Guided Network for Irregularly Sampled Multivariate Time Series

In many domains, including healthcare, biology, and climate science, time series are irregularly sampled with varying time intervals between successive readouts and different subsets of variables (sensors) observed at different time points. Here, we introduce RAINDROP, a graph neural network that embeds irregularly sampled and multivariate time series while also learning the dynamics of sensors purely from observational data. RAINDROP represents every sample as a separate sensor graph and models time-varying dependencies between sensors with a novel message passing operator. It estimates the latent sensor graph structure and leverages the structure together with nearby observations to predict misaligned readouts. This model can be interpreted as a graph neural network that sends messages over graphs that are optimized for capturing time-varying dependencies among sensors. We use RAINDROP to classify time series and interpret temporal dynamics on three healthcare and human activity datasets. RAINDROP outperforms state-of-the-art methods by up to 11.4% (absolute F1-score points), including techniques that deal with irregular sampling using fixed discretization and set functions. RAINDROP shows superiority in diverse setups, including challenging leave-sensor-out settings.

preprint2022arXiv

Pick up the PACE: Fast and Simple Domain Adaptation via Ensemble Pseudo-Labeling

Domain Adaptation (DA) has received widespread attention from deep learning researchers in recent years because of its potential to improve test accuracy with out-of-distribution labeled data. Most state-of-the-art DA algorithms require an extensive amount of hyperparameter tuning and are computationally intensive due to the large batch sizes required. In this work, we propose a fast and simple DA method consisting of three stages: (1) domain alignment by covariance matching, (2) pseudo-labeling, and (3) ensembling. We call this method $\textbf{PACE}$, for $\textbf{P}$seudo-labels, $\textbf{A}$lignment of $\textbf{C}$ovariances, and $\textbf{E}$nsembles. PACE is trained on top of fixed features extracted from an ensemble of modern pretrained backbones. PACE exceeds previous state-of-the-art by $\textbf{5 - 10 \%}$ on most benchmark adaptation tasks without training a neural network. PACE reduces training time and hyperparameter tuning time by $82\%$ and $97\%$, respectively, when compared to state-of-the-art DA methods. Code is released here: https://github.com/Chris210634/PACE-Domain-Adaptation

preprint2022arXiv

Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique that approximately solves a robust optimization problem to minimize the worst-case loss and is widely regarded as the most effective defense. Due to the high computation time for generating strong adversarial examples in the AT process, single-step approaches have been proposed to reduce training time. However, these methods suffer from catastrophic overfitting where adversarial accuracy drops during training, and although improvements have been proposed, they increase training time and robustness is far from that of multi-step AT. We develop a theoretical framework for adversarial training with FW optimization (FW-AT) that reveals a geometric connection between the loss landscape and the $\ell_2$ distortion of $\ell_\infty$ FW attacks. We analytically show that high distortion of FW attacks is equivalent to small gradient variation along the attack path. It is then experimentally demonstrated on various deep neural network architectures that $\ell_\infty$ attacks against robust models achieve near maximal distortion, while standard networks have lower distortion. It is experimentally shown that catastrophic overfitting is strongly correlated with low distortion of FW attacks. This mathematical transparency differentiates FW from Projected Gradient Descent (PGD) optimization. To demonstrate the utility of our theoretical framework we develop FW-AT-Adapt, a novel adversarial training algorithm which uses a simple distortion measure to adapt the number of attack steps during training to increase efficiency without compromising robustness. FW-AT-Adapt provides training time on par with single-step fast AT methods and closes the gap between fast AT methods and multi-step PGD-AT with minimal loss in adversarial accuracy in white-box and black-box settings.

preprint2021arXiv

Information Aware Max-Norm Dirichlet Networks for Predictive Uncertainty Estimation

Precise estimation of uncertainty in predictions for AI systems is a critical factor in ensuring trust and safety. Deep neural networks trained with a conventional method are prone to over-confident predictions. In contrast to Bayesian neural networks that learn approximate distributions on weights to infer prediction confidence, we propose a novel method, Information Aware Dirichlet networks, that learn an explicit Dirichlet prior distribution on predictive distributions by minimizing a bound on the expected max norm of the prediction error and penalizing information associated with incorrect outcomes. Properties of the new cost function are derived to indicate how improved uncertainty estimation is achieved. Experiments using real datasets show that our technique outperforms, by a large margin, state-of-the-art neural networks for estimating within-distribution and out-of-distribution uncertainty, and detecting adversarial examples.

preprint2020arXiv

Active query-driven visual search using probabilistic bisection and convolutional neural networks

We present a novel efficient object detection and localization framework based on the probabilistic bisection algorithm. A Convolutional Neural Network (CNN) is trained and used as a noisy oracle that provides answers to input query images. The responses along with error probability estimates obtained from the CNN are used to update beliefs on the object location along each dimension. We show that querying along each dimension achieves the same lower bound on localization error as the joint query design. Finally, we compare our approach to the traditional sliding window technique on a real world face localization task and show speed improvements by at least an order of magnitude while maintaining accurate localization.

preprint2020arXiv

Second Order Optimization for Adversarial Robustness and Interpretability

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique aimed at learning features robust to such attacks and is widely regarded as a very effective defense. However, the computational cost of such training can be prohibitive as the network size and input dimensions grow. Inspired by the relationship between robustness and curvature, we propose a novel regularizer which incorporates first and second order information via a quadratic approximation to the adversarial loss. The worst case quadratic loss is approximated via an iterative scheme. It is shown that using only a single iteration in our regularizer achieves stronger robustness than prior gradient and curvature regularization schemes, avoids gradient obfuscation, and, with additional iterations, achieves strong robustness with significantly lower training time than AT. Further, it retains the interesting facet of AT that networks learn features which are well-aligned with human perception. We demonstrate experimentally that our method produces higher quality human-interpretable features than other geometric regularization techniques. These robust features are then used to provide human-friendly explanations to model predictions.

preprint2016arXiv

Asynchronous Decentralized 20 Questions for Adaptive Search

This paper considers the problem of adaptively searching for an unknown target using multiple agents connected through a time-varying network topology. Agents are equipped with sensors capable of fast information processing, and we propose a decentralized collaborative algorithm for controlling their search given noisy observations. Specifically, we propose decentralized extensions of the adaptive query-based search strategy that combines elements from the 20 questions approach and social learning. Under standard assumptions on the time-varying network dynamics, we prove convergence to correct consensus on the value of the parameter as the number of iterations go to infinity. The convergence analysis takes a novel approach using martingale-based techniques combined with spectral graph theory. Our results establish that stability and consistency can be maintained even with one-way updating and randomized pairwise averaging, thus providing a scalable low complexity method with performance guarantees. We illustrate the effectiveness of our algorithm for random network topologies.

preprint2016arXiv

Distributed Probabilistic Bisection Search using Social Learning

We present a novel distributed probabilistic bisection algorithm using social learning with application to target localization. Each agent in the network first constructs a query about the target based on its local information and obtains a noisy response. Agents then perform a Bayesian update of their beliefs followed by an averaging of the log beliefs over local neighborhoods. This two stage algorithm consisting of repeated querying and averaging runs until convergence. We derive bounds on the rate of convergence of the beliefs at the correct target location. Numerical simulations show that our method outperforms current state of the art methods.

preprint2015arXiv

Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models

We develop a sequential low-complexity inference procedure for Dirichlet process mixtures of Gaussians for online clustering and parameter estimation when the number of clusters are unknown a-priori. We present an easily computable, closed form parametric expression for the conditional likelihood, in which hyperparameters are recursively updated as a function of the streaming data assuming conjugate priors. Motivated by large-sample asymptotics, we propose a novel adaptive low-complexity design for the Dirichlet process concentration parameter and show that the number of classes grow at most at a logarithmic rate. We further prove that in the large-sample limit, the conditional likelihood and data predictive distribution become asymptotically Gaussian. We demonstrate through experiments on synthetic and real data sets that our approach is superior to other online state-of-the-art methods.

preprint2015arXiv

On Decentralized Estimation with Active Queries

We consider the problem of decentralized 20 questions with noise for multiple players/agents under the minimum entropy criterion in the setting of stochastic search over a parameter space, with application to target localization. We propose decentralized extensions of the active query-based stochastic search strategy that combines elements from the 20 questions approach and social learning. We prove convergence to correct consensus on the value of the parameter. This framework provides a flexible and tractable mathematical model for decentralized parameter estimation systems based on active querying. We illustrate the effectiveness and robustness of the proposed decentralized collaborative 20 questions algorithm for random network topologies with information sharing.

preprint2014arXiv

Collaborative 20 Questions for Target Localization

We consider the problem of 20 questions with noise for multiple players under the minimum entropy criterion in the setting of stochastic search, with application to target localization. Each player yields a noisy response to a binary query governed by a certain error probability. First, we propose a sequential policy for constructing questions that queries each player in sequence and refines the posterior of the target location. Second, we consider a joint policy that asks all players questions in parallel at each time instant and characterize the structure of the optimal policy for constructing the sequence of questions. This generalizes the single player probabilistic bisection method for stochastic search problems. Third, we prove an equivalence between the two schemes showing that, despite the fact that the sequential scheme has access to a more refined filtration, the joint scheme performs just as well on average. Fourth, we establish convergence rates of the mean-square error (MSE) and derive error exponents. Lastly, we obtain an extension to the case of unknown error probabilities. This framework provides a mathematical model for incorporating a human in the loop for active machine learning systems.

preprint2014arXiv

Secure MIMO Communications under Quantized Channel Feedback in the presence of Jamming

We consider the problem of secure communications in a MIMO setting in the presence of an adversarial jammer equipped with $n_j$ transmit antennas and an eavesdropper equipped with $n_e$ receive antennas. A multiantenna transmitter, equipped with $n_t$ antennas, desires to secretly communicate a message to a multiantenna receiver equipped with $n_r$ antennas. We propose a transmission method based on artificial noise and linear precoding and a two-stage receiver method employing beamforming. Under this strategy, we first characterize the achievable secrecy rates of communication and prove that the achievable secure degrees-of-freedom (SDoF) is given by $d_s = n_r - n_j$ in the perfect channel state information (CSI) case. Second, we consider quantized CSI feedback using Grassmannian quantization of a function of the direct channel matrix and derive sufficient conditions for the quantization bit rate scaling as a function of transmit power for maintaining the achievable SDoF $d_s$ with perfect CSI and for having asymptotically zero secrecy rate loss due to quantization. Numerical simulations are also provided to support the theory.

preprint2013arXiv

Convergence Properties of Kronecker Graphical Lasso Algorithms

This paper studies iteration convergence of Kronecker graphical lasso (KGLasso) algorithms for estimating the covariance of an i.i.d. Gaussian random sample under a sparse Kronecker-product covariance model and MSE convergence rates. The KGlasso model, originally called the transposable regularized covariance model by Allen ["Transposable regularized covariance models with an application to missing data imputation," Ann. Appl. Statist., vol. 4, no. 2, pp. 764-790, 2010], implements a pair of $\ell_1$ penalties on each Kronecker factor to enforce sparsity in the covariance estimator. The KGlasso algorithm generalizes Glasso, introduced by Yuan and Lin ["Model selection and estimation in the Gaussian graphical model," Biometrika, vol. 94, pp. 19-35, 2007] and Banerjee ["Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data," J. Mach. Learn. Res., vol. 9, pp. 485-516, Mar. 2008], to estimate covariances having Kronecker product form. It also generalizes the unpenalized ML flip-flop (FF) algorithm of Dutilleul ["The MLE algorithm for the matrix normal distribution," J. Statist. Comput. Simul., vol. 64, pp. 105-123, 1999] and Werner ["On estimation of covariance matrices with Kronecker product structure," IEEE Trans. Signal Process., vol. 56, no. 2, pp. 478-491, Feb. 2008] to estimation of sparse Kronecker factors. We establish that the KGlasso iterates converge pointwise to a local maximum of the penalized likelihood function. We derive high dimensional rates of convergence to the true covariance as both the number of samples and the number of variables go to infinity. Our results establish that KGlasso has significantly faster asymptotic convergence than Glasso and FF. Simulations are presented that validate the results of our analysis.

preprint2013arXiv

Covariance Estimation in High Dimensions via Kronecker Product Expansions

This paper presents a new method for estimating high dimensional covariance matrices. The method, permuted rank-penalized least-squares (PRLS), is based on a Kronecker product series expansion of the true covariance matrix. Assuming an i.i.d. Gaussian random sample, we establish high dimensional rates of convergence to the true covariance as both the number of samples and the number of variables go to infinity. For covariance matrices of low separation rank, our results establish that PRLS has significantly faster convergence than the standard sample covariance matrix (SCM) estimator. The convergence rate captures a fundamental tradeoff between estimation error and approximation error, thus providing a scalable covariance estimation framework in terms of separation rank, similar to low rank approximation of covariance matrices. The MSE convergence rates generalize the high dimensional rates recently obtained for the ML Flip-flop algorithm for Kronecker product covariance estimation. We show that a class of block Toeplitz covariance matrices is approximatable by low separation rank and give bounds on the minimal separation rank $r$ that ensures a given level of bias. Simulations are presented to validate the theoretical bounds. As a real world application, we illustrate the utility of the proposed Kronecker covariance estimator for spatio-temporal linear least squares prediction of multivariate wind speed measurements.

preprint2013arXiv

Kronecker Sum Decompositions of Space-Time Data

In this paper we consider the use of the space vs. time Kronecker product decomposition in the estimation of covariance matrices for spatio-temporal data. This decomposition imposes lower dimensional structure on the estimated covariance matrix, thus reducing the number of samples required for estimation. To allow a smooth tradeoff between the reduction in the number of parameters (to reduce estimation variance) and the accuracy of the covariance approximation (affecting estimation bias), we introduce a diagonally loaded modification of the sum of kronecker products representation [1]. We derive a Cramer-Rao bound (CRB) on the minimum attainable mean squared predictor coefficient estimation error for unbiased estimators of Kronecker structured covariance matrices. We illustrate the accuracy of the diagonally loaded Kronecker sum decomposition by applying it to video data of human activity.

Theodoros Tsiligkaridis

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Fourier-Based Augmentations for Improved Robustness and Uncertainty Calibration

Graph-Guided Network for Irregularly Sampled Multivariate Time Series

Pick up the PACE: Fast and Simple Domain Adaptation via Ensemble Pseudo-Labeling

Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training

Information Aware Max-Norm Dirichlet Networks for Predictive Uncertainty Estimation

Active query-driven visual search using probabilistic bisection and convolutional neural networks

Second Order Optimization for Adversarial Robustness and Interpretability

Asynchronous Decentralized 20 Questions for Adaptive Search

Distributed Probabilistic Bisection Search using Social Learning

Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models

On Decentralized Estimation with Active Queries

Collaborative 20 Questions for Target Localization

Secure MIMO Communications under Quantized Channel Feedback in the presence of Jamming

Convergence Properties of Kronecker Graphical Lasso Algorithms

Covariance Estimation in High Dimensions via Kronecker Product Expansions

Kronecker Sum Decompositions of Space-Time Data