Source author record

Andreas Spanias

Andreas Spanias appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Information Theory math.IT Artificial Intelligence Distributed, Parallel, and Cluster Computing eess.IV Systems and Control cs.CY eess.AS Hardware Architecture math.NA Sound

Catalog footprint

What is connected

21works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Geospatial-Temporal Sensemaking of Remote Sensing Activity Detections with Multimodal Large Language Model

We introduce SMART-HC-VQA, a Sentinel-2-based visual question answering dataset derived from the IARPA SMART Heavy Construction dataset, designed for spatiotemporal analysis of human activity. The dataset transforms construction-site annotations, construction-type labels, temporal-phase labels, geographic metadata, and observation relationships into natural language question-answer triplets. This approach redefines the existing dataset as a temporally extended automatic target recognition and visual question answering (VQA) challenge, considering a fixed geospatial site as a target whose attributes and activity states evolve across sparse satellite observations. Currently, SMART-HC-VQA comprises 21,837 accessible Sentinel-2 image chips, 65,511 single-image VQA examples, and approximately 2.3 million two-image temporal comparison examples generated via our novel Image-Pairwise Combinatorial Augmentation. We detail the workflow for retrieving and processing Sentinel-2 imagery, segmenting large satellite tiles into site-centered images, maintaining traceability to SMART-HC annotations, and analyzing the distributions of site size, observation count, temporal coverage, construction type, and phase labels. Additionally, we describe an implemented multi-image MLLM training framework based on LLaVA-NeXT Mistral-7B, adapted to accept multiple dated image inputs and train on metadata-derived VQA examples. This work offers a reproducible foundation for understanding language-guided remote sensing activities, aiming not only to detect change but also to reason about the ongoing processes, their progression, and potential future developments.

preprint2026arXiv

Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition

Large language-vision models (LLVM), such as OpenAI's ChatGPT and GPT-4, have gained prominence as powerful tools for analyzing text and imagery. The merging of these data domains represents a significant paradigm shift with far-reaching implications for automatic target recognition (ATR). Recent transformer-based LLVM research has shown substantial improvements for geospatial perception tasks. Our study examines the application of LLVM to remote sensing image captioning and visual question-answering (VQA), with a specific focus on synthetic aperture radar (SAR) imagery. We examine newly published LLVM methods, including CLIP and LLaVA neural network transformer architectures. We have developed a work-in-progress SAR training and evaluation benchmark derived from the MSTAR Public Dataset. This has been extended to include descriptive text captions and question-answer pairs for VQA tasks. This challenge dataset is designed to push the boundaries of an LLVM in identifying nuanced ATR details in SAR imagery. Utilizing parameter-efficient fine-tuning, we train an LLVM method to identify fine-grained target qualities at 98% accuracy. We detail our data setup and experiments, addressing potential pitfalls that could lead to misleading conclusions. Accurately identifying and differentiating military vehicle types in SAR data poses a critical challenge, especially under complex environmental conditions. Mastering this target recognition skill may require a human analyst months of training and years of practice. This research represents a unique effort to apply LLVM to SAR applications, advancing machine-assisted remote sensing ATR for military and intelligence contexts.

preprint2022arXiv

Adaptive Subsampling for ROI-based Visual Tracking: Algorithms and FPGA Implementation

There is tremendous scope for improving the energy efficiency of embedded vision systems by incorporating programmable region-of-interest (ROI) readout in the image sensor design. In this work, we study how ROI programmability can be leveraged for tracking applications by anticipating where the ROI will be located in future frames and switching pixels off outside of this region. We refer to this process of ROI prediction and corresponding sensor configuration as adaptive subsampling. Our adaptive subsampling algorithms comprise an object detector and an ROI predictor (Kalman filter) which operate in conjunction to optimize the energy efficiency of the vision pipeline with the end task being object tracking. To further facilitate the implementation of our adaptive algorithms in real life, we select a candidate algorithm and map it onto an FPGA. Leveraging Xilinx Vitis AI tools, we designed and accelerated a YOLO object detector-based adaptive subsampling algorithm. In order to further improve the algorithm post-deployment, we evaluated several competing baselines on the OTB100 and LaSOT datasets. We found that coupling the ECO tracker with the Kalman filter has a competitive AUC score of 0.4568 and 0.3471 on the OTB100 and LaSOT datasets respectively. Further, the power efficiency of this algorithm is on par with, and in a couple of instances superior to, the other baselines. The ECO-based algorithm incurs a power consumption of approximately 4 W averaged across both datasets while the YOLO-based approach requires power consumption of approximately 6 W (as per our power consumption model). In terms of accuracy-latency tradeoff, the ECO-based algorithm provides near-real-time performance (19.23 FPS) while managing to attain competitive tracking precision.

preprint2021arXiv

Loss Estimators Improve Model Generalization

With increased interest in adopting AI methods for clinical diagnosis, a vital step towards safe deployment of such tools is to ensure that the models not only produce accurate predictions but also do not generalize to data regimes where the training data provide no meaningful evidence. Existing approaches for ensuring the distribution of model predictions to be similar to that of the true distribution rely on explicit uncertainty estimators that are inherently hard to calibrate. In this paper, we propose to train a loss estimator alongside the predictive model, using a contrastive training objective, to directly estimate the prediction uncertainties. Interestingly, we find that, in addition to producing well-calibrated uncertainties, this approach improves the generalization behavior of the predictor. Using a dermatology use-case, we show the impact of loss estimators on model generalization, in terms of both its fidelity on in-distribution data and its ability to detect out of distribution samples or new classes unseen during training.

preprint2020arXiv

A Regularized Attention Mechanism for Graph Attention Networks

Machine learning models that can exploit the inherent structure in data have gained prominence. In particular, there is a surge in deep learning solutions for graph-structured data, due to its wide-spread applicability in several fields. Graph attention networks (GAT), a recent addition to the broad class of feature learning models in graphs, utilizes the attention mechanism to efficiently learn continuous vector representations for semi-supervised learning problems. In this paper, we perform a detailed analysis of GAT models, and present interesting insights into their behavior. In particular, we show that the models are vulnerable to heterogeneous rogue nodes and hence propose novel regularization strategies to improve the robustness of GAT models. Using benchmark datasets, we demonstrate performance improvements on semi-supervised learning, using the proposed robust variant of GAT.

preprint2020arXiv

Invenio: Discovering Hidden Relationships Between Tasks/Domains Using Structured Meta Learning

Exploiting known semantic relationships between fine-grained tasks is critical to the success of recent model agnostic approaches. These approaches often rely on meta-optimization to make a model robust to systematic task or domain shifts. However, in practice, the performance of these methods can suffer, when there are no coherent semantic relationships between the tasks (or domains). We present Invenio, a structured meta-learning algorithm to infer semantic similarities between a given set of tasks and to provide insights into the complexity of transferring knowledge between different tasks. In contrast to existing techniques such as Task2Vec and Taskonomy, which measure similarities between pre-trained models, our approach employs a novel self-supervised learning strategy to discover these relationships in the training loop and at the same time utilizes them to update task-specific models in the meta-update step. Using challenging task and domain databases, under few-shot learning settings, we show that Invenio can discover intricate dependencies between tasks or domains, and can provide significant gains over existing approaches in terms of generalization performance. The learned semantic structure between tasks/domains from Invenio is interpretable and can be used to construct meaningful priors for tasks or domains.

preprint2020arXiv

Unsupervised Audio Source Separation using Generative Priors

State-of-the-art under-determined audio source separation systems rely on supervised end-end training of carefully tailored neural network architectures operating either in the time or the spectral domain. However, these methods are severely challenged in terms of requiring access to expensive source level labeled data and being specific to a given set of sources and the mixing process, which demands complete re-training when those assumptions change. This strongly emphasizes the need for unsupervised methods that can leverage the recent advances in data-driven modeling, and compensate for the lack of labeled data through meaningful priors. To this end, we propose a novel approach for audio source separation based on generative priors trained on individual sources. Through the use of projected gradient descent optimization, our approach simultaneously searches in the source-specific latent spaces to effectively recover the constituent sources. Though the generative priors can be defined in the time domain directly, e.g. WaveGAN, we find that using spectral domain loss functions for our optimization leads to good-quality source estimates. Our empirical studies on standard spoken digit and instrument datasets clearly demonstrate the effectiveness of our approach over classical as well as state-of-the-art unsupervised baselines.

preprint2016arXiv

A Deep Learning Approach To Multiple Kernel Fusion

Kernel fusion is a popular and effective approach for combining multiple features that characterize different aspects of data. Traditional approaches for Multiple Kernel Learning (MKL) attempt to learn the parameters for combining the kernels through sophisticated optimization procedures. In this paper, we propose an alternative approach that creates dense embeddings for data using the kernel similarities and adopts a deep neural network architecture for fusing the embeddings. In order to improve the effectiveness of this network, we introduce the kernel dropout regularization strategy coupled with the use of an expanded set of composition kernels. Experiment results on a real-world activity recognition dataset show that the proposed architecture is effective in fusing kernels and achieves state-of-the-art performance.

preprint2016arXiv

Max Consensus in Sensor Networks: Non-linear Bounded Transmission and Additive Noise

A distributed consensus algorithm for estimating the maximum value of the initial measurements in a sensor network with communication noise is proposed. In the absence of communication noise, max estimation can be done by updating the state value with the largest received measurements in every iteration at each sensor. In the presence of communication noise, however, the maximum estimate will incorrectly drift and the estimate at each sensor will diverge. As a result, a soft-max approximation together with a non-linear consensus algorithm is introduced herein. A design parameter controls the trade-off between the soft-max error and convergence speed. An analysis of this trade-off gives a guideline towards how to choose the design parameter for the max estimate. We also show that if some prior knowledge of the initial measurements is available, the consensus process can converge faster by using an optimal step size in the iterative algorithm. A shifted non-linear bounded transmit function is also introduced for faster convergence when sensor nodes have some prior knowledge of the initial measurements. Simulation results corroborating the theory are also provided.

preprint2015arXiv

Empirically Estimable Classification Bounds Based on a New Divergence Measure

Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks.

preprint2015arXiv

Undergraduate Signal Processing Laboratories for the Android Operating System

We present a DSP simulation environment that will enable students to perform laboratory exercises using Android mobile devices and tablets. Due to the pervasive nature of the mobile technology, education applications designed for mobile devices have the potential to stimulate student interest in addition to offering convenient access and interaction capabilities. This paper describes a portable signal processing laboratory for the Android platform. This software is intended to be an educational tool for students and instructors in DSP, and signals and systems courses. The development of Android JDSP (A-JDSP) is carried out using the Android SDK, which is a Java-based open source development platform. The proposed application contains basic DSP functions for convolution, sampling, FFT, filtering and frequency domain analysis, with a convenient graphical user interface. A description of the architecture, functions and planned assessments are presented in this paper.

preprint2014arXiv

Robust Consensus in the Presence of Impulsive Channel Noise

A distributed average consensus algorithm robust to a wide range of impulsive channel noise distributions is proposed. This work is the first of its kind in the literature to propose a consensus algorithm which relaxes the requirement of finite moments on the communication noise. It is shown that the nodes reach consensus asymptotically to a finite random variable whose expectation is the desired sample average of the initial observations with a variance that depends on the step size of the algorithm and the receiver nonlinear function. The asymptotic performance is characterized by deriving the asymptotic covariance matrix using results from stochastic approximation theory. Simulations corroborate our analytical findings and highlight the robustness of the proposed algorithm.

preprint2013arXiv

Ensemble Sparse Models for Image Analysis

Sparse representations with learned dictionaries have been successful in several image analysis applications. In this paper, we propose and analyze the framework of ensemble sparse models, and demonstrate their utility in image restoration and unsupervised clustering. The proposed ensemble model approximates the data as a linear combination of approximations from multiple \textit{weak} sparse models. Theoretical analysis of the ensemble model reveals that even in the worst-case, the ensemble can perform better than any of its constituent individual models. The dictionaries corresponding to the individual sparse models are obtained using either random example selection or boosted approaches. Boosted approaches learn one dictionary per round such that the dictionary learned in a particular round is optimized for the training examples having high reconstruction error in the previous round. Results with compressed recovery show that the ensemble representations lead to a better performance compared to using a single dictionary obtained with the conventional alternating minimization approach. The proposed ensemble models are also used for single image superresolution, and we show that they perform comparably to the recent approaches. In unsupervised clustering, experiments show that the proposed model performs better than baseline approaches in several standard datasets.

preprint2013arXiv

Kernel Sparse Models for Automated Tumor Segmentation

In this paper, we propose sparse coding-based approaches for segmentation of tumor regions from MR images. Sparse coding with data-adapted dictionaries has been successfully employed in several image recovery and vision problems. The proposed approaches obtain sparse codes for each pixel in brain magnetic resonance images considering their intensity values and location information. Since it is trivial to obtain pixel-wise sparse codes, and combining multiple features in the sparse coding setup is not straightforward, we propose to perform sparse coding in a high-dimensional feature space where non-linear similarities can be effectively modeled. We use the training data from expert-segmented images to obtain kernel dictionaries with the kernel K-lines clustering procedure. For a test image, sparse codes are computed with these kernel dictionaries, and they are used to identify the tumor regions. This approach is completely automated, and does not require user intervention to initialize the tumor regions in a test image. Furthermore, a low complexity segmentation approach based on kernel sparse codes, which allows the user to initialize the tumor region, is also presented. Results obtained with both the proposed approaches are validated against manual segmentation by an expert radiologist, and the proposed methods lead to accurate tumor identification.

preprint2013arXiv

Learning Stable Multilevel Dictionaries for Sparse Representations

Sparse representations using learned dictionaries are being increasingly used with success in several data processing and machine learning applications. The availability of abundant training data necessitates the development of efficient, robust and provably good dictionary learning algorithms. Algorithmic stability and generalization are desirable characteristics for dictionary learning algorithms that aim to build global dictionaries which can efficiently model any test data similar to the training samples. In this paper, we propose an algorithm to learn dictionaries for sparse representations from large scale data, and prove that the proposed learning algorithm is stable and generalizable asymptotically. The algorithm employs a 1-D subspace clustering procedure, the K-hyperline clustering, in order to learn a hierarchical dictionary with multiple levels. We also propose an information-theoretic scheme to estimate the number of atoms needed in each level of learning and develop an ensemble approach to learn robust dictionaries. Using the proposed dictionaries, the sparse code for novel test data can be computed using a low-complexity pursuit procedure. We demonstrate the stability and generalization characteristics of the proposed algorithm using simulations. We also evaluate the utility of the multilevel dictionaries in compressed recovery and subspace learning applications.

preprint2013arXiv

Multiple Kernel Sparse Representations for Supervised and Unsupervised Learning

In complex visual recognition tasks it is typical to adopt multiple descriptors, that describe different aspects of the images, for obtaining an improved recognition performance. Descriptors that have diverse forms can be fused into a unified feature space in a principled manner using kernel methods. Sparse models that generalize well to the test data can be learned in the unified kernel space, and appropriate constraints can be incorporated for application in supervised and unsupervised learning. In this paper, we propose to perform sparse coding and dictionary learning in the multiple kernel space, where the weights of the ensemble kernel are tuned based on graph-embedding principles such that class discrimination is maximized. In our proposed algorithm, dictionaries are inferred using multiple levels of 1-D subspace clustering in the kernel space, and the sparse codes are obtained using a simple levelwise pursuit scheme. Empirical results for object recognition and image clustering show that our algorithm outperforms existing sparse coding based approaches, and compares favorably to other state-of-the-art methods.

preprint2013arXiv

Non-Linear Distributed Average Consensus using Bounded Transmissions

A distributed average consensus algorithm in which every sensor transmits with bounded peak power is proposed. In the presence of communication noise, it is shown that the nodes reach consensus asymptotically to a finite random variable whose expectation is the desired sample average of the initial observations with a variance that depends on the step size of the algorithm and the variance of the communication noise. The asymptotic performance is characterized by deriving the asymptotic covariance matrix using results from stochastic approximation theory. It is shown that using bounded transmissions results in slower convergence compared to the linear consensus algorithm based on the Laplacian heuristic. Simulations corroborate our analytical findings.

preprint2013arXiv

Recovering Non-negative and Combined Sparse Representations

The non-negative solution to an underdetermined linear system can be uniquely recovered sometimes, even without imposing any additional sparsity constraints. In this paper, we derive conditions under which a unique non-negative solution for such a system can exist, based on the theory of polytopes. Furthermore, we develop the paradigm of combined sparse representations, where only a part of the coefficient vector is constrained to be non-negative, and the rest is unconstrained (general). We analyze the recovery of the unique, sparsest solution, for combined representations, under three different cases of coefficient support knowledge: (a) the non-zero supports of non-negative and general coefficients are known, (b) the non-zero support of general coefficients alone is known, and (c) both the non-zero supports are unknown. For case (c), we propose the combined orthogonal matching pursuit algorithm for coefficient recovery and derive the deterministic sparsity threshold under which recovery of the unique, sparsest coefficient vector is possible. We quantify the order complexity of the algorithms, and examine their performance in exact and approximate recovery of coefficients under various conditions of noise. Furthermore, we also obtain their empirical phase transition characteristics. We show that the basis pursuit algorithm, with partial non-negative constraints, and the proposed greedy algorithm perform better in recovering the unique sparse representation when compared to their unconstrained counterparts. Finally, we demonstrate the utility of the proposed methods in recovering images corrupted by saturation noise.

preprint2011arXiv

Distributed SNR Estimation using Constant Modulus Signaling over Gaussian Multiple-Access Channels

A sensor network is used for distributed joint mean and variance estimation, in a single time snapshot. Sensors observe a signal embedded in noise, which are phase modulated using a constant-modulus scheme and transmitted over a Gaussian multiple-access channel to a fusion center, where the mean and variance are estimated jointly, using an asymptotically minimum-variance estimator, which is shown to decouple into simple individual estimators of the mean and the variance. The constant-modulus phase modulation scheme ensures a fixed transmit power, robust estimation across several sensing noise distributions, as well as an SNR estimate that requires a single set of transmissions from the sensors to the fusion center, unlike the amplify-and-forward approach. The performance of the estimators of the mean and variance are evaluated in terms of asymptotic variance, which is used to evaluate the performance of the SNR estimator in the case of Gaussian, Laplace and Cauchy sensing noise distributions. For each sensing noise distribution, the optimal phase transmission parameters are also determined. The asymptotic relative efficiency of the mean and variance estimators is evaluated. It is shown that among the noise distributions considered, the estimators are asymptotically efficient only when the noise distribution is Gaussian. Simulation results corroborate analytical results.

preprint2010arXiv

Distributed Detection over Fading MACs with Multiple Antennas at the Fusion Center

A distributed detection problem over fading Gaussian multiple-access channels is considered. Sensors observe a phenomenon and transmit their observations to a fusion center using the amplify and forward scheme. The fusion center has multiple antennas with different channel models considered between the sensors and the fusion center, and different cases of channel state information are assumed at the sensors. The performance is evaluated in terms of the error exponent for each of these cases, where the effect of multiple antennas at the fusion center is studied. It is shown that for zero-mean channels between the sensors and the fusion center when there is no channel information at the sensors, arbitrarily large gains in the error exponent can be obtained with sufficient increase in the number of antennas at the fusion center. In stark contrast, when there is channel information at the sensors, the gain in error exponent due to having multiple antennas at the fusion center is shown to be no more than a factor of (8/pi) for Rayleigh fading channels between the sensors and the fusion center, independent of the number of antennas at the fusion center, or correlation among noise samples across sensors. Scaling laws for such gains are also provided when both sensors and antennas are increased simultaneously. Simple practical schemes and a numerical method using semidefinite relaxation techniques are presented that utilize the limited possible gains available. Simulations are used to establish the accuracy of the results.

preprint2010arXiv

On Inequalities Relating the Characteristic Function and Fisher Information

A relationship between the Fisher information and the characteristic function is established with the help of two inequalities. A necessary and sufficient condition for equality is found. These results are used to determine the asymptotic efficiency of a distributed estimation algorithm that uses constant modulus transmissions over Gaussian multiple access channels. The loss in efficiency of the distributed estimation scheme relative to the centralized approach is quantified for different sensing noise distributions. It is shown that the distributed estimator does not incur an efficiency loss if and only if the sensing noise distribution is Gaussian.

Andreas Spanias

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Geospatial-Temporal Sensemaking of Remote Sensing Activity Detections with Multimodal Large Language Model

Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition

Adaptive Subsampling for ROI-based Visual Tracking: Algorithms and FPGA Implementation

Loss Estimators Improve Model Generalization

A Regularized Attention Mechanism for Graph Attention Networks

Invenio: Discovering Hidden Relationships Between Tasks/Domains Using Structured Meta Learning

Unsupervised Audio Source Separation using Generative Priors

A Deep Learning Approach To Multiple Kernel Fusion

Max Consensus in Sensor Networks: Non-linear Bounded Transmission and Additive Noise

Empirically Estimable Classification Bounds Based on a New Divergence Measure

Undergraduate Signal Processing Laboratories for the Android Operating System

Robust Consensus in the Presence of Impulsive Channel Noise

Ensemble Sparse Models for Image Analysis

Kernel Sparse Models for Automated Tumor Segmentation

Learning Stable Multilevel Dictionaries for Sparse Representations

Multiple Kernel Sparse Representations for Supervised and Unsupervised Learning

Non-Linear Distributed Average Consensus using Bounded Transmissions

Recovering Non-negative and Combined Sparse Representations

Distributed SNR Estimation using Constant Modulus Signaling over Gaussian Multiple-Access Channels

Distributed Detection over Fading MACs with Multiple Antennas at the Fusion Center

On Inequalities Relating the Characteristic Function and Fisher Information