Source author record

Nikos Deligiannis

Nikos Deligiannis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Information Theory math.IT Machine Learning Artificial Intelligence eess.IV math.OC Multimedia Networking and Internet Architecture Computation and Language eess.SP Information Retrieval Multiagent Systems Systems and Control

Catalog footprint

What is connected

19works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GATA2Floor: Graph attention for floor counting in street-view facades

Automated analysis of building facades from street-level imagery has great potential for urban analytics, energy assessment, and emergency planning. However, it requires reasoning over spatially arranged elements rather than solely isolated detections. In this work, we model each facade as a graph over window/door detections with a vertical prior on edges. Additionally, we introduce GATA2Floor, a multi-head Graph Attention v2 (GATv2) based model that predicts the global floor count of a building and, via learnable cross-attention queries, softly assigns elements to latent floor slots, yielding interpretable outputs and robustness to irregular designs. To mitigate the lack of labeled datasets, we demonstrate that the proposed graph-based reasoning can be applied without annotations by leveraging a lightweight label-free proposal mechanism based on self-supervised features and vision-language scoring. Our approach demonstrates the value of graph-attention-based relational reasoning for facade understanding.

preprint2022arXiv

Entropy-Based Feature Extraction For Real-Time Semantic Segmentation

This paper introduces an efficient patch-based computational module, coined Entropy-based Patch Encoder (EPE) module, for resource-constrained semantic segmentation. The EPE module consists of three lightweight fully-convolutional encoders, each extracting features from image patches with a different amount of entropy. Patches with high entropy are being processed by the encoder with the largest number of parameters, patches with moderate entropy are processed by the encoder with a moderate number of parameters, and patches with low entropy are processed by the smallest encoder. The intuition behind the module is the following: as patches with high entropy contain more information, they need an encoder with more parameters, unlike low entropy patches, which can be processed using a small encoder. Consequently, processing part of the patches via the smaller encoder can significantly reduce the computational cost of the module. Experiments show that EPE can boost the performance of existing real-time semantic segmentation models with a slight increase in the computational cost. Specifically, EPE increases the mIOU performance of DFANet A by 0.9% with only 1.2% increase in the number of parameters and the mIOU performance of EDANet by 1% with 10% increase of the model parameters.

preprint2022arXiv

Gradient Variance Loss for Structure-Enhanced Image Super-Resolution

Recent success in the field of single image super-resolution (SISR) is achieved by optimizing deep convolutional neural networks (CNNs) in the image space with the L1 or L2 loss. However, when trained with these loss functions, models usually fail to recover sharp edges present in the high-resolution (HR) images for the reason that the model tends to give a statistical average of potential HR solutions. During our research, we observe that gradient maps of images generated by the models trained with the L1 or L2 loss have significantly lower variance than the gradient maps of the original high-resolution images. In this work, we propose to alleviate the above issue by introducing a structure-enhancing loss function, coined Gradient Variance (GV) loss, and generate textures with perceptual-pleasant details. Specifically, during the training of the model, we extract patches from the gradient maps of the target and generated output, calculate the variance of each patch and form variance maps for these two images. Further, we minimize the distance between the computed variance maps to enforce the model to produce high variance gradient maps that will lead to the generation of high-resolution images with sharper edges. Experimental results show that the GV loss can significantly improve both Structure Similarity (SSIM) and peak signal-to-noise ratio (PSNR) performance of existing image super-resolution (SR) deep learning models.

preprint2022arXiv

NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks

Natural language explanation (NLE) models aim at explaining the decision-making process of a black box system via generating natural language sentences which are human-friendly, high-level and fine-grained. Current NLE models explain the decision-making process of a vision or vision-language model (a.k.a., task model), e.g., a VQA model, via a language model (a.k.a., explanation model), e.g., GPT. Other than the additional memory resources and inference time required by the task model, the task and explanation models are completely independent, which disassociates the explanation from the reasoning process made to predict the answer. We introduce NLX-GPT, a general, compact and faithful language model that can simultaneously predict an answer and explain it. We first conduct pre-training on large scale data of image-caption pairs for general understanding of images, and then formulate the answer as a text prediction task along with the explanation. Without region proposals nor a task model, our resulting overall framework attains better evaluation scores, contains much less parameters and is 15$\times$ faster than the current SoA model. We then address the problem of evaluating the explanations which can be in many times generic, data-biased and can come in several forms. We therefore design 2 new evaluation measures: (1) explain-predict and (2) retrieval-based attack, a self-evaluation framework that requires no labels. Code is at: https://github.com/fawazsammani/nlxgpt.

preprint2022arXiv

Representation Learning with Information Theory for COVID-19 Detection

Successful data representation is a fundamental factor in machine learning based medical imaging analysis. Deep Learning (DL) has taken an essential role in robust representation learning. However, the inability of deep models to generalize to unseen data can quickly overfit intricate patterns. Thereby, we can conveniently implement strategies to aid deep models in discovering useful priors from data to learn their intrinsic properties. Our model, which we call a dual role network (DRN), uses a dependency maximization approach based on Least Squared Mutual Information (LSMI). The LSMI leverages dependency measures to ensure representation invariance and local smoothness. While prior works have used information theory measures like mutual information, known to be computationally expensive due to a density estimation step, our LSMI formulation alleviates the issues of intractable mutual information estimation and can be used to approximate it. Experiments on CT based COVID-19 Detection and COVID-19 Severity Detection benchmarks demonstrate the effectiveness of our method.

preprint2020arXiv

Interpretable Deep Multimodal Image Super-Resolution

Multimodal image super-resolution (SR) is the reconstruction of a high resolution image given a low-resolution observation with the aid of another image modality. While existing deep multimodal models do not incorporate domain knowledge about image SR, we present a multimodal deep network design that integrates coupled sparse priors and allows the effective fusion of information from another modality into the reconstruction process. Our method is inspired by a novel iterative algorithm for coupled convolutional sparse coding, resulting in an interpretable network by design. We apply our model to the super-resolution of near-infrared image guided by RGB images. Experimental results show that our model outperforms state-of-the-art methods.

preprint2020arXiv

Interpretable Deep Recurrent Neural Networks via Unfolding Reweighted $\ell_1$-$\ell_1$ Minimization: Architecture Design and Generalization Analysis

Deep unfolding methods---for example, the learned iterative shrinkage thresholding algorithm (LISTA)---design deep neural networks as learned variations of optimization methods. These networks have been shown to achieve faster convergence and higher accuracy than the original optimization methods. In this line of research, this paper develops a novel deep recurrent neural network (coined reweighted-RNN) by the unfolding of a reweighted $\ell_1$-$\ell_1$ minimization algorithm and applies it to the task of sequential signal reconstruction. To the best of our knowledge, this is the first deep unfolding method that explores reweighted minimization. Due to the underlying reweighted minimization model, our RNN has a different soft-thresholding function (alias, different activation functions) for each hidden unit in each layer. Furthermore, it has higher network expressivity than existing deep unfolding RNN models due to the over-parameterizing weights. Importantly, we establish theoretical generalization error bounds for the proposed reweighted-RNN model by means of Rademacher complexity. The bounds reveal that the parameterization of the proposed reweighted-RNN ensures good generalization. We apply the proposed reweighted-RNN to the problem of video frame reconstruction from low-dimensional measurements, that is, sequential frame reconstruction. The experimental results on the moving MNIST dataset demonstrate that the proposed deep reweighted-RNN significantly outperforms existing RNN models.

preprint2020arXiv

On the Energy Self-Sustainability of IoT via Distributed Compressed Sensing

This paper advocates the use of the distributed compressed sensing (DCS) paradigm to deploy energy harvesting (EH) Internet of Thing (IoT) devices for energy self-sustainability. We consider networks with signal/energy models that capture the fact that both the collected signals and the harvested energy of different devices can exhibit correlation. We provide theoretical analysis on the performance of both the classical compressive sensing (CS) approach and the proposed distributed CS (DCS)-based approach to data acquisition for EH IoT. Moreover, we perform an in-depth comparison of the proposed DCS-based approach against the distributed source coding (DSC) system. These performance characterizations and comparisons embody the effect of various system phenomena and parameters including signal correlation, EH correlation, network size, and energy availability level. Our results unveil that, the proposed approach offers significant increase in data gathering capability with respect to the CS-based approach, and offers a substantial reduction of the mean-squared error distortion with respect to the DSC system.

preprint2019arXiv

Deep Coupled-Representation Learning for Sparse Linear Inverse Problems with Side Information

In linear inverse problems, the goal is to recover a target signal from undersampled, incomplete or noisy linear measurements. Typically, the recovery relies on complex numerical optimization methods; recent approaches perform an unfolding of a numerical algorithm into a neural network form, resulting in a substantial reduction of the computational complexity. In this paper, we consider the recovery of a target signal with the aid of a correlated signal, the so-called side information (SI), and propose a deep unfolding model that incorporates SI. The proposed model is used to learn coupled representations of correlated signals from different modalities, enabling the recovery of multimodal data at a low computational cost. As such, our work introduces the first deep unfolding method with SI, which actually comes from a different modality. We apply our model to reconstruct near-infrared images from undersampled measurements given RGB images as SI. Experimental results demonstrate the superior performance of the proposed framework against single-modal deep learning methods that do not use SI, multimodal deep learning designs, and optimization algorithms.

preprint2016arXiv

Distributed Coding of Multiview Sparse Sources with Joint Recovery

In support of applications involving multiview sources in distributed object recognition using lightweight cameras, we propose a new method for the distributed coding of sparse sources as visual descriptor histograms extracted from multiview images. The problem is challenging due to the computational and energy constraints at each camera as well as the limitations regarding inter-camera communication. Our approach addresses these challenges by exploiting the sparsity of the visual descriptor histograms as well as their intra- and inter-camera correlations. Our method couples distributed source coding of the sparse sources with a new joint recovery algorithm that incorporates multiple side information signals, where prior knowledge (low quality) of all the sparse sources is initially sent to exploit their correlations. Experimental evaluation using the histograms of shift-invariant feature transform (SIFT) descriptors extracted from multiview images shows that our method leads to bit-rate saving of up to 43% compared to the state-of-the-art distributed compressed sensing method with independent encoding of the sources.

preprint2016arXiv

Multi-modal dictionary learning for image separation with application in art investigation

In support of art investigation, we propose a new source separation method that unmixes a single X-ray scan acquired from double-sided paintings. In this problem, the X-ray signals to be separated have similar morphological characteristics, which brings previous source separation methods to their limits. Our solution is to use photographs taken from the front and back-side of the panel to drive the separation process. The crux of our approach relies on the coupling of the two imaging modalities (photographs and X-rays) using a novel coupled dictionary learning framework able to capture both common and disparate features across the modalities using parsimonious representations; the common component models features shared by the multi-modal images, whereas the innovation component captures modality-specific information. As such, our model enables the formulation of appropriately regularized convex optimization procedures that lead to the accurate separation of the X-rays. Our dictionary learning framework can be tailored both to a single- and a multi-scale framework, with the latter leading to a significant performance improvement. Moreover, to improve further on the visual quality of the separated images, we propose to train coupled dictionaries that ignore certain parts of the painting corresponding to craquelure. Experimentation on synthetic and real data - taken from digital acquisition of the Ghent Altarpiece (1432) - confirms the superiority of our method against the state-of-the-art morphological component analysis technique that uses either fixed or trained dictionaries to perform image separation.

preprint2016arXiv

X-ray image separation via coupled dictionary learning

In support of art investigation, we propose a new source sepa- ration method that unmixes a single X-ray scan acquired from double-sided paintings. Unlike prior source separation meth- ods, which are based on statistical or structural incoherence of the sources, we use visual images taken from the front- and back-side of the panel to drive the separation process. The coupling of the two imaging modalities is achieved via a new multi-scale dictionary learning method. Experimental results demonstrate that our method succeeds in the discrimination of the sources, while state-of-the-art methods fail to do so.

preprint2015arXiv

Adaptive-Rate Sparse Signal Reconstruction With Application in Compressive Background Subtraction

We propose and analyze an online algorithm for reconstructing a sequence of signals from a limited number of linear measurements. The signals are assumed sparse, with unknown support, and evolve over time according to a generic nonlinear dynamical model. Our algorithm, based on recent theoretical results for $\ell_1$-$\ell_1$ minimization, is recursive and computes the number of measurements to be taken at each time on-the-fly. As an example, we apply the algorithm to compressive video background subtraction, a problem that can be stated as follows: given a set of measurements of a sequence of images with a static background, simultaneously reconstruct each image while separating its foreground from the background. The performance of our method is illustrated on sequences of real images: we observe that it allows a dramatic reduction in the number of measurements with respect to state-of-the-art compressive background subtraction schemes.

preprint2015arXiv

Binary Rate Distortion With Side Information: The Asymmetric Correlation Channel Case

Advancing over up-to-date information theoretic results that assume symmetric correlation models, in this work we consider the problem of lossy binary source coding with side information, where the correlation is expressed by a generic binary asymmetric channel. Specifically, we present an in-depth analysis of rate distortion with side information available to both the encoder and decoder (conventional predictive), as well as the Wyner-Ziv problem for this particular setup. Prompted by our recent results for the Z-channel correlation case, we evaluate the rate loss between the Wyner-Ziv and the conventional predictive coding, as a function of the parameters of the binary asymmetric correlation channel.

preprint2015arXiv

Fast Desynchronization For Decentralized Multichannel Medium Access Control

Distributed desynchronization algorithms are key to wireless sensor networks as they allow for medium access control in a decentralized manner. In this paper, we view desynchronization primitives as iterative methods that solve optimization problems. In particular, by formalizing a well established desynchronization algorithm as a gradient descent method, we establish novel upper bounds on the number of iterations required to reach convergence. Moreover, by using Nesterov's accelerated gradient method, we propose a novel desynchronization primitive that provides for faster convergence to the steady state. Importantly, we propose a novel algorithm that leads to decentralized time-synchronous multichannel TDMA coordination by formulating this task as an optimization problem. Our simulations and experiments on a densely-connected IEEE 802.15.4-based wireless sensor network demonstrate that our scheme provides for faster convergence to the steady state, robustness to hidden nodes, higher network throughput and comparable power dissipation with respect to the recently standardized IEEE 802.15.4e-2012 time-synchronized channel hopping (TSCH) scheme.

preprint2015arXiv

Vectors of Locally Aggregated Centers for Compact Video Representation

We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al., under the same compaction factor and the same set of distortions.

preprint2014arXiv

Compressed Sensing with Prior Information: Optimal Strategies, Geometry, and Bounds

We address the problem of compressed sensing (CS) with prior information: reconstruct a target CS signal with the aid of a similar signal that is known beforehand, our prior information. We integrate the additional knowledge of the similar signal into CS via L1-L1 and L1-L2 minimization. We then establish bounds on the number of measurements required by these problems to successfully reconstruct the original signal. Our bounds and geometrical interpretations reveal that if the prior information has good enough quality, L1-L1 minimization improves the performance of CS dramatically. In contrast, L1-L2 minimization has a performance very similar to classical CS and brings no significant benefits. All our findings are illustrated with experimental results.

preprint2014arXiv

Compressed Sensing With Side Information: Geometrical Interpretation and Performance Bounds

We address the problem of Compressed Sensing (CS) with side information. Namely, when reconstructing a target CS signal, we assume access to a similar signal. This additional knowledge, the side information, is integrated into CS via L1-L1 and L1-L2 minimization. We then provide lower bounds on the number of measurements that these problems require for successful reconstruction of the target signal. If the side information has good quality, the number of measurements is significantly reduced via L1-L1 minimization, but not so much via L1-L2 minimization. We provide geometrical interpretations and experimental results illustrating our findings.

preprint2014arXiv

Convergence of Desynchronization Primitives in Wireless Sensor Networks: A Stochastic Modeling Approach

Desynchronization approaches in wireless sensor networks converge to time-division multiple access (TDMA) of the shared medium without requiring clock synchronization amongst the wireless sensors, or indeed the presence of a central (coordinator) node. All such methods are based on the principle of reactive listening of periodic "fire" or "pulse" broadcasts: each node updates the time of its fire message broadcasts based on received fire messages from some of the remaining nodes sharing the given spectrum. In this paper, we present a novel framework to estimate the required iterations for convergence to fair TDMA scheduling. Our estimates are fundamentally different from previous conjectures or bounds found in the literature as, for the first time, convergence to TDMA is defined in a stochastic sense. Our analytic results apply to the Desync algorithm and to pulse-coupled oscillator algorithms with inhibitory coupling. The experimental evaluation via iMote2 TinyOS nodes (based on the IEEE 802.15.4 standard) as well as via computer simulations demonstrates that, for the vast majority of settings, our stochastic model is within one standard deviation from the experimentally-observed convergence iterations. The proposed estimates are thus shown to characterize the desynchronization convergence iterations significantly better than existing conjectures or bounds. Therefore, they contribute towards the analytic understanding of how a desynchronization-based system is expected to evolve from random initial conditions to the desynchronized steady state.

Nikos Deligiannis

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

GATA2Floor: Graph attention for floor counting in street-view facades

Entropy-Based Feature Extraction For Real-Time Semantic Segmentation

Gradient Variance Loss for Structure-Enhanced Image Super-Resolution

NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks

Representation Learning with Information Theory for COVID-19 Detection

Interpretable Deep Multimodal Image Super-Resolution

Interpretable Deep Recurrent Neural Networks via Unfolding Reweighted $\ell_1$-$\ell_1$ Minimization: Architecture Design and Generalization Analysis

On the Energy Self-Sustainability of IoT via Distributed Compressed Sensing

Deep Coupled-Representation Learning for Sparse Linear Inverse Problems with Side Information

Distributed Coding of Multiview Sparse Sources with Joint Recovery

Multi-modal dictionary learning for image separation with application in art investigation

X-ray image separation via coupled dictionary learning

Adaptive-Rate Sparse Signal Reconstruction With Application in Compressive Background Subtraction

Binary Rate Distortion With Side Information: The Asymmetric Correlation Channel Case

Fast Desynchronization For Decentralized Multichannel Medium Access Control

Vectors of Locally Aggregated Centers for Compact Video Representation

Compressed Sensing with Prior Information: Optimal Strategies, Geometry, and Bounds

Compressed Sensing With Side Information: Geometrical Interpretation and Performance Bounds

Convergence of Desynchronization Primitives in Wireless Sensor Networks: A Stochastic Modeling Approach