Source author record

Andreas Loukas

Andreas Loukas appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Distributed, Parallel, and Cluster Computing Social and Information Networks Artificial Intelligence Computation and Language Computer Vision Data Structures and Algorithms Information Theory math.DS math.IT math.OC Systems and Control

Catalog footprint

What is connected

12works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

On the generalization of learning algorithms that do not converge

Generalization analyses of deep learning typically assume that the training converges to a fixed point. But, recent results indicate that in practice, the weights of deep neural networks optimized with stochastic gradient descent often oscillate indefinitely. To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics do not necessarily converge to fixed points. Our main contribution is to propose a notion of statistical algorithmic stability (SAS) that extends classical algorithmic stability to non-convergent algorithms and to study its connection to generalization. This ergodic-theoretic approach leads to new insights when compared to the traditional optimization and learning theory perspectives. We prove that the stability of the time-asymptotic behavior of a learning algorithm relates to its generalization and empirically demonstrate how loss dynamics can provide clues to generalization performance. Our findings provide evidence that networks that "train stably generalize better" even when the training continues indefinitely and the weights do not converge.

preprint2022arXiv

SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators

We approach the graph generation problem from a spectral perspective by first generating the dominant parts of the graph Laplacian spectrum and then building a graph matching these eigenvalues and eigenvectors. Spectral conditioning allows for direct modeling of the global and local graph structure and helps to overcome the expressivity and mode collapse issues of one-shot graph generators. Our novel GAN, called SPECTRE, enables the one-shot generation of much larger graphs than previously possible with one-shot models. SPECTRE outperforms state-of-the-art deep autoregressive generators in terms of modeling fidelity, while also avoiding expensive sequential generation and dependence on node ordering. A case in point, in sizable synthetic and real-world graphs SPECTRE achieves a 4-to-170 fold improvement over the best competitor that does not overfit and is 23-to-30 times faster than autoregressive generators.

preprint2021arXiv

Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs

Combinatorial optimization problems are notoriously challenging for neural networks, especially in the absence of labeled instances. This work proposes an unsupervised learning framework for CO problems on graphs that can provide integral solutions of certified quality. Inspired by Erdos' probabilistic method, we use a neural network to parametrize a probability distribution over sets. Crucially, we show that when the network is optimized w.r.t. a suitably chosen loss, the learned distribution contains, with controlled probability, a low-cost integral solution that obeys the constraints of the combinatorial problem. The probabilistic proof of existence is then derandomized to decode the desired solutions. We demonstrate the efficacy of this approach to obtain valid solutions to the maximum clique problem and to perform local graph clustering. Our method achieves competitive results on both real datasets and synthetic hard instances.

preprint2020arXiv

Dynamic Balanced Graph Partitioning

This paper initiates the study of the classic balanced graph partitioning problem from an online perspective: Given an arbitrary sequence of pairwise communication requests between $n$ nodes, with patterns that may change over time, the objective is to service these requests efficiently by partitioning the nodes into $\ell$ clusters, each of size $k$, such that frequently communicating nodes are located in the same cluster. The partitioning can be updated dynamically by migrating nodes between clusters. The goal is to devise online algorithms which jointly minimize the amount of inter-cluster communication and migration cost. The problem features interesting connections to other well-known online problems. For example, scenarios with $\ell=2$ generalize online paging, and scenarios with $k=2$ constitute a novel online variant of maximum matching. We present several lower bounds and algorithms for settings both with and without cluster-size augmentation. In particular, we prove that any deterministic online algorithm has a competitive ratio of at least $k$, even with significant augmentation. Our main algorithmic contributions are an $O(k \log{k})$-competitive deterministic algorithm for the general setting with constant augmentation, and a constant competitive algorithm for the maximum matching variant.

preprint2020arXiv

On the Relationship between Self-Attention and Convolutional Layers

Recent trends of incorporating attention mechanisms in vision have led researchers to reconsider the supremacy of convolutional layers as a primary building block. Beyond helping CNNs to handle long-range dependencies, Ramachandran et al. (2019) showed that attention can completely replace convolution and achieve state-of-the-art performance on vision tasks. This raises the question: do learned attention layers operate similarly to convolutional layers? This work provides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice. Specifically, we prove that a multi-head self-attention layer with sufficient number of heads is at least as expressive as any convolutional layer. Our numerical experiments then show that self-attention layers attend to pixel-grid patterns similarly to CNN layers, corroborating our analysis. Our code is publicly available.

preprint2020arXiv

The role of invariance in spectral complexity-based generalization bounds

Deep convolutional neural networks (CNNs) have been shown to be able to fit a random labeling over data while still being able to generalize well for normal labels. Describing CNN capacity through a posteriori measures of complexity has been recently proposed to tackle this apparent paradox. These complexity measures are usually validated by showing that they correlate empirically with GE; being empirically larger for networks trained on random vs normal labels. Focusing on the case of spectral complexity we investigate theoretically and empirically the insensitivity of the complexity measure to invariances relevant to CNNs, and show several limitations of spectral complexity that occur as a result. For a specific formulation of spectral complexity we show that it results in the same upper bound complexity estimates for convolutional and locally connected architectures (which don't have the same favorable invariance properties). This is contrary to common intuition and empirical results.

preprint2020arXiv

What graph neural networks cannot learn: depth vs width

This paper studies the expressive power of graph neural networks falling within the message-passing framework (GNNmp). Two results are presented. First, GNNmp are shown to be Turing universal under sufficient conditions on their depth, width, node attributes, and layer expressiveness. Second, it is discovered that GNNmp can lose a significant portion of their power when their depth and width is restricted. The proposed impossibility statements stem from a new technique that enables the repurposing of seminal results from distributed computing and leads to lower bounds for an array of decision, optimization, and estimation problems involving graphs. Strikingly, several of these problems are deemed impossible unless the product of a GNNmp's depth and width exceeds a polynomial of the graph size; this dependence remains significant even for tasks that appear simple or when considering approximation.

preprint2016arXiv

Frequency Analysis of Temporal Graph Signals

This letter extends the concept of graph-frequency to graph signals that evolve with time. Our goal is to generalize and, in fact, unify the familiar concepts from time- and graph-frequency analysis. To this end, we study a joint temporal and graph Fourier transform (JFT) and demonstrate its attractive properties. We build on our results to create filters which act on the joint (temporal and graph) frequency domain, and show how these can be used to perform interference cancellation. The proposed algorithms are distributed, have linear complexity, and can approximate any desired joint filtering objective.

preprint2016arXiv

Predicting the evolution of stationary graph signals

An emerging way of tackling the dimensionality issues arising in the modeling of a multivariate process is to assume that the inherent data structure can be captured by a graph. Nevertheless, though state-of-the-art graph-based methods have been successful for many learning tasks, they do not consider time-evolving signals and thus are not suitable for prediction. Based on the recently introduced joint stationarity framework for time-vertex processes, this letter considers multivariate models that exploit the graph topology so as to facilitate the prediction. The resulting method yields similar accuracy to the joint (time-graph) mean-squared error estimator but at lower complexity, and outperforms purely time-based methods.

preprint2016arXiv

Towards stationary time-vertex signal processing

Graph-based methods for signal processing have shown promise for the analysis of data exhibiting irregular structure, such as those found in social, transportation, and sensor networks. Yet, though these systems are often dynamic, state-of-the-art methods for signal processing on graphs ignore the dimension of time, treating successive graph signals independently or taking a global average. To address this shortcoming, this paper considers the statistical analysis of time-varying graph signals. We introduce a novel definition of joint (time-vertex) stationarity, which generalizes the classical definition of time stationarity and the more recent definition appropriate for graphs. Joint stationarity gives rise to a scalable Wiener optimization framework for joint denoising, semi-supervised learning, or more generally inversing a linear operator, that is provably optimal. Experimental results on real weather data demonstrate that taking into account graph and time dimensions jointly can yield significant accuracy improvements in the reconstruction effort.

preprint2015arXiv

Distributed Autoregressive Moving Average Graph Filters

We introduce the concept of autoregressive moving average (ARMA) filters on a graph and show how they can be implemented in a distributed fashion. Our graph filter design philosophy is independent of the particular graph, meaning that the filter coefficients are derived irrespective of the graph. In contrast to finite-impulse response (FIR) graph filters, ARMA graph filters are robust against changes in the signal and/or graph. In addition, when time-varying signals are considered, we prove that the proposed graph filters behave as ARMA filters in the graph domain and, depending on the implementation, as first or higher ARMA filters in the time domain.

preprint2015arXiv

Spinner: Scalable Graph Partitioning in the Cloud

Several organizations, like social networks, store and routinely analyze large graphs as part of their daily operation. Such graphs are typically distributed across multiple servers, and graph partitioning is critical for efficient graph management. Existing partitioning algorithms focus on finding graph partitions with good locality, but disregard the pragmatic challenges of integrating partitioning into large-scale graph management systems deployed on the cloud, such as dealing with the scale and dynamicity of the graph and the compute environment. In this paper, we propose Spinner, a scalable and adaptive graph partitioning algorithm based on label propagation designed on top of the Pregel model. Spinner scales to massive graphs, produces partitions with locality and balance comparable to the state-of-the-art and efficiently adapts the partitioning upon changes. We describe our algorithm and its implementation in the Pregel programming model that makes it possible to partition billion-vertex graphs. We evaluate Spinner with a variety of synthetic and real graphs and show that it can compute partitions with quality comparable to the state-of-the art. In fact, by using Spinner in conjunction with the Giraph graph processing engine, we speed up different applications by a factor of 2 relative to standard hash partitioning.

Andreas Loukas

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

On the generalization of learning algorithms that do not converge

SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators

Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs

Dynamic Balanced Graph Partitioning

On the Relationship between Self-Attention and Convolutional Layers

The role of invariance in spectral complexity-based generalization bounds

What graph neural networks cannot learn: depth vs width

Frequency Analysis of Temporal Graph Signals

Predicting the evolution of stationary graph signals

Towards stationary time-vertex signal processing

Distributed Autoregressive Moving Average Graph Filters

Spinner: Scalable Graph Partitioning in the Cloud