Researcher profile

Michalis Vazirgiannis

Michalis Vazirgiannis contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
29works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

29 published item(s)

preprint2026arXiv

GraViti: Graph-Level Variational Autoencoders with Relaxed Permutation Invariance

We introduce GraViti, a transformer-based graph-level variational autoencoder that maps entire graphs to compact latent vectors. This design produces a true graph-level latent space that supports smooth interpolation, property-guided search, and other downstream tasks beyond the constraints of node-level embeddings. On molecular benchmarks, GraViti learns to decode valid samples that follow the chemical constraints present in the training data, showing that the model recovers domain rules directly from graph-level representations. We also show that, in domains where a reliable canonical node ordering exists such as molecules or bayesian networks, enforcing permutation invariance can prove detrimental for consistent reconstruction. GraViti achieves state-of-the-art reconstruction accuracy on large datasets, and provides solid generative performance. Its single-step decoding offers a lightweight alternative to more complex generation pipelines while maintaining practical sample quality.

preprint2026arXiv

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR

Large language models (LLMs) trained for step-by-step reasoning often become excessively verbose, raising inference cost. Standard Reinforcement Learning with Verifiable Rewards (RLVR) pipelines filter out ``easy'' problems for training efficiency, leaving the model to train primarily on harder problems that require longer reasoning chains. This skews the output length distribution upward, resulting in a \textbf{model that conflates ``thinking longer'' with ``thinking better''}. In this work, we show that retaining and modestly up-weighting moderately easy problems acts as an implicit length regularizer. Exposing the model to solvable short-chain tasks constrains its output distribution and prevents runaway verbosity. The result is \textbf{\emph{emergent brevity for free}}: the model learns to solve harder problems without inflating the output length, \textbf{ despite the absence of any explicit length penalization}. RLVR experiments using this approach on \textit{Qwen3-4B-Thinking-2507} (with a 16k token limit) achieve baseline pass@1 AIME25 accuracy while generating solutions that are, on average, nearly twice as short. The code is available at \href{https://github.com/MBZUAI-Paris/Frugal-AI}{GitHub}, with datasets and models on \href{https://huggingface.co/collections/MBZUAI-Paris/k2-think-mini-68dcfa8b114686a4bd3dc2bc}{Hugging Face}.

preprint2026arXiv

YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation

Steering Large Language Models (LLMs) through activation interventions has emerged as a lightweight alternative to fine-tuning for alignment and personalization. Recent work on Bi-directional Preference Optimization (BiPO) shows that dense steering vectors can be learned directly from preference data in a Direct Preference Optimization (DPO) fashion, enabling control over truthfulness, hallucinations, and safety behaviors. However, dense steering vectors often entangle multiple latent factors due to neuron multi-semanticity, limiting their effectiveness and stability in fine-grained settings such as cultural alignment, where closely related values and behaviors (e.g., among Middle Eastern cultures) must be distinguished. In this paper, we propose Yet another Policy Optimization (YaPO), a \textit{reference-free} method that learns \textit{sparse steering vectors} in the latent space of a Sparse Autoencoder (SAE). By optimizing sparse codes, YaPO produces disentangled, interpretable, and efficient steering directions. Empirically, we show that YaPO converges faster, achieves stronger performance, and exhibits improved training stability compared to dense steering baselines. Beyond cultural alignment, YaPO generalizes to a range of alignment-related behaviors, including hallucination, wealth-seeking, jailbreak, and power-seeking. Importantly, YaPO preserves general knowledge, with no measurable degradation on MMLU. Overall, our results show that YaPO provides a general recipe for efficient, stable, and fine-grained alignment of LLMs, with broad applications to controllability and domain adaptation. The associated code and data are publicly available\footnote{https://github.com/MBZUAI-Paris/YaPO}.

preprint2022arXiv

A Degeneracy Framework for Scalable Graph Autoencoders

In this paper, we present a general framework to scale graph autoencoders (AE) and graph variational autoencoders (VAE). This framework leverages graph degeneracy concepts to train models only from a dense subset of nodes instead of using the entire graph. Together with a simple yet effective propagation mechanism, our approach significantly improves scalability and training speed while preserving performance. We evaluate and discuss our method on several variants of existing graph AE and VAE, providing the first application of these models to large graphs with up to millions of nodes and edges. We achieve empirically competitive results w.r.t. several popular scalable node embedding methods, which emphasizes the relevance of pursuing further research towards more scalable graph AE and VAE.

preprint2022arXiv

AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization

Like most natural language understanding and generation tasks, state-of-the-art models for summarization are transformer-based sequence-to-sequence architectures that are pretrained on large corpora. While most existing models focused on English, Arabic remained understudied. In this paper we propose AraBART, the first Arabic model in which the encoder and the decoder are pretrained end-to-end, based on BART. We show that AraBART achieves the best performance on multiple abstractive summarization datasets, outperforming strong baselines including a pretrained Arabic BERT-based model and multilingual mBART and mT5 models.

preprint2022arXiv

Evaluation Of Word Embeddings From Large-Scale French Web Content

Distributed word representations are popularly used in many tasks in natural language processing. Adding that pretrained word vectors on huge text corpus achieved high performance in many different NLP tasks. This paper introduces multiple high-quality word vectors for the French language where two of them are trained on massive crawled French data during this study and the others are trained on an already existing French corpus. We also evaluate the quality of our proposed word vectors and the existing French word vectors on the French word analogy task. In addition, we do the evaluation on multiple real NLP tasks that shows the important performance enhancement of the pre-trained word vectors compared to the existing and random ones. Finally, we created a demo web application to test and visualize the obtained word embeddings. The produced French word embeddings are available to the public, along with the finetuning code on the NLU tasks and the demo code.

preprint2022arXiv

Exploratory Analysis of Academic Collaborations between French and US

International academic collaborations cultivate diversity in the research landscape and facilitate multiperspective methods, as the scope of each country's science depends on its needs, history, wealth etc. Moreover the quality of science differ significantly amongst nations\cite{king2004scientific}, which renders international collaborations a potential source to understand the dynamics between countries and their advancements. Analyzing these collaborations can reveal sharing expertise between two countries in different fields, the most well-known institutions of a nation, the overall success of collaborative efforts compared to local ones etc. Such analysis were initially performed using statistical metrics \cite{melin1996studying}, but network analysis has later proven much more expressive \cite{wagner2005mapping,gonzalez2008coauthorship}. In this exploratory analysis, we aim to examine the collaboration patterns between French and US institutions. Towards this, we capitalize on the Microsoft Academic Graph MAG \cite{sinha2015overview}, the largest open bibliographic dataset that contains detailed information for authors, publications and institutions. We use the coordinates of the world map to tally affiliations to France or USA. In cases where the coordinates of an affiliation were absent, we used its Wikipedia url and named entity recognition to identify the country of its address in the Wikipedia page. We need to stress that institute names have been volatile (due to University federations created) in the last decade in France, so this is a best effort trial. The results indicate an intensive and increasing scientific production in with , with certain institutions such as Harvard, MIT and CNRS standing out.

preprint2022arXiv

Gravity-Inspired Graph Autoencoders for Directed Link Prediction

Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged as powerful node embedding methods. In particular, graph AE and VAE were successfully leveraged to tackle the challenging link prediction problem, aiming at figuring out whether some pairs of nodes from a graph are connected by unobserved edges. However, these models focus on undirected graphs and therefore ignore the potential direction of the link, which is limiting for numerous real-life applications. In this paper, we extend the graph AE and VAE frameworks to address link prediction in directed graphs. We present a new gravity-inspired decoder scheme that can effectively reconstruct directed graphs from a node embedding. We empirically evaluate our method on three different directed link prediction tasks, for which standard graph AE and VAE perform poorly. We achieve competitive results on three real-world graphs, outperforming several popular baselines.

preprint2022arXiv

Hcore-Init: Neural Network Initialization based on Graph Degeneracy

Neural networks are the pinnacle of Artificial Intelligence, as in recent years we witnessed many novel architectures, learning and optimization techniques for deep learning. Capitalizing on the fact that neural networks inherently constitute multipartite graphs among neuron layers, we aim to analyze directly their structure to extract meaningful information that can improve the learning process. To our knowledge graph mining techniques for enhancing learning in neural networks have not been thoroughly investigated. In this paper we propose an adapted version of the k-core structure for the complete weighted multipartite graph extracted from a deep learning architecture. As a multipartite graph is a combination of bipartite graphs, that are in turn the incidence graphs of hypergraphs, we design k-hypercore decomposition, the hypergraph analogue of k-core degeneracy. We applied k-hypercore to several neural network architectures, more specifically to convolutional neural networks and multilayer perceptrons for image recognition tasks after a very short pretraining. Then we used the information provided by the hypercore numbers of the neurons to re-initialize the weights of the neural network, thus biasing the gradient optimization scheme. Extensive experiments proved that k-hypercore outperforms the state-of-the-art initialization methods.

preprint2022arXiv

Image Keypoint Matching using Graph Neural Networks

Image matching is a key component of many tasks in computer vision and its main objective is to find correspondences between features extracted from different natural images. When images are represented as graphs, image matching boils down to the problem of graph matching which has been studied intensively in the past. In recent years, graph neural networks have shown great potential in the graph matching task, and have also been applied to image matching. In this paper, we propose a graph neural network for the problem of image matching. The proposed method first generates initial soft correspondences between keypoints using localized node embeddings and then iteratively refines the initial correspondences using a series of graph neural network layers. We evaluate our method on natural image datasets with keypoint annotations and show that, in comparison to a state-of-the-art model, our method speeds up inference times without sacrificing prediction accuracy.

preprint2022arXiv

JuriBERT: A Masked-Language Model Adaptation for French Legal Text

Language models have proven to be very useful when adapted to specific domains. Nonetheless, little research has been done on the adaptation of domain-specific BERT models in the French language. In this paper, we focus on creating a language model adapted to French legal text with the goal of helping law professionals. We conclude that some specific tasks do not benefit from generic language models pre-trained on large amounts of data. We explore the use of smaller architectures in domain-specific sub-languages and their benefits for French legal text. We prove that domain-specific pre-trained models can perform better than their equivalent generalised ones in the legal domain. Finally, we release JuriBERT, a new set of BERT models adapted to the French legal domain.

preprint2022arXiv

Learning Parametrised Graph Shift Operators

In many domains data is currently represented as graphs and therefore, the graph representation of this data becomes increasingly important in machine learning. Network data is, implicitly or explicitly, always represented using a graph shift operator (GSO) with the most common choices being the adjacency, Laplacian matrices and their normalisations. In this paper, a novel parametrised GSO (PGSO) is proposed, where specific parameter values result in the most commonly used GSOs and message-passing operators in graph neural network (GNN) frameworks. The PGSO is suggested as a replacement of the standard GSOs that are used in state-of-the-art GNN architectures and the optimisation of the PGSO parameters is seamlessly included in the model training. It is proved that the PGSO has real eigenvalues and a set of real eigenvectors independent of the parameter values and spectral bounds on the PGSO are derived. PGSO parameters are shown to adapt to the sparsity of the graph structure in a study on stochastic blockmodel networks, where they are found to automatically replicate the GSO regularisation found in the literature. On several real-world datasets the accuracy of state-of-the-art GNN architectures is improved by the inclusion of the PGSO in both node- and graph-classification tasks.

preprint2022arXiv

Modularity-Aware Graph Autoencoders for Joint Community Detection and Link Prediction

Graph autoencoders (GAE) and variational graph autoencoders (VGAE) emerged as powerful methods for link prediction. Their performances are less impressive on community detection problems where, according to recent and concurring experimental evaluations, they are often outperformed by simpler alternatives such as the Louvain method. It is currently still unclear to which extent one can improve community detection with GAE and VGAE, especially in the absence of node features. It is moreover uncertain whether one could do so while simultaneously preserving good performances on link prediction. In this paper, we show that jointly addressing these two tasks with high accuracy is possible. For this purpose, we introduce and theoretically study a community-preserving message passing scheme, doping our GAE and VGAE encoders by considering both the initial graph structure and modularity-based prior communities when computing embedding spaces. We also propose novel training and optimization strategies, including the introduction of a modularity-inspired regularizer complementing the existing reconstruction losses for joint link prediction and community detection. We demonstrate the empirical effectiveness of our approach, referred to as Modularity-Aware GAE and VGAE, through in-depth experimental validation on various real-world graphs.

preprint2022arXiv

Permute Me Softly: Learning Soft Permutations for Graph Representations

Graph neural networks (GNNs) have recently emerged as a dominant paradigm for machine learning with graphs. Research on GNNs has mainly focused on the family of message passing neural networks (MPNNs). Similar to the Weisfeiler-Leman (WL) test of isomorphism, these models follow an iterative neighborhood aggregation procedure to update vertex representations, and they next compute graph representations by aggregating the representations of the vertices. Although very successful, MPNNs have been studied intensively in the past few years. Thus, there is a need for novel architectures which will allow research in the field to break away from MPNNs. In this paper, we propose a new graph neural network model, so-called $π$-GNN which learns a "soft" permutation (i.e., doubly stochastic) matrix for each graph, and thus projects all graphs into a common vector space. The learned matrices impose a "soft" ordering on the vertices of the input graphs, and based on this ordering, the adjacency matrices are mapped into vectors. These vectors can be fed into fully-connected or convolutional layers to deal with supervised learning tasks. In case of large graphs, to make the model more efficient in terms of running time and memory, we further relax the doubly stochastic matrices to row stochastic matrices. We empirically evaluate the model on graph classification and graph regression datasets and show that it achieves performance competitive with state-of-the-art models.

preprint2022arXiv

Political Communities on Twitter: Case Study of the 2022 French Presidential Election

With the significant increase in users on social media platforms, a new means of political campaigning has appeared. Twitter and Facebook are now notable campaigning tools during elections. Indeed, the candidates and their parties now take to the internet to interact and spread their ideas. In this paper, we aim to identify political communities formed on Twitter during the 2022 French presidential election and analyze each respective community. We create a large-scale Twitter dataset containing 1.2 million users and 62.6 million tweets that mention keywords relevant to the election. We perform community detection on a retweet graph of users and propose an in-depth analysis of the stance of each community. Finally, we attempt to detect offensive tweets and automatic bots, comparing across communities in order to gain insight into each candidate's supporter demographics and online campaign strategy.

preprint2022arXiv

Sparsifying the Update Step in Graph Neural Networks

Message-Passing Neural Networks (MPNNs), the most prominent Graph Neural Network (GNN) framework, celebrate much success in the analysis of graph-structured data. Concurrently, the sparsification of Neural Network models attracts a great amount of academic and industrial interest. In this paper we conduct a structured, empirical study of the effect of sparsification on the trainable part of MPNNs known as the Update step. To this end, we design a series of models to successively sparsify the linear transform in the Update step. Specifically, we propose the ExpanderGNN model with a tuneable sparsification rate and the Activation-Only GNN, which has no linear transform in the Update step. In agreement with a growing trend in the literature the sparsification paradigm is changed by initialising sparse neural network architectures rather than expensively sparsifying already trained architectures. Our novel benchmark models enable a better understanding of the influence of the Update step on model performance and outperform existing simplified benchmark models such as the Simple Graph Convolution. The ExpanderGNNs, and in some cases the Activation-Only models, achieve performance on par with their vanilla counterparts on several downstream tasks, while containing significantly fewer trainable parameters. Our code is publicly available at: https://github.com/ChangminWu/ExpanderGNN.

preprint2022arXiv

Time Series Forecasting Models Copy the Past: How to Mitigate

Time series forecasting is at the core of important application domains posing significant challenges to machine learning algorithms. Recently neural network architectures have been widely applied to the problem of time series forecasting. Most of these models are trained by minimizing a loss function that measures predictions' deviation from the real values. Typical loss functions include mean squared error (MSE) and mean absolute error (MAE). In the presence of noise and uncertainty, neural network models tend to replicate the last observed value of the time series, thus limiting their applicability to real-world data. In this paper, we provide a formal definition of the above problem and we also give some examples of forecasts where the problem is observed. We also propose a regularization term penalizing the replication of previously seen values. We evaluate the proposed regularization term both on synthetic and real-world datasets. Our results indicate that the regularization term mitigates to some extent the aforementioned problem and gives rise to more robust models.

preprint2021arXiv

BARThez: a Skilled Pretrained French Sequence-to-Sequence Model

Inductive transfer learning has taken the entire NLP field by storm, with models such as BERT and BART setting new state of the art on countless NLU tasks. However, most of the available models and research have been conducted for English. In this work, we introduce BARThez, the first large-scale pretrained seq2seq model for French. Being based on BART, BARThez is particularly well-suited for generative tasks. We evaluate BARThez on five discriminative tasks from the FLUE benchmark and two generative tasks from a novel summarization dataset, OrangeSum, that we created for this research. We show BARThez to be very competitive with state-of-the-art BERT-based French language models such as CamemBERT and FlauBERT. We also continue the pretraining of a multilingual BART on BARThez' corpus, and show our resulting model, mBARThez, to significantly boost BARThez' generative performance. Code, data and models are publicly available.

preprint2021arXiv

Ego-based Entropy Measures for Structural Representations on Graphs

Machine learning on graph-structured data has attracted high research interest due to the emergence of Graph Neural Networks (GNNs). Most of the proposed GNNs are based on the node homophily, i.e neighboring nodes share similar characteristics. However, in many complex networks, nodes that lie to distant parts of the graph share structurally equivalent characteristics and exhibit similar roles (e.g chemical properties of distant atoms in a molecule, type of social network users). A growing literature proposed representations that identify structurally equivalent nodes. However, most of the existing methods require high time and space complexity. In this paper, we propose VNEstruct, a simple approach, based on entropy measures of the neighborhood's topology, for generating low-dimensional structural representations, that is time-efficient and robust to graph perturbations. Empirically, we observe that VNEstruct exhibits robustness on structural role identification tasks. Moreover, VNEstruct can achieve state-of-the-art performance on graph classification, without incorporating the graph structure information in the optimization, in contrast to GNN competitors.

preprint2021arXiv

How COVID-19 Is Changing Our Language : Detecting Semantic Shift in Twitter Word Embeddings

Words are malleable objects, influenced by events that are reflected in written texts. Situated in the global outbreak of COVID-19, our research aims at detecting semantic shifts in social media language triggered by the health crisis. With COVID-19 related big data extracted from Twitter, we train separate word embedding models for different time periods after the outbreak. We employ an alignment-based approach to compare these embeddings with a general-purpose Twitter embedding unrelated to COVID-19. We also compare our trained embeddings among them to observe diachronic evolution. Carrying out case studies on a set of words chosen by topic detection, we verify that our alignment approach is valid. Finally, we quantify the size of global semantic shift by a stability measure based on back-and-forth rotational alignment.

preprint2020arXiv

Ego-based Entropy Measures for Structural Representations

In complex networks, nodes that share similar structural characteristics often exhibit similar roles (e.g type of users in a social network or the hierarchical position of employees in a company). In order to leverage this relationship, a growing literature proposed latent representations that identify structurally equivalent nodes. However, most of the existing methods require high time and space complexity. In this paper, we propose VNEstruct, a simple approach for generating low-dimensional structural node embeddings, that is both time efficient and robust to perturbations of the graph structure. The proposed approach focuses on the local neighborhood of each node and employs the Von Neumann entropy, an information-theoretic tool, to extract features that capture the neighborhood's topology. Moreover, on graph classification tasks, we suggest the utilization of the generated structural embeddings for the transformation of an attributed graph structure into a set of augmented node attributes. Empirically, we observe that the proposed approach exhibits robustness on structural role identification tasks and state-of-the-art performance on graph classification tasks, while maintaining very high computational speed.

preprint2020arXiv

Evaluation of Greek Word Embeddings

Since word embeddings have been the most popular input for many NLP tasks, evaluating their quality is of critical importance. Most research efforts are focusing on English word embeddings. This paper addresses the problem of constructing and evaluating such models for the Greek language. We created a new word analogy corpus considering the original English Word2vec word analogy corpus and some specific linguistic aspects of the Greek language as well. Moreover, we created a Greek version of WordSim353 corpora for a basic evaluation of word similarities. We tested seven word vector models and our evaluation showed that we are able to create meaningful representations. Last, we discovered that the morphological complexity of the Greek language and polysemy can influence the quality of the resulting word embeddings.

preprint2020arXiv

EvoNet: A Neural Network for Predicting the Evolution of Dynamic Graphs

Neural networks for structured data like graphs have been studied extensively in recent years. To date, the bulk of research activity has focused mainly on static graphs. However, most real-world networks are dynamic since their topology tends to change over time. Predicting the evolution of dynamic graphs is a task of high significance in the area of graph mining. Despite its practical importance, the task has not been explored in depth so far, mainly due to its challenging nature. In this paper, we propose a model that predicts the evolution of dynamic graphs. Specifically, we use a graph neural network along with a recurrent architecture to capture the temporal evolution patterns of dynamic graphs. Then, we employ a generative model which predicts the topology of the graph at the next time step and constructs a graph instance that corresponds to that topology. We evaluate the proposed model on several artificial datasets following common network evolving dynamics, as well as on real-world datasets. Results demonstrate the effectiveness of the proposed model.

preprint2020arXiv

GraKeL: A Graph Kernel Library in Python

The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Graph kernels have recently emerged as a promising approach to this problem. There are now many kernels, each focusing on different structural aspects of graphs. Here, we present GraKeL, a library that unifies several graph kernels into a common framework. The library is written in Python and adheres to the scikit-learn interface. It is simple to use and can be naturally combined with scikit-learn's modules to build a complete machine learning pipeline for tasks such as graph classification and clustering. The code is BSD licensed and is available at: https://github.com/ysig/GraKeL .

preprint2020arXiv

k-hop Graph Neural Networks

Graph neural networks (GNNs) have emerged recently as a powerful architecture for learning node and graph representations. Standard GNNs have the same expressive power as the Weisfeiler-Leman test of graph isomorphism in terms of distinguishing non-isomorphic graphs. However, it was recently shown that this test cannot identify fundamental graph properties such as connectivity and triangle freeness. We show that GNNs also suffer from the same limitation. To address this limitation, we propose a more expressive architecture, k-hop GNNs, which updates a node's representation by aggregating information not only from its direct neighbors, but from its k-hop neighborhood. We show that the proposed architecture can identify fundamental graph properties. We evaluate the proposed architecture on standard node classification and graph classification datasets. Our experimental evaluation confirms our theoretical findings since the proposed model achieves performance better or comparable to standard GNNs and to state-of-the-art algorithms.

preprint2020arXiv

Performance in the Courtroom: Automated Processing and Visualization of Appeal Court Decisions in France

Artificial Intelligence techniques are already popular and important in the legal domain. We extract legal indicators from judicial judgment to decrease the asymmetry of information of the legal system and the access-to-justice gap. We use NLP methods to extract interesting entities/data from judgments to construct networks of lawyers and judgments. We propose metrics to rank lawyers based on their experience, wins/loss ratio and their importance in the network of lawyers. We also perform community detection in the network of judgments and propose metrics to represent the difficulty of cases capitalising on communities features.

preprint2020arXiv

Predicting conversions in display advertising based on URL embeddings

Online display advertising is growing rapidly in recent years thanks to the automation of the ad buying process. Real-time bidding (RTB) allows the automated trading of ad impressions between advertisers and publishers through real-time auctions. In order to increase the effectiveness of their campaigns, advertisers should deliver ads to the users who are highly likely to be converted (i.e., purchase, registration, website visit, etc.) in the near future. In this study, we introduce and examine different models for estimating the probability of a user converting, given their history of visited URLs. Inspired by natural language processing, we introduce three URL embedding models to compute semantically meaningful URL representations. To demonstrate the effectiveness of the different proposed representation and conversion prediction models, we have conducted experiments on real logged events collected from an advertising platform.

preprint2020arXiv

Rep the Set: Neural Networks for Learning Set Representations

In several domains, data objects can be decomposed into sets of simpler objects. It is then natural to represent each object as the set of its components or parts. Many conventional machine learning algorithms are unable to process this kind of representations, since sets may vary in cardinality and elements lack a meaningful ordering. In this paper, we present a new neural network architecture, called RepSet, that can handle examples that are represented as sets of vectors. The proposed model computes the correspondences between an input set and some hidden sets by solving a series of network flow problems. This representation is then fed to a standard neural network architecture to produce the output. The architecture allows end-to-end gradient-based learning. We demonstrate RepSet on classification tasks, including text categorization, and graph classification, and we show that the proposed neural network achieves performance better or comparable to state-of-the-art algorithms.

preprint2020arXiv

Simple and Effective Graph Autoencoders with One-Hop Linear Models

Over the last few years, graph autoencoders (AE) and variational autoencoders (VAE) emerged as powerful node embedding methods, with promising performances on challenging tasks such as link prediction and node clustering. Graph AE, VAE and most of their extensions rely on multi-layer graph convolutional networks (GCN) encoders to learn vector space representations of nodes. In this paper, we show that GCN encoders are actually unnecessarily complex for many applications. We propose to replace them by significantly simpler and more interpretable linear models w.r.t. the direct neighborhood (one-hop) adjacency matrix of the graph, involving fewer operations, fewer parameters and no activation function. For the two aforementioned tasks, we show that this simpler approach consistently reaches competitive performances w.r.t. GCN-based graph AE and VAE for numerous real-world graphs, including all benchmark datasets commonly used to evaluate graph AE and VAE. Based on these results, we also question the relevance of repeatedly using these datasets to compare complex graph AE and VAE.