Researcher profile

Aristides Gionis

Aristides Gionis contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

Streaming Stochastic Submodular Maximization with On-Demand User Requests

We explore a novel problem in streaming submodular maximization, inspired by the dynamics of news-recommendation platforms. We consider a setting where users can visit a news website at any time, and upon each visit, the website must display up to $k$ news items. User interactions are inherently stochastic: each news item presented to the user is consumed with a certain acceptance probability by the user, and each news item covers certain topics. Our goal is to design a streaming algorithm that maximizes the expected total topic coverage. To address this problem, we establish a connection to submodular maximization subject to a matroid constraint. We show that we can effectively adapt previous methods to address our problem when the number of user visits is known in advance or linear-size memory in the stream length is available. However, in more realistic scenarios where only an upper bound on the visits and sublinear memory is available, the algorithms fail to guarantee any bounded performance. To overcome these limitations, we introduce a new online streaming algorithm that achieves a competitive ratio of $1/(8δ)$, where $δ$ controls the approximation quality. Moreover, it requires only a single pass over the stream, and uses memory independent of the stream length. Empirically, our algorithms consistently outperform the baselines.

preprint2025arXiv

Fair Committee Selection under Ordinal Preferences and Limited Cardinal Information

We study the problem of fair $k$-committee selection under an egalitarian objective. Given $n$ agents partitioned into $m$ groups (\eg, demographic quotas), the goal is to aggregate their preferences to form a committee of size $k$ that guarantees minimum representation from each group while minimizing the maximum \emph{cost} incurred by any agent. We model this setting as the ordinal fair $k$-center problem, where agents are embedded in an unknown metric space, and each agent reports a complete preference ranking (i.e., ordinal information) over all agents, consistent with the underlying distance metric (i.e., cardinal information). The cost incurred by an agent with respect to a committee is defined as its distance to the closest committee member. The quality of an algorithm is evaluated using the notion of distortion, which measures the worst-case ratio between the cost of the committee produced by the algorithm and the cost of an optimal committee, when given complete access to the underlying metric space. When cardinal information is not available, no constant distortion is possible for the ordinal $k$-center problem, even without fairness constraints, when $k\geq 3$ [Burkhardt et.al., AAAI'24]. To overcome this hardness, we allow limited access to cardinal information by querying the metric space. In this setting, our main contribution is a factor-$5$ distortion algorithm that requires only $O(k \log^2 k)$ queries. Along the way, we present an improved factor-$3$ distortion algorithm using $O(k^2)$ queries.

preprint2023arXiv

Ranking with submodular functions on the fly

Maximizing submodular functions have been studied extensively for a wide range of subset-selection problems. However, much less attention has been given to the role of submodularity in sequence-selection and ranking problems. A recently-introduced framework, named \emph{maximum submodular ranking} (MSR), tackles a family of ranking problems that arise naturally when resources are shared among multiple demands with different budgets. For example, the MSR framework can be used to rank web pages for multiple user intents. In this paper, we extend the MSR framework in the streaming setting. In particular, we consider two different streaming models and we propose practical approximation algorithms. In the first streaming model, called \emph{function arriving}, we assume that submodular functions (demands) arrive continuously in a stream, while in the second model, called \emph{item arriving}, we assume that items (resources) arrive continuously. Furthermore, we study the MSR problem with additional constraints on the output sequence, such as a matroid constraint that can ensure fair exposure among items from different groups. These extensions significantly broaden the range of problems that can be captured by the MSR framework. On the practical side, we develop several novel applications based on the MSR formulation, and empirically evaluate the performance of the proposed~methods.

preprint2022arXiv

Coresets remembered and items forgotten: submodular maximization with deletions

In recent years we have witnessed an increase on the development of methods for submodular optimization, which have been motivated by the wide applicability of submodular functions in real-world data-science problems. In this paper, we contribute to this line of work by considering the problem of robust submodular maximization against unexpected deletions, which may occur due to privacy issues or user preferences. Specifically, we consider the minimum number of items an algorithm has to remember, in order to achieve a non-trivial approximation guarantee against adversarial deletion of up to $d$ items. We refer to the set of items that an algorithm has to keep before adversarial deletions as a deletion-robust coreset. Our theoretical contributions are two-fold. First, we propose a single-pass streaming algorithm that yields a $(1-2ε)/(4p)$-approximation for maximizing a non-decreasing submodular function under a general p-matroid constraint and requires a coreset of size $k + d/ε$, where $k$ is the maximum size of a feasible solution. To the best of our knowledge, this is the first work to achieve an (asymptotically) optimal coreset, as no constant-factor approximation is possible with a coreset of size sublinear in $d$. Second, we devise an effective offline algorithm that guarantees stronger approximation ratios with a coreset of size $O(d \log(k)/ε)$. We also demonstrate the superior empirical performance of the proposed algorithms in real-life applications.

preprint2022arXiv

Generalized Leverage Scores: Geometric Interpretation and Applications

In problems involving matrix computations, the concept of leverage has found a large number of applications. In particular, leverage scores, which relate the columns of a matrix to the subspaces spanned by its leading singular vectors, are helpful in revealing column subsets to approximately factorize a matrix with quality guarantees. As such, they provide a solid foundation for a variety of machine-learning methods. In this paper we extend the definition of leverage scores to relate the columns of a matrix to arbitrary subsets of singular vectors. We establish a precise connection between column and singular-vector subsets, by relating the concepts of leverage scores and principal angles between subspaces. We employ this result to design approximation algorithms with provable guarantees for two well-known problems: generalized column subset selection and sparse canonical correlation analysis. We run numerical experiments to provide further insight on the proposed methods. The novel bounds we derive improve our understanding of fundamental concepts in matrix approximations. In addition, our insights may serve as building blocks for further contributions.

preprint2022arXiv

Improved analysis of randomized SVD for top-eigenvector approximation

Computing the top eigenvectors of a matrix is a problem of fundamental interest to various fields. While the majority of the literature has focused on analyzing the reconstruction error of low-rank matrices associated with the retrieved eigenvectors, in many applications one is interested in finding one vector with high Rayleigh quotient. In this paper we study the problem of approximating the top-eigenvector. Given a symmetric matrix $\mathbf{A}$ with largest eigenvalue $λ_1$, our goal is to find a vector \hu that approximates the leading eigenvector $\mathbf{u}_1$ with high accuracy, as measured by the ratio $R(\hat{\mathbf{u}})=λ_1^{-1}{\hat{\mathbf{u}}^T\mathbf{A}\hat{\mathbf{u}}}/{\hat{\mathbf{u}}^T\hat{\mathbf{u}}}$. We present a novel analysis of the randomized SVD algorithm of \citet{halko2011finding} and derive tight bounds in many cases of interest. Notably, this is the first work that provides non-trivial bounds of $R(\hat{\mathbf{u}})$ for randomized SVD with any number of iterations. Our theoretical analysis is complemented with a thorough experimental study that confirms the efficiency and accuracy of the method.

preprint2022arXiv

Ranking with submodular functions on a budget

Submodular maximization has been the backbone of many important machine-learning problems, and has applications to viral marketing, diversification, sensor placement, and more. However, the study of maximizing submodular functions has mainly been restricted in the context of selecting a set of items. On the other hand, many real-world applications require a solution that is a ranking over a set of items. The problem of ranking in the context of submodular function maximization has been considered before, but to a much lesser extent than item-selection formulations. In this paper, we explore a novel formulation for ranking items with submodular valuations and budget constraints. We refer to this problem as max-submodular ranking (MSR). In more detail, given a set of items and a set of non-decreasing submodular functions, where each function is associated with a budget, we aim to find a ranking of the set of items that maximizes the sum of values achieved by all functions under the budget constraints. For the MSR problem with cardinality- and knapsack-type budget constraints we propose practical algorithms with approximation guarantees. In addition, we perform an empirical evaluation, which demonstrates the superior performance of the proposed algorithms against strong baselines.

preprint2021arXiv

Discovering Dense Correlated Subgraphs in Dynamic Networks

Given a dynamic network, where edges appear and disappear over time, we are interested in finding sets of edges that have similar temporal behavior and form a dense subgraph. Formally, we define the problem as the enumeration of the maximal subgraphs that satisfy specific density and similarity thresholds. To measure the similarity of the temporal behavior, we use the correlation between the binary time series that represent the activity of the edges. For the density, we study two variants based on the average degree. For these problem variants we enumerate the maximal subgraphs and compute a compact subset of subgraphs that have limited overlap. We propose an approximate algorithm that scales well with the size of the network, while achieving a high accuracy. We evaluate our framework on both real and synthetic datasets. The results of the synthetic data demonstrate the high accuracy of the approximation and show the scalability of the framework.

preprint2021arXiv

Query the model: precomputations for efficient inference with Bayesian Networks

Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method for Variable Elimination, which can lead to significant efficiency gains when answering inference queries. We evaluate our technique using real-world Bayesian networks. Our results show that a modest amount of materialization can lead to significant improvements in the running time of queries. Furthermore, in comparison with junction tree methods that also rely on materialization, our approach achieves comparable efficiency during inference using significantly lighter materialization.

preprint2020arXiv

Diverse Rule Sets

While machine-learning models are flourishing and transforming many aspects of everyday life, the inability of humans to understand complex models poses difficulties for these models to be fully trusted and embraced. Thus, interpretability of models has been recognized as an equally important quality as their predictive power. In particular, rule-based systems are experiencing a renaissance owing to their intuitive if-then representation. However, simply being rule-based does not ensure interpretability. For example, overlapped rules spawn ambiguity and hinder interpretation. Here we propose a novel approach of inferring diverse rule sets, by optimizing small overlap among decision rules with a 2-approximation guarantee under the framework of Max-Sum diversification. We formulate the problem as maximizing a weighted sum of discriminative quality and diversity of a rule set. In order to overcome an exponential-size search space of association rules, we investigate several natural options for a small candidate set of high-quality rules, including frequent and accurate rules, and examine their hardness. Leveraging the special structure in our formulation, we then devise an efficient randomized algorithm, which samples rules that are highly discriminative and have small overlap. The proposed sampling algorithm analytically targets a distribution of rules that is tailored to our objective. We demonstrate the superior predictive power and interpretability of our model with a comprehensive empirical study against strong baselines.

preprint2020arXiv

Explainable Classification of Brain Networks via Contrast Subgraphs

Mining human-brain networks to discover patterns that can be used to discriminate between healthy individuals and patients affected by some neurological disorder, is a fundamental task in neuroscience. Learning simple and interpretable models is as important as mere classification accuracy. In this paper we introduce a novel approach for classifying brain networks based on extracting contrast subgraphs, i.e., a set of vertices whose induced subgraphs are dense in one class of graphs and sparse in the other. We formally define the problem and present an algorithmic solution for extracting contrast subgraphs. We then apply our method to a brain-network dataset consisting of children affected by Autism Spectrum Disorder and children Typically Developed. Our analysis confirms the interestingness of the discovered patterns, which match background knowledge in the neuroscience literature. Further analysis on other classification tasks confirm the simplicity, soundness, and high explainability of our proposal, which also exhibits superior classification accuracy, to more complex state-of-the-art methods.

preprint2020arXiv

Finding large balanced subgraphs in signed networks

Signed networks are graphs whose edges are labelled with either a positive or a negative sign, and can be used to capture nuances in interactions that are missed by their unsigned counterparts. The concept of balance in signed graph theory determines whether a network can be partitioned into two perfectly opposing subsets, and is therefore useful for modelling phenomena such as the existence of polarized communities in social networks. While determining whether a graph is balanced is easy, finding a large balanced subgraph is hard. The few heuristics available in the literature for this purpose are either ineffective or non-scalable. In this paper we propose an efficient algorithm for finding large balanced subgraphs in signed networks. The algorithm relies on signed spectral theory and a novel bound for perturbations of the graph Laplacian. In a wide variety of experiments on real-world data we show that our algorithm can find balanced subgraphs much larger than those detected by existing methods, and in addition, it is faster. We test its scalability on graphs of up to 34 million edges.

preprint2020arXiv

Finding path motifs in large temporal graphs using algebraic fingerprints

We study a family of pattern-detection problems in vertex-colored temporal graphs. In particular, given a vertex-colored temporal graph and a multiset of colors as a query, we search for temporal paths in the graph that contain the colors specified in the query. These types of problems have several applications, for example in recommending tours for tourists or detecting abnormal behavior in a network of financial transactions. For the family of pattern-detection problems we consider, we establish complexity results and design an algebraic-algorithmic framework based on constrained multilinear sieving. We demonstrate that our solution scales to massive graphs with up to a billion edges for a multiset query with five colors and up to hundred million edges for a multiset query with ten colors, despite the problems being NP-hard. Our implementation, which is publicly available, exhibits practical edge-linear scalability and is highly optimized. For instance, in a real-world graph dataset with more than six million edges and a multiset query with ten colors, we can extract an optimum solution in less than eight minutes on a Haswell desktop with four cores.

preprint2020arXiv

Improved mixing time for k-subgraph sampling

Understanding the local structure of a graph provides valuable insights about the underlying phenomena from which the graph has originated. Sampling and examining k-subgraphs is a widely used approach to understand the local structure of a graph. In this paper, we study the problem of sampling uniformly k-subgraphs from a given graph. We analyze a few different Markov chain Monte Carlo (MCMC) approaches, and obtain analytical results on their mixing times, which improve significantly the state of the art. In particular, we improve the bound on the mixing times of the standard MCMC approach, and the state-of-the-art MCMC sampling method PSRW, using the canonical-paths argument. In addition, we propose a novel sampling method, which we call recursive subgraph sampling, RSS, and its optimized variant RSS+. The proposed methods, RSS and RSS+, are significantly faster than existing approaches.

preprint2020arXiv

Mining Dense Subgraphs with Similar Edges

When searching for interesting structures in graphs, it is often important to take into account not only the graph connectivity, but also the metadata available, such as node and edge labels, or temporal information. In this paper we are interested in settings where such metadata is used to define a similarity between edges. We consider the problem of finding subgraphs that are dense and whose edges are similar to each other with respect to a given similarity function. Depending on the application, this function can be, for example, the Jaccard similarity between the edge label sets, or the temporal correlation of the edge occurrences in a temporal graph. We formulate a Lagrangian relaxation-based optimization problem to search for dense subgraphs with high pairwise edge similarity. We design a novel algorithm to solve the problem through parametric MinCut, and provide an efficient search scheme to iterate through the values of the Lagrangian multipliers. Our study is complemented by an evaluation on real-world datasets, which demonstrates the usefulness and efficiency of the proposed approach.

preprint2020arXiv

Searching for polarization in signed graphs: a local spectral approach

Signed graphs have been used to model interactions in social net-works, which can be either positive (friendly) or negative (antagonistic). The model has been used to study polarization and other related phenomena in social networks, which can be harmful to the process of democratic deliberation in our society. An interesting and challenging task in this application domain is to detect polarized communities in signed graphs. A number of different methods have been proposed for this task. However, existing approaches aim at finding globally optimal solutions. Instead, in this paper we are interested in finding polarized communities that are related to a small set of seed nodes provided as input. Seed nodes may consist of two sets, which constitute the two sides of a polarized structure. In this paper we formulate the problem of finding local polarized communities in signed graphs as a locally-biased eigen-problem. By viewing the eigenvector associated with the smallest eigenvalue of the Laplacian matrix as the solution of a constrained optimization problem, we are able to incorporate the local information as an additional constraint. In addition, we show that the locally-biased vector can be used to find communities with approximation guarantee with respect to a local analogue of the Cheeger constant on signed graphs. By exploiting the sparsity in the input graph, an indicator vector for the polarized communities can be found in time linear to the graph size. Our experiments on real-world networks validate the proposed algorithm and demonstrate its usefulness in finding local structures in this semi-supervised manner.