Source author record

Marco Bressan

Marco Bressan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Social and Information Networks Computational Complexity Discrete Mathematics physics.soc-ph

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Active Learning of Classifiers with Label and Seed Queries

We study exact active learning of binary and multiclass classifiers with margin. Given an $n$-point set $X \subset \mathbb{R}^m$, we want to learn any unknown classifier on $X$ whose classes have finite strong convex hull margin, a new notion extending the SVM margin. In the standard active learning setting, where only label queries are allowed, learning a classifier with strong convex hull margin $γ$ requires in the worst case $Ω\big(1+\frac{1}γ\big)^{(m-1)/2}$ queries. On the other hand, using the more powerful seed queries (a variant of equivalence queries), the target classifier could be learned in $O(m \log n)$ queries via Littlestone's Halving algorithm; however, Halving is computationally inefficient. In this work we show that, by carefully combining the two types of queries, a binary classifier can be learned in time $\operatorname{poly}(n+m)$ using only $O(m^2 \log n)$ label queries and $O\big(m \log \frac{m}γ\big)$ seed queries; the result extends to $k$-class classifiers at the price of a $k!k^2$ multiplicative overhead. Similar results hold when the input points have bounded bit complexity, or when only one class has strong convex hull margin against the rest. We complement the upper bounds by showing that in the worst case any algorithm needs $Ω\big(k m \log \frac{1}γ\big)$ seed and label queries to learn a $k$-class classifier with strong convex hull margin $γ$.

preprint2020arXiv

Correlation Clustering with Adaptive Similarity Queries

In correlation clustering, we are given $n$ objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries. On the one hand, we describe simple active learning algorithms, which provably achieve an almost optimal trade-off while giving cluster recovery guarantees, and we test them on different datasets. On the other hand, we prove information-theoretical bounds on the number of queries necessary to guarantee a prescribed disagreement bound. These results give a rich characterization of the trade-off between queries and clustering error.

preprint2020arXiv

Faster algorithms for counting subgraphs in sparse graphs

Given a $k$-node pattern graph $H$ and an $n$-node host graph $G$, the subgraph counting problem asks to compute the number of copies of $H$ in $G$. In this work we address the following question: can we count the copies of $H$ faster if $G$ is sparse? We answer in the affirmative by introducing a novel tree-like decomposition for directed acyclic graphs, inspired by the classic tree decomposition for undirected graphs. This decomposition gives a dynamic program for counting the homomorphisms of $H$ in $G$ by exploiting the degeneracy of $G$, which allows us to beat the state-of-the-art subgraph counting algorithms when $G$ is sparse enough. For example, we can count the induced copies of any $k$-node pattern $H$ in time $2^{O(k^2)} O(n^{0.25k + 2} \log n)$ if $G$ has bounded degeneracy, and in time $2^{O(k^2)} O(n^{0.625k + 1} \log n)$ if $G$ has bounded average degree. These bounds are instantiations of a more general result, parameterized by the degeneracy of $G$ and the structure of $H$, which generalizes classic bounds on counting cliques and complete bipartite graphs. We also give lower bounds based on the Exponential Time Hypothesis, showing that our results are actually a characterization of the complexity of subgraph counting in bounded-degeneracy graphs.

preprint2016arXiv

The Limits of Popularity-Based Recommendations, and the Role of Social Ties

In this paper we introduce a mathematical model that captures some of the salient features of recommender systems that are based on popularity and that try to exploit social ties among the users. We show that, under very general conditions, the market always converges to a steady state, for which we are able to give an explicit form. Thanks to this we can tell rather precisely how much a market is altered by a recommendation system, and determine the power of users to influence others. Our theoretical results are complemented by experiments with real world social networks showing that social graphs prevent large market distortions in spite of the presence of highly influential users.

preprint2016arXiv

The Power of Local Information in PageRank

How large a fraction of a graph must one explore to rank a small set of nodes according to their PageRank scores? We show that the answer is quite nuanced, and depends crucially on the interplay between the correctness guarantees one requires and the way one can access the graph. On the one hand, assuming the graph can be accessed only via "natural" exploration queries that reveal small pieces of its topology, we prove that deterministic and Las Vegas algorithms must in the worst case perform $n - o(n)$ queries and explore essentially the entire graph, independently of the specific types of query employed. On the other hand we show that, depending on the types of query available, Monte Carlo algorithms can perform asymptotically better: if allowed to both explore the local topology around single nodes and access nodes at random in the graph they need $Ω(n^{2/3})$ queries in the worst case, otherwise they still need $Ω(n)$ queries similarly to Las Vegas algorithms. All our bounds generalize and tighten those already known, cover the different types of graph exploration queries appearing in the literature, and immediately apply also to the problem of approximating the PageRank score of single nodes.

Marco Bressan

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Active Learning of Classifiers with Label and Seed Queries

Correlation Clustering with Adaptive Similarity Queries

Faster algorithms for counting subgraphs in sparse graphs

The Limits of Popularity-Based Recommendations, and the Role of Social Ties

The Power of Local Information in PageRank