Source author record

Alex Thomo

Alex Thomo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks Databases Computational Geometry Computer Science and Game Theory Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Formal Languages and Automata Theory physics.soc-ph q-fin.GN q-fin.ST

Catalog footprint

What is connected

8works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Provenance for Regular Path Queries

Regular path queries (RPQs) the ubiquitous mechanism for querying data graphs of partially known structure. RPQs are in essence regular expressions over the edge symbols. The answer to an RPQ on a given graph (database) is the set of pairs of objects, which are connected by paths spelling words in the language of the regular path query. Often the database edges come with a weights assoaciated to them. Such weights can distances, levels of discomfort, multiplicities, etc. We model weights using semiring frameworks.

preprint2020arXiv

Reverse Prevention Sampling for Misinformation Mitigation in Social Networks

In this work, we consider misinformation propagating through a social network and study the problem of its prevention. In this problem, a "bad" campaign starts propagating from a set of seed nodes in the network and we use the notion of a limiting (or "good") campaign to counteract the effect of misinformation. The goal is to identify a set of $k$ users that need to be convinced to adopt the limiting campaign so as to minimize the number of people that adopt the "bad" campaign at the end of both propagation processes. This work presents \emph{RPS} (Reverse Prevention Sampling), an algorithm that provides a scalable solution to the misinformation mitigation problem. Our theoretical analysis shows that \emph{RPS} runs in $O((k + l)(n + m)(\frac{1}{1 - γ}) \log n / ε^2 )$ expected time and returns a $(1 - 1/e - ε)$-approximate solution with at least $1 - n^{-l}$ probability (where $γ$ is a typically small network parameter and $l$ is a confidence parameter). The time complexity of \emph{RPS} substantially improves upon the previously best-known algorithms that run in time $Ω(m n k \cdot POLY(ε^{-1}))$. We experimentally evaluate \emph{RPS} on large datasets and show that it outperforms the state-of-the-art solution by several orders of magnitude in terms of running time. This demonstrates that misinformation mitigation can be made practical while still offering strong theoretical guarantees.

preprint2020arXiv

Utility-Based Graph Summarization: New and Improved

A fundamental challenge in graph mining is the ever-increasing size of datasets. Graph summarization aims to find a compact representation resulting in faster algorithms and reduced storage needs. The flip side of graph summarization is the loss of utility which diminishes its usability. The key questions we address in this paper are: (1)How to summarize a graph without any loss of utility? (2)How to summarize a graph with some loss of utility but above a user-specified threshold? (3)How to query graph summaries without graph reconstruction?} We also aim at making graph summarization available for the masses by efficiently handling web-scale graphs using only a consumer-grade machine. Previous works suffer from conceptual limitations and lack of scalability. In this work, we make three key contributions. First, we present a utility-driven graph summarization method, based on a clique and independent set decomposition, that produces significant compression with zero loss of utility. The compression provided is significantly better than state-of-the-art in lossless graph summarization, while the runtime is two orders of magnitude lower. Second, we present a highly scalable algorithm for the lossy case, which foregoes the expensive iterative process that hampers previous work. Our algorithm achieves this by combining a memory reduction technique and a novel binary-search approach. In contrast to the competition, we are able to handle web-scale graphs in a single machine without a performance impediment as the utility threshold (and size of summary) decreases. Third, we show that our graph summaries can be used as-is to answer several important classes of queries, such as triangle enumeration, Pagerank, and shortest paths. This is in contrast to other works that incrementally reconstruct the original graph for answering queries, thus incurring additional time costs.

preprint2014arXiv

Buyer to Seller Recommendation under Constraints

The majority of recommender systems are designed to recommend items (such as movies and products) to users. We focus on the problem of recommending buyers to sellers which comes with new challenges: (1) constraints on the number of recommendations buyers are part of before they become overwhelmed, (2) constraints on the number of recommendations sellers receive within their budget, and (3) constraints on the set of buyers that sellers want to receive (e.g., no more than two people from the same household). We propose the following critical problems of recommending buyers to sellers: Constrained Recommendation (C-REC) capturing the first two challenges, and Conflict-Aware Constrained Recommendation (CAC-REC) capturing all three challenges at the same time. We show that C-REC can be modeled using linear programming and can be efficiently solved using modern solvers. On the other hand, we show that CAC-REC is NP-hard. We propose two approximate algorithms to solve CAC-REC and show that they achieve close to optimal solutions via comprehensive experiments using real-world datasets.

preprint2014arXiv

Clearing Contamination in Large Networks

In this work, we study the problem of clearing contamination spreading through a large network where we model the problem as a graph searching game. The problem can be summarized as constructing a search strategy that will leave the graph clear of any contamination at the end of the searching process in as few steps as possible. We show that this problem is NP-hard even on directed acyclic graphs and provide an efficient approximation algorithm. We experimentally observe the performance of our approximation algorithm in relation to the lower bound on several large online networks including Slashdot, Epinions and Twitter. The experiments reveal that in most cases our algorithm performs near optimally.

preprint2014arXiv

Three-Way Joins on MapReduce: An Experimental Study

We study three-way joins on MapReduce. Joins are very useful in a multitude of applications from data integration and traversing social networks, to mining graphs and automata-based constructions. However, joins are expensive, even for moderate data sets; we need efficient algorithms to perform distributed computation of joins using clusters of many machines. MapReduce has become an increasingly popular distributed computing system and programming paradigm. We consider a state-of-the-art MapReduce multi-way join algorithm by Afrati and Ullman and show when it is appropriate for use on very large data sets. By providing a detailed experimental study, we demonstrate that this algorithm scales much better than what is suggested by the original paper. However, if the join result needs to be summarized or aggregated, as opposed to being only enumerated, then the aggregation step can be integrated into a cascade of two-way joins, making it more efficient than the other algorithm, and thus becomes the preferred solution.

preprint2012arXiv

Computing optimal k-regret minimizing sets with top-k depth contours

Regret minimizing sets are a very recent approach to representing a dataset D with a small subset S of representative tuples. The set S is chosen such that executing any top-1 query on S rather than D is minimally perceptible to any user. To discover an optimal regret minimizing set of a predetermined cardinality is conjectured to be a hard problem. In this paper, we generalize the problem to that of finding an optimal k$regret minimizing set, wherein the difference is computed over top-k queries, rather than top-1 queries. We adapt known geometric ideas of top-k depth contours and the reverse top-k problem. We show that the depth contours themselves offer a means of comparing the optimality of regret minimizing sets using L2 distance. We design an O(cn^2) plane sweep algorithm for two dimensions to compute an optimal regret minimizing set of cardinality c. For higher dimensions, we introduce a greedy algorithm that progresses towards increasingly optimal solutions by exploiting the transitivity of L2 distance.

preprint2012arXiv

Indexing Reverse Top-k Queries

We consider the recently introduced monochromatic reverse top-k queries which ask for, given a new tuple q and a dataset D, all possible top-k queries on D union {q} for which q is in the result. Towards this problem, we focus on designing indexes in two dimensions for repeated (or batch) querying, a novel but practical consideration. We present the insight that by representing the dataset as an arrangement of lines, a critical k-polygon can be identified and used exclusively to respond to reverse top-k queries. We construct an index based on this observation which has guaranteed worst-case query cost that is logarithmic in the size of the k-polygon. We implement our work and compare it to related approaches, demonstrating that our index is fast in practice. Furthermore, we demonstrate through our experiments that a k-polygon is comprised of a small proportion of the original data, so our index structure consumes little disk space.

Alex Thomo

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Provenance for Regular Path Queries

Reverse Prevention Sampling for Misinformation Mitigation in Social Networks

Utility-Based Graph Summarization: New and Improved

Buyer to Seller Recommendation under Constraints

Clearing Contamination in Large Networks

Three-Way Joins on MapReduce: An Experimental Study

Computing optimal k-regret minimizing sets with top-k depth contours

Indexing Reverse Top-k Queries