Source author record

Stavros Sintos

Stavros Sintos appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Computational Geometry

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

JanusAQP: Efficient Partition Tree Maintenance for Dynamic Approximate Query Processing

Approximate query processing over dynamic databases, i.e., under insertions/deletions, has applications ranging from high-frequency trading to internet-of-things analytics. We present JanusAQP, a new dynamic AQP system, which supports SUM, COUNT, AVG, MIN, and MAX queries under insertions and deletions to the dataset. JanusAQP extends static partition tree synopses, which are hierarchical aggregations of datasets, into the dynamic setting. This paper contributes new methods for: (1) efficient initialization of the data synopsis in the presence of incoming data, (2) maintenance of the data synopsis under insertions/deletions, and (3) re-optimization of the partitioning to reduce the approximation error. JanusAQP reduces the error of a state-of-the-art baseline by more than 60% using only 10% storage cost. JanusAQP can process more than 100K updates per second in a single node setting and keep the query latency at a millisecond level.

preprint2021arXiv

The Maximum Exposure Problem

Given a set of points $P$ and axis-aligned rectangles $\mathcal{R}$ in the plane, a point $p \in P$ is called \emph{exposed} if it lies outside all rectangles in $\mathcal{R}$. In the \emph{max-exposure problem}, given an integer parameter $k$, we want to delete $k$ rectangles from $\mathcal{R}$ so as to maximize the number of exposed points. We show that the problem is NP-hard and assuming plausible complexity conjectures is also hard to approximate even when rectangles in $\mathcal{R}$ are translates of two fixed rectangles. However, if $\mathcal{R}$ only consists of translates of a single rectangle, we present a polynomial-time approximation scheme. For range space defined by general rectangles, we present a simple $O(k)$ bicriteria approximation algorithm; that is by deleting $O(k^2)$ rectangles, we can expose at least $Ω(1/k)$ of the optimal number of points.

preprint2019arXiv

Learning to Sample: Counting with Complex Queries

We study the problem of efficiently estimating counts for queries involving complex filters, such as user-defined functions, or predicates involving self-joins and correlated subqueries. For such queries, traditional sampling techniques may not be applicable due to the complexity of the filter preventing sampling over joins, and sampling after the join may not be feasible due to the cost of computing the full join. The other natural approach of training and using an inexpensive classifier to estimate the count instead of the expensive predicate suffers from the difficulties in training a good classifier and giving meaningful confidence intervals. In this paper we propose a new method of learning to sample where we combine the best of both worlds by using sampling in two phases. First, we use samples to learn a probabilistic classifier, and then use the classifier to design a stratified sampling method to obtain the final estimates. We theoretically analyze algorithms for obtaining an optimal stratification, and compare our approach with a suite of natural alternatives like quantification learning, weighted and stratified sampling, and other techniques from the literature. We also provide extensive experiments in diverse use cases using multiple real and synthetic datasets to evaluate the quality, efficiency, and robustness of our approach.