Source author record

Cristian Riveros

Cristian Riveros appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Formal Languages and Automata Theory Logic in Computer Science Data Structures and Algorithms Computational Complexity

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A formal query language and automata model for aggregation in complex event recognition

Complex Event Recognition (CER) systems are used to identify complex patterns in event streams, such as those found in stock markets, sensor networks, and other similar applications. An important task in such patterns is aggregation, which involves summarizing a set of values into a single value using an algebraic function, such as the maximum, sum, or average, among others. Despite the relevance of this task, query languages in CER typically support aggregation in a restricted syntactic form, and their semantics are generally undefined. In this work, we present a first step toward formalizing a query language with aggregation for CER. We propose to extend Complex Event Logic (CEL), a formal query language for CER, with aggregation operations. This task requires revisiting the semantics of CEL, using a new semantics based on bags of tuples instead of sets of positions. Then, we present an extension of CEL, called Aggregation CEL (ACEL), which introduces an aggregation operator for any commutative monoid operation. The operator can be freely composed with previous CEL operators, allowing users to define complex queries and patterns. We showcase several queries in practice where ACEL proves to be natural for specifying them. From the computational side, we present a novel automata model, called Aggregation Complex Event Automata (ACEA), that extends the previous proposal of Complex Event Automata (CEA) with aggregation and filtering features. Moreover, we demonstrate that every query in ACEL can be expressed in ACEA, illustrating the effectiveness of our computational model. Finally, we study the expressiveness of ACEA through the lens of ACEL, showing that the automata model is more expressive than ACEL.

preprint2026arXiv

Structural Indexing of Relational Databases for the Evaluation of Free-Connex Acyclic Conjunctive Queries

We present an index structure to boost the evaluation of free-connex acyclic conjunctive queries (fc-ACQs) over relational databases. The main ingredient of the index associated with a given database $D$ is an auxiliary database $D_{col}$. Our main result states that for any fc-ACQ $Q$ over $D$, we can count the number of answers of $Q$ or enumerate them with constant delay after a preprocessing phase that takes time linear in the size of $D_{col}$. Unlike previous indexing methods based on values or order (e.g., B+ trees), our index is based on structural symmetries among tuples in a database, and the size of $D_{col}$ is related to the number of colors assigned to $D$ by Scheidt and Schweikardt's "relational color refinement" (2025). In the particular case of graphs, this coincides with the minimal size of an equitable partition of the graph. For example, the size of $D_{col}$ is logarithmic in the case of binary trees and constant for regular graphs. Even in the worst-case that $D$ has no structural symmetries among tuples at all, the size of $D_{col}$ is still linear in the size of $D$. Given that the size of $D_{col}$ is bounded by the size of $D$ and can be much smaller (even constant for some families of databases), our index is the first foundational result on indexing internal structural symmetries of a database to evaluate all fc-ACQs with performance potentially strictly smaller than the database size.

preprint2026arXiv

Using Color Refinement to Boost Enumeration and Counting for Acyclic CQs of Binary Schemas

We present an index structure, called the color-index, to boost the evaluation of acyclic conjunctive queries (ACQs) over binary schemas. The color-index is based on the color refinement algorithm, a widely used subroutine for graph isomorphism testing algorithms. Given a database $D$, we use a suitable version of the color refinement algorithm to produce a stable coloring of $D$, an assignment from the active domain of $D$ to a set of colors $C_D$. The main ingredient of the color-index is a particular database $D_c$ whose active domain is $C_D$ and whose size is at most $|D|$. Using the color-index, we can evaluate any free-connex ACQ $Q$ over $D$ with preprocessing time $O(|Q| \cdot |D_c|)$ and constant delay enumeration. Furthermore, we can also count the number of results of $Q$ over $D$ in time $O(|Q| \cdot |D_c|)$. Given that $|D_c|$ could be much smaller than $|D|$ (even constant-size for some families of databases), the color-index is the first index structure for evaluating free-connex ACQs that allows efficient enumeration and counting with performance that may be strictly smaller than the database size.

preprint2025arXiv

Enumeration and updates for conjunctive linear algebra queries through expressibility

Due to the importance of linear algebra and matrix operations in data analytics, there is significant interest in using relational query optimization and processing techniques for evaluating (sparse) linear algebra programs. In particular, in recent years close connections have been established between linear algebra programs and relational algebra that allow transferring optimization techniques of the latter to the former. In this paper, we ask ourselves which linear algebra programs in MATLANG correspond to the free-connex and q-hierarchical fragments of conjunctive first-order logic. Both fragments have desirable query processing properties: free-connex conjunctive queries support constant-delay enumeration after a linear-time preprocessing phase, and q-hierarchical conjunctive queries further allow constant-time updates. By characterizing the corresponding fragments of MATLANG, we hence identify the fragments of linear algebra programs that one can evaluate with constant-delay enumeration after linear-time preprocessing and with constant-time updates. To derive our results, we improve and generalize previous correspondences between MATLANG and relational algebra evaluated over semiring-annotated relations. In addition, we identify properties on semirings that allow to generalize the complexity bounds for free-connex and q-hierarchical conjunctive queries from Boolean annotations to general semirings.

preprint2022arXiv

CORE: a Complex Event Recognition Engine

Complex Event Recognition (CER) systems are a prominent technology for finding user-defined query patterns over large data streams in real time. CER query evaluation is known to be computationally challenging, since it requires maintaining a set of partial matches, and this set quickly grows super-linearly in the number of processed events. We present CORE, a novel COmplex event Recognition Engine that focuses on the efficient evaluation of a large class of complex event queries, including time windows as well as the partition-by event correlation operator. This engine uses a novel automaton-based evaluation algorithm that circumvents the super-linear partial match problem: under data complexity, it takes constant time per input event to maintain a data structure that compactly represents the set of partial matches and, once a match is found, the query results may be enumerated from the data structure with output-linear delay. We experimentally compare CORE against state-of-the-art CER systems on real-world data. We show that (1) CORE's performance is stable with respect to both query and time window size, and (2) CORE outperforms the other systems by up to five orders of magnitude on different workloads.

preprint2022arXiv

Probabilistic Automata of Bounded Ambiguity

Probabilistic automata are an extension of nondeterministic finite automata in which transitions are annotated with probabilities. Despite its simplicity, this model is very expressive and many of the associated algorithmic questions are undecidable. In this work we focus on the emptiness problem (and its variant the value problem), which asks whether a given probabilistic automaton accepts some word with probability greater than a given threshold. We consider a natural and well-studied structural restriction on automata, namely the degree of ambiguity, which is defined as the maximum number of accepting runs over all words. The known undecidability proofs exploits infinite ambiguity and so we focus on the case of finitely ambiguous probabilistic automata. Our main contributions are to construct efficient algorithms for analysing finitely ambiguous probabilistic automata through a reduction to a multi-objective optimisation problem called the stochastic path problem. We obtain a polynomial time algorithm for approximating the value of probabilistic automata of fixed ambiguity and a quasi-polynomial time algorithm for the emptiness problem for 2-ambiguous probabilistic automata. We complement these positive results by an inapproximability result stating that the value of finitely ambiguous probabilistic automata cannot be approximated unless PTIME = NP.

preprint2022arXiv

Streaming enumeration on nested documents

Some of the most relevant document schemas used online, such as XML and JSON, have a nested format. In the last decade, the task of extracting data from nested documents over streams has become especially relevant. We focus on the streaming evaluation of queries with outputs of varied sizes over nested documents. We model queries of this kind as Visibly Pushdown Transducers (VPT), a computational model that extends visibly pushdown automata with outputs and has the same expressive power as MSO over nested documents. Since processing a document through a VPT can generate a massive number of results, we are interested in reading the input in a streaming fashion and enumerating the outputs one after another as efficiently as possible, namely, with constant-delay. This paper presents an algorithm that enumerates these elements with constant-delay after processing the document stream in a single pass. Furthermore, we show that this algorithm is worst-case optimal in terms of update-time per symbol and memory usage.

preprint2020arXiv

The monitoring problem for timed automata

We study a variant of the classical membership problem in automata theory, which consists of deciding whether a given input word is accepted by a given automaton. We do so under a different perspective, that is, we consider a dynamic version of the problem, called monitoring problem, where the automaton is fixed and the input is revealed as in a stream, one symbol at a time following the natural order on positions. The goal here is to design a dynamic data structure that can be queried about whether the word consisting of symbols revealed so far is accepted by the automaton, and that can be efficiently updated when the next symbol is revealed. We provide complexity bounds for this monitoring problem, by considering timed automata that process symbols interleaved with timestamps. The main contribution is that monitoring of a one-clock timed automaton, with all its components but the clock constants fixed, can be done in amortised constant time per input symbol.

preprint2010arXiv

Composition and Inversion of Schema Mappings

In the recent years, a lot of attention has been paid to the development of solid foundations for the composition and inversion of schema mappings. In this paper, we review the proposals for the semantics of these crucial operators. For each of these proposals, we concentrate on the three following problems: the definition of the semantics of the operator, the language needed to express the operator, and the algorithmic issues associated to the problem of computing the operator. It should be pointed out that we primarily consider the formalization of schema mappings introduced in the work on data exchange. In particular, when studying the problem of computing the composition and inverse of a schema mapping, we will be mostly interested in computing these operators for mappings specified by source-to-target tuple-generating dependencies.

Cristian Riveros

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

A formal query language and automata model for aggregation in complex event recognition

Structural Indexing of Relational Databases for the Evaluation of Free-Connex Acyclic Conjunctive Queries

Using Color Refinement to Boost Enumeration and Counting for Acyclic CQs of Binary Schemas

Enumeration and updates for conjunctive linear algebra queries through expressibility

CORE: a Complex Event Recognition Engine

Probabilistic Automata of Bounded Ambiguity

Streaming enumeration on nested documents

The monitoring problem for timed automata

Composition and Inversion of Schema Mappings