Source author record

Markus L. Schmid

Markus L. Schmid appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Formal Languages and Automata Theory Computational Complexity Data Structures and Algorithms Databases

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Extending Shinohara's Algorithm for Computing Descriptive (Angluin-Style) Patterns to Subsequence Patterns

The introduction of pattern languages in the seminal work [Angluin, ``Finding Patterns Common to a Set of Strings'', JCSS 1980] has revived the classical model of inductive inference (learning in the limit, gold-style learning). In [Shinohara, ``Polynomial Time Inference of Pattern Languages and Its Application'', 7th IBM Symposium on Mathematical Foundations of Computer Science 1982] a simple and elegant algorithm has been introduced that, based on membership queries, computes a pattern that is descriptive for a given sample of input strings (and, consequently, can be employed in strategies for inductive inference). In this paper, we give a brief survey of the recent work [Kleest-Meißner et al., ``Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints'', ICDT 2022], where the classical concepts of Angluin-style (descriptive) patterns and the respective Shinohara's algorithm are extended to a query class with applications in complex event recognition -- a modern topic from databases.

preprint2022arXiv

Subsequences With Gap Constraints: Complexity Bounds for Matching and Analysis Problems

We consider subsequences with gap constraints, i.e., length-k subsequences p that can be embedded into a string w such that the induced gaps (i.e., the factors of w between the positions to which p is mapped to) satisfy given gap constraints $gc = (C_1, C_2, ..., C_{k-1})$; we call p a gc-subsequence of w. In the case where the gap constraints gc are defined by lower and upper length bounds $C_i = (L^-_i, L^+_i) \in \mathbb{N}^2$ and/or regular languages $C_i \in REG$, we prove tight (conditional on the orthogonal vectors (OV) hypothesis) complexity bounds for checking whether a given p is a gc-subsequence of a string w. We also consider the whole set of all gc-subsequences of a string, and investigate the complexity of the universality, equivalence and containment problems for these sets of gc-subsequences.

preprint2021arXiv

Spanner Evaluation over SLP-Compressed Documents

We consider the problem of evaluating regular spanners over compressed documents, i.e., we wish to solve evaluation tasks directly on the compressed data, without decompression. As compressed forms of the documents we use straight-line programs (SLPs) -- a lossless compression scheme for textual data widely used in different areas of theoretical computer science and particularly well-suited for algorithmics on compressed data. In terms of data complexity, our results are as follows. For a regular spanner M and an SLP S that represents a document D, we can solve the tasks of model checking and of checking non-emptiness in time O(size(S)). Computing the set M(D) of all span-tuples extracted from D can be done in time O(size(S) size(M(D))), and enumeration of M(D) can be done with linear preprocessing O(size(S)) and a delay of O(depth(S)), where depth(S) is the depth of S's derivation tree. Note that size(S) can be exponentially smaller than the document's size |D|; and, due to known balancing results for SLPs, we can always assume that depth(S) = O(log(|D|)) independent of D's compressibility. Hence, our enumeration algorithm has a delay logarithmic in the size of the non-compressed data and a preprocessing time that is at best (i.e., in the case of highly compressible documents) also logarithmic, but at worst still linear. Therefore, in a big-data perspective, our enumeration algorithm for SLP-compressed documents may nevertheless beat the known linear preprocessing and constant delay algorithms for non-compressed documents.

preprint2015arXiv

Characterization and Complexity Results on Jumping Finite Automata

In a jumping finite automaton, the input head can jump to an arbitrary position within the remaining input after reading and consuming a symbol. We characterize the corresponding class of languages in terms of special shuffle expressions and survey other equivalent notions from the existing literature. Moreover, we present several results concerning computational hardness and algorithms for parsing and other basic tasks concerning jumping finite automata.