Source author record

Dirk Nowotka

Dirk Nowotka appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Formal Languages and Automata Theory math.CO Discrete Mathematics Computation and Language cs.CY Databases eess.SY Information Retrieval Logic in Computer Science Machine Learning Programming Languages Software Engineering Systems and Control

Catalog footprint

What is connected

13works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Generic Information Extraction System for String Constraints

String constraint solving, and the underlying theory of word equations, are highly interesting research topics both for practitioners and theoreticians working in the wide area of satisfiability modulo theories. As string constraint solving algorithms, a.k.a. string solvers, gained a more prominent role in the formal analysis of string-heavy programs, especially in connection to symbolic code execution and security protocol verification, we can witness an ever-growing number of benchmarks collecting string solving instances from real-world applications as well as an ever-growing need for more efficient and reliable solvers, especially for the aforementioned real-world instances. Thus, it seems that the string solving area (and the developers, theoreticians, and end-users active in it) could greatly benefit from a better understanding and processing of the existing string solving benchmarks. In this context, we propose SMTQUERY: an SMT-LIB benchmark analysis tool for string constraints. SMTQUERY is implemented in Python 3, and offers a collection of analysis and information extraction tools for a comprehensive data base of string benchmarks (presented in SMT-LIB format), based on an SQL-centred language called QLANG.

preprint2022arXiv

m-Nearly k-Universal Words -- Investigating Simon Congruence

Determining the index of the Simon congruence is a long outstanding open problem. Two words $u$ and $v$ are called Simon congruent if they have the same set of scattered factors, which are parts of the word in the correct order but not necessarily consecutive, e.g., $\mathtt{oath}$ is a scattered factor of $\mathtt{logarithm}$. Following the idea of scattered factor $k$-universality, we investigate $m$-nearly $k$-universality, i.e., words where $m$ scattered factors of length $k$ are absent, w.r.t. Simon congruence. We present a full characterisation as well as the index of the congruence for $m=1$. For $m\neq 1$, we show some results if in addition $w$ is $(k-1)$-universal as well as some further insights for different $m$.

preprint2022arXiv

On the Self Shuffle Language

The shuffle product $u\shuffle v$ of two words $u$ and $v$ is the set of all words which can be obtained by interleaving $u$ and $v$. Motivated by the paper \emph{The Shuffle Product: New Research Directions} by Restivo (2015) we investigate a special case of the shuffle product. In this work we consider the shuffle of a word with itself called the \emph{self shuffle} or \emph{shuffle square}, showing first that the self shuffle language and the shuffle of the language are in general different sets. We prove that the language of all words arising as a self shuffle of some word is context sensitive but not context free. Furthermore, we show that the self shuffle $w \shuffle w$ uniquely determines $w$.

preprint2022arXiv

Responsible and Regulatory Conform Machine Learning for Medicine: A Survey of Challenges and Solutions

Machine learning is expected to fuel significant improvements in medical care. To ensure that fundamental principles such as beneficence, respect for human autonomy, prevention of harm, justice, privacy, and transparency are respected, medical machine learning systems must be developed responsibly. Many high-level declarations of ethical principles have been put forth for this purpose, but there is a severe lack of technical guidelines explicating the practical consequences for medical machine learning. Similarly, there is currently considerable uncertainty regarding the exact regulatory requirements placed upon medical machine learning systems. This survey provides an overview of the technical and procedural challenges involved in creating medical machine learning systems responsibly and in conformity with existing regulations, as well as possible solutions to address these challenges. First, a brief review of existing regulations affecting medical machine learning is provided, showing that properties such as safety, robustness, reliability, privacy, security, transparency, explainability, and nondiscrimination are all demanded already by existing law and regulations - albeit, in many cases, to an uncertain degree. Next, the key technical obstacles to achieving these desirable properties are discussed, as well as important techniques to overcome these obstacles in the medical context. We notice that distribution shift, spurious correlations, model underspecification, uncertainty quantification, and data scarcity represent severe challenges in the medical context. Promising solution approaches include the use of large and representative datasets and federated learning as a means to that end, the careful exploitation of domain knowledge, the use of inherently transparent models, comprehensive out-of-distribution model testing and verification, as well as algorithmic impact assessments.

preprint2020arXiv

Automatic Verification of LLVM Code

In this work we present our work in developing a software verification tool for llvm-code - Lodin - that incorporates both explicit-state model checking, statistical model checking and symbolic state model checking algorithms.

preprint2020arXiv

Blocksequences of k-local Words

The locality of words is a relatively young structural complexity measure, introduced by Day et al. in 2017 in order to define classes of patterns with variables which can be matched in polynomial time. The main tool used to compute the locality of a word is called marking sequence: an ordering of the distinct letters occurring in the respective order. Once a marking sequence is defined, the letters of the word are marked in steps: in the ith marking step, all occurrences of the ith letter of the marking sequence are marked. As such, after each marking step, the word can be seen as a sequence of blocks of marked letters separated by blocks of non-marked letters. By keeping track of the evolution of the marked blocks of the word through the marking defined by a marking sequence, one defines the blocksequence of the respective marking sequence. We first show that the words sharing the same blocksequence are only loosely connected, so we consider the stronger notion of extended blocksequence, which stores additional information on the form of each single marked block. In this context, we present a series of combinatorial results for words sharing the extended blocksequence.

preprint2020arXiv

Estimating End-to-End Latencies in Automotive Cyber-physical Systems

Controller networks in today's automotive systems consist of more than 100 ECUs connected by various bus protocols. Seamless operation of the entire system requires a well-orchestrated interaction of these ECUs. Consequently, to ensure safety and comfort, a performance analysis is an inherent part of the engineering process. Conducting such an analysis manually is expensive, slow, and error prone. Tool support is therefore crucial, and a number of approaches have been presented. However, most work is limited to either network latencies or software latencies which results in an analysis gap at the transition between different layers of the communication stack. The work presented here introduces an approach to close this gap. Furthermore, we discuss the integration of different methods to obtain an end-to-end latency analysis.

preprint2020arXiv

On Collapsing Prefix Normal Words

Prefix normal words are binary words in which each prefix has at least the same number of $\so$s as any factor of the same length. Firstly introduced by Fici and Lipták in 2011, the problem of determining the index of the prefix equivalence relation is still open. In this paper, we investigate two aspects of the problem, namely prefix normal palindromes and so-called collapsing words (extending the notion of critical words). We prove characterizations for both the palindromes and the collapsing words and show their connection. Based on this, we show that still open problems regarding prefix normal words can be split into certain subproblems.

preprint2020arXiv

Reconstructing Words from Right-Bounded-Block Words

A reconstruction problem of words from scattered factors asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word $w \in \{a, b\}^{*}$ can be reconstructed from the number of occurrences of at most $\min(|w|_a, |w|_b)+ 1$ scattered factors of the form $a^{i} b$. Moreover, we generalize the result to alphabets of the form $\{1,\ldots,q\}$ by showing that at most $ \sum^{q-1}_{i=1} |w|_i (q-i+1)$ scattered factors suffices to reconstruct $w$. Both results improve on the upper bounds known so far. Complexity time bounds on reconstruction algorithms are also considered here.

preprint2020arXiv

Scattered Factor-Universality of Words

A word $u=u_1\dots u_n$ is a scattered factor of a word $w$ if $u$ can be obtained from $w$ by deleting some of its letters: there exist the (potentially empty) words $v_0,v_1,..,v_n$ such that $w = v_0u_1v_1...u_nv_n$. The set of all scattered factors up to length $k$ of a word is called its full $k$-spectrum. Firstly, we show an algorithm deciding whether the $k$-spectra for given $k$ of two words are equal or not, running in optimal time. Secondly, we consider a notion of scattered-factors universality: the word $w$, with $\letters(w)=Σ$, is called $k$-universal if its $k$-spectrum includes all words of length $k$ over the alphabet $Σ$; we extend this notion to $k$-circular universality. After a series of preliminary combinatorial results, we present an algorithm computing, for a given $k'$-universal word $w$ the minimal $i$ such that $w^i$ is $k$-universal for some $k>k'$. Several other connected problems~are~also~considered.

preprint2016arXiv

A note on Thue games

In this work we improve on a result from~\cite{GryKosZma15}. In particular, we investigate the situation where a word is constructed jointly by two players who alternately append letters to the end of an existing word. One of the players (Ann) tries to avoid (non-trivial) repetitions, while the other one (Ben) tries to enforce them. We show a construction that is closer to the lower bound showed in~\cite{GryKozMic13} using entropy compression, and building on the probabilistic arguments based on a version of the Lovász Local Lemma from~\cite{Peg11}. We provide an explicit strategy for Ann to avoid (non-trivial) repetitions over a $7$-letter alphabet.

preprint2011arXiv

Pattern Avoidability with Involution

An infinte word w avoids a pattern p with the involution t if there is no substitution for the variables in p and no involution t such that the resulting word is a factor of w. We investigate the avoidance of patterns with respect to the size of the alphabet. For example, it is shown that the pattern a t(a) a can be avoided over three letters but not two letters, whereas it is well known that a a a is avoidable over two letters.

preprint2010arXiv

Maximal Intersection Queries in Randomized Input Models

Consider a family of sets and a single set, called the query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and distributing on-line advertisements. In general, maximal intersection queries are computationally expensive. We investigate two well-motivated distributions over all families of sets and propose an algorithm for each of them. We show that with very high probability an almost optimal solution is found in time which is logarithmic in the size of the family. Moreover, we point out a threshold phenomenon on the probabilities of intersecting sets in each of our two input models which leads to the efficient algorithms mentioned above.

Dirk Nowotka

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

A Generic Information Extraction System for String Constraints

m-Nearly k-Universal Words -- Investigating Simon Congruence

On the Self Shuffle Language

Responsible and Regulatory Conform Machine Learning for Medicine: A Survey of Challenges and Solutions

Automatic Verification of LLVM Code

Blocksequences of k-local Words

Estimating End-to-End Latencies in Automotive Cyber-physical Systems

On Collapsing Prefix Normal Words

Reconstructing Words from Right-Bounded-Block Words

Scattered Factor-Universality of Words

A note on Thue games

Pattern Avoidability with Involution

Maximal Intersection Queries in Randomized Input Models