Source author record

Stavros Konstantinidis

Stavros Konstantinidis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Formal Languages and Automata Theory Computational Complexity Genomics Information Theory math.IT

Catalog footprint

What is connected

7works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Approximate NFA Universality and Related Problems Motivated by Information Theory

In coding and information theory, it is desirable to construct maximal codes that can be either variable length codes or error control codes of fixed length. However deciding code maximality boils down to deciding whether a given NFA is universal, and this is a hard problem (including the case of whether the NFA accepts all words of a fixed length). On the other hand, it is acceptable to know whether a code is `approximately' maximal, which then boils down to whether a given NFA is `approximately' universal. Here we introduce the notion of a $(1-ε)$-universal automaton and present polynomial randomized approximation algorithms to test NFA universality and related hard automata problems, for certain natural probability distributions on the set of words. We also conclude that the randomization aspect is necessary, as approximate universality remains hard for any fixed polynomially computable $ε$.

preprint2016arXiv

Channels with Synchronization/Substitution Errors and Computation of Error Control Codes

We introduce the concept of an \ff-maximal error-detecting block code, for some parameter \ff{} between 0 and 1, in order to formalize the situation where a block code is close to maximal with respect to being error-detecting. Our motivation for this is that constructing a maximal error-detecting code is a computationally hard problem. We present a randomized algorithm that takes as input two positive integers $N,\ell$, a probability value \ff, and a specification of the errors permitted in some application, and generates an error-detecting, or error-correcting, block code having up to $N$ codewords of length $\ell$. If the algorithm finds less than $N$ codewords, then those codewords constitute a code that is \ff-maximal with high probability. The error specification (also called channel) is modelled as a transducer, which allows one to model any rational combination of substitution and synchronization errors. We also present some elements of our implementation of various error-detecting properties and their associated methods. Then, we show several tests of the implemented randomized algorithm on various channels. A methodological contribution is the presentation of how various desirable error combinations can be expressed formally and processed algorithmically.

preprint2016arXiv

Symbolic Manipulation of Code Properties

The FAdo system is a symbolic manipulator of formal languages objects, implemented in Python. In this work, we extend its capabilities by implementing methods to manipulate transducers and we go one level higher than existing formal language systems and implement methods to manipulate objects representing classes of independent languages (widely known as code properties). Our methods allow users to define their own code properties and combine them between themselves or with fixed properties such as prefix codes, suffix codes, error detecting codes, etc. The satisfaction and maximality decision questions are solvable for any of the definable properties. The new online system LaSer allows to query about code properties and obtain the answer in a batch mode. Our work is founded on independence theory as well as the theory of rational relations and transducers and contributes with improveded algorithms on these objects.

preprint2015arXiv

An investigation into inter- and intragenomic variations of graphic genomic signatures

We provide, on an extensive dataset and using several different distances, confirmation of the hypothesis that CGR patterns are preserved along a genomic DNA sequence, and are different for DNA sequences originating from genomes of different species. This finding lends support to the theory that CGRs of genomic sequences can act as graphic genomic signatures. In particular, we compare the CGR patterns of over five hundred different 150,000 bp genomic sequences originating from the genomes of six organisms, each belonging to one of the kingdoms of life: H. sapiens, S. cerevisiae, A. thaliana, P. falciparum, E. coli, and P. furiosus. We also provide preliminary evidence of this method's applicability to closely related species by comparing H. sapiens (chromosome 21) sequences and over one hundred and fifty genomic sequences, also 150,000 bp long, from P. troglodytes (Animalia; chromosome Y), for a total length of more than 101 million basepairs analyzed. We compute pairwise distances between CGRs of these genomic sequences using six different distances, and construct Molecular Distance Maps that visualize all sequences as points in a two-dimensional or three-dimensional space, to simultaneously display their interrelationships. Our analysis confirms that CGR patterns of DNA sequences from the same genome are in general quantitatively similar, while being different for DNA sequences from genomes of different species. Our analysis of the performance of the assessed distances uses three different quality measures and suggests that several distances outperform the Euclidean distance, which has so far been almost exclusively used for such studies. In particular we show that, for this dataset, DSSIM (Structural Dissimilarity Index) and the descriptor distance (introduced here) are best able to classify genomic sequences.

preprint2015arXiv

Embedding rationally independent languages into maximal ones

We consider the embedding problem in coding theory: given an independence (a code-related property) and an independent language $L$, find a maximal independent language containing $L$. We consider the case where the code-related property is defined via a rational binary relation that is decreasing with respect to any fixed total order on the set of words. Our method works by iterating a max-min operator that has been used before for the embedding problem for properties defined by length-increasing-and-transitive binary relations. By going to order-decreasing rational relations, represented by input-decreasing transducers, we are able to include many known properties from both the noiseless and noisy domains of coding theory, as well as any combination of such properties. Moreover, in many cases the desired maximal embedding is effectively computable.

preprint2015arXiv

Transducer Descriptions of DNA Code Properties and Undecidability of Antimorphic Problems

This work concerns formal descriptions of DNA code properties, and builds on previous work on transducer descriptions of classic code properties and on trajectory descriptions of DNA code properties. This line of research allows us to give a property as input to an algorithm, in addition to any regular language, which can then answer questions about the language and the property. Here we define DNA code properties via transducers and show that this method is strictly more expressive than that of trajectories, without sacrificing the efficiency of deciding the satisfaction question. We also show that the maximality question can be undecidable. Our undecidability results hold not only for the fixed DNA involution but also for any fixed antimorphic permutation. Moreover, we also show the undecidability of the antimorphic version of the Post Corresponding Problem, for any fixed antimorphic permutation.

preprint2014arXiv

An efficient algorithm for computing the edit distance of a regular language via input-altering transducers

We revisit the problem of computing the edit distance of a regular language given via an NFA. This problem relates to the inherent maximal error-detecting capability of the language in question. We present an efficient algorithm for solving this problem which executes in time $O(r^2n^2d)$, where $r$ is the cardinality of the alphabet involved, $n$ is the number of transitions in the given NFA, and $d$ is the computed edit distance. We have implemented the algorithm and present here performance tests. The correctness of the algorithm is based on the result (also presented here) that the particular error-detection property related to our problem can be defined via an input-altering transducer.

Stavros Konstantinidis

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Approximate NFA Universality and Related Problems Motivated by Information Theory

Channels with Synchronization/Substitution Errors and Computation of Error Control Codes

Symbolic Manipulation of Code Properties

An investigation into inter- and intragenomic variations of graphic genomic signatures

Embedding rationally independent languages into maximal ones

Transducer Descriptions of DNA Code Properties and Undecidability of Antimorphic Problems

An efficient algorithm for computing the edit distance of a regular language via input-altering transducers