Source author record

Roman Kolpakov

Roman Kolpakov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Formal Languages and Automata Theory Computational Complexity Discrete Mathematics Information Retrieval

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Almost optimal searching of maximal subrepetitions in a word

For $0<δ<1$ a $δ$-subrepetition in a word is a factor which exponent is less than~2 but is not less than $1+δ$ (the exponent of the factor is the ratio of the factor length to its minimal period). The $δ$-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its minimal period. In the paper we propose an algorithm for searching all maximal $δ$-subrepetitions in a word of length~$n$ in $O(\frac{n}δ\log\frac{1}δ)$ time (the lower bound for this time is $Ω(\frac{n}δ)$).

preprint2016arXiv

Indexing and querying color sets of images

We aim to study the set of color sets of continuous regions of an image given as a matrix of $m$ rows over $n\geq m$ columns where each element in the matrix is an integer from $[1,σ]$ named a {\em color}. The set of distinct colors in a region is called fingerprint. We aim to compute, index and query the fingerprints of all rectangular regions named rectangles. The set of all such fingerprints is denoted by ${\cal F}$. A rectangle is {\em maximal} if it is not contained in a greater rectangle with the same fingerprint. The set of all locations of maximal rectangles is denoted by $\mathcal{L}.$ We first explain how to determine all the $|\mathcal{L}|$ maximal locations with their fingerprints in expected time $O(nm^2σ)$ using a Monte Carlo algorithm (with polynomially small probability of error) or within deterministic $O(nm^2σ\log(\frac{|\mathcal{L}|}{nm^2}+2))$ time. We then show how to build a data structure which occupies $O(nm\log n+\mathcal{|L|})$ space such that a query which asks for all the maximal locations with a given fingerprint $f$ can be answered in time $O(|f|+\log\log n+k)$, where $k$ is the number of maximal locations with fingerprint $f$. If the query asks only for the presence of the fingerprint, then the space usage becomes $O(nm\log n+|{\cal F}|)$ while the query time becomes $O(|f|+\log\log n)$. We eventually consider the special case of squared regions (squares).

preprint2015arXiv

Optimal searching of gapped repeats in a word

Following (Kolpakov et al., 2013; Gawrychowski and Manea, 2015), we continue the study of {\em $α$-gapped repeats} in strings, defined as factors $uvu$ with $|uv|\leq α|u|$. Our main result is the $O(αn)$ bound on the number of {\em maximal} $α$-gapped repeats in a string of length $n$, previously proved to be $O(α^2 n)$ in (Kolpakov et al., 2013). For a closely related notion of maximal $δ$-subrepetition (maximal factors of exponent between $1+δ$ and $2$), our result implies the $O(n/δ)$ bound on their number, which improves the bound of (Kolpakov et al., 2010) by a $\log n$ factor. We also prove an algorithmic time bound $O(αn+S)$ ($S$ size of the output) for computing all maximal $α$-gapped repeats. Our solution, inspired by (Gawrychowski and Manea, 2015), is different from the recently published proof by (Tanimura et al., 2015) of the same bound. Together with our bound on $S$, this implies an $O(αn)$-time algorithm for computing all maximal $α$-gapped repeats.

preprint2015arXiv

Upper bound on the number of steps for solving the subset sum problem by the Branch-and-Bound method

We study the computational complexity of one of the particular cases of the knapsack problem: the subset sum problem. For solving this problem we consider one of the basic variants of the Branch-and-Bound method in which any sub-problem is decomposed along the free variable with the maximal weight. By the complexity of solving a problem by the Branch-and-Bound method we mean the number of steps required for solving the problem by this method. In the paper we obtain upper bounds on the complexity of solving the subset sum problem by the Branch-and-Bound method. These bounds can be easily computed from the input data of the problem. So these bounds can be used for the the preliminary estimation of the computational resources required for solving the subset sum problem by the Branch-and-Bound method.

preprint2013arXiv

Searching of gapped repeats and subrepetitions in a word

A gapped repeat is a factor of the form $uvu$ where $u$ and $v$ are nonempty words. The period of the gapped repeat is defined as $|u|+|v|$. The gapped repeat is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its period. The gapped repeat is called $α$-gapped if its period is not greater than $α|v|$. A $δ$-subrepetition is a factor which exponent is less than 2 but is not less than $1+δ$ (the exponent of the factor is the quotient of the length and the minimal period of the factor). The $δ$-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its minimal period. We reveal a close relation between maximal gapped repeats and maximal subrepetitions. Moreover, we show that in a word of length $n$ the number of maximal $α$-gapped repeats is bounded by $O(α^2n)$ and the number of maximal $δ$-subrepetitions is bounded by $O(n/δ^2)$. Using the obtained upper bounds, we propose algorithms for finding all maximal $α$-gapped repeats and all maximal $δ$-subrepetitions in a word of length $n$. The algorithm for finding all maximal $α$-gapped repeats has $O(α^2n)$ time complexity for the case of constant alphabet size and $O(n\log n + α^2n)$ time complexity for the general case. For finding all maximal $δ$-subrepetitions we propose two algorithms. The first algorithm has $O(\frac{n\log\log n}{δ^2})$ time complexity for the case of constant alphabet size and $O(n\log n +\frac{n\log\log n}{δ^2})$ time complexity for the general case. The second algorithm has $O(n\log n+\frac{n}{δ^2}\log \frac{1}δ)$ expected time complexity.

preprint2013arXiv

Various improvements to text fingerprinting

Let s = s_1 .. s_n be a text (or sequence) on a finite alphabet Σof size σ. A fingerprint in s is the set of distinct characters appearing in one of its substrings. The problem considered here is to compute the set {\cal F} of all fingerprints of all substrings of s in order to answer efficiently certain questions on this set. A substring s_i .. s_j is a maximal location for a fingerprint f in F (denoted by <i,j>) if the alphabet of s_i .. s_j is f and s_{i-1}, s_{j+1}, if defined, are not in f. The set of maximal locations ins is {\cal L} (it is easy to see that |{\cal L}| \leq n σ). Two maximal locations <i,j> and <k,l> such that s_i .. s_j = s_k .. s_l are named {\em copies}, and the quotient set of {\cal L} according to the copy relation is denoted by {\cal L}_C. We present new exact and approximate efficient algorithms and data structures for the following three problems: (1) to compute {\cal F}; (2) given f as a set of distinct characters in Σ, to answer if f represents a fingerprint in {\cal F}; (3) given f, to find all maximal locations of f in s.

preprint2011arXiv

Linear pattern matching on sparse suffix trees

Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse suffix tree} \cite{KU-96} with appropriately defined suffix links. Assuming, under the standard unit-cost RAM model, that a word can store up to $\log_σn$ characters ($σ$ the alphabet size), our index takes $O(n/\log_σn)$ space, i.e. the same space as the packed string itself. The resulting pattern matching algorithm runs in time $O(m+r^2+r\cdot occ)$, where $m$ is the length of the pattern, $r$ is the actual number of characters stored in a word and $occ$ is the number of pattern occurrences.

preprint2011arXiv

On primary and secondary repetitions in words

Combinatorial properties of maximal repetitions (runs) in formal words are studied. We classify all maximal repetitions in a word as primary and secondary where the set of all primary repetitions determines all the other repetitons in the word. Essential combinatorial properties of primary repetitions are established.

preprint2011arXiv

On the number of Dejean words over alphabets of 5, 6, 7, 8, 9 and 10 letters

We give lower bounds on the growth rate of Dejean words, i.e. minimally repetitive words, over a k-letter alphabet, for k=5, 6, 7, 8, 9, 10. Put together with the known upper bounds, we estimate these growth rates with the precision of 0,005. As an consequence, we establish the exponential growth of the number of Dejean words over a k-letter alphabet, for k=5, 6, 7, 8, 9, 10.

Roman Kolpakov

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Almost optimal searching of maximal subrepetitions in a word

Indexing and querying color sets of images

Optimal searching of gapped repeats in a word

Upper bound on the number of steps for solving the subset sum problem by the Branch-and-Bound method

Searching of gapped repeats and subrepetitions in a word

Various improvements to text fingerprinting

Linear pattern matching on sparse suffix trees

On primary and secondary repetitions in words

On the number of Dejean words over alphabets of 5, 6, 7, 8, 9 and 10 letters