Researcher profile

Roman Kolpakov

Roman Kolpakov contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Almost optimal searching of maximal subrepetitions in a word

For $0<δ<1$ a $δ$-subrepetition in a word is a factor which exponent is less than~2 but is not less than $1+δ$ (the exponent of the factor is the ratio of the factor length to its minimal period). The $δ$-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its minimal period. In the paper we propose an algorithm for searching all maximal $δ$-subrepetitions in a word of length~$n$ in $O(\frac{n}δ\log\frac{1}δ)$ time (the lower bound for this time is $Ω(\frac{n}δ)$).

preprint2013arXiv

Searching of gapped repeats and subrepetitions in a word

A gapped repeat is a factor of the form $uvu$ where $u$ and $v$ are nonempty words. The period of the gapped repeat is defined as $|u|+|v|$. The gapped repeat is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its period. The gapped repeat is called $α$-gapped if its period is not greater than $α|v|$. A $δ$-subrepetition is a factor which exponent is less than 2 but is not less than $1+δ$ (the exponent of the factor is the quotient of the length and the minimal period of the factor). The $δ$-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its minimal period. We reveal a close relation between maximal gapped repeats and maximal subrepetitions. Moreover, we show that in a word of length $n$ the number of maximal $α$-gapped repeats is bounded by $O(α^2n)$ and the number of maximal $δ$-subrepetitions is bounded by $O(n/δ^2)$. Using the obtained upper bounds, we propose algorithms for finding all maximal $α$-gapped repeats and all maximal $δ$-subrepetitions in a word of length $n$. The algorithm for finding all maximal $α$-gapped repeats has $O(α^2n)$ time complexity for the case of constant alphabet size and $O(n\log n + α^2n)$ time complexity for the general case. For finding all maximal $δ$-subrepetitions we propose two algorithms. The first algorithm has $O(\frac{n\log\log n}{δ^2})$ time complexity for the case of constant alphabet size and $O(n\log n +\frac{n\log\log n}{δ^2})$ time complexity for the general case. The second algorithm has $O(n\log n+\frac{n}{δ^2}\log \frac{1}δ)$ expected time complexity.

preprint2013arXiv

Various improvements to text fingerprinting

Let s = s_1 .. s_n be a text (or sequence) on a finite alphabet Σof size σ. A fingerprint in s is the set of distinct characters appearing in one of its substrings. The problem considered here is to compute the set {\cal F} of all fingerprints of all substrings of s in order to answer efficiently certain questions on this set. A substring s_i .. s_j is a maximal location for a fingerprint f in F (denoted by <i,j>) if the alphabet of s_i .. s_j is f and s_{i-1}, s_{j+1}, if defined, are not in f. The set of maximal locations ins is {\cal L} (it is easy to see that |{\cal L}| \leq n σ). Two maximal locations <i,j> and <k,l> such that s_i .. s_j = s_k .. s_l are named {\em copies}, and the quotient set of {\cal L} according to the copy relation is denoted by {\cal L}_C. We present new exact and approximate efficient algorithms and data structures for the following three problems: (1) to compute {\cal F}; (2) given f as a set of distinct characters in Σ, to answer if f represents a fingerprint in {\cal F}; (3) given f, to find all maximal locations of f in s.

preprint2011arXiv

On the number of Dejean words over alphabets of 5, 6, 7, 8, 9 and 10 letters

We give lower bounds on the growth rate of Dejean words, i.e. minimally repetitive words, over a k-letter alphabet, for k=5, 6, 7, 8, 9, 10. Put together with the known upper bounds, we estimate these growth rates with the precision of 0,005. As an consequence, we establish the exponential growth of the number of Dejean words over a k-letter alphabet, for k=5, 6, 7, 8, 9, 10.