Researcher profile

Hideo Bannai

Hideo Bannai contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2022arXiv

Combinatorics of minimal absent words for a sliding window

A string $w$ is called a minimal absent word (MAW) for another string $T$ if $w$ does not occur in $T$ but the proper substrings of $w$ occur in $T$. For example, let $Σ= \{\mathtt{a, b, c}\}$ be the alphabet. Then, the set of MAWs for string $w = \mathtt{abaab}$ is $\{\mathtt{aaa, aaba, bab, bb, c}\}$. In this paper, we study combinatorial properties of MAWs in the sliding window model, namely, how the set of MAWs changes when a sliding window of fixed length $d$ is shifted over the input string $T$ of length $n$, where $1 \leq d < n$. We present \emph{tight} upper and lower bounds on the maximum number of changes in the set of MAWs for a sliding window over $T$, both in the cases of general alphabets and binary alphabets. Our bounds improve on the previously known best bounds [Crochemore et al., 2020].

preprint2022arXiv

Computing Longest (Common) Lyndon Subsequences

Given a string $T$ with length $n$ whose characters are drawn from an ordered alphabet of size $σ$, its longest Lyndon subsequence is a longest subsequence of $T$ that is a Lyndon word. We propose algorithms for finding such a subsequence in $O(n^3)$ time with $O(n)$ space, or online in $O(n^3 σ)$ space and time. Our first result can be extended to find the longest common Lyndon subsequence of two strings of length $n$ in $O(n^4 σ)$ time using $O(n^3)$ space.

preprint2022arXiv

Computing NP-hard Repetitiveness Measures via MAX-SAT

Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional macro scheme, and the smallest size of a straight-line program. While a vast variety of implementations for heuristically computing approximations exist, exact computation of these measures has received little to no attention. In this paper, we present MAX-SAT formulations that provide the first non-trivial implementations for exact computation of smallest string attractors, smallest bidirectional macro schemes, and smallest straight-line programs. Computational experiments show that our implementations work for texts of length up to a few hundred for straight-line programs and bidirectional macro schemes, and texts even over a million for string attractors.

preprint2021arXiv

Computing longest palindromic substring after single-character or block-wise edits

Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. It is known that the length of the longest palindromic substrings (LPSs) of a given string T of length n can be computed in O(n) time by Manacher&#39;s algorithm [J. ACM &#39;75]. In this paper, we consider the problem of finding the LPS after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LPSs in O(\log (\min \{σ, \log n\})) time after a single character substitution, insertion, or deletion, where σdenotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LPSs in O(\ell + \log \log n) time, after an existing substring in T is replaced by a string of arbitrary length \ell.

preprint2021arXiv

The Parameterized Suffix Tray

Let $Σ$ and $Π$ be disjoint alphabets, respectively called the static alphabet and the parameterized alphabet. Two strings $x$ and $y$ over $Σ\cup Π$ of equal length are said to parameterized match (p-match) if there exists a renaming bijection $f$ on $Σ$ and $Π$ which is identity on $Σ$ and maps the characters of $x$ to those of $y$ so that the two strings become identical. The indexing version of the problem of finding p-matching occurrences of a given pattern in the text is a well-studied topic in string matching. In this paper, we present a state-of-the-art indexing structure for p-matching called the parameterized suffix tray of an input text $T$, denoted by $\mathsf{PSTray}(T)$. We show that $\mathsf{PSTray}(T)$ occupies $O(n)$ space and supports pattern matching queries in $O(m + \log (σ+π) + \mathit{occ})$ time, where $n$ is the length of $T$, $m$ is the length of a query pattern $P$, $π$ is the number of distinct symbols of $|Π|$ in $T$, $σ$ is the number of distinct symbols of $|Σ|$ in $T$ and $\mathit{occ}$ is the number of p-matching occurrences of $P$ in $T$. We also present how to build $\mathsf{PSTray}(T)$ in $O(n)$ time from the parameterized suffix tree of $T$.

preprint2020arXiv

Detecting $k$-(Sub-)Cadences and Equidistant Subsequence Occurrences

The equidistant subsequence pattern matching problem is considered. Given a pattern string $P$ and a text string $T$, we say that $P$ is an \emph{equidistant subsequence} of $T$ if $P$ is a subsequence of the text such that consecutive symbols of $P$ in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield $o(n^2)$ time algorithms for finding $k$-(sub-)cadences and equidistant subsequences. Furthermore, $O(n\log^2 n)$ and $O(n\log n)$ time algorithms, respectively for equidistant and Abelian equidistant matching for the case $|P| = 3$, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints.

preprint2020arXiv

Fast Algorithms for the Shortest Unique Palindromic Substring Problem on Run-Length Encoded Strings

For a string $S$, a palindromic substring $S[i..j]$ is said to be a \emph{shortest unique palindromic substring} ($\mathit{SUPS}$) for an interval $[s, t]$ in $S$, if $S[i..j]$ occurs exactly once in $S$, the interval $[i, j]$ contains $[s, t]$, and every palindromic substring containing $[s, t]$ which is shorter than $S[i..j]$ occurs at least twice in $S$. In this paper, we study the problem of answering $\mathit{SUPS}$ queries on run-length encoded strings. We show how to preprocess a given run-length encoded string $\mathit{RLE}_{S}$ of size $m$ in $O(m)$ space and $O(m \log σ_{\mathit{RLE}_{S}} + m \sqrt{\log m / \log\log m})$ time so that all $\mathit{SUPSs}$ for any subsequent query interval can be answered in $O(\sqrt{\log m / \log\log m} + α)$ time, where $α$ is the number of outputs, and $σ_{\mathit{RLE}_{S}}$ is the number of distinct runs of $\mathit{RLE}_{S}$. Additionaly, we consider a variant of the SUPS problem where a query interval is also given in a run-length encoded form. For this variant of the problem, we present two alternative algorithms with faster queries. The first one answers queries in $O(\sqrt{\log\log m /\log\log\log m} + α)$ time and can be built in $O(m \log σ_{\mathit{RLE}_{S}} + m \sqrt{\log m / \log\log m})$ time, and the second one answers queries in $O(\log \log m + α)$ time and can be built in $O(m \log σ_{\mathit{RLE}_{S}})$ time. Both of these data structures require $O(m)$ space.

preprint2020arXiv

Faster STR-EC-LCS Computation

The longest common subsequence (LCS) problem is a central problem in stringology that finds the longest common subsequence of given two strings $A$ and $B$. More recently, a set of four constrained LCS problems (called generalized constrained LCS problem) were proposed by Chen and Chao [J. Comb. Optim, 2011]. In this paper, we consider the substring-excluding constrained LCS (STR-EC-LCS) problem. A string $Z$ is said to be an STR-EC-LCS of two given strings $A$ and $B$ excluding $P$ if, $Z$ is one of the longest common subsequences of $A$ and $B$ that does not contain $P$ as a substring. Wang et al. proposed a dynamic programming solution which computes an STR-EC-LCS in $O(mnr)$ time and space where $m = |A|, n = |B|, r = |P|$ [Inf. Process. Lett., 2013]. In this paper, we show a new solution for the STR-EC-LCS problem. Our algorithm computes an STR-EC-LCS in $O(n|Σ| + (L+1)(m-L+1)r)$ time where $|Σ| \leq \min\{m, n\}$ denotes the set of distinct characters occurring in both $A$ and $B$, and $L$ is the length of the STR-EC-LCS. This algorithm is faster than the $O(mnr)$-time algorithm for short/long STR-EC-LCS (namely, $L \in O(1)$ or $m-L \in O(1)$), and is at least as efficient as the $O(mnr)$-time algorithm for all cases.

preprint2020arXiv

Grammar-compressed Self-index with Lyndon Words

We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of $O(g)$ words of space that can be built from a string $T$ in $O(n \lg n)$ expected time, retrieving the starting positions of all occurrences of a pattern $P$ of length $m$ in $O(m + \lg m \lg n + occ \lg g)$ time, where $n$ is the length of $T$, $g$ is the size of the Lyndon SLP for $T$, and $occ$ is the number of occurrences of $P$ in $T$.

preprint2020arXiv

Longest Square Subsequence Problem Revisited

The longest square subsequence (LSS) problem consists of computing a longest subsequence of a given string $S$ that is a square, i.e., a longest subsequence of form $XX$ appearing in $S$. It is known that an LSS of a string $S$ of length $n$ can be computed using $O(n^2)$ time [Kosowski 2004], or with (model-dependent) polylogarithmic speed-ups using $O(n^2 (\log \log n)^2 / \log^2 n)$ time [Tiskin 2013]. We present the first algorithm for LSS whose running time depends on other parameters, i.e., we show that an LSS of $S$ can be computed in $O(r \min\{n, M\}\log \frac{n}{r} + n + M \log n)$ time with $O(M)$ space, where $r$ is the length of an LSS of $S$ and $M$ is the number of matching points on $S$.

preprint2020arXiv

Lyndon Words, the Three Squares Lemma, and Primitive Squares

We revisit the so-called &#34;Three Squares Lemma&#34; by Crochemore and Rytter [Algorithmica 1995] and, using arguments based on Lyndon words, derive a more general variant which considers three overlapping squares which do not necessarily share a common prefix. We also give an improved upper bound of $n\log_2 n$ on the maximum number of (occurrences of) primitively rooted squares in a string of length $n$, also using arguments based on Lyndon words. To the best of our knowledge, the only known upper bound was $n \log_ϕn \approx 1.441n\log_2 n$, where $ϕ$ is the golden ratio, reported by Fraenkel and Simpson [TCS 1999] obtained via the Three Squares Lemma.

preprint2020arXiv

On repetitiveness measures of Thue-Morse words

We show that the size $γ(t_n)$ of the smallest string attractor of the $n$th Thue-Morse word $t_n$ is 4 for any $n\geq 4$, disproving the conjecture by Mantaci et al. [ICTCS 2019] that it is $n$. We also show that $δ(t_n) = \frac{10}{3+2^{4-n}}$ for $n \geq 3$, where $δ(w)$ is the maximum over all $k = 1,\ldots,|w|$, the number of distinct substrings of length $k$ in $w$ divided by $k$, which is a measure of repetitiveness recently studied by Kociumaka et al. [LATIN 2020]. Furthermore, we show that the number $z(t_n)$ of factors in the self-referencing Lempel-Ziv factorization of $t_n$ is exactly $2n$.

preprint2020arXiv

Space-Efficient Algorithms for Computing Minimal/Shortest Unique Substrings

Given a string $T$ of length $n$, a substring $u = T[i..j]$ of $T$ is called a shortest unique substring (SUS) for an interval $[s,t]$ if (a) $u$ occurs exactly once in $T$, (b) $u$ contains the interval $[s,t]$ (i.e. $i \leq s \leq t \leq j$), and (c) every substring $v$ of $T$ with $|v| < |u|$ containing $[s,t]$ occurs at least twice in $T$. Given a query interval $[s, t] \subset [1, n]$, the interval SUS problem is to output all the SUSs for the interval $[s,t]$. In this article, we propose a $4n + o(n)$ bits data structure answering an interval SUS query in output-sensitive $O(\mathit{occ})$ time, where $\mathit{occ}$ is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for $s = t$. Here, we propose a $\lceil (\log_2{3} + 1)n \rceil + o(n)$ bits data structure answering a point SUS query in the same output-sensitive time. We also propose space-efficient algorithms for computing the minimal unique substrings of $T$.

preprint2020arXiv

Towards Efficient Interactive Computation of Dynamic Time Warping Distance

The dynamic time warping (DTW) is a widely-used method that allows us to efficiently compare two time series that can vary in speed. Given two strings $A$ and $B$ of respective lengths $m$ and $n$, there is a fundamental dynamic programming algorithm that computes the DTW distance for $A$ and $B$ together with an optimal alignment in $Θ(mn)$ time and space. In this paper, we tackle the problem of interactive computation of the DTW distance for dynamic strings, denoted $\mathrm{D^2TW}$, where character-wise edit operation (insertion, deletion, substitution) can be performed at an arbitrary position of the strings. Let $M$ and $N$ be the sizes of the run-length encoding (RLE) of $A$ and $B$, respectively. We present an algorithm for $\mathrm{D^2TW}$ that occupies $Θ(mN+nM)$ space and uses $O(m+n+\#_{\mathrm{chg}}) \subseteq O(mN + nM)$ time to update a compact differential representation $\mathit{DS}$ of the DP table per edit operation, where $\#_{\mathrm{chg}}$ denotes the number of cells in $\mathit{DS}$ whose values change after the edit operation. Our method is at least as efficient as the algorithm recently proposed by Froese et al. running in $Θ(mN + nM)$ time, and is faster when $\#_{\mathrm{chg}}$ is smaller than $O(mN + nM)$ which, as our preliminary experiments suggest, is likely to be the case in the majority of instances.