Source author record

Esko Ukkonen

Esko Ukkonen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms astro-ph.GA Computational Complexity Discrete Mathematics

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

Longest common substrings with k mismatches

The longest common substring with $k$-mismatches problem is to find, given two strings $S_1$ and $S_2$, a longest substring $A_1$ of $S_1$ and $A_2$ of $S_2$ such that the Hamming distance between $A_1$ and $A_2$ is $\le k$. We introduce a practical $O(nm)$ time and $O(1)$ space solution for this problem, where $n$ and $m$ are the lengths of $S_1$ and $S_2$, respectively. This algorithm can also be used to compute the matching statistics with $k$-mismatches of $S_1$ and $S_2$ in $O(nm)$ time and $O(m)$ space. Moreover, we also present a theoretical solution for the $k = 1$ case which runs in $O(n \log m)$ time, assuming $m\le n$, and uses $O(m)$ space, improving over the existing $O(nm)$ time and $O(m)$ space bound of Babenko and Starikovskaya.

preprint2014arXiv

Motif matching using gapped patterns

We present new algorithms for the problem of multiple string matching of gapped patterns, where a gapped pattern is a sequence of strings such that there is a gap of fixed length between each two consecutive strings. The problem has applications in the discovery of transcription factor binding sites in DNA sequences when using generalized versions of the Position Weight Matrix model to describe transcription factor specificities. In these models a motif can be matched as a set of gapped patterns with unit-length keywords. The existing algorithms for matching a set of gapped patterns are worst-case efficient but not practical, or vice versa, in this particular case. The novel algorithms that we present are based on dynamic programming and bit-parallelism, and lie in a middle-ground among the existing algorithms. In fact, their time complexity is close to the best existing bound and, yet, they are also practical. We also provide experimental results which show that the presented algorithms are fast in practice, and preferable if all the strings in the patterns have unit-length.

preprint2013arXiv

Mining the VVV: star formation and embedded clusters

The aim of this study is to locate previously unknown stellar clusters from the VISTA variables in the Vía Láctea Survey (VVV) catalogue data. The method, fitting a mixture model of Gaussian densities and background noise using the expectation maximization algorithm to a pre-filtered NIR survey stellar catalogue data, was developed by the authors for the UKIDSS Galactic Plane Survey (GPS). The search located 88 previously unknown mainly embedded stellar cluster candidates and 39 previously unknown sites of star formation in the 562 deg2 covered by VVV in the Galactic bulge and the southern disk.

preprint2010arXiv

On the complexity of finding gapped motifs

This paper has been withdrawn by the corresponding author because the newest version is now published in Journal of Discrete Algorithms.