Source author record

Tomáš Vinař

Tomáš Vinař appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Genomics Quantitative Methods Machine Learning Populations and Evolution

Catalog footprint

What is connected

8works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Probabilistic Models of k-mer Frequencies (Extended Abstract)

In this article, we review existing probabilistic models for modeling abundance of fixed-length strings (k-mers) in DNA sequencing data. These models capture dependence of the abundance on various phenomena, such as the size and repeat content of the genome, heterozygosity levels, and sequencing error rate. This in turn allows to estimate these properties from k-mer abundance histograms observed in real data. We also briefly discuss the issue of comparing k-mer abundance between related sequencing samples and meaningfully summarizing the results.

preprint2016arXiv

Using Sequence Ensembles for Seeding Alignments of MinION Sequencing Data

Oxford Nanopore MinION sequencer is currently the smallest sequencing device available. While being able to produce very long reads (reads of up to 100~kbp were reported), it is prone to high sequencing error rates of up to 30%. Since most of these errors are insertions or deletions, it is very difficult to adapt popular seed-based algorithms designed for aligning data sets with much lower error rates. Base calling of MinION reads is typically done using hidden Markov models. In this paper, we propose to represent each sequencing read by an ensemble of sequences sampled from such a probabilistic model. This approach can improve the sensitivity and false positive rate of seeding an alignment compared to using a single representative base call sequence for each read.

preprint2013arXiv

Probabilistic Approaches to Alignment with Tandem Repeats

We propose a simple tractable pair hidden Markov model for pairwise sequence alignment that accounts for the presence of short tandem repeats. Using the framework of gain functions, we design several optimization criteria for decoding this model and describe the resulting decoding algorithms, ranging from the traditional Viterbi and posterior decoding to block-based decoding algorithms specialized for our model. We compare the accuracy of individual decoding algorithms on simulated data and find our approach superior to the classical three-state pair HMM in simulations.

preprint2012arXiv

Sequence Annotation with HMMs: New Problems and Their Complexity

Hidden Markov models (HMMs) and their variants were successfully used for several sequence annotation tasks. Traditionally, inference with HMMs is done using the Viterbi and posterior decoding algorithms. However, recently a variety of different optimization criteria and associated computational problems were proposed. In this paper, we consider three HMM decoding criteria and prove their NP hardness. These criteria consider the set of states used to generate a certain sequence, but abstract from the exact locations of regions emitted by individual states. We also illustrate experimentally that these criteria are useful for HIV recombination detection.

preprint2010arXiv

A New Approach to the Small Phylogeny Problem

In the small phylogeny problem we, are given a phylogenetic tree and gene orders of the extant species and our goal is to reconstruct all of the ancestral genomes so that the number of evolutionary operations is minimized. Algorithms for reconstructing evolutionary history from gene orders are usually based on repeatedly computing medians of genomes at neighbouring vertices of the tree. We propose a new, more general approach, based on an iterative local optimization procedure. In each step, we propose candidates for ancestral genomes and choose the best ones by dynamic programming. We have implemented our method and used it to reconstruct evolutionary history of 16 yeast mtDNAs and 13 Campanulaceae cpDNAs.

preprint2010arXiv

The Highest Expected Reward Decoding for HMMs with Application to Recombination Detection

Hidden Markov models are traditionally decoded by the Viterbi algorithm which finds the highest probability state path in the model. In recent years, several limitations of the Viterbi decoding have been demonstrated, and new algorithms have been developed to address them \citep{Kall2005,Brejova2007,Gross2007,Brown2010}. In this paper, we propose a new efficient highest expected reward decoding algorithm (HERD) that allows for uncertainty in boundaries of individual sequence features. We demonstrate usefulness of our approach on jumping HMMs for recombination detection in viral genomes.

preprint2009arXiv

Bayesian History Reconstruction of Complex Human Gene Clusters on a Phylogeny

Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. Improved understanding of these clusters is of utmost importance, since they have been shown to be the source of evolutionary innovation, and have been linked to multiple diseases, including HIV and a variety of cancers. Previously, Zhang et al. (2008) developed an algorithm for reconstructing parsimonious evolutionary histories of such gene clusters, using only human genomic sequence data. In this paper, we propose a probabilistic model for the evolution of gene clusters on a phylogeny, and an MCMC algorithm for reconstruction of duplication histories from genomic sequences in multiple species. Several projects are underway to obtain high quality BAC-based assemblies of duplicated clusters in multiple species, and we anticipate that our method will be useful in analyzing these valuable new data sets.

preprint2007arXiv

On-line Viterbi Algorithm and Its Relationship to Random Walks

In this paper, we introduce the on-line Viterbi algorithm for decoding hidden Markov models (HMMs) in much smaller than linear space. Our analysis on two-state HMMs suggests that the expected maximum memory used to decode sequence of length $n$ with $m$-state HMM can be as low as $Θ(m\log n)$, without a significant slow-down compared to the classical Viterbi algorithm. Classical Viterbi algorithm requires $O(mn)$ space, which is impractical for analysis of long DNA sequences (such as complete human genome chromosomes) and for continuous data streams. We also experimentally demonstrate the performance of the on-line Viterbi algorithm on a simple HMM for gene finding on both simulated and real DNA sequences.