Source author record

Hamidreza Chitsaz

Hamidreza Chitsaz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Biomolecules Genomics Robotics Machine Learning Computational Engineering, Finance, and Science math.OC

Catalog footprint

What is connected

11works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

BPPart and BPMax: RNA-RNA Interaction Partition Function and Structure Prediction for the Base Pair Counting Model

RNA-RNA interaction (RRI) is ubiquitous and has complex roles in the cellular functions. In human health studies, miRNA-target and lncRNAs are among an elite class of RRIs that have been extensively studied. Bacterial ncRNA-target and RNA interference are other classes of RRIs that have received significant attention. In recent studies, mRNA-mRNA interaction instances have been observed, where both partners appear in the same pathway without any direct link between them, or any prior knowledge about their relationship. Those recently discovered cases suggest that RRI scope is much wider than those aforementioned elite classes. We revisit our RNA-RNA interaction partition function algorithm, piRNA, which computes the partition function, base-pairing probabilities, and structure for the comprehensive Turner energy model using 96 different dynamic programming tables. In this study, we strategically retreat from sophisticated thermodynamic models to the much simpler base pair counting model. That might seem counter-intuitive at the first glance; our idea is to benefit from the advantages of such simple models in terms of running time and memory footprint and compensate for the associated information loss by adding machine learning components in the future. Here, simple weighted base pair counting is considered to obtain BPPart for Base-pair Partition function and BPMax for Base-pair Maximization, which use 9 and 2 tables respectively. They are empirically 225 and 1350 fold faster than piRNA. A correlation of 0.855 and 0.836 was achieved between piRNA and BPPart and between piRNA and BPMax, respectively, in 37 degrees, and 0.920 and 0.904 in -180 degrees. We also discover two partner RNAs, SNORD3D and TRAF3, and hypothesize their potential roles in genetic diseases. We envision fusion of machine learning methods with the proposed algorithms in the future.

preprint2016arXiv

Proceedings of the 1st International Workshop on Robot Learning and Planning (RLP 2016)

preprint2014arXiv

NUROA: A Numerical Roadmap Algorithm

Motion planning has been studied for nearly four decades now. Complete, combinatorial motion planning approaches are theoretically well-rooted with completeness guarantees but they are hard to implement. Sampling-based and heuristic methods are easy to implement and quite simple to customize but they lack completeness guarantees. Can the best of both worlds be ever achieved, particularly for mission critical applications such as robotic surgery, space explorations, and handling hazardous material? In this paper, we answer affirmatively to that question. We present a new methodology, NUROA, to numerically approximate the Canny's roadmap, which is a network of one-dimensional algebraic curves. Our algorithm encloses the roadmap with a chain of tiny boxes each of which contains a piece of the roadmap and whose connectivity captures the roadmap connectivity. It starts by enclosing the entire space with a box. In each iteration, remaining boxes are shrunk on all sides and then split into smaller sized boxes. Those boxes that are empty are detected in the shrink phase and removed. The algorithm terminates when all remaining boxes are smaller than a resolution that can be either given as input or automatically computed using root separation lower bounds. Shrink operation is cast as a polynomial optimization with semialgebraic constraints, which is in turn transformed into a (series of) semidefinite programs (SDP) using the Lasserre's approach. NUROA's success is due to fast SDP solvers. NUROA correctly captured the connectivity of multiple curves/skeletons whereas competitors such as IBEX and Realpaver failed in some cases. Since boxes are independent from one another, NUROA can be parallelized particularly on GPUs. NUROA is available as an open source package at http://nuroa.sourceforge.net/.

preprint2014arXiv

Proceedings of the 1st Workshop on Robotics Challenges and Vision (RCV2013)

preprint2013arXiv

An Efficient Algorithm for Upper Bound on the Partition Function of Nucleic Acids

It has been shown that minimum free energy structure for RNAs and RNA-RNA interaction is often incorrect due to inaccuracies in the energy parameters and inherent limitations of the energy model. In contrast, ensemble based quantities such as melting temperature and equilibrium concentrations can be more reliably predicted. Even structure prediction by sampling from the ensemble and clustering those structures by Sfold [7] has proven to be more reliable than minimum free energy structure prediction. The main obstacle for ensemble based approaches is the computational complexity of the partition function and base pairing probabilities. For instance, the space complexity of the partition function for RNA-RNA interaction is $O(n^4)$ and the time complexity is $O(n^6)$ which are prohibitively large [4,12]. Our goal in this paper is to give a fast algorithm, based on sparse folding, to calculate an upper bound on the partition function. Our work is based on the recent algorithm of Hazan and Jaakkola [10]. The space complexity of our algorithm is the same as that of sparse folding algorithms, and the time complexity of our algorithm is $O(MFE(n)\ell)$ for single RNA and $O(MFE(m, n)\ell)$ for RNA-RNA interaction in practice, in which $MFE$ is the running time of sparse folding and $\ell \leq n$ ($\ell \leq n + m$) is a sequence dependent parameter.

preprint2013arXiv

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results - In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions - Many current genome assemblers produced useful assemblies, containing a significant representation of their genes, regulatory sequences, and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

preprint2013arXiv

Distilled Single Cell Genome Sequencing and De Novo Assembly for Sparse Microbial Communities

Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naive exhaustive approach. Availability: Squeezambler and datasets are available under http://compbio.cs.wayne.edu/software/squeezambler/.

preprint2013arXiv

Exact Learning of RNA Energy Parameters From Structure

We consider the problem of exact learning of parameters of a linear RNA energy model from secondary structure data. A necessary and sufficient condition for learnability of parameters is derived, which is based on computing the convex hull of union of translated Newton polytopes of input sequences. The set of learned energy parameters is characterized as the convex cone generated by the normal vectors to those facets of the resulting polytope that are incident to the origin. In practice, the sufficient condition may not be satisfied by the entire training data set; hence, computing a maximal subset of training data for which the sufficient condition is satisfied is often desired. We show that problem is NP-hard in general for an arbitrary dimensional feature space. Using a randomized greedy algorithm, we select a subset of RNA STRAND v2.0 database that satisfies the sufficient condition for separate A-U, C-G, G-U base pair counting model. The set of learned energy parameters includes experimentally measured energies of A-U, C-G, and G-U pairs; hence, our parameter set is in agreement with the Turner parameters.

preprint2013arXiv

On Time-optimal Trajectories for a Car-like Robot with One Trailer

In addition to the theoretical value of challenging optimal control problmes, recent progress in autonomous vehicles mandates further research in optimal motion planning for wheeled vehicles. Since current numerical optimal control techniques suffer from either the curse of dimens ionality, e.g. the Hamilton-Jacobi-Bellman equation, or the curse of complexity, e.g. pseudospectral optimal control and max-plus methods, analytical characterization of geodesics for wheeled vehicles becomes important not only from a theoretical point of view but also from a prac tical one. Such an analytical characterization provides a fast motion planning algorithm that can be used in robust feedback loops. In this work, we use the Pontryagin Maximum Principle to characterize extremal trajectories, i.e. candidate geodesics, for a car-like robot with one trailer. We use time as the distance function. In spite of partial progress, this problem has remained open in the past two decades. Besides straight motion and turn with maximum allowed curvature, we identify planar elastica as the third piece of motion that occurs along our extr emals. We give a detailed characterization of such curves, a special case of which, called \emph{merging curve}, connects maximum curvature turns to straight line segments. The structure of extremals in our case is revealed through analytical integration of the system and adjoint equations.

preprint2013arXiv

The RNA Newton Polytope and Learnability of Energy Parameters

Despite nearly two scores of research on RNA secondary structure and RNA-RNA interaction prediction, the accuracy of the state-of-the-art algorithms are still far from satisfactory. Researchers have proposed increasingly complex energy models and improved parameter estimation methods in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. In this paper, we introduce the notion of learnability of the parameters of an energy model as a measure of its inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. We demonstrated the application of our theory to a simple energy model consisting of a weighted count of A-U and C-G base pairs. Our results show that this simple energy model satisfies the necessary condition for less than one third of the input unpseudoknotted sequence-structure pairs chosen from the RNA STRAND v2.0 database. For another one third, the necessary condition is barely violated, which suggests that augmenting this simple energy model with more features such as the Turner loops may solve the problem. The necessary condition is severely violated for 8%, which provides a small set of hard cases that require further investigation.

preprint2010arXiv

Prediction of RNA-RNA interaction structure by centroids in the Boltzmann ensemble

New high-throughput sequencing technologies have made it possible to pursue the advent of genome-wide transcriptomics. That progress combined with the recent discovery of regulatory non-coding RNAs (ncRNAs) has necessitated fast and accurate algorithms to predict RNA-RNA interaction probability and structure. Although there are algorithms to predict minimum free energy interaction secondary structure for two nucleic acids, little work has been done to exploit the information invested in the base pair probabilities to improve interaction structure prediction. In this paper, we present an algorithm to predict the Hamming centroid of the Boltzmann ensemble of interaction structures. We also present an efficient algorithm to sample interaction structures from the ensemble. Our sampling algorithm uses a balanced scheme for traversing indices which improves the running time of the Ding-Lawrence sampling algorithm. The Ding-Lawrence sampling algorithm has $O(n^2m^2)$ time complexity whereas our algorithm has $O((n+m)^2\log(n+m))$ time complexity, in which $n$ and $m$ are the lengths of input strands. We implemented our algorithm in a new version of {\tt piRNA} and compared our structure prediction results with competitors. Our centroid prediction outperforms competitor minimum-free-energy prediction algorithms on average.

Hamidreza Chitsaz

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

BPPart and BPMax: RNA-RNA Interaction Partition Function and Structure Prediction for the Base Pair Counting Model

Proceedings of the 1st International Workshop on Robot Learning and Planning (RLP 2016)

NUROA: A Numerical Roadmap Algorithm

Proceedings of the 1st Workshop on Robotics Challenges and Vision (RCV2013)

An Efficient Algorithm for Upper Bound on the Partition Function of Nucleic Acids

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Distilled Single Cell Genome Sequencing and De Novo Assembly for Sparse Microbial Communities

Exact Learning of RNA Energy Parameters From Structure

On Time-optimal Trajectories for a Car-like Robot with One Trailer

The RNA Newton Polytope and Learnability of Energy Parameters

Prediction of RNA-RNA interaction structure by centroids in the Boltzmann ensemble