ThreshKnot: Thresholded ProbKnot for Improved RNA Secondary Structure Prediction
RNA structure prediction is a challenging problem, especially with pseudoknots. Recently, there has been a shift from the classical minimum free energy-based methods (MFE) to partition function-based ones that assemble structures using base-pairing probabilities. Two examples of the latter group are the popular maximum expected accuracy (MEA) method and the ProbKnot method. ProbKnot is a fast heuristic that pairs nucleotides that are reciprocally most probable pairing partners, and unlike MEA, can also predict structures with pseudoknots. However, ProbKnot's full potential has been largely overlooked. In particular, when introduced, it did not have an MEA-like hyperparameter that can balance between positive predictive value (PPV) and sensitivity. We show that a simple thresholded version of ProbKnot, which we call ThreshKnot, leads to more accurate overall predictions by filtering out unlikely pairs whose probabilities fall under a given threshold. We also show that on three widely-used folding engines (RNAstructure, Vienna RNAfold, and CONTRAfold), ThreshKnot always outperforms the much more involved MEA algorithm in (1) its higher structure prediction accuracy, (2) its capability to predict pseudoknots, and (3) its faster runtime and easier implementation. This suggests that ThreshKnot should replace MEA as the default partition function-based structure prediction algorithm. ThreshKnot is already available in the widely used RNAstructure software package version 6.2 (released November 27, 2019): https://rna.urmc.rochester.edu/RNAstructure.html