Researcher profile

André P. L. F. de Carvalho

André P. L. F. de Carvalho contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 11 - UnverifiedVerification L1Unclaimed author
1works
0followers
1topics
2close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

1 published item(s)

preprint2014arXiv

The discriminant power of RNA features for pre-miRNA recognition

Computational discovery of microRNAs (miRNA) is based on pre-determined sets of features from miRNA precursors (pre-miRNA). These feature sets used by current tools for pre-miRNA recognition differ in construction and dimension. Some feature sets are composed of sequence-structure patterns commonly found in pre-miRNAs, while others are a combination of more sophisticated RNA features. Current tools achieve similar predictive performance even though the feature sets used - and their computational cost - differ widely. In this work, we analyze the discriminant power of seven feature sets, which are used in six pre-miRNA prediction tools. The analysis is based on the classification performance achieved with these feature sets for the training algorithms used in these tools. We also evaluate feature discrimination through the F-score and feature importance in the induction of random forests. More diverse feature sets produce classifiers with significantly higher classification performance compared to feature sets composed only of sequence-structure patterns. However, small or non-significant differences were found among the estimated classification performances of classifiers induced using sets with diversification of features, despite the wide differences in their dimension. Based on these results, we applied a feature selection method to reduce the computational cost of computing the feature set, while maintaining discriminant power. We obtained a lower-dimensional feature set, which achieved a sensitivity of 90% and a specificity of 95%. Our feature set achieves a sensitivity and specificity within 0.1% of the maximal values obtained with any feature set while it is 34x faster to compute. Even compared to another feature set, which is the computationally least expensive feature set of those from the literature which perform within 0.1% of the maximal values, it is 34x faster to compute.