Source author record

Gelio Alves

Gelio Alves appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Quantitative Methods math.ST Statistics Theory

Catalog footprint

What is connected

3works

3topics

2close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

Mass spectrometry based protein identification with accurate statistical significance assignment

Motivation: Assigning statistical significance accurately has become increasingly important as meta data of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of meta data at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. Results: We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database $P$-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level $E$-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Soric formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. Availability: The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit

preprint2010arXiv

Combining independent, arbitrarily weighted P-values: a new solution to an old problem using a novel expansion with controllable accuracy

Good's formula and Fisher's method are frequently used for combining independent P-values. Interestingly, the equivalent of Good's formula already emerged in 1910 and mathematical expressions relevant to even more general situations have been repeatedly derived, albeit in different context. We provide here a novel derivation and show how the analytic formula obtained reduces to the two aforementioned ones as special cases. The main novelty of this paper, however, is the explicit treatment of nearly degenerate weights, which are known to cause numerical instabilities. We derive a controlled expansion, in powers of differences in inverse weights, that provides both accurate statistics and stable numerics.

preprint2010arXiv

RAId_aPS: MS/MS analysis with multiple scoring functions and spectrum-specific statistics

Statistically meaningful comparison/combination of peptide identification results from various search methods is impeded by the lack of a universal statistical standard. Providing an E-value calibration protocol, we demonstrated earlier the feasibility of translating either the score or heuristic E-value reported by any method into the textbook-defined E-value, which may serve as the universal statistical standard. This protocol, although robust, may lose spectrum-specific statistics and might require a new calibration when changes in experimental setup occur. To mitigate these issues, we developed a new MS/MS search tool, RAId_aPS, that is able to provide spectrum-specific E-values for additive scoring functions. Given a selection of scoring functions out of RAId score, K-score, Hyperscore and XCorr, RAId_aPS generates the corresponding score histograms of all possible peptides using dynamic programming. Using these score histograms to assign E-values enables a calibration-free protocol for accurate significance assignment for each scoring function. RAId_aPS features four different modes: (i) compute the total number of possible peptides for a given molecular mass range, (ii) generate the score histogram given a MS/MS spectrum and a scoring function, (iii) reassign E-values for a list of candidate peptides given a MS/MS spectrum and the scoring functions chosen, and (iv) perform database searches using selected scoring functions. In modes (iii) and (iv), RAId_aPS is also capable of combining results from different scoring functions using spectrum-specific statistics. The web link is http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid_aps/index.html. Relevant binaries for Linux, Windows, and Mac OS X are available from the same page.

Gelio Alves

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Mass spectrometry based protein identification with accurate statistical significance assignment

Combining independent, arbitrarily weighted P-values: a new solution to an old problem using a novel expansion with controllable accuracy

RAId_aPS: MS/MS analysis with multiple scoring functions and spectrum-specific statistics