Researcher profile

Weichuan Yu

Weichuan Yu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2012arXiv

A Combinatorial Perspective of the Protein Inference Problem

In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from the results of peptide identification. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we are devoted to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (Protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain a closed-form formulation for protein inference. Based on our model, we study the impact of unique peptides and degenerate peptides on protein probabilities. Here, degenerate peptides are peptides shared by at least two proteins. Meanwhile, we also study the relationship of our model with other methods such as ProteinProphet. A probability confidence interval can be calculated and used together with probability to filter the protein identification result. Our method achieves competitive results with ProteinProphet in a more efficient manner in the experiment based on two datasets of standard protein mixtures and two datasets of real samples. We name our program ProteinInfer. Its Java source code is available at http://bioinformatics.ust.hk/proteininfer

preprint2012arXiv

Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation

Object detection is a fundamental step for automated video analysis in many vision applications. Object detection in a video is usually performed by object detectors or background subtraction techniques. Often, an object detector requires manually labeled examples to train a binary classifier, while background subtraction needs a training sequence that contains no objects to build a background model. To automate the analysis, object detection without a separate training phase becomes a critical task. People have tried to tackle this task by using motion information. But existing motion-based methods are usually limited when coping with complex scenarios such as nonrigid motion and dynamic background. In this paper, we show that above challenges can be addressed in a unified framework named DEtecting Contiguous Outliers in the LOw-rank Representation (DECOLOR). This formulation integrates object detection and background learning into a single process of optimization, which can be solved by an alternating algorithm efficiently. We explain the relations between DECOLOR and other sparsity-based methods. Experiments on both simulated data and real sequences demonstrate that DECOLOR outperforms the state-of-the-art approaches and it can work effectively on a wide range of complex scenarios.

preprint2012arXiv

Running PeptideProphet Separately on Replicates Improves Peptide Identification Results

Limited spectrum coverage is a problem in shotgun proteomics. Replicates are generated to improve the spectrum coverage. When integrating peptide identification results obtained from replicates, the state-of-the-art algorithm PeptideProphet combines Peptide-Spectrum Matches (PSMs) before building the statistical model to calculate peptide probabilities. In this paper, we find the connection between merging results of replicates and Bagging, which is a standard routine to improve the power of statistical methods. Following Bagging's philosophy, we propose to run PeptideProphet separately on each replicate and combine the outputs to obtain the final peptide probabilities. In our experiments, we show that the proposed routine can improve PeptideProphet consistently on a standard protein dataset, a Human dataset and a Yeast dataset.

preprint2010arXiv

BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies

Gene-gene interactions have long been recognized to be fundamentally important to understand genetic causes of complex disease traits. At present, identifying gene-gene interactions from genome-wide case-control studies is computationally and methodologically challenging. In this paper, we introduce a simple but powerful method, named `BOolean Operation based Screening and Testing'(BOOST). To discover unknown gene-gene interactions that underlie complex diseases, BOOST allows examining all pairwise interactions in genome-wide case-control studies in a remarkably fast manner. We have carried out interaction analyses on seven data sets from the Wellcome Trust Case Control Consortium (WTCCC). Each analysis took less than 60 hours on a standard 3.0 GHz desktop with 4G memory running Windows XP system. The interaction patterns identified from the type 1 diabetes data set display significant difference from those identified from the rheumatoid arthritis data set, while both data sets share a very similar hit region in the WTCCC report. BOOST has also identified many undiscovered interactions between genes in the major histocompatibility complex (MHC) region in the type 1 diabetes data set. In the coming era of large-scale interaction mapping in genome-wide case-control studies, our method can serve as a computationally and statistically useful tool.

preprint2010arXiv

Stable Feature Selection for Biomarker Discovery

Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development.