Researcher profile

Yi-Kuo Yu

Yi-Kuo Yu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
14works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2014arXiv

Mass spectrometry based protein identification with accurate statistical significance assignment

Motivation: Assigning statistical significance accurately has become increasingly important as meta data of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of meta data at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. Results: We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database $P$-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level $E$-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Soric formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. Availability: The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit

preprint2013arXiv

Transition from one-dimensional antiferromagnetism to three-dimensional antiferromagnetic order in single-crystalline CuSb$_{2}$O$_{6}$

Measurements of magnetic susceptibility, heat capacity and thermal expansion are reported for single crystalline CuSb$_{2}$O$_{6}$ in the temperature range $5<T<350$ K. The magnetic susceptibility exhibits a broad peak centered near 60 K that is typical of one-dimensional antiferromagnetic compounds. Long-range antiferromagnetic order at $T_N$ = 8.7 K is accompanied by an energy gap ($Δ$ = 17.48(6) K). This transition represents a crossover from one- to three-dimensional antiferromagnetic behavior. Both heat capacity and the thermal expansion coefficients exhibit distinct jumps at $T_N$, which are similar to those observed at the normal-superconducting phase transition in a superconductor. This behavior is quite unusual, and is presumably associated with a Spin-Peierls transition occurring as a result of three-dimensional phonons coupling with {\it Jordan-Wigner-transformed} Fermions.

preprint2012arXiv

CytoITMprobe: a network information flow plugin for Cytoscape

To provide the Cytoscape users the possibility of integrating ITM Probe into their workflows, we developed CytoITMprobe, a new Cytoscape plugin. CytoITMprobe maintains all the desirable features of ITM Probe and adds additional flexibility not achievable through its web service version. It provides access to ITM Probe either through a web server or locally. The input, consisting of a Cytoscape network, together with the desired origins and/or destinations of information and a dissipation coefficient, is specified through a query form. The results are shown as a subnetwork of significant nodes and several summary tables. Users can control the composition and appearance of the subnetwork and interchange their ITM Probe results with other software tools through tab-delimited files. The main strength of CytoITMprobe is its flexibility. It allows the user to specify as input any Cytoscape network, rather than being restricted to the pre-compiled protein-protein interaction networks available through the ITM Probe web service. Users may supply their own edge weights and directionalities. Consequently, as opposed to ITM Probe web service, CytoITMprobe can be applied to many other domains of network-based research beyond protein-networks. It also enables seamless integration of ITM Probe results with other Cytoscape plugins having complementary functionality for data analysis.

preprint2012arXiv

Information flow in interaction networks II: channels, path lengths and potentials

In our previous publication, a framework for information flow in interaction networks based on random walks with damping was formulated with two fundamental modes: emitting and absorbing. While many other network analysis methods based on random walks or equivalent notions have been developed before and after our earlier work, one can show that they can all be mapped to one of the two modes. In addition to these two fundamental modes, a major strength of our earlier formalism was its accommodation of context-specific directed information flow that yielded plausible and meaningful biological interpretation of protein functions and pathways. However, the directed flow from origins to destinations was induced via a potential function that was heuristic. Here, with a theoretically sound approach called the channel mode, we extend our earlier work for directed information flow. This is achieved by constructing a potential function facilitating a purely probabilistic interpretation of the channel mode. For each network node, the channel mode combines the solutions of emitting and absorbing modes in the same context, producing what we call a channel tensor. The entries of the channel tensor at each node can be interpreted as the amount of flow passing through that node from an origin to a destination. Similarly to our earlier model, the channel mode encompasses damping as a free parameter that controls the locality of information flow. Through examples involving the yeast pheromone response pathway, we illustrate the versatility and stability of our new framework.

preprint2011arXiv

CytoSaddleSum: a functional enrichment analysis plugin for Cytoscape based on sum-of-weights scores

Summary: CytoSaddleSum provides Cytoscape users with access to the functionality of SaddleSum, a functional enrichment tool based on sum-of-weight scores. It operates by querying SaddleSum locally (using the standalone version) or remotely (through an HTTP request to a web server). The functional enrichment results are shown as a term relationship network, where nodes represent terms and edges show term relationships. Furthermore, query results are written as Cytoscape attributes allowing easy saving, retrieval and integration into network-based data analysis workflows. Availability: www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads The source code is placed in Public Domain.

preprint2011arXiv

Information Flow in Interaction Networks

Interaction networks, consisting of agents linked by their interactions, are ubiquitous across many disciplines of modern science. Many methods of analysis of interaction networks have been proposed, mainly concentrating on node degree distribution or aiming to discover clusters of agents that are very strongly connected between themselves. These methods are principally based on graph-theory or machine learning. We present a mathematically simple formalism for modelling context-specific information propagation in interaction networks based on random walks. The context is provided by selection of sources and destinations of information and by use of potential functions that direct the flow towards the destinations. We also use the concept of dissipation to model the aging of information as it diffuses from its source. Using examples from yeast protein-protein interaction networks and some of the histone acetyltransferases involved in control of transcription, we demonstrate the utility of the concepts and the mathematical constructs introduced in this paper.

preprint2011arXiv

ppiTrim: Constructing non-redundant and up-to-date interactomes

Robust advances in interactome analysis demand comprehensive, non-redundant and consistently annotated datasets. By non-redundant, we mean that the accounting of evidence for every interaction should be faithful: each independent experimental support is counted exactly once, no more, no less. While many interactions are shared among public repositories, none of them contains the complete known interactome for any model organism. In addition, the annotations of the same experimental result by different repositories often disagree. This brings up the issue of which annotation to keep while consolidating evidences that are the same. The iRefIndex database, including interactions from most popular repositories with a standardized protein nomenclature, represents a significant advance in all aspects, especially in comprehensiveness. However, iRefIndex aims to maintain all information/annotation from original sources and requires users to perform additional processing to fully achieve the aforementioned goals. To address issues with iRefIndex and to achieve our goals, we have developed ppiTrim, a script that processes iRefIndex to produce non-redundant, consistently annotated datasets of physical interactions. Our script proceeds in three stages: mapping all interactants to gene identifiers and removing all undesired raw interactions, deflating potentially expanded complexes, and reconciling for each interaction the annotation labels among different source databases. As an illustration, we have processed the three largest organismal datasets: yeast, human and fruitfly. While ppiTrim can resolve most apparent conflicts between different labelings, we also discovered some unresolvable disagreements mostly resulting from different annotation policies among repositories. URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/ppiTrim.html

preprint2010arXiv

Combining independent, arbitrarily weighted P-values: a new solution to an old problem using a novel expansion with controllable accuracy

Good&#39;s formula and Fisher&#39;s method are frequently used for combining independent P-values. Interestingly, the equivalent of Good&#39;s formula already emerged in 1910 and mathematical expressions relevant to even more general situations have been repeatedly derived, albeit in different context. We provide here a novel derivation and show how the analytic formula obtained reduces to the two aforementioned ones as special cases. The main novelty of this paper, however, is the explicit treatment of nearly degenerate weights, which are known to cause numerical instabilities. We derive a controlled expansion, in powers of differences in inverse weights, that provides both accurate statistics and stable numerics.

preprint2010arXiv

Derivation of the Density Functional via Effective Action

A rigorous derivation of the density functional in the Hohenberg-Kohn theory is presented. With no assumption regarding the magnitude of the electric coupling constant $e^2$ (or correlation), this work provides a firm basis for first-principles calculations. Using the auxiliary field method, in which $e^2$ need not be small, we show that the bosonic loop expansion of the exchange-correlation functional can be reorganized so as to be expressed entirely in terms of the Kohn-Sham single-particle orbitals and energies. The excitations of the many-particle system can be obtained within the same formalism. We also explicitly demonstrate at zero-temperature the single-particle limit, the weak-coupling limit of the energy functional, and its application to homogeneous electron gas.

preprint2010arXiv

RAId_aPS: MS/MS analysis with multiple scoring functions and spectrum-specific statistics

Statistically meaningful comparison/combination of peptide identification results from various search methods is impeded by the lack of a universal statistical standard. Providing an E-value calibration protocol, we demonstrated earlier the feasibility of translating either the score or heuristic E-value reported by any method into the textbook-defined E-value, which may serve as the universal statistical standard. This protocol, although robust, may lose spectrum-specific statistics and might require a new calibration when changes in experimental setup occur. To mitigate these issues, we developed a new MS/MS search tool, RAId_aPS, that is able to provide spectrum-specific E-values for additive scoring functions. Given a selection of scoring functions out of RAId score, K-score, Hyperscore and XCorr, RAId_aPS generates the corresponding score histograms of all possible peptides using dynamic programming. Using these score histograms to assign E-values enables a calibration-free protocol for accurate significance assignment for each scoring function. RAId_aPS features four different modes: (i) compute the total number of possible peptides for a given molecular mass range, (ii) generate the score histogram given a MS/MS spectrum and a scoring function, (iii) reassign E-values for a list of candidate peptides given a MS/MS spectrum and the scoring functions chosen, and (iv) perform database searches using selected scoring functions. In modes (iii) and (iv), RAId_aPS is also capable of combining results from different scoring functions using spectrum-specific statistics. The web link is http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid_aps/index.html. Relevant binaries for Linux, Windows, and Mac OS X are available from the same page.

preprint2010arXiv

Robust and accurate data enrichment statistics via distribution function of sum of weights

Term enrichment analysis facilitates biological interpretation by assigning to experimentally/computationally obtained data annotation associated with terms from controlled vocabularies. This process usually involves obtaining statistical significance for each vocabulary term and using the most significant terms to describe a given set of biological entities, often associated with weights. Many existing enrichment methods require selections of (arbitrary number of) the most significant entities and/or do not account for weights of entities. Others either mandate extensive simulations to obtain statistics or assume normal weight distribution. In addition, most methods have difficulty assigning correct statistical significance to terms with few entities. Implementing the well-known Lugananni-Rice formula, we have developed a novel approach, called SaddleSum, that is free from all the aforementioned constraints and evaluated it against several existing methods. With entity weights properly taken into account, SaddleSum is internally consistent and stable with respect to the choice of number of most significant entities selected. Making few assumptions on the input data, the proposed method is universal and can thus be applied to areas beyond analysis of microarrays. Employing asymptotic approximation, SaddleSum provides a term-size dependent score distribution function that gives rise to accurate statistical significance even for terms with few entities. As a consequence, SaddleSum enables researchers to place confidence in its significance assignments to small terms that are often biologically most specific.

preprint2009arXiv

A simple electrostatic model applicable to biomolecular recognition

An exact, analytic solution for a simple electrostatic model applicable to biomolecular recognition is presented. In the model, a layer of high dielectric constant material (representative of the solvent, water) whose thickness may vary separates two regions of low dielectric constant material (representative of proteins, DNA, RNA, or similar materials), in each of which is embedded a point charge. For identical charges, the presence of the screening layer always lowers the energy compared to the case of point charges in an infinite medium of low dielectric constant. Somewhat surprisingly, the presence of a sufficiently thick screening layer also lowers the energy compared to the case of point charges in an infinite medium of high dielectric constant. For charges of opposite sign, the screening layer always lowers the energy compared to the case of point charges in an infinite medium of either high or low dielectric constant. The behavior of the energy leads to a substantially increased repulsive force between charges of the same sign. The repulsive force between charges of opposite signs is weaker than in an infinite medium of low dielectric constant material but stronger than in an infinite medium of high dielectric constant material. The presence of this behavior, which we name asymmetric screening, in the simple system presented here confirms the generality of the behavior that was established in a more complicated system of an arbitrary number of charged dielectric spheres in an infinite solvent.

preprint2009arXiv

ITM Probe: analyzing information flow in protein networks

Summary: Founded upon diffusion with damping, ITM Probe is an application for modeling information flow in protein interaction networks without prior restriction to the sub-network of interest. Given a context consisting of desired origins and destinations of information, ITM Probe returns the set of most relevant proteins with weights and a graphical representation of the corresponding sub-network. With a click, the user may send the resulting protein list for enrichment analysis to facilitate hypothesis formation or confirmation. Availability: ITM Probe web service and documentation can be found at www.ncbi.nlm.nih.gov/CBBresearch/qmbp/mn/itm_probe

preprint2001arXiv

On the Anti-Wishart distribution

We provide the probability distribution function of matrix elements each of which is the inner product of two vectors. The vectors we are considering here are independently distributed but not necessarily Gaussian variables. When the number of components M of each vector is greater than the number of vectors N, one has a $N\times N$ symmetric matrix. When $M\ge N$ and the components of each vector are independent Gaussian variables, the distribution function of the $N(N+1)/2$ matrix elements was obtained by Wishart in 1928. When N > M, what we called the ``Anti-Wishart&#39;&#39; case, the matrix elements are no longer completely independent because the true degrees of freedom becomes smaller than the number of matrix elements. Due to this singular nature, analytical derivation of the probability distribution function is much more involved than the corresponding Wishart case. For a class of general random vectors, we obtain the analytical distribution function in a closed form, which is a product of various factors and delta function constraints, composed of various determinants. The distribution function of the matrix element for the $M\ge N$ case with the same class of random vectors is also obtained as a by-product. Our result is closely related to and should be valuable for the study of random magnet problem and information redundancy problem.