Source author record

Yan Fu

Yan Fu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks physics.soc-ph Cryptography and Security Information Retrieval Machine Learning Applications Artificial Intelligence Cell Behavior Computation and Language cond-mat.stat-mech Databases Digital Libraries Distributed, Parallel, and Cluster Computing math.ST Methodology Molecular Networks Quantitative Methods Statistics Theory

Catalog footprint

What is connected

17works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Identifying critical nodes in complex networks by graph representation learning

Because of its wide application, critical nodes identification has become an important research topic at the micro level of network science. Influence maximization is one of the main problems in critical nodes mining and is usually handled with heuristics. In this paper, a deep graph learning framework IMGNN is proposed and the corresponding training sample generation scheme is designed. The framework takes centralities of nodes in a network as input and the probability that nodes in the optimal initial spreaders as output. By training on a large number of small synthetic networks, IMGNN is more efficient than human-based heuristics in minimizing the size of initial spreaders under the fixed infection scale. The experimental results on one synthetic and five real networks show that, compared with traditional non-iterative node ranking algorithms, IMGNN has the smallest proportion of initial spreaders under different infection probabilities when the final infection scale is fixed. And the reordered version of IMGNN outperforms all the latest critical nodes mining algorithms.

preprint2022arXiv

Local False Discovery Rate Estimation with Competition-Based Procedures for Variable Selection

Multiple hypothesis testing has been widely applied to problems dealing with high-dimensional data, e.g., selecting significant variables and controlling the selection error rate. The most prevailing measure of error rate used in the multiple hypothesis testing is the false discovery rate (FDR). In recent years, local false discovery rate (fdr) has drawn much attention, due to its advantage of accessing the confidence of individual hypothesis. However, most methods estimate fdr through p-values or statistics with known null distributions, which are sometimes not available or reliable. Adopting the innovative methodology of competition-based procedures, e.g., knockoff filter, this paper proposes a new approach, named TDfdr, to local false discovery rate estimation, which is free of the p-values or known null distributions. Simulation results demonstrate that TDfdr can accurately estimate the fdr with two competition-based procedures. In real data analysis, the power of TDfdr on variable selection is verified on two biological datasets.

preprint2022arXiv

Variational Model Inversion Attacks

Given the ubiquity of deep neural networks, it is important that these models do not reveal information about sensitive data that they have been trained on. In model inversion attacks, a malicious user attempts to recover the private dataset used to train a supervised neural network. A successful model inversion attack should generate realistic and diverse samples that accurately describe each of the classes in the private dataset. In this work, we provide a probabilistic interpretation of model inversion attacks, and formulate a variational objective that accounts for both diversity and accuracy. In order to optimize this variational objective, we choose a variational family defined in the code space of a deep generative model, trained on a public auxiliary dataset that shares some structural similarity with the target dataset. Empirically, our method substantially improves performance in terms of target attack accuracy, sample realism, and diversity on datasets of faces and chest X-ray images.

preprint2020arXiv

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages. However, the transfer quality for multilingual Machine Reading Comprehension (MRC) is significantly worse than sentence classification tasks mainly due to the requirement of MRC to detect the word level answer boundary. In this paper, we propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision: (1) A mixed MRC task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs; (2) A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web. Besides, extensive experiments on two cross-lingual MRC datasets show the effectiveness of our proposed approach.

preprint2020arXiv

The MUIR Framework: Cross-Linking MOOC Resources to Enhance Discussion Forums

New learning resources are created and minted in Massive Open Online Courses every week -- new videos, quizzes, assessments and discussion threads are deployed and interacted with -- in the era of on-demand online learning. However, these resources are often artificially siloed between platforms and artificial web application models. Facilitating the linking between such resources facilitates learning and multimodal understanding, bettering learners' experience. We create a framework for MOOC Uniform Identifier for Resources (MUIR). MUIR enables applications to refer and link to such resources in a cross-platform way, allowing the easy minting of identifiers to MOOC resources, akin to #hashtags. We demonstrate the feasibility of this approach to the automatic identification, linking and resolution -- a task known as Wikification -- of learning resources mentioned on MOOC discussion forums, from a harvested collection of 100K+ resources. Our Wikification system achieves a high initial rate of 54.6% successful resolutions on key resource mentions found in discussion forums, demonstrating the utility of the MUIR framework. Our analysis on this new problem shows that context is a key factor in determining the correct resolution of such mentions.

preprint2015arXiv

A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics

Motivation: Target-decoy search (TDS) is currently the most popular strategy for estimating and controlling the false discovery rate (FDR) of peptide identifications in mass spectrometry-based shotgun proteomics. While this strategy is very useful in practice and has been intensively studied empirically, its theoretical foundation has not yet been well established. Result: In this work, we systematically analyze the TDS strategy in a rigorous statistical sense. We prove that the commonly used concatenated TDS provides a conservative estimate of the FDR for any given score threshold, but it cannot rigorously control the FDR. We prove that with a slight modification to the commonly used formula for FDR estimation, the peptide-level FDR can be rigorously controlled based on the concatenated TDS. We show that the spectrum-level FDR control is difficult. We verify the theoretical conclusions with real mass spectrometry data.

preprint2014arXiv

Efficient allocation of heterogeneous response times in information spreading process

Recently, the impacts of spatiotemporal heterogeneities of human activities on spreading dynamics have attracted extensive attention. In this paper, to study heterogeneous response times on information spreading, we focus on the susceptible-infected spreading dynamics with adjustable power-law response time distribution based on uncorrelated scale-free networks. We find that the stronger the heterogeneity of response times is, the faster the information spreading is in the early and middle stages. Following a given heterogeneity, the procedure of reducing the correlation between the response times and degrees of individuals can also accelerate the spreading dynamics in the early and middle stages. However, the dynamics in the late stage is slightly more complicated, and there is an optimal value of the full prevalence time changing with the heterogeneity of response times and the response time-degree correlation, respectively. The optimal phenomena results from the efficient allocation of heterogeneous response times.

preprint2014arXiv

Information Filtering on Coupled Social Networks

In this paper, based on the coupled social networks (CSN), we propose a hybrid algorithm to nonlinearly integrate both social and behavior information of online users. Filtering algorithm based on the coupled social networks, which considers the effects of both social influence and personalized preference. Experimental results on two real datasets, \emph{Epinions} and \emph{Friendfeed}, show that hybrid pattern can not only provide more accurate recommendations, but also can enlarge the recommendation coverage while adopting global metric. Further empirical analyses demonstrate that the mutual reinforcement and rich-club phenomenon can also be found in coupled social networks where the identical individuals occupy the core position of the online system. This work may shed some light on the in-depth understanding structure and function of coupled social networks.

preprint2014arXiv

Information Filtering via Balanced Diffusion on Bipartite Networks

Recent decade has witnessed the increasing popularity of recommender systems, which help users acquire relevant commodities and services from overwhelming resources on Internet. Some simple physical diffusion processes have been used to design effective recommendation algorithms for user-object bipartite networks, typically mass diffusion (MD) and heat conduction (HC) algorithms which have different advantages respectively on accuracy and diversity. In this paper, we investigate the effect of weight assignment in the hybrid of MD and HC, and find that a new hybrid algorithm of MD and HC with balanced weights will achieve the optimal recommendation results, we name it balanced diffusion (BD) algorithm. Numerical experiments on three benchmark data sets, MovieLens, Netflix and RateYourMusic (RYM), show that the performance of BD algorithm outperforms the existing diffusion-based methods on the three important recommendation metrics, accuracy, diversity and novelty. Specifically, it can not only provide accurately recommendation results, but also yield higher diversity and novelty in recommendations by accurately recommending unpopular objects.

preprint2014arXiv

Local degree blocking model for link prediction in complex networks

Recovering and reconstructing networks by accurately identifying missing and unreliable links is a vital task in the domain of network analysis and mining. In this article, by studying a specific local structure, namely a degree block having a node and its all immediate neighbors, we find it contains important statistical features of link formation for complex networks. We therefore propose a parameter-free local blocking (LB) predictor to quantitatively detect link formation in given networks via local link density calculations. The promising experimental results performed on six real-world networks suggest that the new index can outperform other traditional local similarity-based methods on most of tested networks. After further analyzing the scores' correlations between LB and two other methods, we find that the features of LB index are analogous to those of both PA index and short-path-based index, which empirically verify that large degree principle and short path principle simultaneously captured by the LB index are jointly driving link formation in complex networks.

preprint2013arXiv

Discovery of Proteomics based on Machine learning

The ultimate target of proteomics identification is to identify and quantify the protein in the organism. Mass spectrometry (MS) based on label-free protein quantitation has mainly focused on analysis of peptide spectral counts and ion peak heights. Using several observed peptides (proteotypic) can identify the origin protein. However, each peptide's possibility to be detected was severely influenced by the peptide physicochemical properties, which confounded the results of MS accounting. Using about a million peptide identification generated by four different kinds of proteomic platforms, we successfully identified >16,000 proteotypic peptides. We used machine learning classification to derive peptide detection probabilities that are used to predict the number of trypic peptides to be observed, which can serve to estimate the absolutely abundance of protein with highly accuracy. We used the data of peptides (provides by CAS lab) to derive the best model from different kinds of methods. We first employed SVM and Random Forest classifier to identify the proteotypic and unobserved peptides, and then searched the best parameter for better prediction results. Considering the excellent performance of our model, we can calculate the absolutely estimation of protein abundance.

preprint2013arXiv

Privacy Preserving Social Network Publication Against Mutual Friend Attacks

Publishing social network data for research purposes has raised serious concerns for individual privacy. There exist many privacy-preserving works that can deal with different attack models. In this paper, we introduce a novel privacy attack model and refer it as a mutual friend attack. In this model, the adversary can re-identify a pair of friends by using their number of mutual friends. To address this issue, we propose a new anonymity concept, called k-NMF anonymity, i.e., k-anonymity on the number of mutual friends, which ensures that there exist at least k-1 other friend pairs in the graph that share the same number of mutual friends. We devise algorithms to achieve the k-NMF anonymity while preserving the original vertex set in the sense that we allow the occasional addition but no deletion of vertices. Further we give an algorithm to ensure the k-degree anonymity in addition to the k-NMF anonymity. The experimental results on real-word datasets demonstrate that our approach can preserve the privacy and utility of social networks effectively against mutual friend attacks.

preprint2013arXiv

Scaling behavior of online human activity

The rapid development of Internet technology enables human explore the web and record the traces of online activities. From the analysis of these large-scale data sets (i.e. traces), we can get insights about dynamic behavior of human activity. In this letter, the scaling behavior and complexity of human activity in the e-commerce, such as music, book, and movie rating, are comprehensively investigated by using detrended fluctuation analysis technique and multiscale entropy method. Firstly, the interevent time series of rating behaviors of these three type medias show the similar scaling property with exponents ranging from 0.53 to 0.58, which implies that the collective behaviors of rating media follow a process embodying self-similarity and long-range correlation. Meanwhile, by dividing the users into three groups based their activities (i.e., rating per unit time), we find that the scaling exponents of interevent time series in three groups are different. Hence, these results suggest the stronger long-range correlations exist in these collective behaviors. Furthermore, their information complexities vary from three groups. To explain the differences of the collective behaviors restricted to three groups, we study the dynamic behavior of human activity at individual level, and find that the dynamic behaviors of a few users have extremely small scaling exponents associating with long-range anticorrelations. By comparing with the interevent time distributions of four representative users, we can find that the bimodal distributions may bring the extraordinary scaling behaviors. These results of analyzing the online human activity in the e-commerce may not only provide insights to understand its dynamic behaviors but also be applied to acquire the potential economic interest.

preprint2012arXiv

Emergence of scale-free close-knit friendship structure in online social networks

Despite the structural properties of online social networks have attracted much attention, the properties of the close-knit friendship structures remain an important question. Here, we mainly focus on how these mesoscale structures are affected by the local and global structural properties. Analyzing the data of four large-scale online social networks reveals several common structural properties. It is found that not only the local structures given by the indegree, outdegree, and reciprocal degree distributions follow a similar scaling behavior, the mesoscale structures represented by the distributions of close-knit friendship structures also exhibit a similar scaling law. The degree correlation is very weak over a wide range of the degrees. We propose a simple directed network model that captures the observed properties. The model incorporates two mechanisms: reciprocation and preferential attachment. Through rate equation analysis of our model, the local-scale and mesoscale structural properties are derived. In the local-scale, the same scaling behavior of indegree and outdegree distributions stems from indegree and outdegree of nodes both growing as the same function of the introduction time, and the reciprocal degree distribution also shows the same power-law due to the linear relationship between the reciprocal degree and in/outdegree of nodes. In the mesoscale, the distributions of four closed triples representing close-knit friendship structures are found to exhibit identical power-laws, a behavior attributed to the negligible degree correlations. Intriguingly, all the power-law exponents of the distributions in the local-scale and mesoscale depend only on one global parameter -- the mean in/outdegree, while both the mean in/outdegree and the reciprocity together determine the ratio of the reciprocal degree of a node to its in/outdegree.

preprint2012arXiv

Slow dynamics of Zero Range Process in the Framework of Traps Model

The relaxation dynamics of zero range process (ZRP) has always been an interesting problem. In this study, we set up the relationship between ZRP and traps model, and investigate the slow dynamics of ZRP in the framework of traps model. Through statistical quantities such as the average rest time, the particle distribution, the two-time correlation function and the average escape time, we find that the particle interaction, especially the resulted condensation, can significantly influence the dynamics. In the stationary state, both the average rest time and the average escape time caused by the attraction among particles are obtained analytically. In the transient state, a hierarchical nature of the aging dynamics is revealed by both simulations and scaling analysis. Moreover, by comparing the particle diffusion in both the transient state and the stationary state, we find that the closer ZRP systems approach the stationary state, the more slowly particles diffuse.

preprint2011arXiv

Hamiltonian Connectivity of Twisted Hypercube-Like Networks under the Large Fault Model

Twisted hypercube-like networks (THLNs) are an important class of interconnection networks for parallel computing systems, which include most popular variants of the hypercubes, such as crossed cubes, Möbius cubes, twisted cubes and locally twisted cubes. This paper deals with the fault-tolerant hamiltonian connectivity of THLNs under the large fault model. Let $G$ be an $n$-dimensional THLN and $F \subseteq V(G)\bigcup E(G)$, where $n \geq 7$ and $|F| \leq 2n - 10$. We prove that for any two nodes $u,v \in V(G - F)$ satisfying a simple necessary condition on neighbors of $u$ and $v$, there exists a hamiltonian or near-hamiltonian path between $u$ and $v$ in $G-F$. The result extends further the fault-tolerant graph embedding capability of THLNs.

preprint2010arXiv

Resonant activation: a strategy against bacterial persistence

A bacterial colony may develop a small number of cells genetically identical to, but phenotypically different from other normally growing bacteria. These so-called persister cells keep themselves in a dormant state and thus are insensitive to antibiotic treatment, resulting in serious problems of drug resistance. In this paper, we proposed a novel strategy to "kill" persister cells by triggering them to switch, in a fast and synchronized way, into normally growing cells that are susceptible to antibiotics. The strategy is based on resonant activation (RA), a well-studied phenomenon in physics where the internal noise of a system can constructively facilitate fast and synchronized barrier crossings. Through stochastic Gilliespie simulation with a generic toggle switch model, we demonstrated that RA exists in the phenotypic switching of a single bacterium. Further, by coupling single cell level and population level simulations, we showed that with RA, one can greatly reduce the time and total amount of antibiotics needed to sterilize a bacterial population. We suggest that resonant activation is a general phenomenon in phenotypic transition, and can find other applications such as cancer therapy.

Yan Fu

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Identifying critical nodes in complex networks by graph representation learning

Local False Discovery Rate Estimation with Competition-Based Procedures for Variable Selection

Variational Model Inversion Attacks

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

The MUIR Framework: Cross-Linking MOOC Resources to Enhance Discussion Forums

A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics

Efficient allocation of heterogeneous response times in information spreading process

Information Filtering on Coupled Social Networks

Information Filtering via Balanced Diffusion on Bipartite Networks

Local degree blocking model for link prediction in complex networks

Discovery of Proteomics based on Machine learning

Privacy Preserving Social Network Publication Against Mutual Friend Attacks

Scaling behavior of online human activity

Emergence of scale-free close-knit friendship structure in online social networks

Slow dynamics of Zero Range Process in the Framework of Traps Model

Hamiltonian Connectivity of Twisted Hypercube-Like Networks under the Large Fault Model

Resonant activation: a strategy against bacterial persistence