Researcher profile

Yan Fu

Yan Fu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Identifying critical nodes in complex networks by graph representation learning

Because of its wide application, critical nodes identification has become an important research topic at the micro level of network science. Influence maximization is one of the main problems in critical nodes mining and is usually handled with heuristics. In this paper, a deep graph learning framework IMGNN is proposed and the corresponding training sample generation scheme is designed. The framework takes centralities of nodes in a network as input and the probability that nodes in the optimal initial spreaders as output. By training on a large number of small synthetic networks, IMGNN is more efficient than human-based heuristics in minimizing the size of initial spreaders under the fixed infection scale. The experimental results on one synthetic and five real networks show that, compared with traditional non-iterative node ranking algorithms, IMGNN has the smallest proportion of initial spreaders under different infection probabilities when the final infection scale is fixed. And the reordered version of IMGNN outperforms all the latest critical nodes mining algorithms.

preprint2022arXiv

Local False Discovery Rate Estimation with Competition-Based Procedures for Variable Selection

Multiple hypothesis testing has been widely applied to problems dealing with high-dimensional data, e.g., selecting significant variables and controlling the selection error rate. The most prevailing measure of error rate used in the multiple hypothesis testing is the false discovery rate (FDR). In recent years, local false discovery rate (fdr) has drawn much attention, due to its advantage of accessing the confidence of individual hypothesis. However, most methods estimate fdr through p-values or statistics with known null distributions, which are sometimes not available or reliable. Adopting the innovative methodology of competition-based procedures, e.g., knockoff filter, this paper proposes a new approach, named TDfdr, to local false discovery rate estimation, which is free of the p-values or known null distributions. Simulation results demonstrate that TDfdr can accurately estimate the fdr with two competition-based procedures. In real data analysis, the power of TDfdr on variable selection is verified on two biological datasets.

preprint2022arXiv

Variational Model Inversion Attacks

Given the ubiquity of deep neural networks, it is important that these models do not reveal information about sensitive data that they have been trained on. In model inversion attacks, a malicious user attempts to recover the private dataset used to train a supervised neural network. A successful model inversion attack should generate realistic and diverse samples that accurately describe each of the classes in the private dataset. In this work, we provide a probabilistic interpretation of model inversion attacks, and formulate a variational objective that accounts for both diversity and accuracy. In order to optimize this variational objective, we choose a variational family defined in the code space of a deep generative model, trained on a public auxiliary dataset that shares some structural similarity with the target dataset. Empirically, our method substantially improves performance in terms of target attack accuracy, sample realism, and diversity on datasets of faces and chest X-ray images.

preprint2020arXiv

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages. However, the transfer quality for multilingual Machine Reading Comprehension (MRC) is significantly worse than sentence classification tasks mainly due to the requirement of MRC to detect the word level answer boundary. In this paper, we propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision: (1) A mixed MRC task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs; (2) A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web. Besides, extensive experiments on two cross-lingual MRC datasets show the effectiveness of our proposed approach.

preprint2020arXiv

The MUIR Framework: Cross-Linking MOOC Resources to Enhance Discussion Forums

New learning resources are created and minted in Massive Open Online Courses every week -- new videos, quizzes, assessments and discussion threads are deployed and interacted with -- in the era of on-demand online learning. However, these resources are often artificially siloed between platforms and artificial web application models. Facilitating the linking between such resources facilitates learning and multimodal understanding, bettering learners' experience. We create a framework for MOOC Uniform Identifier for Resources (MUIR). MUIR enables applications to refer and link to such resources in a cross-platform way, allowing the easy minting of identifiers to MOOC resources, akin to #hashtags. We demonstrate the feasibility of this approach to the automatic identification, linking and resolution -- a task known as Wikification -- of learning resources mentioned on MOOC discussion forums, from a harvested collection of 100K+ resources. Our Wikification system achieves a high initial rate of 54.6% successful resolutions on key resource mentions found in discussion forums, demonstrating the utility of the MUIR framework. Our analysis on this new problem shows that context is a key factor in determining the correct resolution of such mentions.