Researcher profile

Ying Fan

Ying Fan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2022arXiv

Academic mentees succeed in big groups, but thrive in small groups

Mentoring is a key component of scientific achievements, contributing to overall measures of career success for mentees and mentors. A common success metric in the scientific enterprise is acquiring a large research group, which is believed to indicate excellent mentorship and high-quality research. However, large, competitive groups might also amplify dropout rates, which are high especially among early career researchers. Here, we collect longitudinal genealogical data on mentor-mentee relations and their publication, and study the effects of a mentor's group on future academic survival and performance of their mentees. We find that mentees trained in large groups generally have better academic performance than mentees from small groups, if they continue working in academia after graduation. However, we also find two surprising results: Academic survival rate is significantly lower for (1) mentees from larger groups, and for (2) mentees with more productive mentors. These findings reveal that success of mentors has a negative effect on the academic survival rate of mentees, raising important questions about the definition of successful mentorship and providing actionable suggestions concerning career development.

preprint2022arXiv

Impactful scientists have higher tendency to involve collaborators in new topics

In scientific research, collaboration is one of the most effective ways to take advantage of new ideas, skills, resources, and for performing interdisciplinary research. Although collaboration networks have been intensively studied, the question of how individual scientists choose collaborators to study a new research topic remains almost unexplored. Here, we investigate the statistics and mechanisms of collaborations of individual scientists along their careers, revealing that, in general, collaborators are involved in significantly fewer topics than expected from controlled surrogate. In particular, we find that highly productive scientists tend to have higher fraction of single-topic collaborators, while highly cited, i.e., impactful, scientists have higher fraction of multi-topic collaborators. We also suggest a plausible mechanism for this distinction. Moreover, we investigate the cases where scientists involve existing collaborators into a new topic. We find that compared to productive scientists, impactful scientists show strong preference of collaboration with high impact scientists on a new topic. Finally, we validate our findings by investigating active scientists in different years and across different disciplines.

preprint2022arXiv

POEM: Out-of-Distribution Detection with Posterior Sampling

Out-of-distribution (OOD) detection is indispensable for machine learning models deployed in the open world. Recently, the use of an auxiliary outlier dataset during training (also known as outlier exposure) has shown promising performance. As the sample space for potential OOD data can be prohibitively large, sampling informative outliers is essential. In this work, we propose a novel posterior sampling-based outlier mining framework, POEM, which facilitates efficient use of outlier data and promotes learning a compact decision boundary between ID and OOD data for improved detection. We show that POEM establishes state-of-the-art performance on common benchmarks. Compared to the current best method that uses a greedy sampling strategy, POEM improves the relative performance by 42.0% and 24.2% (FPR95) on CIFAR-10 and CIFAR-100, respectively. We further provide theoretical insights on the effectiveness of POEM for OOD detection.

preprint2020arXiv

A hyperbolic Embedding Model for Directed Networks

Network embedding is a fervid topic in current networks science and observes that most real complex systems can be embedded in hidden metrics space and emerge as the geometrical property, where the geometric distance between nodes determines the likelihood of links connected. Among those, hyperbolic space associated with the structural organization of many real complex systems, it has thus received extensive attention. However, the majority of methods and measurements, recently developed, less take these features into consideration for the asymmetry of links. Here, we discuss how to multiplex node information as an embedding foundation through identifying the bipartite structure of directed networks; and we proposed the generally mapping framework which hybrids the topological structure of complex networks, directed links and the hidden metrics space. By splitting the different properties of a node, possibilities between different types of nodes can be modeled. In addition to that, we apply this model to some real systems, including international trade networks and C.elegans neural networks. Results confirm that directed networks enable mapping into metrics space as well, and network embedding information can improve the scope of application of existing models.

preprint2020arXiv

Prediction Model Based on Integrated Political Economy System: The Case of US Presidential Election

This paper studies an integrated system of political and economic systems from a systematic perspective to explore the complex interaction between them, and specially analyzes the case of the US presidential election forecasting. Based on the signed association networks of industrial structure constructed by economic data, our framework simulates the diffusion and evolution of opinions during the election through a kinetic model called the Potts Model. Remarkably, we propose a simple and efficient prediction model for the US presidential election, and meanwhile inspire a new way to model the economic structure. Findings also highlight the close relationship between economic structure and political attitude. Furthermore, the case analysis in terms of network and economy demonstrates the specific features and the interaction between political tendency and industrial structure in a particular period, which is consistent with theories in politics and economics.

preprint2020arXiv

Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction

Rich user behavior data has been proven to be of great value for click-through rate prediction tasks, especially in industrial applications such as recommender systems and online advertising. Both industry and academy have paid much attention to this topic and propose different approaches to modeling with long sequential user behavior data. Among them, memory network based model MIMN proposed by Alibaba, achieves SOTA with the co-design of both learning algorithm and serving system. MIMN is the first industrial solution that can model sequential user behavior data with length scaling up to 1000. However, MIMN fails to precisely capture user interests given a specific candidate item when the length of user behavior sequence increases further, say, by 10 times or more. This challenge exists widely in previously proposed approaches. In this paper, we tackle this problem by designing a new modeling paradigm, which we name as Search-based Interest Model (SIM). SIM extracts user interests with two cascaded search units: (i) General Search Unit acts as a general search from the raw and arbitrary long sequential behavior data, with query information from candidate item, and gets a Sub user Behavior Sequence which is relevant to candidate item; (ii) Exact Search Unit models the precise relationship between candidate item and SBS. This cascaded search paradigm enables SIM with a better ability to model lifelong sequential behavior data in both scalability and accuracy. Apart from the learning algorithm, we also introduce our hands-on experience on how to implement SIM in large scale industrial systems. Since 2019, SIM has been deployed in the display advertising system in Alibaba, bringing 7.1\% CTR and 4.4\% RPM lift, which is significant to the business. Serving the main traffic in our real system now, SIM models user behavior data with maximum length reaching up to 54000, pushing SOTA to 54x.

preprint2020arXiv

The critical role of fresh teams in creating original and multi-disciplinary research

Teamwork is one of the most prominent features in modern science. It is now well-understood that the team size is an important factor that affects team creativity. However, the crucial question of how the character of research studies is influenced by the freshness of the team remains unclear. In this paper, we quantify the team freshness according to the absent of prior collaboration among team members. Our results suggest that fresher teams tend to produce works of higher originality and more multi-disciplinary impact. These effects are even magnified in larger teams. Furthermore, we find that freshness defined by new team members in a paper is a more effective indicator of research originality and multi-disciplinarity compared to freshness defined by new collaboration relations among team members. Finally, we show that career freshness of members also plays an important role in increasing the originality and multi-disciplinarity of produced papers.

preprint2010arXiv

Comment on "Dynamics and Directionality in Complex Networks"

Authors of Phys. Rev. Lett. 103, 228702 (2009) claim that "The residual degree gradient (RDG) method can enhance thesynchronizability of networks by simply changing the direction of the links". In this paper, we argue that in some case the RDG method will lead to the failure of synchronization ($R=λ^{r}_{2}/λ^{r}_{N}=0$). Additionally, we also propose a so-called residual betweenness gradient (RBG) method to solve this problem.

preprint2010arXiv

Emergence of Global Preferential Attachment From Local Interaction

Global degree/strength based preferential attachment is widely used as an evolution mechanism of networks. But it is hard to believe that any individual can get global information and shape the network architecture based on it. In this paper, it is found that the global preferential attachment emerges from the local interaction models, including distance-dependent preferential attachment (DDPA) evolving model of weighted networks(M. Li et al, New Journal of Physics 8 (2006) 72), acquaintance network model(J. Davidsen et al, Phys. Rev. Lett. 88 (2002) 128701) and connecting nearest-neighbor(CNN) model(A. Vazquez, Phys. Rev. E 67 (2003) 056104). For DDPA model and CNN model, the attachment rate depends linearly on the degree or strength, while for acquaintance network model, the dependence follows a sublinear power law. It implies that for the evolution of social networks, local contact could be more fundamental than the presumed global preferential attachment. This is onsistent with the result observed in the evolution of empirical email networks.

preprint2010arXiv

How to Measure Significance of Community Structure in Complex Networks

Community structure analysis is a powerful tool for complex networks, which can simplify their functional analysis considerably. Recently, many approaches were proposed to community structure detection, but few works were focused on the significance of community structure. Since real networks obtained from complex systems always contain error links, and most of the community detection algorithms have random factors, evaluate the significance of community structure is important and urgent. In this paper, we use the eigenvectors' stability to characterize the significance of community structures. By employing the eigenvalues of Laplacian matrix of a given network, we can evaluate the significance of its community structure and obtain the optimal number of communities, which are always hard for community detection algorithms. We apply our method to many real networks. We find that significant community structures exist in many social networks and C.elegans neural network, and that less significant community structures appear in protein-interaction networks and metabolic networks. Our method can be applied to broad clustering problems in data mining due to its solid mathematical basis and efficiency.

preprint2010arXiv

NRQCD Predictions of D-Wave Quarkonia $^3D_{J}(J=1,2,3)$ Decay into Light Hadrons at Order $α_{s}^{3}$

In this paper, in the framework of NRQCD we study the light hadron (LH) decays of the spin-triplet (S=1) D-wave heavy quarkonia. The short distance coefficients of all Fock states in the $^3D_J(J=1,2,3)$ quarkonia including D-wave color-singlet, P-wave color-octet and S-wave color-singlet and color-octet are calculated perturbatively at $α_{s}^3$ order. The operator evolution equations of the four-fermion operators are also derived and are used to estimate the numerical values of the long distance matrix elements. We find that for the $c\bar{c}$ system, the LH decay widths of $ψ(1^3D_J)$ predicted by NRQCD is about $2\sim3$ times larger than the phenomenological potential model results, while for the $b\bar{b}$ system the two theoretical estimations of $Γ(Υ(1^3D_J)\to LH)$ are in coincidence with each other. Our predictions for $ψ(1^3D_J)$ LH decay widths are $Γ(ψ(1^3D_J)\to LH)=(0.43,0.05,0.17)$MeV for J=1,2,3; and for $Υ(1^3D_J)$, $Γ(Υ(1^3D_J)\to LH)=(6.91,0.75,2.75)$KeV for J=1,2,3.

preprint2010arXiv

Relativistic correction to $e^{+}e^{-}\to J/ψ+gg$ at $B$ factories and constraint on color-octet matrix elements

We calculate the relativistic correction to $J/ψ$ production in the color-singlet process $e^{+}e^{-}\to J/ψ+gg$ at B-factories. We employ the non-relativistic QCD (NRQCD) factorization approach, where the short-distance coefficients are calculated perturbatively and the long-distance matrix elements are extracted from the decays of $J/ψ$ into $e^{+}e^{-}$ and light hadrons. We find that the $O(v^2)$ relativistic correction can enhance the cross section by a factor of 20-30%, comparable to the enhancement due to the $O(α_s)$ radiative correction obtained earlier. Combining the relativistic correction with the QCD radiative correction, we find that the color-singlet contribution to $e^{+}e^{-}\to J/ψ+gg$ can saturate the latest observed cross section $σ(e^{+}e^{-}\to J/ψ+X_{\mathrm{non-c\bar{c}}})=0.43 \pm0.09\pm0.09$ pb by Belle, thus leaving little room to the color-octet contributions. This gives a very stringent constraint on the color-octet contribution, and may imply that the values of color-octet matrix elements are much smaller than expected earlier by using the naive velocity scaling rules or extracted from fitting experimental data with the leading-order calculations.

preprint2010arXiv

The attack tolerance of community structure in complex networks

Robustness is an important property of complex networks. Up to now, there are plentiful researches focusing on the network's robustness containing error and attack tolerance of network's connectivity and the shortest path. In this paper, the error and attack tolerance of network's community structure are studies through randomly and purposely disturbing interaction of networks. Two purposely perturbation methods are designed, that one methods is based on cluster coefficient and the other is attacking triangle. Dissimilarity function D is used to quantify the changes of community structure and modularity Q is used to quantify the significance of community structure. The numerical results show that after perturbation, network's community structure is damaged to be more unclear. It is also discovered that purposely attacking damages more to the community structure than randomly attacking.