Source author record

Erjia Yan

Erjia Yan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Digital Libraries Information Retrieval Social and Information Networks Computation and Language

Catalog footprint

What is connected

11works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Citation Cascade and the Evolution of Topic Relevance

Citation analysis, as a tool for quantitative studies of science, has long emphasized direct citation relations, leaving indirect or high order citations overlooked. However, a series of early and recent studies demonstrate the existence of indirect and continuous citation impact across generations. Adding to the literature on high order citations, we introduce the concept of a citation cascade: the constitution of a series of subsequent citing events initiated by a certain publication. We investigate this citation structure by analyzing more than 450,000 articles and over 6 million citation relations. We show that citation impact exists not only within the three generations documented in prior research, but also in much further generations. Still, our experimental results indicate that two to four generations are generally adequate to trace a work's scientific impact. We also explore specific structural properties such as depth, width, structural virality, and size, which account for differences among individual citation cascades. Finally, we find evidence that it is more important for a scientific work to inspire trans domain (or indirectly related domain) works than to receive only intra domain recognition in order to achieve high impact. Our methods and findings can serve as a new tool for scientific evaluation and the modeling of scientific history.

preprint2016arXiv

A natural language interface to a graph-based bibliographic information retrieval system

With the ever-increasing scientific literature, there is a need on a natural language interface to bibliographic information retrieval systems to retrieve related information effectively. In this paper, we propose a natural language interface, NLI-GIBIR, to a graph-based bibliographic information retrieval system. In designing NLI-GIBIR, we developed a novel framework that can be applicable to graph-based bibliographic information retrieval systems. Our framework integrates algorithms/heuristics for interpreting and analyzing natural language bibliographic queries. NLI-GIBIR allows users to search for a variety of bibliographic data through natural language. A series of text- and linguistic-based techniques are used to analyze and answer natural language queries, including tokenization, named entity recognition, and syntactic analysis. We find that our framework can effectively represents and addresses complex bibliographic information needs. Thus, the contributions of this paper are as follows: First, to our knowledge, it is the first attempt to propose a natural language interface to graph-based bibliographic information retrieval. Second, we propose a novel customized natural language processing framework that integrates a few original algorithms/heuristics for interpreting and analyzing natural language bibliographic queries. Third, we show that the proposed framework and natural language interface provide a practical solution in building real-world natural language interface-based bibliographic information retrieval systems. Our experimental results show that the presented system can correctly answer 39 out of 40 example natural language queries with varying lengths and complexities.

preprint2013arXiv

Entitymetrics: Measuring the Impact of Entities

This paper proposes entitymetrics to measure the impact of knowledge units. Entitymetrics highlight the importance of entities embedded in scientific literature for further knowledge discovery. In this paper, we use Metformin, a drug for diabetes, as an example to form an entity-entity citation network based on literature related to Metformin. We then calculate the network features and compare the centrality ranks of biological entities with results from Comparative Toxicogenomics Database (CTD). The comparison demonstrates the usefulness of entitymetrics to detect most of the outstanding interactions manually curated in CTD.

preprint2013arXiv

Finding knowledge paths among scientific disciplines

This paper discovers patterns of knowledge dissemination among scientific disciplines. While the transfer of knowledge is largely unobservable, citations from one discipline to another have been proven to be an effective proxy to study disciplinary knowledge flow. This study constructs a knowledge flow network in that a node represents a Journal Citation Report subject category and a link denotes the citations from one subject category to another. Using the concept of shortest path, several quantitative measurements are proposed and applied to a knowledge flow network. Based on an examination of subject categories in Journal Citation Report, this paper finds that social science domains tend to be more self-contained and thus it is more difficult for knowledge from other domains to flow into them; at the same time, knowledge from science domains, such as biomedicine-, chemistry-, and physics-related domains can access and be accessed by other domains more easily. This paper also finds that social science domains are more disunified than science domains, as three fifths of the knowledge paths from one social science domain to another need at least one science domain to serve as an intermediate. This paper contributes to discussions on disciplinarity and interdisciplinarity by providing empirical analysis.

preprint2012arXiv

A bird's-eye view of scientific trading: Dependency relations among fields of science

We use a trading metaphor to study knowledge transfer in the sciences as well as the social sciences. The metaphor comprises four dimensions: (a) Discipline Self-dependence, (b) Knowledge Exports/Imports, (c) Scientific Trading Dynamics, and (d) Scientific Trading Impact. This framework is applied to a dataset of 221 Web of Science subject categories. We find that: (i) the Scientific Trading Impact and Dynamics of Materials Science And Transportation Science have increased; (ii) Biomedical Disciplines, Physics, And Mathematics are significant knowledge exporters, as is Statistics & Probability; (iii) in the social sciences, Economics, Business, Psychology, Management, And Sociology are important knowledge exporters; (iv) Discipline Self-dependence is associated with specialized domains which have ties to professional practice (e.g., Law, Ophthalmology, Dentistry, Oral Surgery & Medicine, Psychology, Psychoanalysis, Veterinary Sciences, And Nursing).

preprint2011arXiv

A recursive field-normalized bibliometric performance indicator: An application to the field of library and information science

Two commonly used ideas in the development of citation-based research performance indicators are the idea of normalizing citation counts based on a field classification scheme and the idea of recursive citation weighing (like in PageRank-inspired indicators). We combine these two ideas in a single indicator, referred to as the recursive mean normalized citation score indicator, and we study the validity of this indicator. Our empirical analysis shows that the proposed indicator is highly sensitive to the field classification scheme that is used. The indicator also has a strong tendency to reinforce biases caused by the classification scheme. Based on these observations, we advise against the use of indicators in which the idea of normalization based on a field classification scheme and the idea of recursive citation weighing are combined.

preprint2010arXiv

Applying centrality measures to impact analysis: A coauthorship network analysis

Many studies on coauthorship networks focus on network topology and network statistical mechanics. This article takes a different approach by studying micro-level network properties, with the aim to apply centrality measures to impact analysis. Using coauthorship data from 16 journals in the field of library and information science (LIS) with a time span of twenty years (1988-2007), we construct an evolving coauthorship network and calculate four centrality measures (closeness, betweenness, degree and PageRank) for authors in this network. We find out that the four centrality measures are significantly correlated with citation counts. We also discuss the usability of centrality measures in author ranking, and suggest that centrality measures can be useful indicators for impact analysis.

preprint2010arXiv

Discovering author impact: A PageRank perspective

This article provides an alternative perspective for measuring author impact by applying PageRank algorithm to a coauthorship network. A weighted PageRank algorithm considering citation and coauthorship network topology is proposed. We test this algorithm under different damping factors by evaluating author impact in the informetrics research community. In addition, we also compare this weighted PageRank with the h-index, citation, and program committee (PC) membership of the International Society for Scientometrics and Informetrics (ISSI) conferences. Findings show that this weighted PageRank algorithm provides reliable results in measuring author impact.

preprint2010arXiv

PageRank for ranking authors in co-citation networks

Google's PageRank has created a new synergy to information retrieval for a better ranking of Web pages. It ranks documents depending on the topology of the graphs and the weights of the nodes. PageRank has significantly advanced the field of information retrieval and keeps Google ahead of competitors in the search engine market. It has been deployed in bibliometrics to evaluate research impact, yet few of these studies focus on the important impact of the damping factor (d) for ranking purposes. This paper studies how varied damping factors in the PageRank algorithm can provide additional insight into the ranking of authors in an author co-citation network. Furthermore, we propose weighted PageRank algorithms. We select 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculate the ranks of these 108 authors based on PageRank with damping factor ranging from 0.05 to 0.95. In order to test the relationship between these different measures, we compare PageRank and weighted PageRank results with the citation ranking, h-index, and centrality measures. We found that in our author co-citation network, citation rank is highly correlated with PageRank's with different damping factors and also with different PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index is not significantly correlated with centrality measures.

preprint2010arXiv

Upper Tag Ontology (UTO) For Integrating Social Tagging Data

Data integration and mediation have become central concerns of information technology over the past few decades. With the advent of the Web and the rapid increases in the amount of data and the number of Web documents and users, researchers have focused on enhancing the interoperability of data through the development of metadata schemes. Other researchers have looked to the wealth of metadata generated by bookmarking sites on the Social Web. While several existing ontologies capitalize on the semantics of metadata created by tagging activities, the Upper Tag Ontology (UTO) emphasizes the structure of tagging activities to facilitate modeling of tagging data and the integration of data from different bookmarking sites as well as the alignment of tagging ontologies. UTO is described and its utility in harvesting, modeling, integrating, searching and analyzing data is demonstrated with metadata harvested from three major social tagging systems (Delicious, Flickr and YouTube).

preprint2010arXiv

Weighted citation: An indicator of an article's prestige

We propose using the technique of weighted citation to measure an article's prestige. The technique allocates a different weight to each reference by taking into account the impact of citing journals and citation time intervals. Weighted citation captures prestige, whereas citation counts capture popularity. We compare the value variances for popularity and prestige for articles published in the Journal of the American Society for Information Science and Technology from 1998 to 2007, and find that the majority have comparable status.

Erjia Yan

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Citation Cascade and the Evolution of Topic Relevance

A natural language interface to a graph-based bibliographic information retrieval system

Entitymetrics: Measuring the Impact of Entities

Finding knowledge paths among scientific disciplines

A bird's-eye view of scientific trading: Dependency relations among fields of science

A recursive field-normalized bibliometric performance indicator: An application to the field of library and information science

Applying centrality measures to impact analysis: A coauthorship network analysis

Discovering author impact: A PageRank perspective

PageRank for ranking authors in co-citation networks

Upper Tag Ontology (UTO) For Integrating Social Tagging Data

Weighted citation: An indicator of an article's prestige