Source author record

Tandy Warnow

Tandy Warnow appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Digital Libraries physics.soc-ph Social and Information Networks Computational Engineering, Finance, and Science Genomics Machine Learning Populations and Evolution

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Center-Periphery Structure in Communities: Extracellular Vesicles

Clustering and community detection in networks are of broad interest and have been the subject of extensive research that spans several fields. We are interested in the relatively narrow question of detecting communities of scientific publications that are linked by citations. These publication communities can be used to identify scientists with shared interests who form communities of researchers. Building on the well-known k-core algorithm, we have developed a modular pipeline to find publication communities. We compare our approach to communities discovered by the widely used Leiden algorithm for community finding. Using a quantitative and qualitative approach, we evaluate community finding results on a citation network consisting of over 14 million publications relevant to the field of extracellular vesicles.

preprint2020arXiv

Finding Scientific Communities In Citation Graphs: Convergent Clustering

Understanding the nature and organization of scientific communities is of broad interest. The `Invisible College' is a historical metaphor for one such type of community and the search for such `colleges' can be framed as the detection and analysis of small groups of scientists working on problems of common interests. Case studies have previously been conducted on individual communities with respect to their scientific and social behavior. In this study, we introduce, a new and scalable community finding approach. Supplemented by expert assessment, we use the convergence of two different clustering methods to select article clusters generated from over two million articles from the field of immunology spanning an eleven year period with relevant cluster quality indicators for evaluation. Finally, we identify author communities defined by these clusters. A sample of the article clusters produced by this pipeline was reviewed by experts, and shows strong thematic relatedness, suggesting that the inferred author communities may represent valid communities of practice. These findings suggest that such convergent approaches may be useful in the future.

preprint2020arXiv

Frequently Co-cited Publications: Features and Kinetics

Co-citation measurements can reveal the extent to which a concept representing a novel combination of existing ideas evolves towards a specialty. The strength of co-citation is represented by its frequency, which accumulates over time. Of interest is whether underlying features associated with the strength of co-citation can be identified. We use the proximal citation network for a given pair of articles (x, y) to compute theta, an a priori estimate of the probability of co-citation between x and y, prior to their first co-citation.Thus, low values for theta reflect pairs of articles for which co-citation is presumed less likely. We observe that co-citation frequencies are a composite of power-law and lognormal distributions, and that very high co-citation frequencies are more likely to be composed of pairs with low values of theta, reflecting the impact of a novel combination of ideas. Furthermore, we note that the occurrence of a direct citation between two members of a co-cited pair increases with co-citation frequency. Finally, we identify cases of frequently co-cited publications that accumulate co-citations after an extended period of dormancy.

preprint2020arXiv

NJst and ASTRID are not statistically consistent under a random model of missing data

Species tree estimation from multi-locus datasets is statistically challenging for multiple reasons, including gene tree heterogeneity across the genome due to incomplete lineage sorting (ILS). Species tree estimation methods have been developed that operate by estimating gene trees and then using those gene trees to estimate the species tree. Several of these methods (e.g., ASTRAL, ASTRID, and NJst) are provably statistically consistent under the multi-species coalescent (MSC) model, provided that the gene trees are estimated correctly, and there is no missing data. Recently, Nute et al. (BMC Genomics 2018) addressed the question of whether these methods remain statistically consistent under random models of taxon deletion, and asserted that they do so. Here we provide a counterexample to one of these theorems, and establish that ASTRID and NJst are not statistically consistent under an i.i.d. model of taxon deletion.

preprint2019arXiv

Co-citations in context: disciplinary heterogeneity is relevant

Citation analysis of the scientific literature has been used to study and define disciplinary boundaries, to trace the dissemination of knowledge, and to estimate impact. Co-citation, the frequency with which pairs of publications are cited, provides insight into how documents relate to each other and across fields. Co-citation analysis has been used to characterize combinations of prior work as conventional or innovative and to derive features of highly cited publications. Given the organization of science into disciplines, a key question is the sensitivity of such analyses to frame of reference. Our study examines this question using semantically-themed citation networks. We observe that trends reported to be true across the scientific literature do not hold for focused citation networks, and we conclude that inferring novelty using co-citation analysis and random graph models benefits from disciplinary context.

preprint2019arXiv

Viewing Computer Science through Citation Analysis; Salton and Bergmark Redux

Computer science has experienced dramatic growth and diversification over the last twenty years. Towards a current understanding of the structure of this discipline, we analyze a cohort of the computer science literature using the DBLP database. For insight on the features of this cohort and the relationship within its components, we constructed article level clusters based on either direct citations or co-citations, and reconciled them to major and minor subject categories in the Scopus All Science Journal Classification (ASJC). We described complementary insights from clustering by direct citation and co-citation, and both point to the increase in computer science publications and their scope. Our analysis shows cross-category clusters, some that interact with external fields, such as the biological sciences, while others remain inward looking.

preprint2015arXiv

Ultra-large alignments using Phylogeny-aware Profiles

Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments (MSAs) and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, an MSA method that uses a new machine learning technique - the Ensemble of Hidden Markov Models - that we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.

Tandy Warnow

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Center-Periphery Structure in Communities: Extracellular Vesicles

Finding Scientific Communities In Citation Graphs: Convergent Clustering

Frequently Co-cited Publications: Features and Kinetics

NJst and ASTRID are not statistically consistent under a random model of missing data

Co-citations in context: disciplinary heterogeneity is relevant

Viewing Computer Science through Citation Analysis; Salton and Bergmark Redux

Ultra-large alignments using Phylogeny-aware Profiles