Source author record

Ying Ding

Ying Ding appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

50works

28topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Rethinking the Value of Multi-Agent Workflow: A Strong Single Agent Baseline

Recent advances in LLM-based multi-agent systems (MAS) show that workflows composed of multiple LLM agents with distinct roles, tools, and communication patterns can outperform single-LLM baselines on complex tasks. However, most frameworks are homogeneous, where all agents share the same base LLM and differ only in prompts, tools, and positions in the workflow. This raises the question of whether such workflows can be simulated by a single agent through multi-turn conversations. We investigate this across seven benchmarks spanning coding, mathematics, general question answering, domain-specific reasoning, and real-world planning and tool use. Our results show that a single agent can reach the performance of homogeneous workflows with an efficiency advantage from KV cache reuse, and can even match the performance of an automatically optimized heterogeneous workflow. Building on this finding, we propose \textbf{OneFlow}, an algorithm that automatically tailors workflows for single-agent execution, reducing inference costs compared to existing automatic multi-agent design frameworks without trading off accuracy. These results position the single-LLM implementation of multi-agent workflows as a strong baseline for MAS research. We also note that single-LLM methods cannot capture heterogeneous workflows due to the lack of KV cache sharing across different LLMs, highlighting future opportunities in developing \textit{truly} heterogeneous multi-agent systems.

preprint2022arXiv

Knowledge-Augmented Contrastive Learning for Abnormality Classification and Localization in Chest X-rays with Radiomics using a Feedback Loop

Building a highly accurate predictive model for classification and localization of abnormalities in chest X-rays usually requires a large number of manually annotated labels and pixel regions (bounding boxes) of abnormalities. However, it is expensive to acquire such annotations, especially the bounding boxes. Recently, contrastive learning has shown strong promise in leveraging unlabeled natural images to produce highly generalizable and discriminative features. However, extending its power to the medical image domain is under-explored and highly non-trivial, since medical images are much less amendable to data augmentations. In contrast, their prior knowledge, as well as radiomic features, is often crucial. To bridge this gap, we propose an end-to-end semi-supervised knowledge-augmented contrastive learning framework, that simultaneously performs disease classification and localization tasks. The key knob of our framework is a unique positive sampling approach tailored for the medical images, by seamlessly integrating radiomic features as a knowledge augmentation. Specifically, we first apply an image encoder to classify the chest X-rays and to generate the image features. We next leverage Grad-CAM to highlight the crucial (abnormal) regions for chest X-rays (even when unannotated), from which we extract radiomic features. The radiomic features are then passed through another dedicated encoder to act as the positive sample for the image features generated from the same chest X-ray. In this way, our framework constitutes a feedback loop for image and radiomic modality features to mutually reinforce each other. Their contrasting yields knowledge-augmented representations that are both robust and interpretable. Extensive experiments on the NIH Chest X-ray dataset demonstrate that our approach outperforms existing baselines in both classification and localization tasks.

preprint2022arXiv

Pneumonia Detection on Chest X-ray using Radiomic Features and Contrastive Learning

Chest X-ray becomes one of the most common medical diagnoses due to its noninvasiveness. The number of chest X-ray images has skyrocketed, but reading chest X-rays still have been manually performed by radiologists, which creates huge burnouts and delays. Traditionally, radiomics, as a subfield of radiology that can extract a large number of quantitative features from medical images, demonstrates its potential to facilitate medical imaging diagnosis before the deep learning era. With the rise of deep learning, the explainability of deep neural networks on chest X-ray diagnosis remains opaque. In this study, we proposed a novel framework that leverages radiomics features and contrastive learning to detect pneumonia in chest X-ray. Experiments on the RSNA Pneumonia Detection Challenge dataset show that our model achieves superior results to several state-of-the-art models (> 10% in F1-score) and increases the model's interpretability.

preprint2022arXiv

Prior Knowledge Enhances Radiology Report Generation

Radiology report generation aims to produce computer-aided diagnoses to alleviate the workload of radiologists and has drawn increasing attention recently. However, previous deep learning methods tend to neglect the mutual influences between medical findings, which can be the bottleneck that limits the quality of generated reports. In this work, we propose to mine and represent the associations among medical findings in an informative knowledge graph and incorporate this prior knowledge with radiology report generation to help improve the quality of generated reports. Experiment results demonstrate the superior performance of our proposed method on the IU X-ray dataset with a ROUGE-L of 0.384$\pm$0.007 and CIDEr of 0.340$\pm$0.011. Compared with previous works, our model achieves an average of 1.6% improvement (2.0% and 1.5% improvements in CIDEr and ROUGE-L, respectively). The experiments suggest that prior knowledge can bring performance gains to accurate radiology report generation. We will make the code publicly available at https://github.com/bionlplab/report_generation_amia2022.

preprint2022arXiv

Radiology Text Analysis System (RadText): Architecture and Evaluation

Analyzing radiology reports is a time-consuming and error-prone task, which raises the need for an efficient automated radiology report analysis system to alleviate the workloads of radiologists and encourage precise diagnosis. In this work, we present RadText, an open-source radiology text analysis system developed by Python. RadText offers an easy-to-use text analysis pipeline, including de-identification, section segmentation, sentence split and word tokenization, named entity recognition, parsing, and negation detection. RadText features a flexible modular design, provides a hybrid text processing schema, and supports raw text processing and local processing, which enables better usability and improved data privacy. RadText adopts BioC as the unified interface, and also standardizes the input / output into a structured representation compatible with Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). This allows for a more systematic approach to observational research across multiple, disparate data sources. We evaluated RadText on the MIMIC-CXR dataset, with five new disease labels we annotated for this work. RadText demonstrates highly accurate classification performances, with an average precision of, a recall of 0.94, and an F-1 score of 0.92. We have made our code, documentation, examples, and the test set available at https://github.com/bionlplab/radtext .

preprint2022arXiv

Switching modulation of spin transport in ferromagnetic tetragonal silicene

We study the band structure and transport properties of ferromagnetic tetragonal silicene nanoribbons by using the non-equilibrium Green's function method. The band structure and spin-dependent conductance are discussed under the combined effect of the external electric field, potential energy, exchange field and the spin-orbit coupling. One can easily realize a phase transition from a semimetallic to a semiconducting state by changing the transverse width of the nanoribbon. Separation of spin-dependent conductances arises from the effect of exchange field and the spin-orbit coupling, while zero-conductance behaviors exhibit spin-dependent band gaps induced by the electric field. We propose a device configuration of four-terminal tetragonal silicene nanoribbon with two central channels. It is found that spin current can be controlled by utilizing two switches. The switch with a high potential barrier can block electrons flowing from the central scattering region into other terminals. Interestingly, applying only one switch can realize spin-dependent zero conductance and large spin polarization. Two switches can provide multiple operations for controlling spin-dependent transport properties. The two-channel ferromagnetic tetragonal silicene nanoribbon can realize an effective separation of spin current, which may be a potential candidate for spintronic devices.

preprint2022arXiv

Team formation and team performance: The balance between team freshness and repeat collaboration

Incorporating fresh members in teams is considered a pathway to team creativity. However, whether freshness improves team performance or not remains unclear, as well as the optimal involvement of fresh members for team performance. This study uses a group of authors on the byline of a publication as a proxy for a scientific team. We extend an indicator, i.e., team freshness, to measure the extent to which a scientific team incorporates new members, by calculating the fraction of new collaboration relations established within the team. Based on more than 43 million scientific publications covering more than a half-century of research from Microsoft Academic Graph, this study provides a holistic picture of the current development of team freshness by outlining the temporal evolution of freshness, and its disciplinary distribution. Subsequently, using a multivariable regression approach, we examine the association between team freshness and papers'short-term and long-term citations.The major findings are as follows: (1)team freshness in scientific teams has been increasing in the past half-century; (2)there exists an inverted-U-shaped association between team freshness and papers' citations in all the disciplines and in different periods;(3)the inverted-U-shaped relationship between team freshness and papers' citations is only found in small teams, while, in large teams, team freshness is significantly positively related to papers' citations.

preprint2022arXiv

Team Power Dynamics and Team Impact: New Perspectives on Scientific Collaboration using Career Age as a Proxy for Team Power

Power dynamics influence every aspect of scientific collaboration. Team power dynamics can be measured by team power level and team power hierarchy. Team power level is conceptualized as the average level of the possession of resources, expertise, or decision-making authorities of a team. Team power hierarchy represents the vertical differences of the possessions of resources in a team. In Science of Science, few studies have looked at scientific collaboration from the perspective of team power dynamics. This research examines how team power dynamics affect team impact to fill the research gap. In this research, all co-authors of one publication are treated as one team. Team power level and team power hierarchy of one team are measured by the mean and Gini index of career age of co-authors in this team. Team impact is quantified by citations of a paper authored by this team. By analyzing over 7.7 million teams from Science (e.g., Computer Science, Physics), Social Sciences (e.g., Sociology, Library & Information Science), and Arts & Humanities (e.g., Art), we find that flat team structure is associated with higher team impact, especially when teams have high team power level. These findings have been repeated in all five disciplines except Art, and are consistent in various types of teams from Computer Science including teams from industry or academia, teams with different gender groups, teams with geographical contrast, and teams with distinct size.

preprint2022arXiv

The Gene of Scientific Success

This paper elaborates how to identify and evaluate causal factors to improve scientific impact. Currently, analyzing scientific impact can be beneficial to various academic activities including funding application, mentor recommendation, and discovering potential cooperators etc. It is universally acknowledged that high-impact scholars often have more opportunities to receive awards as an encouragement for their hard working. Therefore, scholars spend great efforts in making scientific achievements and improving scientific impact during their academic life. However, what are the determinate factors that control scholars' academic success? The answer to this question can help scholars conduct their research more efficiently. Under this consideration, our paper presents and analyzes the causal factors that are crucial for scholars' academic success. We first propose five major factors including article-centered factors, author-centered factors, venue-centered factors, institution-centered factors, and temporal factors. Then, we apply recent advanced machine learning algorithms and jackknife method to assess the importance of each causal factor. Our empirical results show that author-centered and article-centered factors have the highest relevancy to scholars' future success in the computer science area. Additionally, we discover an interesting phenomenon that the h-index of scholars within the same institution or university are actually very close to each other.

preprint2022arXiv

Training Your Sparse Neural Network Better with Any Mask

Pruning large neural networks to create high-quality, independently trainable sparse masks, which can maintain similar performance to their dense counterparts, is very desirable due to the reduced space and time complexity. As research effort is focused on increasingly sophisticated pruning methods that leads to sparse subnetworks trainable from the scratch, we argue for an orthogonal, under-explored theme: improving training techniques for pruned sub-networks, i.e. sparse training. Apart from the popular belief that only the quality of sparse masks matters for sparse training, in this paper we demonstrate an alternative opportunity: one can carefully customize the sparse training techniques to deviate from the default dense network training protocols, consisting of introducing ``ghost" neurons and skip connections at the early stage of training, and strategically modifying the initialization as well as labels. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks. By adopting our newly curated techniques, we demonstrate significant performance gains across various popular datasets (CIFAR-10, CIFAR-100, TinyImageNet), architectures (ResNet-18/32/104, Vgg16, MobileNet), and sparse mask options (lottery ticket, SNIP/GRASP, SynFlow, or even randomly pruning), compared to the default training protocols, especially at high sparsity levels. Code is at https://github.com/VITA-Group/ToST

preprint2021arXiv

Contrastive Learning Improves Critical Event Prediction in COVID-19 Patients

Machine Learning (ML) models typically require large-scale, balanced training data to be robust, generalizable, and effective in the context of healthcare. This has been a major issue for developing ML models for the coronavirus-disease 2019 (COVID-19) pandemic where data is highly imbalanced, particularly within electronic health records (EHR) research. Conventional approaches in ML use cross-entropy loss (CEL) that often suffers from poor margin classification. For the first time, we show that contrastive loss (CL) improves the performance of CEL especially for imbalanced EHR data and the related COVID-19 analyses. This study has been approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai. We use EHR data from five hospitals within the Mount Sinai Health System (MSHS) to predict mortality, intubation, and intensive care unit (ICU) transfer in hospitalized COVID-19 patients over 24 and 48 hour time windows. We train two sequential architectures (RNN and RETAIN) using two loss functions (CEL and CL). Models are tested on full sample data set which contain all available data and restricted data set to emulate higher class imbalance.CL models consistently outperform CEL models with the restricted data set on these tasks with differences ranging from 0.04 to 0.15 for AUPRC and 0.05 to 0.1 for AUROC. For the restricted sample, only the CL model maintains proper clustering and is able to identify important features, such as pulse oximetry. CL outperforms CEL in instances of severe class imbalance, on three EHR outcomes with respect to three performance metrics: predictive power, clustering, and feature importance. We believe that the developed CL framework can be expanded and used for EHR ML work in general.

preprint2021arXiv

Evolution of Charge and Pair Density Modulations in Overdoped Bi2Sr2CuO6+delta

One of the central issues concerning the mechanism of high temperature superconductivity in cuprates is the nature of the ubiquitous charge order and its implications to superconductivity. Here we use scanning tunneling microscopy to investigate the evolution of charge order from the optimally doped to strongly overdoped Bi2Sr2CuO6+δ cuprates. We find that with increasing hole concentration, the long-range checkerboard order gradually evolves into short-range glassy patterns consisting of diluted charge puddles. Each charge puddle has a unidirectional nematic internal structure, and exhibits clear pair density modulations as revealed by the spatial variations of superconducting coherence peak and gap depth. Both the charge puddles and the nematicity vanish completely in the strongly overdoped non-superconducting regime, when another type of short-range order with root2 * root2 periodicity emerges. These results shed important new lights on the intricate interplay between the intertwined orders and the superconducting phase of cuprates.

preprint2021arXiv

Fluid structure interaction: Insights into biomechanical implications of endograft after thoracic endovascular aortic repair

Thoracic endovascular aortic repair (TEVAR) has developed to be the most effective treatment for aortic diseases. This study aims to evaluate the biomechanical implications of the implanted endograft after TEVAR. We present a novel image-based, patient-specific, fluid-structure computational framework. The geometries of blood, endograft, and aortic wall were reconstructed based on clinical images. Patient-specific measurement data was collected to determine the parameters of the three-element Windkessel. We designed three postoperative scenarios with rigid wall assumption, blood-wall interaction, blood-endograft-wall interplay, respectively, where a two-way fluid-structure interaction (FSI) method was applied to predict the deformation of the composite stent-wall. Computational results were validated with Doppler ultrasound data. Results show that the rigid wall assumption fails to predict the waveforms of blood outflow and energy loss (EL). The complete storage and release process of blood flow energy, which consists of four phases is captured by the FSI method. The endograft implantation would weaken the buffer function of the aorta and reduce mean EL by 19.1%. The closed curve area of wall pressure and aortic volume could indicate the EL caused by the interaction between blood flow and wall deformation, which accounts for 68.8% of the total EL. Both the FSI and endograft have a slight effect on wall shear stress-related-indices. The deformability of the composite stent-wall region is remarkably limited by the endograft. Our results highlight the importance of considering the interaction between blood flow, the implanted endograft, and the aortic wall to acquire physiologically accurate hemodynamics in post-TEVAR computational studies and the deformation of the aortic wall is responsible for the major EL of the blood flow.

preprint2021arXiv

Giant anisotropic photocurrent modulated by strain in type-II Weyl semimetal Td-MoTe2

We build a Cu-MoTe2-Cu device model and use first-principles density functional theory to study the transport properties of single-layer Td-MoTe2. We obtained the effect of strain on the energy band structure, transport properties, and photocurrent. The strain-induced photocurrent shows an anisotropy that reflects the modulation of the energy bands, including the Weyl point, by strain. The photocurrent can be suppressed to almost zero when the strain is applied along the vacuum direction. In contrast, the photocurrent can be significantly increased when the strain is applied along the transport direction. The transport properties and magnitude of the photocurrent in the MoTe2-based device can be effectively modulated by adjusting the strength and direction of the strain.

preprint2021arXiv

Innovation adoption: Broadcasting vs. Virality

Diffusion channels are critical to determining the adoption scale which leads to the ultimate impact of an innovation. The aim of this study is to develop an integrative understanding of the impact of two diffusion channels (i.e., broadcasting vs virality) on innovation adoption. Using citations of a series of classic algorithms and the time series of co-authorship as the footprints of their diffusion trajectories, we propose a novel method to analyze the intertwining relationships between broadcasting and virality in the innovation diffusion process. Our findings show that broadcasting and virality have similar diffusion power, but play different roles across diffusion stages. Broadcasting is more powerful in the early stages but may be gradually caught up or even surpassed by virality in the later period. Meanwhile, diffusion speed in virality is significantly faster than broadcasting and members from virality channels tend to adopt the same innovation repetitively.

preprint2020arXiv

A Simultaneous Inference Procedure to Identify Subgroups from RCTs with Survival Outcomes: Application to Analysis of AMD Progression Studies

With the uptake of targeted therapies, instead of the "one-fits-all" approach, modern randomized clinical trials (RCTs) often aim to develop treatments that target a subgroup of patients. Motivated by analyzing the Age-Related Eye Disease Study (AREDS) data, a large RCT to study the efficacy of nutritional supplements in delaying the progression of an eye disease, age-related macular degeneration (AMD), we develop a simultaneous inference procedure to identify and infer subgroups with differential treatment efficacy in RCTs with survival outcome. Specifically, we formulate the multiple testing problem through contrasts and construct their simultaneous confidence intervals, which control both within- and across- marker multiplicity appropriately. Realistic simulations are conducted using real genotype data to evaluate the method performance under various scenarios. The method is then applied to AREDS to assess the efficacy of antioxidants and zinc combination in delaying AMD progression. Multiple gene regions including ESRRB-VASH1 on chromosome 14 have been identified with subgroups showing differential efficacy. We further validate our findings in an independent subsequent RCT, AREDS2, by discovering consistent differential treatment responses in the targeted and non-targeted subgroups been identified from AREDS. This simultaneous inference approach provides a step forward to confidently identify and infer subgroups in modern drug development.

preprint2020arXiv

Analysis of misinformation during the COVID-19 outbreak in China: cultural, social and political entanglements

COVID-19 resulted in an infodemic, which could erode public trust, impede virus containment, and outlive the pandemic itself. The evolving and fragmented media landscape is a key driver of the spread of misinformation. Using misinformation identified by the fact-checking platform by Tencent and posts on Weibo, our results showed that the evolution of misinformation follows an issue-attention cycle, pertaining to topics such as city lockdown, cures, and preventions, and school reopening. Sources of authority weigh in on these topics, but their influence is complicated by peoples' pre-existing beliefs and cultural practices. Finally, social media has a complicated relationship with established or legacy media systems. Sometimes they reinforce each other, but in general, social media may have a topic cycle of its own making. Our findings shed light on the distinct characteristics of misinformation during the COVID-19 and offer insights into combating misinformation in China and across the world at large.

preprint2020arXiv

Attribute2vec: Deep Network Embedding Through Multi-Filtering GCN

We present a multi-filtering Graph Convolution Neural Network (GCN) framework for network embedding task. It uses multiple local GCN filters to do feature extraction in every propagation layer. We show this approach could capture different important aspects of node features against the existing attribute embedding based method. We also show that with multi-filtering GCN approach, we can achieve significant improvement against baseline methods when training data is limited. We also perform many empirical experiments and demonstrate the benefit of using multiple filters against single filter as well as most current existing network embedding methods for both the link prediction and node classification tasks.

preprint2020arXiv

Building a PubMed knowledge graph

PubMed is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguated, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID, and identifying fine-grained affiliation data from MapAffil. Through the integration of the credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving a F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities. The PKG is freely available on Figshare (https://figshare.com/s/6327a55355fc2c99f3a2, simplified version that exclude PubMed raw data) and TACC website (http://er.tacc.utexas.edu/datasets/ped, full version).

preprint2020arXiv

Coronavirus Knowledge Graph: A Case Study

The emergence of the novel COVID-19 pandemic has had a significant impact on global healthcare and the economy over the past few months. The virus's rapid widespread has led to a proliferation in biomedical research addressing the pandemic and its related topics. One of the essential Knowledge Discovery tools that could help the biomedical research community understand and eventually find a cure for COVID-19 are Knowledge Graphs. The CORD-19 dataset is a collection of publicly available full-text research articles that have been recently published on COVID-19 and coronavirus topics. Here, we use several Machine Learning, Deep Learning, and Knowledge Graph construction and mining techniques to formalize and extract insights from the PubMed dataset and the CORD-19 dataset to identify COVID-19 related experts and bio-entities. Besides, we suggest possible techniques to predict related diseases, drug candidates, gene, gene mutations, and related compounds as part of a systematic effort to apply Knowledge Discovery methods to help biomedical researchers tackle the pandemic.

preprint2020arXiv

The Pace of Artificial Intelligence Innovations: Speed, Talent, and Trial-and-Error

Innovations in artificial intelligence (AI) are occurring at speeds faster than ever witnessed before. However, few studies have managed to measure or depict this increasing velocity of innovations in the field of AI. In this paper, we combine data on AI from arXiv and Semantic Scholar to explore the pace of AI innovations from three perspectives: AI publications, AI players, and AI updates (trial and error). A research framework and three novel indicators, Average Time Interval (ATI), Innovation Speed (IS) and Update Speed (US), are proposed to measure the pace of innovations in the field of AI. The results show that: (1) in 2019, more than 3 AI preprints were submitted to arXiv per hour, over 148 times faster than in 1994. Furthermore, there was one deep learning-related preprint submitted to arXiv every 0.87 hours in 2019, over 1,064 times faster than in 1994. (2) For AI players, 5.26 new researchers entered into the field of AI each hour in 2019, more than 175 times faster than in the 1990s. (3) As for AI updates (trial and error), one updated AI preprint was submitted to arXiv every 41 days, with around 33% of AI preprints having been updated at least twice in 2019. In addition, as reported in 2019, it took, on average, only around 0.2 year for AI preprints to receive their first citations, which is 5 times faster than 2000-2007. This swift pace in AI illustrates the increase in popularity of AI innovation. The systematic and fine-grained analysis of the AI field enabled to portrait the pace of AI innovation and demonstrated that the proposed approach can be adopted to understand other fast-growing fields such as cancer research and nano science.

preprint2016arXiv

Electronic Evidence for Type II Weyl Semimetal State in MoTe2

Topological quantum materials, including topological insulators and superconductors, Dirac semimetals and Weyl semimetals, have attracted much attention recently for their unique electronic structure, spin texture and physical properties. Very lately, a new type of Weyl semimetals has been proposed where the Weyl Fermions emerge at the boundary between electron and hole pockets in a new phase of matter, which is distinct from the standard type I Weyl semimetals with a point-like Fermi surface. The Weyl cone in this type II semimetals is strongly tilted and the related Fermi surface undergos a Lifshitz transition, giving rise to a new kind of chiral anomaly and other new physics. MoTe2 is proposed to be a candidate of a type II Weyl semimetal; the sensitivity of its topological state to lattice constants and correlation also makes it an ideal platform to explore possible topological phase transitions. By performing laser-based angle-resolved photoemission (ARPES) measurements with unprecedentedly high resolution, we have uncovered electronic evidence of type II semimetal state in MoTe2. We have established a full picture of the bulk electronic states and surface state for MoTe2 that are consistent with the band structure calculations. A single branch of surface state is identified that connects bulk hole pockets and bulk electron pockets. Detailed temperature-dependent ARPES measurements show high intensity spot-like features that is ~40 meV above the Fermi level and is close to the momentum space consistent with the theoretical expectation of the type II Weyl points. Our results constitute electronic evidence on the nature of the Weyl semimetal state that favors the presence of two sets of type II Weyl points in MoTe2.

preprint2016arXiv

Electronic structure of the ingredient planes of cuprate superconductor Bi2Sr2CuO6+δ: a comparison study with Bi2Sr2CaCu2O8+δ

By means of low-temperature scanning tunneling microscopy, we report on the electronic structures of BiO and SrO planes of Bi2Sr2CuO6+δ (Bi-2201) superconductor prepared by argon-ion bombardment and annealing. Depending on post annealing conditions, the BiO planes exhibit either pseudogap (PG) with sharp coherence peaks and an anomalously large gap of 49 meV or van Hove singularity (VHS) near the Fermi level, while the SrO is always characteristic of a PG-like feature. This contrasts with Bi2Sr2CaCu2O8+δ (Bi-2212) superconductor where VHS occurs solely on the SrO plane. We disclose the interstitial oxygen dopants (δ in the formulas) as a primary cause for the occurrence of VHS, which are located dominantly around the BiO and SrO planes, respectively, in Bi-2201 and Bi-2212. This is supported by the contrasting structural buckling amplitude of BiO and SrO planes in the two superconductors. Our findings provide solid evidence for the irrelevance of PG to the superconductivity in the two superconductors, as well as insights into why Bi-2212 can achieve a higher superconducting transition temperature than Bi-2201, and by implication, the mechanism of cuprate superconductivity.

preprint2015arXiv

Identification of Topological Surface State in PdTe2 Superconductor by Angle-Resolved Photoemission Spectroscopy

High resolution angle-resolved photoemission measurements have been carried out on transition metal dichalcogenide PdTe2 that is a superconductor with a Tc at 1.7 K. Combined with theoretical calculations, we have discovered for the first time the existence of topologically nontrivial surface state with Dirac cone in PbTe2 superconductor. It is located at the Brillouin zone center and possesses helical spin texture. Distinct from the usual three-dimensional topological insulators where the Dirac cone of the surface state lies at the Fermi level, the Dirac point of the surface state in PdTe2 lies deep below the Fermi level at ~1.75 eV binding energy and is well separated from the bulk states. The identification of topological surface state in PdTe2 superconductor deep below the Fermi level provides a unique system to explore for new phenomena and properties and opens a door for finding new topological materials in transition metal chalcogenides.

preprint2015arXiv

Imputation of truncated p-values for meta-analysis methods and its genomic application

Microarray analysis to monitor expression activities in thousands of genes simultaneously has become routine in biomedical research during the past decade. A tremendous amount of expression profiles are generated and stored in the public domain and information integration by meta-analysis to detect differentially expressed (DE) genes has become popular to obtain increased statistical power and validated findings. Methods that aggregate transformed $p$-value evidence have been widely used in genomic settings, among which Fisher's and Stouffer's methods are the most popular ones. In practice, raw data and $p$-values of DE evidence are often not available in genomic studies that are to be combined. Instead, only the detected DE gene lists under a certain $p$-value threshold (e.g., DE genes with $p$-value${}<0.001$) are reported in journal publications. The truncated $p$-value information makes the aforementioned meta-analysis methods inapplicable and researchers are forced to apply a less efficient vote counting method or naïvely drop the studies with incomplete information. The purpose of this paper is to develop effective meta-analysis methods for such situations with partially censored $p$-values. We developed and compared three imputation methods - mean imputation, single random imputation and multiple imputation - for a general class of evidence aggregation methods of which Fisher's and Stouffer's methods are special examples. The null distribution of each method was analytically derived and subsequent inference and genomic analysis frameworks were established. Simulations were performed to investigate the type I error, power and the control of false discovery rate (FDR) for (correlated) gene expression data. The proposed methods were applied to several genomic applications in colorectal cancer, pain and liquid association analysis of major depressive disorder (MDD). The results showed that imputation methods outperformed existing naïve approaches. Mean imputation and multiple imputation methods performed the best and are recommended for future applications.

preprint2014arXiv

Subgroup Mixable Inference in Personalized Medicine, with an Application to Time-to-Event Outcomes

Measuring treatment efficacy in mixture of subgroups from a randomized clinical trial is a fundamental problem in personalized medicine development, in deciding whether to treat the entire patient population or to target a subgroup. We show that some commonly used efficacy measures are not suitable for a mixture population. We also show that, while it is important to adjust for imbalance in the data using least squares means (LSmeans) (not marginal means) estimation, the current practice of applying LSmeans to directly estimate the efficacy in a mixture population for any type of outcome is inappropriate. Proposing a new principle called {\em subgroup mixable estimation}, we establish the logical relationship among parameters that represent efficacy and develop a general inference procedure to confidently infer efficacy in subgroups and their mixtures. Using oncology studies with time-to-event outcomes as an example, we show that Hazard Ratio is not suitable for measuring efficacy in a mixture population, and provide alternative efficacy measures with a valid inference procedure.

preprint2014arXiv

The role of handbooks in knowledge creation and diffusion: A case of science and technology studies

Genre is considered to be an important element in scholarly communication and in the practice of scientific disciplines. However, scientometric studies have typically focused on a single genre, the journal article. The goal of this study is to understand the role that handbooks play in knowledge creation and diffusion and their relationship with the genre of journal articles, particularly in highly interdisciplinary and emergent social science and humanities disciplines. To shed light on these questions we focused on handbooks and journal articles published over the last four decades belonging to the research area of Science and Technology Studies (STS), broadly defined. To get a detailed picture we used the full-text of five handbooks (500,000 words) and a well-defined set of 11,700 STS articles. We confirmed the methodological split of STS into qualitative and quantitative (scientometric) approaches. Even when the two traditions explore similar topics (e.g., science and gender) they approach them from different starting points. The change in cognitive foci in both handbooks and articles partially reflects the changing trends in STS research, often driven by technology. Using text similarity measures we found that, in the case of STS, handbooks play no special role in either focusing the research efforts or marking their decline. In general, they do not represent the summaries of research directions that have emerged since the previous edition of the handbook.

preprint2013arXiv

Entitymetrics: Measuring the Impact of Entities

This paper proposes entitymetrics to measure the impact of knowledge units. Entitymetrics highlight the importance of entities embedded in scientific literature for further knowledge discovery. In this paper, we use Metformin, a drug for diabetes, as an example to form an entity-entity citation network based on literature related to Metformin. We then calculate the network features and compare the centrality ranks of biological entities with results from Comparative Toxicogenomics Database (CTD). The comparison demonstrates the usefulness of entitymetrics to detect most of the outstanding interactions manually curated in CTD.

preprint2013arXiv

Estimating mean survival time: when is it possible?

For right censored survival data, it is well known that the mean survival time can be consistently estimated when the support of the censoring time contains the support of the survival time. In practice, however, this condition can be easily violated because the follow-up of a study is usually within a finite window. In this article we show that the mean survival time is still estimable from a linear model when the support of some covariate(s) with nonzero coefficient(s) is unbounded regardless of the length of follow-up. This implies that the mean survival time can be well estimated when the covariate range is wide in practice. The theoretical finding is further verified for finite samples by simulation studies. Simulations also show that, when both models are correctly specified, the linear model yields reasonable mean square prediction errors and outperforms the Cox model, particularly with heavy censoring and short follow-up time.

preprint2013arXiv

Meta Path-Based Collective Classification in Heterogeneous Information Networks

Collective classification has been intensively studied due to its impact in many important applications, such as web mining, bioinformatics and citation analysis. Collective classification approaches exploit the dependencies of a group of linked objects whose class labels are correlated and need to be predicted simultaneously. In this paper, we focus on studying the collective classification problem in heterogeneous networks, which involves multiple types of data objects interconnected by multiple types of links. Intuitively, two objects are correlated if they are linked by many paths in the network. However, most existing approaches measure the dependencies among objects through directly links or indirect links without considering the different semantic meanings behind different paths. In this paper, we study the collective classification problem taht is defined among the same type of objects in heterogenous networks. Moreover, by considering different linkage paths in the network, one can capture the subtlety of different types of dependencies among objects. We introduce the concept of meta-path based dependencies among objects, where a meta path is a path consisting a certain sequence of linke types. We show that the quality of collective classification results strongly depends upon the meta paths used. To accommodate the large network size, a novel solution, called HCC (meta-path based Heterogenous Collective Classification), is developed to effectively assign labels to a group of instances that are interconnected through different meta-paths. The proposed HCC model can capture different types of dependencies among objects with respect to different meta paths. Empirical studies on real-world networks demonstrate that effectiveness of the proposed meta path-based collective classification approach.

preprint2012arXiv

A bird's-eye view of scientific trading: Dependency relations among fields of science

We use a trading metaphor to study knowledge transfer in the sciences as well as the social sciences. The metaphor comprises four dimensions: (a) Discipline Self-dependence, (b) Knowledge Exports/Imports, (c) Scientific Trading Dynamics, and (d) Scientific Trading Impact. This framework is applied to a dataset of 221 Web of Science subject categories. We find that: (i) the Scientific Trading Impact and Dynamics of Materials Science And Transportation Science have increased; (ii) Biomedical Disciplines, Physics, And Mathematics are significant knowledge exporters, as is Statistics & Probability; (iii) in the social sciences, Economics, Business, Psychology, Management, And Sociology are important knowledge exporters; (iv) Discipline Self-dependence is associated with specialized domains which have ties to professional practice (e.g., Law, Ophthalmology, Dentistry, Oral Surgery & Medicine, Psychology, Psychoanalysis, Veterinary Sciences, And Nursing).

preprint2012arXiv

A sieve M-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data

In many semiparametric models that are parameterized by two types of parameters---a Euclidean parameter of interest and an infinite-dimensional nuisance parameter---the two parameters are bundled together, that is, the nuisance parameter is an unknown function that contains the parameter of interest as part of its argument. For example, in a linear regression model for censored survival data, the unspecified error distribution function involves the regression coefficients. Motivated by developing an efficient estimating method for the regression parameters, we propose a general sieve M-theorem for bundled parameters and apply the theorem to deriving the asymptotic theory for the sieve maximum likelihood estimation in the linear regression model for censored survival data. The numerical implementation of the proposed estimating method can be achieved through the conventional gradient-based search algorithms such as the Newton--Raphson algorithm. We show that the proposed estimator is consistent and asymptotically normal and achieves the semiparametric efficiency bound. Simulation studies demonstrate that the proposed method performs well in practical settings and yields more efficient estimates than existing estimating equation based methods. Illustration with a real data example is also provided.

preprint2012arXiv

Citation content analysis (cca): A framework for syntactic and semantic analysis of citation content

This paper proposes a new framework for Citation Content Analysis (CCA), for syntactic and semantic analysis of citation content that can be used to better analyze the rich sociocultural context of research behavior. The framework could be considered the next generation of citation analysis. This paper briefly reviews the history and features of content analysis in traditional social sciences, and its previous application in Library and Information Science. Based on critical discussion of the theoretical necessity of a new method as well as the limits of citation analysis, the nature and purposes of CCA are discussed, and potential procedures to conduct CCA, including principles to identify the reference scope, a two-dimensional (citing and cited) and two-modular (syntactic and semantic modules) codebook, are provided and described. Future works and implications are also suggested.

preprint2012arXiv

Multiple spreaders affect the indirect influence on Twitter

Most studies on social influence have focused on direct influence, while another interesting question can be raised as whether indirect influence exists between two users who're not directly connected in the network and what affects such influence. In addition, the theory of \emph{complex contagion} tells us that more spreaders will enhance the indirect influence between two users. Our observation of intensity of indirect influence, propagated by $n$ parallel spreaders and quantified by retweeting probability on Twitter, shows that complex contagion is validated globally but is violated locally. In other words, the retweeting probability increases non-monotonically with some local drops.

preprint2012arXiv

Topic-Level Opinion Influence Model(TOIM): An Investigation Using Tencent Micro-Blogging

Mining user opinion from Micro-Blogging has been extensively studied on the most popular social networking sites such as Twitter and Facebook in the U.S., but few studies have been done on Micro-Blogging websites in other countries (e.g. China). In this paper, we analyze the social opinion influence on Tencent, one of the largest Micro-Blogging websites in China, endeavoring to unveil the behavior patterns of Chinese Micro-Blogging users. This paper proposes a Topic-Level Opinion Influence Model (TOIM) that simultaneously incorporates topic factor and social direct influence in a unified probabilistic framework. Based on TOIM, two topic level opinion influence propagation and aggregation algorithms are developed to consider the indirect influence: CP (Conservative Propagation) and NCP (None Conservative Propagation). Users' historical social interaction records are leveraged by TOIM to construct their progressive opinions and neighbors' opinion influence through a statistical learning process, which can be further utilized to predict users' future opinions on some specific topics. To evaluate and test this proposed model, an experiment was designed and a sub-dataset from Tencent Micro-Blogging was used. The experimental results show that TOIM outperforms baseline methods on predicting users' opinion. The applications of CP and NCP have no significant differences and could significantly improve recall and F1-measure of TOIM.

preprint2012arXiv

What is the Nature of Chinese MicroBlogging: Unveiling the Unique Features of Tencent Weibo

China has the largest number of online users in the world and about 20% internet users are from China. This is a huge, as well as a mysterious, market for IT industry due to various reasons such as culture difference. Twitter is the largest microblogging service in the world and Tencent Weibo is one of the largest microblogging services in China. Employ the two data sets as a source in our study, we try to unveil the unique behaviors of Chinese users. We have collected the entire Tencent Weibo from 10th, Oct, 2011 to 5th, Jan, 2012 and obtained 320 million user profiles, 5.15 billion user actions. We study Tencent Weibo from both macro and micro levels. From the macro level, Tencent users are more active on forwarding messages, but with less reciprocal relationships than Twitter users, their topic preferences are very different from Twitter users from both content and time consuming; besides, information can be diffused more efficient in Tencent Weibo. From the micro level, we mainly evaluate users' social influence from two indexes: "Forward" and \Follower", we study how users' actions will contribute to their social influences, and further identify unique features of Tencent users. According to our studies, Tencent users' actions are more personalized and diversity, and the influential users play a more important part in the whole networks. Based on the above analysis, we design a graphical model for predicting users' forwarding behaviors. Our experimental results on the large Tencent Weibo data validate the correctness of the discoveries and the effectiveness of the proposed model. To the best of our knowledge, this work is the first quantitative study on the entire Tencentsphere and information diffusion on it.

preprint2011arXiv

Applying weighted PageRank to author citation networks

This paper aims to identify whether different weighted PageRank algorithms can be applied to author citation networks to measure the popularity and prestige of a scholar from a citation perspective. Information Retrieval (IR) was selected as a test field and data from 1956-2008 were collected from Web of Science (WOS). Weighted PageRank with citation and publication as weighted vectors were calculated on author citation networks. The results indicate that both popularity rank and prestige rank were highly correlated with the weighted PageRank. Principal Component Analysis (PCA) was conducted to detect relationships among these different measures. For capturing prize winners within the IR field, prestige rank outperformed all the other measures.

preprint2011arXiv

Does Quantum Interference exist in Twitter?

It becomes more difficult to explain the social information transfer phenomena using the classic models based merely on Shannon Information Theory (SIT) and Classic Probability Theory (CPT), because the transfer process in the social world is rich of semantic and highly contextualized. This paper aims to use twitter data to explore whether the traditional models can interpret information transfer in social networks, and whether quantum-like phenomena can be spotted in social networks. Our main contributions are: (1) SIT and CPT fail to interpret the information transfer occurring in Twitter; and (2) Quantum interference exists in Twitter, and (3) a mathematical model is proposed to elucidate the spotted quantum phenomena.

preprint2011arXiv

Finding Complex Biological Relationships in Recent PubMed Articles Using Bio-LDA

The overwhelming amount of available scholarly literature in the life sciences poses significant challenges to scientists wishing to keep up with important developments related to their research, but also provides a useful resource for the discovery of recent information concerning genes, diseases, compounds and the interactions between them. In this paper, we describe an algorithm called Bio-LDA that uses extracted biological terminology to automatically identify latent topics, and provides a variety of measures to uncover putative relations among topics and bio-terms. Relationships identified using those approaches are combined with existing data in life science datasets to provide additional insight. Three case studies demonstrate the utility of the Bio-LDA model, including association predication, association search and connectivity map generation. This combined approach offers new opportunities for knowledge discovery in many areas of biology including target identification, lead hopping and drug repurposing.

preprint2011arXiv

Semantic Inference using Chemogenomics Data for Drug Discovery

Background Semantic Web Technology (SWT) makes it possible to integrate and search the large volume of life science datasets in the public domain, as demonstrated by well-known linked data projects such as LODD, Bio2RDF, and Chem2Bio2RDF. Integration of these sets creates large networks of information. We have previously described a tool called WENDI for aggregating information pertaining to new chemical compounds, effectively creating evidence paths relating the compounds to genes, diseases and so on. In this paper we examine the utility of automatically inferring new compound-disease associations (and thus new links in the network) based on semantically marked-up versions of these evidence paths, rule-sets and inference engines. Results Through the implementation of a semantic inference algorithm, rule set, Semantic Web methods (RDF, OWL and SPARQL) and new interfaces, we have created a new tool called Chemogenomic Explorer that uses networks of ontologically annotated RDF statements along with deductive reasoning tools to infer new associations between the query structure and genes and diseases from WENDI results. The tool then permits interactive clustering and filtering of these evidence paths. Conclusions We present a new aggregate approach to inferring links between chemical compounds and diseases using semantic inference. This approach allows multiple evidence paths between compounds and diseases to be identified using a rule-set and semantically annotated data, and for these evidence paths to be clustered to show overall evidence linking the compound to a disease. We believe this is a powerful approach, because it allows compound-disease relationships to be ranked by the amount of evidence supporting them.

preprint2010arXiv

Applying centrality measures to impact analysis: A coauthorship network analysis

Many studies on coauthorship networks focus on network topology and network statistical mechanics. This article takes a different approach by studying micro-level network properties, with the aim to apply centrality measures to impact analysis. Using coauthorship data from 16 journals in the field of library and information science (LIS) with a time span of twenty years (1988-2007), we construct an evolving coauthorship network and calculate four centrality measures (closeness, betweenness, degree and PageRank) for authors in this network. We find out that the four centrality measures are significantly correlated with citation counts. We also discuss the usability of centrality measures in author ranking, and suggest that centrality measures can be useful indicators for impact analysis.

preprint2010arXiv

Chem2Bio2RDF: A Linked Open Data Portal for Chemical Biology

The Chem2Bio2RDF portal is a Linked Open Data (LOD) portal for systems chemical biology aiming for facilitating drug discovery. It converts around 25 different datasets on genes, compounds, drugs, pathways, side effects, diseases, and MEDLINE/PubMed documents into RDF triples and links them to other LOD bubbles, such as Bio2RDF, LODD and DBPedia. The portal is based on D2R server and provides a SPARQL endpoint, but adds on few unique features like RDF faceted browser, user-friendly SPARQL query generator, MEDLINE/PubMed cross validation service, and Cytoscape visualization plugin. Three use cases demonstrate the functionality and usability of this portal.

preprint2010arXiv

Discovering author impact: A PageRank perspective

This article provides an alternative perspective for measuring author impact by applying PageRank algorithm to a coauthorship network. A weighted PageRank algorithm considering citation and coauthorship network topology is proposed. We test this algorithm under different damping factors by evaluating author impact in the informetrics research community. In addition, we also compare this weighted PageRank with the h-index, citation, and program committee (PC) membership of the International Society for Scientometrics and Informetrics (ISSI) conferences. Findings show that this weighted PageRank algorithm provides reliable results in measuring author impact.

preprint2010arXiv

Efficient equilibrium sampling of all-atom peptides using library-based Monte Carlo

We applied our previously developed library-based Monte Carlo (LBMC) to equilibrium sampling of several implicitly solvated all-atom peptides. LBMC can perform equilibrium sampling of molecules using the pre-calculated statistical libraries of molecular-fragment configurations and energies. For this study, we employed residue-based fragments distributed according to the Boltzmann factor of the OPLS-AA forcefield describing the individual fragments. Two solvent models were employed: a simple uniform dielectric and the Generalized Born/Surface Area (GBSA) model. The efficiency of LBMC was compared to standard Langevin dynamics (LD) using three different statistical tools. The statistical analyses indicate that LBMC is more than 100 times faster than LD not only for the simple solvent model but also for GBSA.

preprint2010arXiv

General Scaled Support Vector Machines

Support Vector Machines (SVMs) are popular tools for data mining tasks such as classification, regression, and density estimation. However, original SVM (C-SVM) only considers local information of data points on or over the margin. Therefore, C-SVM loses robustness. To solve this problem, one approach is to translate (i.e., to move without rotation or change of shape) the hyperplane according to the distribution of the entire data. But existing work can only be applied for 1-D case. In this paper, we propose a simple and efficient method called General Scaled SVM (GS-SVM) to extend the existing approach to multi-dimensional case. Our method translates the hyperplane according to the distribution of data projected on the normal vector of the hyperplane. Compared with C-SVM, GS-SVM has better performance on several data sets.

preprint2010arXiv

PageRank for ranking authors in co-citation networks

Google's PageRank has created a new synergy to information retrieval for a better ranking of Web pages. It ranks documents depending on the topology of the graphs and the weights of the nodes. PageRank has significantly advanced the field of information retrieval and keeps Google ahead of competitors in the search engine market. It has been deployed in bibliometrics to evaluate research impact, yet few of these studies focus on the important impact of the damping factor (d) for ranking purposes. This paper studies how varied damping factors in the PageRank algorithm can provide additional insight into the ranking of authors in an author co-citation network. Furthermore, we propose weighted PageRank algorithms. We select 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculate the ranks of these 108 authors based on PageRank with damping factor ranging from 0.05 to 0.95. In order to test the relationship between these different measures, we compare PageRank and weighted PageRank results with the citation ranking, h-index, and centrality measures. We found that in our author co-citation network, citation rank is highly correlated with PageRank's with different damping factors and also with different PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index is not significantly correlated with centrality measures.

preprint2010arXiv

Popular and/or Prestigious? Measures of Scholarly Esteem

Citation analysis does not generally take the quality of citations into account: all citations are weighted equally irrespective of source. However, a scholar may be highly cited but not highly regarded: popularity and prestige are not identical measures of esteem. In this study we define popularity as the number of times an author is cited and prestige as the number of times an author is cited by highly cited papers. Information Retrieval (IR) is the test field. We compare the 40 leading researchers in terms of their popularity and prestige over time. Some authors are ranked high on prestige but not on popularity, while others are ranked high on popularity but not on prestige. We also relate measures of popularity and prestige to date of Ph.D. award, number of key publications, organizational affiliation, receipt of prizes/honors, and gender.

preprint2010arXiv

Semantic Web: Who is who in the field - A bibliometric analysis

The Semantic Web is one of the main efforts aiming to enhance human and machine interaction by representing data in an understandable way for machines to mediate data and services. It is a fast-moving and multidisciplinary field. This study conducts a thorough bibliometric analysis of the field by collecting data from Web of Science (WOS) and Scopus for the period of 1960-2009. It utilizes a total of 44,157 papers with 651,673 citations from Scopus, and 22,951 papers with 571,911 citations from WOS. Based on these papers and citations, it evaluates the research performance of the Semantic Web (SW) by identifying the most productive players, major scholarly communication media, highly cited authors, influential papers and emerging stars.

preprint2010arXiv

Upper Tag Ontology (UTO) For Integrating Social Tagging Data

Data integration and mediation have become central concerns of information technology over the past few decades. With the advent of the Web and the rapid increases in the amount of data and the number of Web documents and users, researchers have focused on enhancing the interoperability of data through the development of metadata schemes. Other researchers have looked to the wealth of metadata generated by bookmarking sites on the Social Web. While several existing ontologies capitalize on the semantics of metadata created by tagging activities, the Upper Tag Ontology (UTO) emphasizes the structure of tagging activities to facilitate modeling of tagging data and the integration of data from different bookmarking sites as well as the alignment of tagging ontologies. UTO is described and its utility in harvesting, modeling, integrating, searching and analyzing data is demonstrated with metadata harvested from three major social tagging systems (Delicious, Flickr and YouTube).

preprint2010arXiv

Weighted citation: An indicator of an article's prestige

We propose using the technique of weighted citation to measure an article's prestige. The technique allocates a different weight to each reference by taking into account the impact of citing journals and citation time intervals. Weighted citation captures prestige, whereas citation counts capture popularity. We compare the value variances for popularity and prestige for articles published in the Journal of the American Society for Information Science and Technology from 1998 to 2007, and find that the majority have comparable status.

Ying Ding

What is connected

Connect this record

See the researcher in context

Building this map preview

50 published item(s)

Rethinking the Value of Multi-Agent Workflow: A Strong Single Agent Baseline

Knowledge-Augmented Contrastive Learning for Abnormality Classification and Localization in Chest X-rays with Radiomics using a Feedback Loop

Pneumonia Detection on Chest X-ray using Radiomic Features and Contrastive Learning

Prior Knowledge Enhances Radiology Report Generation

Radiology Text Analysis System (RadText): Architecture and Evaluation

Switching modulation of spin transport in ferromagnetic tetragonal silicene

Team formation and team performance: The balance between team freshness and repeat collaboration

Team Power Dynamics and Team Impact: New Perspectives on Scientific Collaboration using Career Age as a Proxy for Team Power

The Gene of Scientific Success

Training Your Sparse Neural Network Better with Any Mask

Contrastive Learning Improves Critical Event Prediction in COVID-19 Patients

Evolution of Charge and Pair Density Modulations in Overdoped Bi2Sr2CuO6+delta

Fluid structure interaction: Insights into biomechanical implications of endograft after thoracic endovascular aortic repair

Giant anisotropic photocurrent modulated by strain in type-II Weyl semimetal Td-MoTe2

Innovation adoption: Broadcasting vs. Virality

A Simultaneous Inference Procedure to Identify Subgroups from RCTs with Survival Outcomes: Application to Analysis of AMD Progression Studies

Analysis of misinformation during the COVID-19 outbreak in China: cultural, social and political entanglements

Attribute2vec: Deep Network Embedding Through Multi-Filtering GCN

Building a PubMed knowledge graph

Coronavirus Knowledge Graph: A Case Study

The Pace of Artificial Intelligence Innovations: Speed, Talent, and Trial-and-Error

Electronic Evidence for Type II Weyl Semimetal State in MoTe2

Electronic structure of the ingredient planes of cuprate superconductor Bi2Sr2CuO6+δ: a comparison study with Bi2Sr2CaCu2O8+δ

Identification of Topological Surface State in PdTe2 Superconductor by Angle-Resolved Photoemission Spectroscopy

Imputation of truncated p-values for meta-analysis methods and its genomic application

Subgroup Mixable Inference in Personalized Medicine, with an Application to Time-to-Event Outcomes

The role of handbooks in knowledge creation and diffusion: A case of science and technology studies

Entitymetrics: Measuring the Impact of Entities

Estimating mean survival time: when is it possible?

Meta Path-Based Collective Classification in Heterogeneous Information Networks

A bird's-eye view of scientific trading: Dependency relations among fields of science

A sieve M-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data

Citation content analysis (cca): A framework for syntactic and semantic analysis of citation content

Multiple spreaders affect the indirect influence on Twitter

Topic-Level Opinion Influence Model(TOIM): An Investigation Using Tencent Micro-Blogging

What is the Nature of Chinese MicroBlogging: Unveiling the Unique Features of Tencent Weibo

Applying weighted PageRank to author citation networks

Does Quantum Interference exist in Twitter?

Finding Complex Biological Relationships in Recent PubMed Articles Using Bio-LDA

Semantic Inference using Chemogenomics Data for Drug Discovery

Applying centrality measures to impact analysis: A coauthorship network analysis

Chem2Bio2RDF: A Linked Open Data Portal for Chemical Biology

Discovering author impact: A PageRank perspective

Efficient equilibrium sampling of all-atom peptides using library-based Monte Carlo

General Scaled Support Vector Machines

PageRank for ranking authors in co-citation networks

Popular and/or Prestigious? Measures of Scholarly Esteem

Semantic Web: Who is who in the field - A bibliometric analysis

Upper Tag Ontology (UTO) For Integrating Social Tagging Data

Weighted citation: An indicator of an article's prestige