Researcher profile

Yikuan Li

Yikuan Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

Clinical outcome prediction under hypothetical interventions -- a representation learning framework for counterfactual reasoning

Most machine learning (ML) models are developed for prediction only; offering no option for causal interpretation of their predictions or parameters/properties. This can hamper the health systems' ability to employ ML models in clinical decision-making processes, where the need and desire for predicting outcomes under hypothetical investigations (i.e., counterfactual reasoning/explanation) is high. In this research, we introduce a new representation learning framework (i.e., partial concept bottleneck), which considers the provision of counterfactual explanations as an embedded property of the risk model. Despite architectural changes necessary for jointly optimising for prediction accuracy and counterfactual reasoning, the accuracy of our approach is comparable to prediction-only models. Our results suggest that our proposed framework has the potential to help researchers and clinicians improve personalised care (e.g., by investigating the hypothetical differential effects of interventions)

preprint2022arXiv

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success of these long sequence transformer models, we introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora. We evaluate both pre-trained models using 10 baseline tasks including named entity recognition, question answering, and document classification tasks. The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT as well as other short-sequence transformers in all downstream tasks. We have made our source code available at [https://github.com/luoyuanlab/Clinical-Longformer] the pre-trained models available for public download at: [https://huggingface.co/yikuan8/Clinical-Longformer].

preprint2022arXiv

Deep Learning Reveals Patterns of Diverse and Changing Sentiments Towards COVID-19 Vaccines Based on 11 Million Tweets

Over 12 billion doses of COVID-19 vaccines have been administered at the time of writing. However, public perceptions of vaccines have been complex. We analyzed COVID-19 vaccine-related tweets to understand the evolving perceptions of COVID-19 vaccines. We finetuned a deep learning classifier using a state-of-the-art model, XLNet, to detect each tweet's sentiment automatically. We employed validated methods to extract the users' race or ethnicity, gender, age, and geographical locations from user profiles. Incorporating multiple data sources, we assessed the sentiment patterns among subpopulations and juxtaposed them against vaccine uptake data to unravel their interactive patterns. 11,211,672 COVID-19 vaccine-related tweets corresponding to 2,203,681 users over two years were analyzed. The finetuned model for sentiment classification yielded an accuracy of 0.92 on testing set. Users from various demographic groups demonstrated distinct patterns in sentiments towards COVID-19 vaccines. User sentiments became more positive over time, upon which we observed subsequent upswing in the population-level vaccine uptake. Surrounding dates where positive sentiments crest, we detected encouraging news or events regarding vaccine development and distribution. Positive sentiments in pregnancy-related tweets demonstrated a delayed pattern compared with trends in general population, with postponed vaccine uptake trends. Distinctive patterns across subpopulations suggest the need of tailored strategies. Global news and events profoundly involved in shaping users' thoughts on social media. Populations with additional concerns, such as pregnancy, demonstrated more substantial hesitancy since lack of timely recommendations. Feature analysis revealed hesitancies of various subpopulations stemmed from clinical trial logics, risks and complications, and urgency of scientific evidence.

preprint2022arXiv

Multimodal Machine Learning in Precision Health

As machine learning and artificial intelligence are more frequently being leveraged to tackle problems in the health sector, there has been increased interest in utilizing them in clinical decision-support. This has historically been the case in single modal data such as electronic health record data. Attempts to improve prediction and resemble the multimodal nature of clinical expert decision-making this has been met in the computational field of machine learning by a fusion of disparate data. This review was conducted to summarize this field and identify topics ripe for future research. We conducted this review in accordance with the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) extension for Scoping Reviews to characterize multi-modal data fusion in health. We used a combination of content analysis and literature searches to establish search strings and databases of PubMed, Google Scholar, and IEEEXplore from 2011 to 2021. A final set of 125 articles were included in the analysis. The most common health areas utilizing multi-modal methods were neurology and oncology. However, there exist a wide breadth of current applications. The most common form of information fusion was early fusion. Notably, there was an improvement in predictive performance performing heterogeneous data fusion. Lacking from the papers were clear clinical deployment strategies and pursuit of FDA-approved tools. These findings provide a map of the current literature on multimodal data fusion as applied to health diagnosis/prognosis problems. Multi-modal machine learning, while more robust in its estimations over unimodal methods, has drawbacks in its scalability and the time-consuming nature of information concatenation.

preprint2021arXiv

An explainable Transformer-based deep learning model for the prediction of incident heart failure

Predicting the incidence of complex chronic conditions such as heart failure is challenging. Deep learning models applied to rich electronic health records may improve prediction but remain unexplainable hampering their wider use in medical practice. We developed a novel Transformer deep-learning model for more accurate and yet explainable prediction of incident heart failure involving 100,071 patients from longitudinal linked electronic health records across the UK. On internal 5-fold cross validation and held-out external validation, our model achieved 0.93 and 0.93 area under the receiver operator curve and 0.69 and 0.70 area under the precision-recall curve, respectively and outperformed existing deep learning models. Predictor groups included all community and hospital diagnoses and medications contextualised within the age and calendar year for each patient's clinical encounter. The importance of contextualised medical information was revealed in a number of sensitivity analyses, and our perturbation method provided a way of identifying factors contributing to risk. Many of the identified risk factors were consistent with existing knowledge from clinical and epidemiological research but several new associations were revealed which had not been considered in expert-driven risk prediction models.

preprint2021arXiv

Risk factor identification for incident heart failure using neural network distillation and variable selection

Recent evidence shows that deep learning models trained on electronic health records from millions of patients can deliver substantially more accurate predictions of risk compared to their statistical counterparts. While this provides an important opportunity for improving clinical decision-making, the lack of interpretability is a major barrier to the incorporation of these black-box models in routine care, limiting their trustworthiness and preventing further hypothesis-testing investigations. In this study, we propose two methods, namely, model distillation and variable selection, to untangle hidden patterns learned by an established deep learning model (BEHRT) for risk association identification. Due to the clinical importance and diversity of heart failure as a phenotype, it was used to showcase the merits of the proposed methods. A cohort with 788,880 (8.3% incident heart failure) patients was considered for the study. Model distillation identified 598 and 379 diseases that were associated and dissociated with heart failure at the population level, respectively. While the associations were broadly consistent with prior knowledge, our method also highlighted several less appreciated links that are worth further investigation. In addition to these important population-level insights, we developed an approach to individual-level interpretation to take account of varying manifestation of heart failure in clinical practice. This was achieved through variable selection by detecting a minimal set of encounters that can maximally preserve the accuracy of prediction for individuals. Our proposed work provides a discovery-enabling tool to identify risk factors in both population and individual levels from a data-driven perspective. This helps to generate new hypotheses and guides further investigations on causal links.

preprint2020arXiv

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

Joint image-text embedding extracted from medical images and associated contextual reports is the bedrock for most biomedical vision-and-language (V+L) tasks, including medical visual question answering, clinical image-text retrieval, clinical report auto-generation. In this study, we adopt four pre-trained V+L models: LXMERT, VisualBERT, UNIER and PixelBERT to learn multimodal representation from MIMIC-CXR radiographs and associated reports. The extrinsic evaluation on OpenI dataset shows that in comparison to the pioneering CNN-RNN model, the joint embedding learned by pre-trained V+L models demonstrate performance improvement in the thoracic findings classification task. We conduct an ablation study to analyze the contribution of certain model components and validate the advantage of joint embedding over text-only embedding. We also visualize attention maps to illustrate the attention mechanism of V+L models.

preprint2020arXiv

Deep Bayesian Gaussian Processes for Uncertainty Estimation in Electronic Health Records

One major impediment to the wider use of deep learning for clinical decision making is the difficulty of assigning a level of confidence to model predictions. Currently, deep Bayesian neural networks and sparse Gaussian processes are the main two scalable uncertainty estimation methods. However, deep Bayesian neural network suffers from lack of expressiveness, and more expressive models such as deep kernel learning, which is an extension of sparse Gaussian process, captures only the uncertainty from the higher level latent space. Therefore, the deep learning model under it lacks interpretability and ignores uncertainty from the raw data. In this paper, we merge features of the deep Bayesian learning framework with deep kernel learning to leverage the strengths of both methods for more comprehensive uncertainty estimation. Through a series of experiments on predicting the first incidence of heart failure, diabetes and depression applied to large-scale electronic medical records, we demonstrate that our method is better at capturing uncertainty than both Gaussian processes and deep Bayesian neural networks in terms of indicating data insufficiency and distinguishing true positive and false positive predictions, with a comparable generalisation performance. Furthermore, by assessing the accuracy and area under the receiver operating characteristic curve over the predictive probability, we show that our method is less susceptible to making overconfident predictions, especially for the minority class in imbalanced datasets. Finally, we demonstrate how uncertainty information derived by the model can inform risk factor analysis towards model interpretability.