Researcher profile

Yixin Li

Yixin Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

SynQP: A Framework and Metrics for Evaluating the Quality and Privacy Risk of Synthetic Data

The use of synthetic data in health applications raises privacy concerns, yet the lack of open frameworks for privacy evaluations has slowed its adoption. A major challenge is the absence of accessible benchmark datasets for evaluating privacy risks, due to difficulties in acquiring sensitive data. To address this, we introduce SynQP, an open framework for benchmarking privacy in synthetic data generation (SDG) using simulated sensitive data, ensuring that original data remains confidential. We also highlight the need for privacy metrics that fairly account for the probabilistic nature of machine learning models. As a demonstration, we use SynQP to benchmark CTGAN and propose a new identity disclosure risk metric that offers a more accurate estimation of privacy risks compared to existing approaches. Our work provides a critical tool for improving the transparency and reliability of privacy evaluations, enabling safer use of synthetic data in health-related applications. % In our quality evaluations, non-private models achieved near-perfect machine-learning efficacy \(\ge0.97\). Our privacy assessments (Table II) reveal that DP consistently lowers both identity disclosure risk (SD-IDR) and membership-inference attack risk (SD-MIA), with all DP-augmented models staying below the 0.09 regulatory threshold. Code available at https://github.com/CAN-SYNH/SynQP

preprint2022arXiv

Application of Graph Based Features in Computer Aided Diagnosis for Histopathological Image Classification of Gastric Cancer

The gold standard for gastric cancer detection is gastric histopathological image analysis, but there are certain drawbacks in the existing histopathological detection and diagnosis. In this paper, based on the study of computer aided diagnosis system, graph based features are applied to gastric cancer histopathology microscopic image analysis, and a classifier is used to classify gastric cancer cells from benign cells. Firstly, image segmentation is performed, and after finding the region, cell nuclei are extracted using the k-means method, the minimum spanning tree (MST) is drawn, and graph based features of the MST are extracted. The graph based features are then put into the classifier for classification. In this study, different segmentation methods are compared in the tissue segmentation stage, among which are Level-Set, Otsu thresholding, watershed, SegNet, U-Net and Trans-U-Net segmentation; Graph based features, Red, Green, Blue features, Grey-Level Co-occurrence Matrix features, Histograms of Oriented Gradient features and Local Binary Patterns features are compared in the feature extraction stage; Radial Basis Function (RBF) Support Vector Machine (SVM), Linear SVM, Artificial Neural Network, Random Forests, k-NearestNeighbor, VGG16, and Inception-V3 are compared in the classifier stage. It is found that using U-Net to segment tissue areas, then extracting graph based features, and finally using RBF SVM classifier gives the optimal results with 94.29%.

preprint2022arXiv

Application of Transfer Learning and Ensemble Learning in Image-level Classification for Breast Histopathology

Background: Breast cancer has the highest prevalence in women globally. The classification and diagnosis of breast cancer and its histopathological images have always been a hot spot of clinical concern. In Computer-Aided Diagnosis (CAD), traditional classification models mostly use a single network to extract features, which has significant limitations. On the other hand, many networks are trained and optimized on patient-level datasets, ignoring the application of lower-level data labels. Method: This paper proposes a deep ensemble model based on image-level labels for the binary classification of benign and malignant lesions of breast histopathological images. First, the BreaKHis dataset is randomly divided into a training, validation and test set. Then, data augmentation techniques are used to balance the number of benign and malignant samples. Thirdly, considering the performance of transfer learning and the complementarity between each network, VGG16, Xception, ResNet50, DenseNet201 are selected as the base classifiers. Result: In the ensemble network model with accuracy as the weight, the image-level binary classification achieves an accuracy of $98.90\%$. In order to verify the capabilities of our method, the latest Transformer and Multilayer Perception (MLP) models have been experimentally compared on the same dataset. Our model wins with a $5\%-20\%$ advantage, emphasizing the ensemble model's far-reaching significance in classification tasks. Conclusion: This research focuses on improving the model's classification performance with an ensemble algorithm. Transfer learning plays an essential role in small datasets, improving training speed and accuracy. Our model has outperformed many existing approaches in accuracy, providing a method for the field of auxiliary medical diagnosis.

preprint2022arXiv

Characterizing Spectral Properties of Bridge

The Bridge graph is a special type of graph which are constructed by connecting identical connected graphs with path graphs. We discuss different types of bridge graphs $B_{n\times l}^{m\times k}$ in this paper. In particular, we discuss the following: complete-type bridge graphs, star-type bridge graphs, and full binary tree bridge graphs. We also bound the second eigenvalues of the graph Laplacian of these graphs using methods from Spectral Graph Theory. In general, we prove that for general bridge graphs, $B_{n\times l}^2$, the second eigenvalue of the graph Laplacian should be between $0$ and $2$, inclusive. In the end, we talk about future work on infinite bridge graphs. We created definitions and found the related theorems to support our future work about infinite bridge graphs.

preprint2022arXiv

GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathological Image Detection

In this paper, a multi-scale visual transformer model, referred as GasHis-Transformer, is proposed for Gastric Histopathological Image Detection (GHID), which enables the automatic global detection of gastric cancer images. GasHis-Transformer model consists of two key modules designed to extract global and local information using a position-encoded transformer model and a convolutional neural network with local convolution, respectively. A publicly available hematoxylin and eosin (H&E) stained gastric histopathological image dataset is used in the experiment. Furthermore, a Dropconnect based lightweight network is proposed to reduce the model size and training time of GasHis-Transformer for clinical applications with improved confidence. Moreover, a series of contrast and extended experiments verify the robustness, extensibility and stability of GasHis-Transformer. In conclusion, GasHis-Transformer demonstrates high global detection performance and shows its significant potential in GHID task.

preprint2022arXiv

IL-MCAM: An interactive learning and multi-channel attention mechanism-based weakly supervised colorectal histopathology image classification approach

In recent years, colorectal cancer has become one of the most significant diseases that endanger human health. Deep learning methods are increasingly important for the classification of colorectal histopathology images. However, existing approaches focus more on end-to-end automatic classification using computers rather than human-computer interaction. In this paper, we propose an IL-MCAM framework. It is based on attention mechanisms and interactive learning. The proposed IL-MCAM framework includes two stages: automatic learning (AL) and interactivity learning (IL). In the AL stage, a multi-channel attention mechanism model containing three different attention mechanism channels and convolutional neural networks is used to extract multi-channel features for classification. In the IL stage, the proposed IL-MCAM framework continuously adds misclassified images to the training set in an interactive approach, which improves the classification ability of the MCAM model. We carried out a comparison experiment on our dataset and an extended experiment on the HE-NCT-CRC-100K dataset to verify the performance of the proposed IL-MCAM framework, achieving classification accuracies of 98.98% and 99.77%, respectively. In addition, we conducted an ablation experiment and an interchangeability experiment to verify the ability and interchangeability of the three channels. The experimental results show that the proposed IL-MCAM framework has excellent performance in the colorectal histopathological image classification tasks.

preprint2022arXiv

Predict the water level of the Lake Mead for the next 30 years based on ARIMA

In this study, a mathematical model is developed for the drought problem of Lake Mead. First, a polynomial fitting of the elevation of Lake Mead to the area of the lake is done by the least-squares method, and the volume of Lake Mead is approximated by the numerical integration of the product of the height and the area solved by the trapezoidal rule. The accuracy of the fitting reached more than 96%at all four different locations. Second, the minimum and maximum water levels were transformed into volume numbers by the above method, and the historical data of Lake Mead were classified into three classes of water resources by sequential clustering. According to these data, the optimal cut point of the most recent drought period was 2008 and has continued until now. Finally, two prediction models were constructed using ARIMA(2,2,2) and ARIMA(3,2,2) to study the water level data from 2008 to 2020 and 2005 to 2020, respectively, to predict the water level data of Lake Mead from 2022 to 2050, and to compare and analyze them.

preprint2020arXiv

An artificial neural network approximation for Cauchy inverse problems

A novel artificial neural network method is proposed for solving Cauchy inverse problems. It allows multiple hidden layers with arbitrary width and depth, which theoretically yields better approximations to the inverse problems. In this research, the existence and convergence are shown to establish the well-posedness of neural network method for Cauchy inverse problems, and various numerical examples are presented to illustrate its accuracy and stability. The numerical examples are from different points of view, including time-dependent and time-independent cases, high spatial dimension cases up to 8D, and cases with noisy boundary data and singular computational domain. Moreover, numerical results also show that neural networks with wider and deeper hidden layers could lead to better approximation for Cauchy inverse problems.

preprint2020arXiv

Direct Acyclic Graph based Ledger for Internet of Things: Performance and Security Analysis

Direct Acyclic Graph (DAG)-based ledger and the corresponding consensus algorithm has been identified as a promising technology for Internet of Things (IoT). Compared with Proof-of-Work (PoW) and Proof-of-Stake (PoS) that have been widely used in blockchain, the consensus mechanism designed on DAG structure (simply called as DAG consensus) can overcome some shortcomings such as high resource consumption, high transaction fee, low transaction throughput and long confirmation delay. However, the theoretic analysis on the DAG consensus is an untapped venue to be explored. To this end, based on one of the most typical DAG consensuses, Tangle, we investigate the impact of network load on the performance and security of the DAG-based ledger. Considering unsteady network load, we first propose a Markov chain model to capture the behavior of DAG consensus process under dynamic load conditions. The key performance metrics, i.e., cumulative weight and confirmation delay are analysed based on the proposed model. Then, we leverage a stochastic model to analyse the probability of a successful double-spending attack in different network load regimes. The results can provide an insightful understanding of DAG consensus process, e.g., how the network load affects the confirmation delay and the probability of a successful attack. Meanwhile, we also demonstrate the trade-off between security level and confirmation delay, which can act as a guidance for practical deployment of DAG-based ledgers.

preprint2020arXiv

What are We Depressed about When We Talk about COVID19: Mental Health Analysis on Tweets Using Natural Language Processing

The outbreak of coronavirus disease 2019 (COVID-19) recently has affected human life to a great extent. Besides direct physical and economic threats, the pandemic also indirectly impact people's mental health conditions, which can be overwhelming but difficult to measure. The problem may come from various reasons such as unemployment status, stay-at-home policy, fear for the virus, and so forth. In this work, we focus on applying natural language processing (NLP) techniques to analyze tweets in terms of mental health. We trained deep models that classify each tweet into the following emotions: anger, anticipation, disgust, fear, joy, sadness, surprise and trust. We build the EmoCT (Emotion-Covid19-Tweet) dataset for the training purpose by manually labeling 1,000 English tweets. Furthermore, we propose and compare two methods to find out the reasons that are causing sadness and fear.

preprint2017arXiv

AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding

Significant progress has been achieved in Computer Vision by leveraging large-scale image datasets. However, large-scale datasets for complex Computer Vision tasks beyond classification are still limited. This paper proposed a large-scale dataset named AIC (AI Challenger) with three sub-datasets, human keypoint detection (HKD), large-scale attribute dataset (LAD) and image Chinese captioning (ICC). In this dataset, we annotate class labels (LAD), keypoint coordinate (HKD), bounding box (HKD and LAD), attribute (LAD) and caption (ICC). These rich annotations bridge the semantic gap between low-level images and high-level concepts. The proposed dataset is an effective benchmark to evaluate and improve different computational methods. In addition, for related tasks, others can also use our dataset as a new resource to pre-train their models.