Source author record

Xiaomeng Li

Xiaomeng Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Artificial Intelligence Computation and Language Machine Learning Applications math.AP physics.soc-ph q-fin.GN

Catalog footprint

What is connected

26works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis

Echocardiographic diagnosis is vital for cardiac screening yet remains challenging. Existing echocardiography foundation models do not effectively capture the relationships between quantitative measurements and clinical manifestations, whereas medical reasoning multimodal large language models (MLLMs) require costly construction of detailed reasoning paths and remain ineffective at directly incorporating such echocardiographic priors into their reasoning. To address these limitations, we propose a novel approach comprising Cardiac Reasoning Template (CRT) and CardiacMind to enhance MLLM's echocardiographic reasoning by introducing cardiologist-like mindset. Specifically, CRT provides stepwise canonical diagnostic procedures for complex cardiac diseases to streamline reasoning path construction without the need for costly case-by-case verification. To incentivize reasoning MLLM under CRT, we develop CardiacMind, a new reinforcement learning scheme with three novel rewards: Procedural Quantity Reward (PQtR), Procedural Quality Reward (PQlR), and Echocardiographic Semantic Reward (ESR). PQtR promotes detailed reasoning; PQlR promotes integration of evidence across views and modalities, while ESR grounds stepwise descriptions in visual content. Our methods show a 48% improvement in multiview echocardiographic diagnosis for 15 complex cardiac diseases and a 5% improvement on CardiacNet-PAH over prior methods. The user study on our method's reasoning outputs shows 93.33% clinician agreement with cardiologist-like reasoning logic. Our code will be available.

preprint2026arXiv

Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

Multimodal large language models (MLLMs) struggle with numerical regression under long-tailed target distributions. Token-level supervised fine-tuning (SFT) and point-wise regression rewards bias learning toward high-density regions, leading to regression-to-the-mean behavior and poor tail performance. We identify the lack of cross-sample relational supervision as a key limitation of existing MLLM training paradigms. To address it, we propose a distribution-aware reinforcement learning framework based on Group Relative Policy Optimization, which introduces batch-level comparison-based supervision via the Concordance Correlation Coefficient-based reward to align predicted and ground-truth distributions in terms of correlation, scale, and mean. The framework is plug-and-play, requiring no architectural modification. Experiments on a unified suite of long-tailed regression benchmarks show consistent improvements over SFT and existing MLLM regression methods, with particularly strong gains in medium- and few-shot regimes.

preprint2026arXiv

MedHorizon: Towards Long-context Medical Video Understanding in the Wild

Medical multimodal large language models (MLLMs) have advanced image understanding and short-video analysis, but real clinical review often requires full-procedure video understanding. Unlike general long videos, medical procedures contain highly redundant anatomical views, while decisive evidence is temporally sparse, spatially subtle, and context dependent. Existing benchmarks often assume this evidence has already been localized through images, short clips, or pre-segmented videos, leaving the retrieval-before-reasoning problem under-tested. We introduce MedHorizon, an in-the-wild benchmark for long-context medical video understanding. MedHorizon preserves 759 hours of full-length clinical procedures and provides 1,253 evidence-grounded multiple-choice questionsthat jointly evaluate sparse evidence understanding and multi-hop clinical reasoning. Its evidence is extremely sparse, with only 0.166% evidence frames on average, requiring models to search noisy procedural streams before interpreting and aggregating findings. We evaluate representative general-domain, medical-domain, and long-video MLLMs. The best model reaches only 41.1% accuracy, showing that current systems remain far from robust full-procedure understanding. Further analysis yields four key findings: performance does not scale reliably with more frames, evidence retrieval and clinical interpretation remain primary bottlenecks; these bottlenecks are rooted in weak procedural reasoning and attention drift under redundancy, and generic sampling methods only partially balances local detail with global coverage. MedHorizon provides a rigorous testbed for MLLMs that retrieve sparse evidence and reason over complete clinical workflows.

preprint2026arXiv

Real-Time Reconstruction of 3D Bone Models via Very-Low-Dose Protocols

Patient-specific bone models are essential for designing surgical guides and preoperative planning, as they enable the visualization of intricate anatomical structures. However, traditional CT-based approaches for creating bone models are limited to preoperative use due to the low flexibility and high radiation exposure of CT and time-consuming manual delineation. Here, we introduce Semi-Supervised Reconstruction with Knowledge Distillation (SSR-KD), a fast and accurate AI framework to reconstruct high-quality bone models from biplanar X-rays in 30 seconds, with an average error under 1.0 mm, eliminating the dependence on CT and manual work. Additionally, high tibial osteotomy simulation was performed by experts on reconstructed bone models, demonstrating that bone models reconstructed from biplanar X-rays have comparable clinical applicability to those annotated from CT. Overall, our approach accelerates the process, reduces radiation exposure, enables intraoperative guidance, and significantly improves the practicality of bone models, offering transformative applications in orthopedics.

preprint2026arXiv

TriALS: Triphasic-Aided Liver Lesion Segmentation Benchmark in Non-Contrast CT

Automated segmentation of liver lesions on non-contrast computed tomography (NCCT) is clinically important but fundamentally challenging, particularly in low-resource settings across Africa and Asia where contrast agents are frequently unavailable. Progress has been limited by the absence of annotated NCCT benchmarks. Here we describe the TriALS challenge for automated liver lesion segmentation under contrast-limited conditions, supported by a multi-centre dataset of 150 cases with four-phase CT acquisitions (600 volumes) from Egyptian and Chinese institutions. Algorithms were evaluated on 70 cases from three institutions, including an independent external cohort. The top-performing method achieved a mean venous-phase Dice of 0.754, consistent with human-level performance, yet dropped to 0.57 on NCCT. On external validation, the leading method outperformed off-the-shelf models by up to 28% in Dice on NCCT. Algorithm performance was most strongly predicted by training data scale and pre-training strategy. A cross-year comparison exposed a persistent perceptual barrier on NCCT that scaling pre-training alone cannot overcome. Data, annotations, and code are available at https://github.com/xmed-lab/TriALS.

preprint2025arXiv

OFL-SAM2: Prompt SAM2 with Online Few-shot Learner for Efficient Medical Image Segmentation

The Segment Anything Model 2 (SAM2) has demonstrated remarkable promptable visual segmentation capabilities in video data, showing potential for extension to medical image segmentation (MIS) tasks involving 3D volumes and temporally correlated 2D image sequences. However, adapting SAM2 to MIS presents several challenges, including the need for extensive annotated medical data for fine-tuning and high-quality manual prompts, which are both labor-intensive and require intervention from medical experts. To address these challenges, we introduce OFL-SAM2, a prompt-free SAM2 framework for label-efficient MIS. Our core idea is to leverage limited annotated samples to train a lightweight mapping network that captures medical knowledge and transforms generic image features into target features, thereby providing additional discriminative target representations for each frame and eliminating the need for manual prompts. Crucially, the mapping network supports online parameter update during inference, enhancing the model's generalization across test sequences. Technically, we introduce two key components: (1) an online few-shot learner that trains the mapping network to generate target features using limited data, and (2) an adaptive fusion module that dynamically integrates the target features with the memory-attention features generated by frozen SAM2, leading to accurate and robust target representation. Extensive experiments on three diverse MIS datasets demonstrate that OFL-SAM2 achieves state-of-the-art performance with limited training data.

preprint2024arXiv

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models

The rise of multimodal large language models (MLLMs) has spurred interest in language-based driving tasks. However, existing research typically focuses on limited tasks and often omits key multi-view and temporal information which is crucial for robust autonomous driving. To bridge these gaps, we introduce NuInstruct, a novel dataset with 91K multi-view video-QA pairs across 17 subtasks, where each task demands holistic information (e.g., temporal, multi-view, and spatial), significantly elevating the challenge level. To obtain NuInstruct, we propose a novel SQL-based method to generate instruction-response pairs automatically, which is inspired by the driving logical progression of humans. We further present BEV-InMLLM, an end-to-end method for efficiently deriving instruction-aware Bird's-Eye-View (BEV) features, language-aligned for large language models. BEV-InMLLM integrates multi-view, spatial awareness, and temporal semantics to enhance MLLMs' capabilities on NuInstruct tasks. Moreover, our proposed BEV injection module is a plug-and-play method for existing MLLMs. Our experiments on NuInstruct demonstrate that BEV-InMLLM significantly outperforms existing MLLMs, e.g. around 9% improvement on various tasks. We plan to release our NuInstruct for future research development.

preprint2022arXiv

Calibrating Label Distribution for Class-Imbalanced Barely-Supervised Knee Segmentation

Segmentation of 3D knee MR images is important for the assessment of osteoarthritis. Like other medical data, the volume-wise labeling of knee MR images is expertise-demanded and time-consuming; hence semi-supervised learning (SSL), particularly barely-supervised learning, is highly desirable for training with insufficient labeled data. We observed that the class imbalance problem is severe in the knee MR images as the cartilages only occupy 6% of foreground volumes, and the situation becomes worse without sufficient labeled data. To address the above problem, we present a novel framework for barely-supervised knee segmentation with noisy and imbalanced labels. Our framework leverages label distribution to encourage the network to put more effort into learning cartilage parts. Specifically, we utilize 1.) label quantity distribution for modifying the objective loss function to a class-aware weighted form and 2.) label position distribution for constructing a cropping probability mask to crop more sub-volumes in cartilage areas from both labeled and unlabeled inputs. In addition, we design dual uncertainty-aware sampling supervision to enhance the supervision of low-confident categories for efficient unsupervised learning. Experiments show that our proposed framework brings significant improvements by incorporating the unlabeled data and alleviating the problem of class imbalance. More importantly, our method outperforms the state-of-the-art SSL methods, demonstrating the potential of our framework for the more challenging SSL setting.

preprint2022arXiv

Enhancing Pseudo Label Quality for Semi-Supervised Domain-Generalized Medical Image Segmentation

Generalizing the medical image segmentation algorithms to unseen domains is an important research topic for computer-aided diagnosis and surgery. Most existing methods require a fully labeled dataset in each source domain. Although some researchers developed a semi-supervised domain generalized method, it still requires the domain labels. This paper presents a novel confidence-aware cross pseudo supervision algorithm for semi-supervised domain generalized medical image segmentation. The main goal is to enhance the pseudo label quality for unlabeled images from unknown distributions. To achieve it, we perform the Fourier transformation to learn low-level statistic information across domains and augment the images to incorporate cross-domain information. With these augmentations as perturbations, we feed the input to a confidence-aware cross pseudo supervision network to measure the variance of pseudo labels and regularize the network to learn with more confident pseudo labels. Our method sets new records on public datasets, i.e., M&Ms and SCGM. Notably, without using domain labels, our method surpasses the prior art that even uses domain labels by 11.67% on Dice on M&Ms dataset with 2% labeled data. Code is available at https://github.com/XMed-Lab/EPL_SemiDG.

preprint2022arXiv

Exploring Segment-level Semantics for Online Phase Recognition from Surgical Videos

Automatic surgical phase recognition plays a vital role in robot-assisted surgeries. Existing methods ignored a pivotal problem that surgical phases should be classified by learning segment-level semantics instead of solely relying on frame-wise information. This paper presents a segment-attentive hierarchical consistency network (SAHC) for surgical phase recognition from videos. The key idea is to extract hierarchical high-level semantic-consistent segments and use them to refine the erroneous predictions caused by ambiguous frames. To achieve it, we design a temporal hierarchical network to generate hierarchical high-level segments. Then, we introduce a hierarchical segment-frame attention module to capture relations between the low-level frames and high-level segments. By regularizing the predictions of frames and their corresponding segments via a consistency loss, the network can generate semantic-consistent segments and then rectify the misclassified predictions caused by ambiguous low-level frames. We validate SAHC on two public surgical video datasets, i.e., the M2CAI16 challenge dataset and the Cholec80 dataset. Experimental results show that our method outperforms previous state-of-the-arts and ablation studies prove the effectiveness of our proposed modules. Our code has been released at: https://github.com/xmed-lab/SAHC.

preprint2022arXiv

Learning Shadow Correspondence for Video Shadow Detection

Video shadow detection aims to generate consistent shadow predictions among video frames. However, the current approaches suffer from inconsistent shadow predictions across frames, especially when the illumination and background textures change in a video. We make an observation that the inconsistent predictions are caused by the shadow feature inconsistency, i.e., the features of the same shadow regions show dissimilar proprieties among the nearby frames.In this paper, we present a novel Shadow-Consistent Correspondence method (SC-Cor) to enhance pixel-wise similarity of the specific shadow regions across frames for video shadow detection. Our proposed SC-Cor has three main advantages. Firstly, without requiring the dense pixel-to-pixel correspondence labels, SC-Cor can learn the pixel-wise correspondence across frames in a weakly-supervised manner. Secondly, SC-Cor considers intra-shadow separability, which is robust to the variant textures and illuminations in videos. Finally, SC-Cor is a plug-and-play module that can be easily integrated into existing shadow detectors with no extra computational cost. We further design a new evaluation metric to evaluate the temporal stability of the video shadow detection results. Experimental results show that SC-Cor outperforms the prior state-of-the-art method, by 6.51% on IoU and 3.35% on the newly introduced temporal stability metric.

preprint2022arXiv

Online Easy Example Mining for Weakly-supervised Gland Segmentation from Histology Images

Developing an AI-assisted gland segmentation method from histology images is critical for automatic cancer diagnosis and prognosis; however, the high cost of pixel-level annotations hinders its applications to broader diseases. Existing weakly-supervised semantic segmentation methods in computer vision achieve degenerative results for gland segmentation, since the characteristics and problems of glandular datasets are different from general object datasets. We observe that, unlike natural images, the key problem with histology images is the confusion of classes owning to morphological homogeneity and low color contrast among different tissues. To this end, we propose a novel method Online Easy Example Mining (OEEM) that encourages the network to focus on credible supervision signals rather than noisy signals, therefore mitigating the influence of inevitable false predictions in pseudo-masks. According to the characteristics of glandular datasets, we design a strong framework for gland segmentation. Our results exceed many fully-supervised methods and weakly-supervised methods for gland segmentation over 4.4% and 6.04% at mIoU, respectively. Code is available at https://github.com/xmed-lab/OEEM.

preprint2022arXiv

RSCFed: Random Sampling Consensus Federated Semi-supervised Learning

Federated semi-supervised learning (FSSL) aims to derive a global model by training fully-labeled and fully-unlabeled clients or training partially labeled clients. The existing approaches work well when local clients have independent and identically distributed (IID) data but fail to generalize to a more practical FSSL setting, i.e., Non-IID setting. In this paper, we present a Random Sampling Consensus Federated learning, namely RSCFed, by considering the uneven reliability among models from fully-labeled clients, fully-unlabeled clients or partially labeled clients. Our key motivation is that given models with large deviations from either labeled clients or unlabeled clients, the consensus could be reached by performing random sub-sampling over clients. To achieve it, instead of directly aggregating local models, we first distill several sub-consensus models by random sub-sampling over clients and then aggregating the sub-consensus models to the global model. To enhance the robustness of sub-consensus models, we also develop a novel distance-reweighted model aggregation method. Experimental results show that our method outperforms state-of-the-art methods on three benchmarked datasets, including both natural and medical images. The code is available at https://github.com/XMed-Lab/RSCFed.

preprint2022arXiv

Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

Automatic delineation of organ-at-risk (OAR) and gross-tumor-volume (GTV) is of great significance for radiotherapy planning. However, it is a challenging task to learn powerful representations for accurate delineation under limited pixel (voxel)-wise annotations. Contrastive learning at pixel-level can alleviate the dependency on annotations by learning dense representations from unlabeled data. Recent studies in this direction design various contrastive losses on the feature maps, to yield discriminative features for each pixel in the map. However, pixels in the same map inevitably share semantics to be closer than they actually are, which may affect the discrimination of pixels in the same map and lead to the unfair comparison to pixels in other maps. To address these issues, we propose a separated region-level contrastive learning scheme, namely SepaReg, the core of which is to separate each image into regions and encode each region separately. Specifically, SepaReg comprises two components: a structure-aware image separation (SIS) module and an intra- and inter-organ distillation (IID) module. The SIS is proposed to operate on the image set to rebuild a region set under the guidance of structural information. The inter-organ representation will be learned from this set via typical contrastive losses cross regions. On the other hand, the IID is proposed to tackle the quantity imbalance in the region set as tiny organs may produce fewer regions, by exploiting intra-organ representations. We conducted extensive experiments to evaluate the proposed model on a public dataset and two private datasets. The experimental results demonstrate the effectiveness of the proposed model, consistently achieving better performance than state-of-the-art approaches. Code is available at https://github.com/jcwang123/Separate_CL.

preprint2022arXiv

WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic Segmentation for Lung Adenocarcinoma

Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma (LUAD) is the most common subtype. Exploiting the potential value of the histopathology images can promote precision medicine in oncology. Tissue segmentation is the basic upstream task of histopathology image analysis. Existing deep learning models have achieved superior segmentation performance but require sufficient pixel-level annotations, which is time-consuming and expensive. To enrich the label resources of LUAD and to alleviate the annotation efforts, we organize this challenge WSSS4LUAD to call for the outstanding weakly-supervised semantic segmentation (WSSS) techniques for histopathology images of LUAD. Participants have to design the algorithm to segment tumor epithelial, tumor-associated stroma and normal tissue with only patch-level labels. This challenge includes 10,091 patch-level annotations (the training set) and over 130 million labeled pixels (the validation and test sets), from 87 WSIs (67 from GDPH, 20 from TCGA). All the labels were generated by a pathologist-in-the-loop pipeline with the help of AI models and checked by the label review board. Among 532 registrations, 28 teams submitted the results in the test phase with over 1,000 submissions. Finally, the first place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919). According to the technical reports of the top-tier teams, CAM is still the most popular approach in WSSS. Cutmix data augmentation has been widely adopted to generate more reliable samples. With the success of this challenge, we believe that WSSS approaches with patch-level annotations can be a complement to the traditional pixel annotations while reducing the annotation efforts. The entire dataset has been released to encourage more researches on computational pathology in LUAD and more novel WSSS techniques.

preprint2020arXiv

AGE Challenge: Angle Closure Glaucoma Evaluation in Anterior Segment Optical Coherence Tomography

Angle closure glaucoma (ACG) is a more aggressive disease than open-angle glaucoma, where the abnormal anatomical structures of the anterior chamber angle (ACA) may cause an elevated intraocular pressure and gradually lead to glaucomatous optic neuropathy and eventually to visual impairment and blindness. Anterior Segment Optical Coherence Tomography (AS-OCT) imaging provides a fast and contactless way to discriminate angle closure from open angle. Although many medical image analysis algorithms have been developed for glaucoma diagnosis, only a few studies have focused on AS-OCT imaging. In particular, there is no public AS-OCT dataset available for evaluating the existing methods in a uniform way, which limits progress in the development of automated techniques for angle closure detection and assessment. To address this, we organized the Angle closure Glaucoma Evaluation challenge (AGE), held in conjunction with MICCAI 2019. The AGE challenge consisted of two tasks: scleral spur localization and angle closure classification. For this challenge, we released a large dataset of 4800 annotated AS-OCT images from 199 patients, and also proposed an evaluation framework to benchmark and compare different models. During the AGE challenge, over 200 teams registered online, and more than 1100 results were submitted for online evaluation. Finally, eight teams participated in the onsite challenge. In this paper, we summarize these eight onsite challenge methods and analyze their corresponding results for the two tasks. We further discuss limitations and future directions. In the AGE challenge, the top-performing approach had an average Euclidean Distance of 10 pixels (10um) in scleral spur localization, while in the task of angle closure classification, all the algorithms achieved satisfactory performances, with two best obtaining an accuracy rate of 100%.

preprint2020arXiv

Deep Sinogram Completion with Image Prior for Metal Artifact Reduction in CT Images

Computed tomography (CT) has been widely used for medical diagnosis, assessment, and therapy planning and guidance. In reality, CT images may be affected adversely in the presence of metallic objects, which could lead to severe metal artifacts and influence clinical diagnosis or dose calculation in radiation therapy. In this paper, we propose a generalizable framework for metal artifact reduction (MAR) by simultaneously leveraging the advantages of image domain and sinogram domain-based MAR techniques. We formulate our framework as a sinogram completion problem and train a neural network (SinoNet) to restore the metal-affected projections. To improve the continuity of the completed projections at the boundary of metal trace and thus alleviate new artifacts in the reconstructed CT images, we train another neural network (PriorNet) to generate a good prior image to guide sinogram learning, and further design a novel residual sinogram learning strategy to effectively utilize the prior image information for better sinogram completion. The two networks are jointly trained in an end-to-end fashion with a differentiable forward projection (FP) operation so that the prior image generation and deep sinogram completion procedures can benefit from each other. Finally, the artifact-reduced CT images are reconstructed using the filtered backward projection (FBP) from the completed sinogram. Extensive experiments on simulated and real artifacts data demonstrate that our method produces superior artifact-reduced results while preserving the anatomical structures and outperforms other MAR methods.

preprint2020arXiv

Difficulty-aware Meta-learning for Rare Disease Diagnosis

Rare diseases have extremely low-data regimes, unlike common diseases with large amount of available labeled data. Hence, to train a neural network to classify rare diseases with a few per-class data samples is very challenging, and so far, catches very little attention. In this paper, we present a difficulty-aware meta-learning method to address rare disease classifications and demonstrate its capability to classify dermoscopy images. Our key approach is to first train and construct a meta-learning model from data of common diseases, then adapt the model to perform rare disease classification.To achieve this, we develop the difficulty-aware meta-learning method that dynamically monitors the importance of learning tasks during the meta-optimization stage. To evaluate our method, we use the recent ISIC 2018 skin lesion classification dataset, and show that with only five samples per class, our model can quickly adapt to classify unseen classes by a high AUC of 83.3%. Also, we evaluated several rare disease classification results in the public Dermofit Image Library to demonstrate the potential of our method for real clinical practice.

preprint2020arXiv

Effects of Regional Trade Agreement to Local and Global Trade Purity Relationships

In contrast to the rapid integration of the world economy, many regional trade agreements (RTAs) have also emerged since the early 1990s. This seeming contradiction has encouraged scholars and policy makers to explore the true effects of RTAs, including both regional and global trade relationships. This paper defines synthesized trade resistance and decomposes it into natural and artificial factors. Here, we separate the influence of geographical distance, economic volume, overall increases in transportation and labor costs and use the expectation maximization algorithm to optimize the parameters and quantify the trade purity indicator, which describes the true global trade environment and relationships among countries. This indicates that although global and most regional trade relations gradually deteriorated during the period 2007-2017, RTAs generate trade relations among members, especially contributing to the relative prosperity of EU and NAFTA countries. In addition, we apply the network to reflect the purity of the trade relations among countries. The effects of RTAs can be analyzed by comparing typical trade unions and trade communities, which are presented using an empirical network structure. This analysis shows that the community structure is quite consistent with some trade unions, and the representative RTAs constitute the core structure of international trade network. However, the role of trade unions has weakened, and multilateral trade liberalization has accelerated in the past decade. This means that more countries have recently tended to expand their trading partners outside of these unions rather than limit their trading activities to RTAs.

preprint2020arXiv

Flexible Modeling of Hurdle Conway-Maxwell-Poisson Distributions with Application to Mining Injuries

While the hurdle Poisson regression is a popular class of models for count data with excessive zeros, the link function in the binary component may be unsuitable for highly imbalanced cases. Ordinary Poisson regression is unable to handle the presence of dispersion. In this paper, we introduce Conway-Maxwell-Poisson (CMP) distribution and integrate use of flexible skewed Weibull link functions as better alternative. We take a fully Bayesian approach to draw inference from the underlying models to better explain skewness and quantify dispersion, with Deviance Information Criteria (DIC) used for model selection. For empirical investigation, we analyze mining injury data for period 2013-2016 from the U.S. Mine Safety and Health Administration (MSHA). The risk factors describing proportions of employee hours spent in each type of mining work are compositional data; the probabilistic principal components analysis (PPCA) is deployed to deal with such covariates. The hurdle CMP regression is additionally adjusted for exposure, measured by the total employee working hours, to make inference on rate of mining injuries; we tested its competitiveness against other models. This can be used as predictive model in the mining workplace to identify features that increase the risk of injuries so that prevention can be implemented.

preprint2020arXiv

Revisiting Metric Learning for Few-Shot Image Classification

The goal of few-shot learning is to recognize new visual concepts with just a few amount of labeled samples in each class. Recent effective metric-based few-shot approaches employ neural networks to learn a feature similarity comparison between query and support examples. However, the importance of feature embedding, i.e., exploring the relationship among training samples, is neglected. In this work, we present a simple yet powerful baseline for few-shot classification by emphasizing the importance of feature embedding. Specifically, we revisit the classical triplet network from deep metric learning, and extend it into a deep K-tuplet network for few-shot learning, utilizing the relationship among the input samples to learn a general representation learning via episode-training. Once trained, our network is able to extract discriminative features for unseen novel categories and can be seamlessly incorporated with a non-linear distance metric function to facilitate the few-shot classification. Our result on the miniImageNet benchmark outperforms other metric-based few-shot classification methods. More importantly, when evaluated on completely different datasets (Caltech-101, CUB-200, Stanford Dogs and Cars) using the model trained with miniImageNet, our method significantly outperforms prior methods, demonstrating its superior capability to generalize to unseen classes.

preprint2020arXiv

Self-supervised Feature Learning via Exploiting Multi-modal Data for Retinal Disease Diagnosis

The automatic diagnosis of various retinal diseases from fundus images is important to support clinical decision-making. However, developing such automatic solutions is challenging due to the requirement of a large amount of human-annotated data. Recently, unsupervised/self-supervised feature learning techniques receive a lot of attention, as they do not need massive annotations. Most of the current self-supervised methods are analyzed with single imaging modality and there is no method currently utilize multi-modal images for better results. Considering that the diagnostics of various vitreoretinal diseases can greatly benefit from another imaging modality, e.g., FFA, this paper presents a novel self-supervised feature learning method by effectively exploiting multi-modal data for retinal disease diagnosis. To achieve this, we first synthesize the corresponding FFA modality and then formulate a patient feature-based softmax embedding objective. Our objective learns both modality-invariant features and patient-similarity features. Through this mechanism, the neural network captures the semantically shared information across different modalities and the apparent visual similarity between patients. We evaluate our method on two public benchmark datasets for retinal disease diagnosis. The experimental results demonstrate that our method clearly outperforms other self-supervised feature learning methods and is comparable to the supervised baseline.

preprint2020arXiv

Structure and Dynamic of Global Population Migration Network

Cross-border migration brings economic and cultural impacts to the origin and destination, and is also a key to reflect the international relations of related countries. In fact, the migration relationships of countries are complex and multilateral, but most traditional migration models are bilateral. Network theories could provide a better description of global migration to show the structure and statistical characteristics more clearly. Based on the estimated migration data and disparity filter algorithm, the networks describing the global multilateral migration relationships has been extracted among 200 countries over fifty years. The results show that the global migration networks during 1960-2015 exhibit a clustering and disassortative feature, implying globalized and multipolarized changes of migration during these years. The networks were embed into a Poincaré disk, yielding a typical and hierarchical "core-periphery" structure which, associated with angular density distribution, has been used to describe the "multi-centering" trend since 1990s. Analysis on correlation and evolution of communities indicates the stability of most communities yet some structural changes still exist since 1990s, which reflect that the important historical events are contributable to regional and even global migration patterns.

preprint2020arXiv

The 'Letter' Distribution in the Chinese Language

Corpus-based statistical analysis plays a significant role in linguistic research, and ample evidence has shown that different languages exhibit some common laws. Studies have found that letters in some alphabetic writing languages have strikingly similar statistical usage frequency distributions. Does this hold for Chinese, which employs ideogram writing? We obtained letter frequency data of some alphabetic writing languages and found the common law of the letter distributions. In addition, we collected Chinese literature corpora for different historical periods from the Tang Dynasty to the present, and we dismantled the Chinese written language into three kinds of basic particles: characters, strokes and constructive parts. The results of the statistical analysis showed that, in different historical periods, the intensity of the use of basic particles in Chinese writing varied, but the form of the distribution was consistent. In particular, the distributions of the Chinese constructive parts are certainly consistent with those alphabetic writing languages. This study provides new evidence of the consistency of human languages.

preprint2020arXiv

Transformation Consistent Self-ensembling Model for Semi-supervised Medical Image Segmentation

Deep convolutional neural networks have achieved remarkable progress on a variety of medical image computing tasks. A common problem when applying supervised deep learning methods to medical images is the lack of labeled data, which is very expensive and time-consuming to be collected. In this paper, we present a novel semi-supervised method for medical image segmentation, where the network is optimized by the weighted combination of a common supervised loss for labeled inputs only and a regularization loss for both labeled and unlabeled data. To utilize the unlabeled data, our method encourages the consistent predictions of the network-in-training for the same input under different regularizations. Aiming for the semi-supervised segmentation problem, we enhance the effect of regularization for pixel-level predictions by introducing a transformation, including rotation and flipping, consistent scheme in our self-ensembling model. With the aim of semi-supervised segmentation tasks, we introduce a transformation consistent strategy in our self-ensembling model to enhance the regularization effect for pixel-level predictions. We have extensively validated the proposed semi-supervised method on three typical yet challenging medical image segmentation tasks: (i) skin lesion segmentation from dermoscopy images on International Skin Imaging Collaboration (ISIC) 2017 dataset, (ii) optic disc segmentation from fundus images on Retinal Fundus Glaucoma Challenge (REFUGE) dataset, and (iii) liver segmentation from volumetric CT scans on Liver Tumor Segmentation Challenge (LiTS) dataset. Compared to the state-of-the-arts, our proposed method shows superior segmentation performance on challenging 2D/3D medical images, demonstrating the effectiveness of our semi-supervised method for medical image segmentation.

preprint2016arXiv

Extremal functions for singular Trudinger-Moser inequalities in the entire Euclidean space

In a previous work (Int. Math. Res. Notices 13 (2010) 2394-2426), Adimurthi-Yang proved a singular Trudinger-Moser inequality in the entire Euclidean space $\mathbb{R}^N$ $(N\geq 2)$. Precisely, if $0\leq β<1$ and $0<γ\leq1-β$, then there holds for any $τ>0$, $$\sup_{u\in W^{1,N}(\mathbb{R}^N),\,\int_{\mathbb{R}^N}(|\nabla u|^N+τ|u|^N)dx\leq 1}\int_{\mathbb{R}^N}\frac{1}{|x|^{Nβ}}\left(e^{α_Nγ|u|^{\frac{N}{N-1}}}-\sum_{k=0}^{N-2}\frac{α_N^kγ^k|u|^{\frac{kN}{N-1}}} {k!}\right)dx<\infty,$$ where $α_N=Nω_{N-1}^{1/(N-1)}$ and $ω_{N-1}$ is the area of the unit sphere in $\mathbb{R}^N$. The above inequality is sharp in the sense that if $γ>1-β$, all integrals are still finite but the supremum is infinity. In this paper, we concern extremal functions for these singular inequalities. The regular case $β=0$ has been considered by Li-Ruf (Indiana Univ. Math. J. 57 (2008) 451-480) and Ishiwata (Math. Ann. 351 (2011) 781-804). We shall investigate the singular case $0<β<1$ and prove that for all $τ>0$, $0<β<1$ and $0<γ\leq 1-β$, extremal functions for the above inequalities exist. The proof is based on blow-up analysis.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computer Vision eess.IV Artificial Intelligence Computation and Language Machine Learning Applications math.AP physics.soc-ph q-fin.GN

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2512.24861:author:3:xiaomeng-li

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.01402:author:3:xiaomeng-li

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.06537:author:11:xiaomeng-li

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.16572:author:65:xiaomeng-li

Imported May 20, 2026Synced May 20, 2026

5 works

Lequan Yu

Researcher

Lequan Yu contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Lei Xing

Researcher

Lei Xing contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Xinpeng Ding

Researcher

Xinpeng Ding contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Chi-Wing Fu

Researcher

Chi-Wing Fu contributes to research discovery and scholarly infrastructure.

Open to collaborate

Xiaomeng Li

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis

Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

MedHorizon: Towards Long-context Medical Video Understanding in the Wild

Real-Time Reconstruction of 3D Bone Models via Very-Low-Dose Protocols

TriALS: Triphasic-Aided Liver Lesion Segmentation Benchmark in Non-Contrast CT

OFL-SAM2: Prompt SAM2 with Online Few-shot Learner for Efficient Medical Image Segmentation

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models

Calibrating Label Distribution for Class-Imbalanced Barely-Supervised Knee Segmentation

Enhancing Pseudo Label Quality for Semi-Supervised Domain-Generalized Medical Image Segmentation

Exploring Segment-level Semantics for Online Phase Recognition from Surgical Videos

Learning Shadow Correspondence for Video Shadow Detection

Online Easy Example Mining for Weakly-supervised Gland Segmentation from Histology Images

RSCFed: Random Sampling Consensus Federated Semi-supervised Learning

Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic Segmentation for Lung Adenocarcinoma

AGE Challenge: Angle Closure Glaucoma Evaluation in Anterior Segment Optical Coherence Tomography

Deep Sinogram Completion with Image Prior for Metal Artifact Reduction in CT Images

Difficulty-aware Meta-learning for Rare Disease Diagnosis

Effects of Regional Trade Agreement to Local and Global Trade Purity Relationships

Flexible Modeling of Hurdle Conway-Maxwell-Poisson Distributions with Application to Mining Injuries

Revisiting Metric Learning for Few-Shot Image Classification

Self-supervised Feature Learning via Exploiting Multi-modal Data for Retinal Disease Diagnosis

Structure and Dynamic of Global Population Migration Network

The 'Letter' Distribution in the Chinese Language

Transformation Consistent Self-ensembling Model for Semi-supervised Medical Image Segmentation

Extremal functions for singular Trudinger-Moser inequalities in the entire Euclidean space