Source author record

Yuan Cheng

Yuan Cheng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence cond-mat.mtrl-sci Machine Learning Information Retrieval Computation Computation and Language Computer Science and Game Theory cond-mat.mes-hall econ.GN eess.IV Human-Computer Interaction math.DS physics.soc-ph q-fin.EC q-fin.TR

Catalog footprint

What is connected

21works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging

Self-supervised pre-training methods in medical imaging typically treat each individual as an isolated instance, learning representations through augmentation-based objectives or masked reconstruction. They often do not adequately capitalize on a key characteristic of physiological features: anatomical structures maintain consistent spatial relationships across individuals (instances), such as the thalamus being medial to the basal ganglia, regardless of variations in brain size, shape, or pathology. We propose leveraging this cross-instance topological consistency as a supervisory signal. The challenge arises from the inherent variability in medical imaging, which can differ significantly across instances and modalities. To tackle this, we focus on two alignment regimes. (i) Intra-instance: with pixel-level correspondences available, a cross-modal triplet objective explicitly preserves local neighborhood topology. (ii) Inter-instance: without such supervision, we derive pseudo-correspondences to control partial neighborhood alignment and prevent topology collapse across modalities. We validate our approach across 7 downstream multi-modal tasks, achieving average improvements of 1.1% and 5.94% in segmentation and classification tasks, respectively, and demonstrating significantly better robustness when modalities are missing at test time.

preprint2026arXiv

FLAG: Foundation model representation with Latent diffusion Alignment via Graph for spatial gene expression prediction

Predicting spatial gene expression from routine H\&E enables large-scale molecular profiling, yet current models treat this as isolated pointwise tasks, thereby overlooking essential biological structures like gene coordination and spatial distribution. To preserve these relationships, we introduce \textbf{FLAG}, a diffusion-based framework that redefines this task as structured distribution modeling. At the same time, we identify the critical \textbf{Gene Dimension Curse}, where joint modeling gene expression and their spatial interactions fail in high-dimensional spaces, and FLAG solves this challenge by integrating a spatial graph encoder for topological consistency and utilizing Gene Foundation Model (GFM) alignment for gene-gene fidelity in the generation process. To rigorously assess model performance, we propose a set of novel structural evaluation metrics, including Gene Structural Correlation (\textbf{GSC}) and Spatial Structural Correlation (\textbf{SSC}). Our experiments demonstrate that FLAG is highly competitive in traditional accuracy (PCC/MSE) while achieving significantly enhanced structural fidelity in capturing both gene-gene and gene-spatial relationships. The code is available at https://github.com/darkflash03/FLAG.

preprint2026arXiv

UST-Hand: An Uncertainty-aware Spatiotemporal Point Cloud Interaction Network for 3D Self-supervised Hand Pose Estimation

Manually annotating accurate 3D hand poses is extremely time-consuming and labor-intensive. Existing self-supervised hand pose estimation methods leverage the discrepancy between input images and rendered outputs, or multi-view consistency constraints, as the driving force to optimize networks and progressively refine pose accuracy. However, these methods are highly susceptible to noisy pseudo-labels and overlook the importance of fully exploiting fine-grained spatial correlations, which undermines the stability of model training. To address these issues, we propose UST-Hand, a self-supervised learning framework that estimates uncertainty distribution of hand pose and constructs a probabilistic point cloud feature space, which enables the complex spatiotemporal relationship modeling. UST-Hand employs a conditional normalizing flow model to capture hand pose distributions and samples diverse hypotheses, facilitating robust learning under noisy pseudo-labels supervision with enhanced stability. These multi-hypothesis are mapped to a unified probabilistic 3D point cloud space for multi-view and temporal feature interaction, comprehensively exploring hand motion patterns and fine-grained spatial correlations. Extensive experiments on three challenging datasets demonstrate that UST-Hand achieves state-of-the-art performance, outperforming existing self-supervised methods by up to 37.8% in Mean Per Vertex Position Error (MPVPE).

preprint2025arXiv

Tracing the Heart's Pathways: ECG Representation Learning from a Cardiac Conduction Perspective

The multi-lead electrocardiogram (ECG) stands as a cornerstone of cardiac diagnosis. Recent strides in electrocardiogram self-supervised learning (eSSL) have brightened prospects for enhancing representation learning without relying on high-quality annotations. Yet earlier eSSL methods suffer a key limitation: they focus on consistent patterns across leads and beats, overlooking the inherent differences in heartbeats rooted in cardiac conduction processes, while subtle but significant variations carry unique physiological signatures. Moreover, representation learning for ECG analysis should align with ECG diagnostic guidelines, which progress from individual heartbeats to single leads and ultimately to lead combinations. This sequential logic, however, is often neglected when applying pre-trained models to downstream tasks. To address these gaps, we propose CLEAR-HUG, a two-stage framework designed to capture subtle variations in cardiac conduction across leads while adhering to ECG diagnostic guidelines. In the first stage, we introduce an eSSL model termed Conduction-LEAd Reconstructor (CLEAR), which captures both specific variations and general commonalities across heartbeats. Treating each heartbeat as a distinct entity, CLEAR employs a simple yet effective sparse attention mechanism to reconstruct signals without interference from other heartbeats. In the second stage, we implement a Hierarchical lead-Unified Group head (HUG) for disease diagnosis, mirroring clinical workflow. Experimental results across six tasks show a 6.84% improvement, validating the effectiveness of CLEAR-HUG. This highlights its ability to enhance representations of cardiac conduction and align patterns with expert diagnostic guidelines.

preprint2022arXiv

3D Pose Based Feedback for Physical Exercises

Unsupervised self-rehabilitation exercises and physical training can cause serious injuries if performed incorrectly. We introduce a learning-based framework that identifies the mistakes made by a user and proposes corrective measures for easier and safer individual training. Our framework does not rely on hard-coded, heuristic rules. Instead, it learns them from data, which facilitates its adaptation to specific user needs. To this end, we use a Graph Convolutional Network (GCN) architecture acting on the user's pose sequence to model the relationship between the body joints trajectories. To evaluate our approach, we introduce a dataset with 3 different physical exercises. Our approach yields 90.9% mistake identification accuracy and successfully corrects 94.2% of the mistakes.

preprint2022arXiv

A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection

In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset. Compared with existing copy detection datasets restricted by either video-level annotation or small-scale, VCSL not only has two orders of magnitude more segment-level labelled data, with 160k realistic video copy pairs containing more than 280k localized copied segment pairs, but also covers a variety of video categories and a wide range of video duration. All the copied segments inside each collected video pair are manually extracted and accompanied by precisely annotated starting and ending timestamps. Alongside the dataset, we also propose a novel evaluation protocol that better measures the prediction accuracy of copy overlapping segments between a video pair and shows improved adaptability in different scenarios. By benchmarking several baseline and state-of-the-art segment-level video copy detection methods with the proposed dataset and evaluation metric, we provide a comprehensive analysis that uncovers the strengths and weaknesses of current approaches, hoping to open up promising directions for future works. The VCSL dataset, metric and benchmark codes are all publicly available at https://github.com/alipay/VCSL.

preprint2022arXiv

Hyperspectral Imaging for cherry tomato

Cherry tomato (Solanum Lycopersicum) is popular with consumers over the world due to its special flavor. Soluble solids content (SSC) and firmness are two key metrics for evaluating the product qualities. In this work, we develop non-destructive testing techniques for SSC and fruit firmness based on hyperspectral images and a corresponding deep learning regression model. Hyperspectral reflectance images of over 200 tomato fruits are derived with spectrum ranging from 400 to 1000 nm. The acquired hyperspectral images are corrected and the spectral information is extracted. A novel one-dimensional(1D) convolutional ResNet (Con1dResNet) based regression model is prosed and compared with the state of art techniques. Experimental results show that, with a relatively large number of samples our technique is 26.4\% better than state of art technique for SSC and 33.7\% for firmness. The results of this study indicate the application potential of hyperspectral imaging technique in the SSC and firmness detection, which provides a new option for non-destructive testing of cherry tomato fruit quality in the future.

preprint2022arXiv

Multi-modal Emotion Estimation for in-the-wild Videos

In this paper, we briefly introduce our submission to the Valence-Arousal Estimation Challenge of the 3rd Affective Behavior Analysis in-the-wild (ABAW) competition. Our method utilizes the multi-modal information, i.e., the visual and audio information, and employs a temporal encoder to model the temporal context in the videos. Besides, a smooth processor is applied to get more reasonable predictions, and a model ensemble strategy is used to improve the performance of our proposed method. The experiment results show that our method achieves 65.55% ccc for valence and 70.88% ccc for arousal on the validation set of the Aff-Wild2 dataset, which prove the effectiveness of our proposed method.

preprint2022arXiv

Solutions for Fine-grained and Long-tailed Snake Species Recognition in SnakeCLEF 2022

Automatic snake species recognition is important because it has vast potential to help lower deaths and disabilities caused by snakebites. We introduce our solution in SnakeCLEF 2022 for fine-grained snake species recognition on a heavy long-tailed class distribution. First, a network architecture is designed to extract and fuse features from multiple modalities, i.e. photograph from visual modality and geographic locality information from language modality. Then, logit adjustment based methods are studied to relieve the impact caused by the severe class imbalance. Next, a combination of supervised and self-supervised learning method is proposed to make full use of the dataset, including both labeled training data and unlabeled testing data. Finally, post processing strategies, such as multi-scale and multi-crop test-time-augmentation, location filtering and model ensemble, are employed for better performance. With an ensemble of several different models, a private score 82.65%, ranking the 3rd, is achieved on the final leaderboard.

preprint2022arXiv

Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation

Scene Graph Generation, which generally follows a regular encoder-decoder pipeline, aims to first encode the visual contents within the given image and then parse them into a compact summary graph. Existing SGG approaches generally not only neglect the insufficient modality fusion between vision and language, but also fail to provide informative predicates due to the biased relationship predictions, leading SGG far from practical. Towards this end, in this paper, we first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction, to serve as the encoder. We then devise an innovative Group Collaborative Learning strategy to optimize the decoder. Particularly, based upon the observation that the recognition capability of one classifier is limited towards an extremely unbalanced dataset, we first deploy a group of classifiers that are expert in distinguishing different subsets of classes, and then cooperatively optimize them from two aspects to promote the unbiased SGG. Experiments conducted on VG and GQA datasets demonstrate that, we not only establish a new state-of-the-art in the unbiased metric, but also nearly double the performance compared with two baselines.

preprint2021arXiv

CRSLab: An Open-Source Toolkit for Building Conversational Recommender System

In recent years, conversational recommender system (CRS) has received much attention in the research community. However, existing studies on CRS vary in scenarios, goals and techniques, lacking unified, standardized implementation or comparison. To tackle this challenge, we propose an open-source CRS toolkit CRSLab, which provides a unified and extensible framework with highly-decoupled modules to develop CRSs. Based on this framework, we collect 6 commonly-used human-annotated CRS datasets and implement 18 models that include recent techniques such as graph neural network and pre-training models. Besides, our toolkit provides a series of automatic evaluation protocols and a human-machine interaction interface to test and compare different CRS methods. The project and documents are released at https://github.com/RUCAIBox/CRSLab.

preprint2020arXiv

HOTCAKE: Higher Order Tucker Articulated Kernels for Deeper CNN Compression

The emerging edge computing has promoted immense interests in compacting a neural network without sacrificing much accuracy. In this regard, low-rank tensor decomposition constitutes a powerful tool to compress convolutional neural networks (CNNs) by decomposing the 4-way kernel tensor into multi-stage smaller ones. Building on top of Tucker-2 decomposition, we propose a generalized Higher Order Tucker Articulated Kernels (HOTCAKE) scheme comprising four steps: input channel decomposition, guided Tucker rank selection, higher order Tucker decomposition and fine-tuning. By subjecting each CONV layer to HOTCAKE, a highly compressed CNN model with graceful accuracy trade-off is obtained. Experiments show HOTCAKE can compress even pre-compressed models and produce state-of-the-art lightweight networks.

preprint2020arXiv

Magnon-magnon interaction and magnon relaxation time in ferromagnetic Cr2Ge2Te6 monolayer

Despite the intense amount of attention and huge potential of two-dimensional (2D) magnets for applications in novel magnetic, magneto-optical, magneto-thermal and magneto-electronic devices, there has yet to be a robust strategy developed to systematically understand magnon-magnon (MMI) interactions at finite temperature. In this paper, we present a first-principles theoretical method to introduce the finite temperature magnon-magnon interaction into Heisenberg Hamiltonian through a nonlinear correction energy. The Wick theorem is used to decouple the four-magnon operators to two-magnon order. We demonstrate the capabilities of this method by studying the strength of MMI in Cr2Ge2Te6 (CGT) monolayer. The spin wave spectrum at finite temperature and the time-dependent spin autocorrelation function are explored. It is found that the magnon relaxation time due to magnon-magnon scattering increases with temperature because of the reduction in magnon energy, while decreases with wavevector and external magnetic field. Our results provide a new insight to understand the magnon damping and energy dissipation in two-dimensional ferromagnetic materials.

preprint2020arXiv

Material Platforms for Defect Qubits and Single Photon Emitters

Quantum technology has grown out of quantum information theory and now provides a valuable tool that researchers from numerous fields can add to their toolbox of research methods. To date, various systems have been exploited to promote the application of quantum information processing. The systems that can be used for quantum technology include superconducting circuits, ultra-cold atoms, trapped ions, semiconductor quantum dots, and solid-state spins and emitters. In this review, we will discuss the state of the art on material platforms for spin-based quantum technology, with a focus on the progress in solid-state spins and emitters in several leading host materials, including diamond, silicon carbide, boron nitride, silicon, two-dimensional semiconductors, and other materials. We will highlight how first-principles calculations can serve as an exceptionally robust tool for finding the novel defect qubits and single photon emitters in solids, through detailed predictions of the electronic, magnetic and optical properties.

preprint2019arXiv

Career Choice as an Extended Spatial Evolutionary Public Goods Game

We propose an extended spatial evolutionary public goods game (SEPGG) model to study the dynamics of individual career choice and the corresponding social output. Based on the social value orientation theory, we categorized two classes of work, namely the public work if it serves public interests, and the private work if it serves personal interests. Under the context of SEPGG, choosing public work is to cooperate and choosing private work is to defect. We then investigate the effects of employee productivity, human capital and external subsidies on individual career choices of the two work types, as well as the overall social welfare. From simulation results, we found that when employee productivity of public work is low, people are more willing to enter the private sector. Although this will make both the effort level and human capital of individuals doing private work higher than those engaging in public work, the total outcome of the private sector is still lower than that of the public sector provided a low level of public subsidies. When the employee productivity is higher for public work, a certain amount of subsidy can greatly improve system output. On the contrary, when the employee productivity of public work is low, provisions of subsidy to the public sector can result in a decline in social output.

preprint2017arXiv

Event-Radar: Real-time Local Event Detection System for Geo-Tagged Tweet Streams

The local event detection is to use posting messages with geotags on social networks to reveal the related ongoing events and their locations. Recent studies have demonstrated that the geo-tagged tweet stream serves as an unprecedentedly valuable source for local event detection. Nevertheless, how to effectively extract local events from large geo-tagged tweet streams in real time remains challenging. A robust and efficient cloud-based real-time local event detection software system would benefit various aspects in the real-life society, from shopping recommendation for customer service providers to disaster alarming for emergency departments. We use the preliminary research GeoBurst as a starting point, which proposed a novel method to detect local events. GeoBurst+ leverages a novel cross-modal authority measure to identify several pivots in the query window. Such pivots reveal different geo-topical activities and naturally attract related tweets to form candidate events. It further summarises the continuous stream and compares the candidates against the historical summaries to pinpoint truly interesting local events. We mainly implement a website demonstration system Event-Radar with an improved algorithm to show the real-time local events online for public interests. Better still, as the query window shifts, our method can update the event list with little time cost, thus achieving continuous monitoring of the stream.

preprint2016arXiv

Diamond nanothread as a new reinforcement for nanocomposites

This work explores the application of a new one-dimensional carbon nanomaterial, the diamond nanothread (DNT), as a reinforcement for nanocomposites. Owing to the existence of Stone-Wales transformation defects, the DNT intrinsically possesses irregular surfaces, which is expected to enhance the non-covalent interfacial load transfer. Through a series of in silico pull-out studies of the DNT in polyethylene (PE) matrix, we found that the load transfer between DNT and PE matrix is dominated by the non-covalent interactions, in particular the van der Waals interactions. Although the hydrogenated surface of the DNT reduces the strength of the van der Waals interactions at the interface, the irregular surface of the DNT can compensate for the weak bonds. These factors lead to an interfacial shear strength of the DNT/PE interface comparable with that of the carbon nanotube (CNT)/PE interface. Our results show that the DNT/PE interfacial shear strength remains high even as the number of Stone-Wales transformation defects decreases. It can be enhanced further by increasing the PE density or introduction of functional groups to the DNT, both of which greatly increase the non-covalent interactions.

preprint2016arXiv

Equity Market Impact Modeling: an Empirical Analysis for Chinese Market

Market impact has become a subject of increasing concern among academics and industry experts. We put forward a price impact model which considers the heteroscedasticity of price in the time dimension and dependency between permanent impact and temporary impact. We discuss and derive the extremum of the expectation of permanent impact and realized impact by constructing several special trading trajectories. Given our use of a large trade and quote tick records of 17,213,238,343 compiled from the Chinese stock market, the model assessment ultimately suggest that our model is better than Almgren's model. Interestingly, the result of random effect analysis indicates the parameter $α$, which is the exponent of the impact function, is a constant with a value of around 0.7 across all stocks. Our model and empirical result would give academia some insight of mechanism of Chinese market, and can be applied to algorithm trading.

preprint2016arXiv

From Brittle to Ductile: A Structure Dependent Ductility of Diamond Nanothread

As a potential building block for the next generation of devices or multifunctional materials that are spreading almost every technology sector, one-dimensional (1D) carbon nanomaterial has received intensive research interests. Recently, a new ultra-thin diamond nanothread (DNT) has joined this palette, which is a 1D structure with poly-benzene sections connected by Stone-Wales (SW) transformation defects. Using large-scale molecular dynamics simulations, we found that this sp3 bonded DNT can transit from a brittle to a ductile characteristic by varying the length of the poly-benzene sections, suggesting that DNT possesses entirely different mechanical responses than other 1D carbon allotropies. Analogously, the SW defects behave like a grain boundary that interrupts the consistency of the poly-benzene sections. For a DNT with a fixed length, the yield strength fluctuates in the vicinity of a certain value and is independent of the "grain size". On the other hand, both yield strength and yield strain show a clear dependence on the total length of DNT, which is due to the fact that the failure of the DNT is dominated by the SW defects. Its highly tunable ductility together with its ultra-light density and high Young's modulus makes diamond nanothread ideal for creation of extremely strong three-dimensional nano-architectures.

preprint2014arXiv

A McKean optimal transportation perspective on Feynman-Kac formulae with application to data assimilation

Data assimilation is the task of combining mathematical models with observational data. From a mathematical perspective data assimilation leads to Bayesian inference problems which can be formulated in terms of Feynman-Kac formulae. In this paper we focus on the sequential nature of many data assimilation problems and their numerical implementation in form of Monte Carlo methods. We demonstrate how sequential data assimilation can be interpreted as time-dependent Markov processes, which is often referred to as the McKean approach to Feynman-Kac formulae. It is shown that the McKean approach has very natural links to coupling of random variables and optimal transportation. This link allows one to propose novel sequential Monte Carlo methods/particle filters. In combination with localization these novel algorithms have the potential of beating the curse of dimensionality, which has prevented particle filters from being applied to spatially extended systems.

preprint2012arXiv

A Multivariate Graphical Stochastic Volatility Model

The Gaussian Graphical Model (GGM) is a popular tool for incorporating sparsity into joint multivariate distributions. The G-Wishart distribution, a conjugate prior for precision matrices satisfying general GGM constraints, has now been in existence for over a decade. However, due to the lack of a direct sampler, its use has been limited in hierarchical Bayesian contexts, relegating mixing over the class of GGMs mostly to situations involving standard Gaussian likelihoods. Recent work, however, has developed methods that couple model and parameter moves, first through reversible jump methods and later by direct evaluation of conditional Bayes factors and subsequent resampling. Further, methods for avoiding prior normalizing constant calculations--a serious bottleneck and source of numerical instability--have been proposed. We review and clarify these developments and then propose a new methodology for GGM comparison that blends many recent themes. Theoretical developments and computational timing experiments reveal an algorithm that has limited computational demands and dramatically improves on computing times of existing methods. We conclude by developing a parsimonious multivariate stochastic volatility model that embeds GGM uncertainty in a larger hierarchical framework. The method is shown to be capable of adapting to the extreme swings in market volatility experienced in 2008 after the collapse of Lehman Brothers, offering considerable improvement in posterior predictive distribution calibration.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2512.24002:author:10:yuan-cheng

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.18055:author:8:yuan-cheng

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.14654:author:10:yuan-cheng

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.17742:author:6:yuan-cheng

Imported May 20, 2026Synced May 20, 2026

3 works

Chen Jiang

Researcher

Chen Jiang contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Gang Zhang

Researcher

Gang Zhang contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Tan Pan

Researcher

Tan Pan contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Furong Xu

Researcher

Furong Xu contributes to research discovery and scholarly infrastructure.

Open to collaborate

Yuan Cheng

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging

FLAG: Foundation model representation with Latent diffusion Alignment via Graph for spatial gene expression prediction

UST-Hand: An Uncertainty-aware Spatiotemporal Point Cloud Interaction Network for 3D Self-supervised Hand Pose Estimation

Tracing the Heart's Pathways: ECG Representation Learning from a Cardiac Conduction Perspective

3D Pose Based Feedback for Physical Exercises

A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection

Hyperspectral Imaging for cherry tomato

Multi-modal Emotion Estimation for in-the-wild Videos

Solutions for Fine-grained and Long-tailed Snake Species Recognition in SnakeCLEF 2022

Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation

CRSLab: An Open-Source Toolkit for Building Conversational Recommender System

HOTCAKE: Higher Order Tucker Articulated Kernels for Deeper CNN Compression

Magnon-magnon interaction and magnon relaxation time in ferromagnetic Cr2Ge2Te6 monolayer

Material Platforms for Defect Qubits and Single Photon Emitters

Career Choice as an Extended Spatial Evolutionary Public Goods Game

Event-Radar: Real-time Local Event Detection System for Geo-Tagged Tweet Streams

Diamond nanothread as a new reinforcement for nanocomposites

Equity Market Impact Modeling: an Empirical Analysis for Chinese Market

From Brittle to Ductile: A Structure Dependent Ductility of Diamond Nanothread

A McKean optimal transportation perspective on Feynman-Kac formulae with application to data assimilation

A Multivariate Graphical Stochastic Volatility Model