Researcher profile

Jun Cheng

Jun Cheng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
21works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2026arXiv

Circular Phase Representation and Geometry-Aware Optimization for Ptychographic Image Reconstruction

Traditional iterative reconstruction methods are accurate but computationally expensive, limiting their use in high-throughput and real-time ptychography. Recent deep learning approaches improve speed, but often predict phase as a Euclidean scalar despite its $2π$ periodicity, which can introduce wrapping artifacts, discontinuities at $\pmπ$, and a mismatch between the loss and the underlying signal geometry. We present a deep learning framework for ptychographic reconstruction that models phase on the unit circle using cosine and sine components. Phase error is optimized with a differentiable geodesic loss, which avoids branch-cut discontinuities and provides bounded gradients. The network further incorporates saturation-aware dual-gain input scaling, parallel encoder branches, and three decoders for amplitude, cosine, and sine prediction, together with a composite loss that promotes circular consistency and structural fidelity. Experiments on synthetic and experimental datasets show consistent improvements in both amplitude and phase reconstruction over existing deep learning methods. Frequency-domain analysis further shows better preservation of mid- and high-frequency phase content. The proposed method also provides substantial speedup over iterative solvers while maintaining physically consistent reconstructions.

preprint2026arXiv

Degradation-Aware Adaptive Context Gating for Unified Image Restoration

Unified image restoration using a single model often faces task interference due to diverse degradations. To address this, we propose DACG-IR (Degradation-Aware Adaptive Context Gating), which enables explicit perception of degradation characteristics to dynamically modulate feature representations. Our method constructs degradation-aware contextual representations from the input to modulate attention distribution, frequency-domain features, and feature aggregation. Specifically, a lightweight multi-scale degradation-aware module extracts coarse degradation information and generates layer-wise prompts. These prompts guide attention temperature and output gating in encoder and decoder blocks for adaptive feature extraction. Additionally, a spatial-channel dual-gated adaptive fusion mechanism refines encoder features, suppressing noise propagation from shallow to deep layers. This design effectively suppresses degradation-induced noise while preserving informative structures. Experiments show DACG-IR outperforms state-of-the-art methods in single-task, all-in-one, adverse weather removal, and composite degradation settings. Code: https://github.com/HlHomes/DACG-IR-code

preprint2026arXiv

Plasticine: A Traceable Diffusion Model for Medical Image Translation

Domain gaps arising from variations in imaging devices and population distributions pose significant challenges for machine learning in medical image analysis. Existing image-to-image translation methods primarily aim to learn mappings between domains, often generating diverse synthetic data with variations in anatomical scale and shape, but they usually overlook spatial correspondence during the translation process. For clinical applications, traceability, defined as the ability to provide pixel-level correspondences between original and translated images, is equally important. This property enhances clinical interpretability but has been largely overlooked in previous approaches. To address this gap, we propose Plasticine, which is, to the best of our knowledge, the first end-to-end image-to-image translation framework explicitly designed with traceability as a core objective. Our method combines intensity translation and spatial transformation within a denoising diffusion framework. This design enables the generation of synthetic images with interpretable intensity transitions and spatially coherent deformations, supporting pixel-wise traceability throughout the translation process.

preprint2022arXiv

A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications

To understand human behaviors, action recognition based on videos is a common approach. Compared with image-based action recognition, videos provide much more information. Reducing the ambiguity of actions and in the last decade, many works focused on datasets, novel models and learning approaches have improved video action recognition to a higher level. However, there are challenges and unsolved problems, in particular in sports analytics where data collection and labeling are more sophisticated, requiring sport professionals to annotate data. In addition, the actions could be extremely fast and it becomes difficult to recognize them. Moreover, in team sports like football and basketball, one action could involve multiple players, and to correctly recognize them, we need to analyse all players, which is relatively complicated. In this paper, we present a survey on video action recognition for sports analytics. We introduce more than ten types of sports, including team sports, such as football, basketball, volleyball, hockey and individual sports, such as figure skating, gymnastics, table tennis, tennis, diving and badminton. Then we compare numerous existing frameworks for sports analysis to present status quo of video action recognition in both team sports and individual sports. Finally, we discuss the challenges and unsolved problems in this area and to facilitate sports analytics, we develop a toolbox using PaddlePaddle, which supports football, basketball, table tennis and figure skating action recognition.

preprint2022arXiv

Detecting the stochastic gravitational wave background with the TianQin detector

The detection of stochastic gravitational wave background (SGWB) is among the leading scientific goals of the space-borne gravitational wave observatory, which would have significant impact on astrophysics and fundamental physics. In this work, we developed a data analysis software, \texttt{TQSGWB}, which can extract isotropic SGWB using the Bayes analysis method based on the TianQin detector. We find that for the noise cross spectrum, there are imaginary components and they play an important role in breaking the degeneracy of the position noise in the common laser link. When the imaginary corrections are considered, the credible regions of the position noise parameters are reduced by two orders of magnitude. We demonstrate that the parameters of various signals and instrumental noise could be estimated directly in the absence of a Galactic confusion foreground through Markov chain Monte Carlo sampling. With only a three-month observation, we find that TianQin could be able to confidently detect SGWBs with energy density as low as $Ω_{\rm PL} = 1.3 \times 10^{-12}$, $Ω_{\rm Flat} = 6.0 \times 10^{-12}$, and $Ω_{\rm SP} = 9.0 \times 10^{-12}$, for power-law, flat, and single-peak models respectively.

preprint2022arXiv

HASA: Hybrid Architecture Search with Aggregation Strategy for Echinococcosis Classification and Ovary Segmentation in Ultrasound Images

Different from handcrafted features, deep neural networks can automatically learn task-specific features from data. Due to this data-driven nature, they have achieved remarkable success in various areas. However, manual design and selection of suitable network architectures are time-consuming and require substantial effort of human experts. To address this problem, researchers have proposed neural architecture search (NAS) algorithms which can automatically generate network architectures but suffer from heavy computational cost and instability if searching from scratch. In this paper, we propose a hybrid NAS framework for ultrasound (US) image classification and segmentation. The hybrid framework consists of a pre-trained backbone and several searched cells (i.e., network building blocks), which takes advantage of the strengths of both NAS and the expert knowledge from existing convolutional neural networks. Specifically, two effective and lightweight operations, a mixed depth-wise convolution operator and a squeeze-and-excitation block, are introduced into the candidate operations to enhance the variety and capacity of the searched cells. These two operations not only decrease model parameters but also boost network performance. Moreover, we propose a re-aggregation strategy for the searched cells, aiming to further improve the performance for different vision tasks. We tested our method on two large US image datasets, including a 9-class echinococcosis dataset containing 9566 images for classification and an ovary dataset containing 3204 images for segmentation. Ablation experiments and comparison with other handcrafted or automatically searched architectures demonstrate that our method can generate more powerful and lightweight models for the above US image classification and segmentation tasks.

preprint2022arXiv

Personalized Diagnostic Tool for Thyroid Cancer Classification using Multi-view Ultrasound

Over the past decades, the incidence of thyroid cancer has been increasing globally. Accurate and early diagnosis allows timely treatment and helps to avoid over-diagnosis. Clinically, a nodule is commonly evaluated from both transverse and longitudinal views using thyroid ultrasound. However, the appearance of the thyroid gland and lesions can vary dramatically across individuals. Identifying key diagnostic information from both views requires specialized expertise. Furthermore, finding an optimal way to integrate multi-view information also relies on the experience of clinicians and adds further difficulty to accurate diagnosis. To address these, we propose a personalized diagnostic tool that can customize its decision-making process for different patients. It consists of a multi-view classification module for feature extraction and a personalized weighting allocation network that generates optimal weighting for different views. It is also equipped with a self-supervised view-aware contrastive loss to further improve the model robustness towards different patient groups. Experimental results show that the proposed framework can better utilize multi-view information and outperform the competing methods.

preprint2022arXiv

Planar Hall effect induced spin rectification effect and its strong impact on spin pumping measurements

Spin pumping is a technique widely used to generate the pure spin current and characterize the spin-charge conversion in various systems. The reversing sign of the symmetric Lorentzian charge current with respect to opposite magnetic field is generally accepted as the key criterion to identify its pure spin current origin. However, we herein find that the rectified voltage due to the planar Hall effect can exhibit similar spurious signal, complicating and even misleading the analysis. The distribution of microwave magnetic field and induction current has strong influence on the magnetic field symmetry and lineshape of the obtained signal. We further demonstrate a geometry where the spin-charge conversion and the rectified voltage can be readily distinguished with a straightforward symmetry analysis.

preprint2022arXiv

Real-time Semantic Segmentation via Spatial-detail Guided Context Propagation

Nowadays, vision-based computing tasks play an important role in various real-world applications. However, many vision computing tasks, e.g. semantic segmentation, are usually computationally expensive, posing a challenge to the computing systems that are resource-constrained but require fast response speed. Therefore, it is valuable to develop accurate and real-time vision processing models that only require limited computational resources. To this end, we propose the Spatial-detail Guided Context Propagation Network (SGCPNet) for achieving real-time semantic segmentation. In SGCPNet, we propose the strategy of spatial-detail guided context propagation. It uses the spatial details of shallow layers to guide the propagation of the low-resolution global contexts, in which the lost spatial information can be effectively reconstructed. In this way, the need for maintaining high-resolution features along the network is freed, therefore largely improving the model efficiency. On the other hand, due to the effective reconstruction of spatial details, the segmentation accuracy can be still preserved. In the experiments, we validate the effectiveness and efficiency of the proposed SGCPNet model. On the Citysacpes dataset, for example, our SGCPNet achieves 69.5% mIoU segmentation accuracy, while its speed reaches 178.5 FPS on 768x1536 images on a GeForce GTX 1080 Ti GPU card. In addition, SGCPNet is very lightweight and only contains 0.61 M parameters.

preprint2022arXiv

Science with the TianQin Observatory: Preliminary Results on Stochastic Gravitational-Wave Background

In this work, we study the prospect of detecting the stochastic gravitational-wave background with the TianQin Observatory. We consider sources of both astrophysical-origin and cosmological-origin, including stellar-mass binary black holes, binary neutron stars, Galactic white dwarves, inflation, first-order phase transitions, and cosmic defects. For the detector configurations, we consider TianQin, TianQin I+II, and TianQin + LISA. We study the detectability of stochastic gravitational-wave backgrounds with both the cross correlation and null channel methods, and present the corresponding power-law integrated sensitivity curves. We introduce the definition of the "joint foreground" with a network of detectors. With the joint foreground, the number of resolved double white dwarves in the Galaxy will be increased by 5$-$22\% compared with a simple combination of individual detectors. The astrophysical background is expected to be detectable with a signal-to-noise ratio of 100 after 5 years of operation and dominated by the extragalactic double white dwarves. On the other hand, due to the uncertain nature of underlying models, we can only estimate the detection capability of the cosmological background for specific cases.

preprint2022arXiv

Sketch guided and progressive growing GAN for realistic and editable ultrasound image synthesis

Ultrasound (US) imaging is widely used for anatomical structure inspection in clinical diagnosis. The training of new sonographers and deep learning based algorithms for US image analysis usually requires a large amount of data. However, obtaining and labeling large-scale US imaging data are not easy tasks, especially for diseases with low incidence. Realistic US image synthesis can alleviate this problem to a great extent. In this paper, we propose a generative adversarial network (GAN) based image synthesis framework. Our main contributions include: 1) we present the first work that can synthesize realistic B-mode US images with high-resolution and customized texture editing features; 2) to enhance structural details of generated images, we propose to introduce auxiliary sketch guidance into a conditional GAN. We superpose the edge sketch onto the object mask and use the composite mask as the network input; 3) to generate high-resolution US images, we adopt a progressive training strategy to gradually generate high-resolution images from low-resolution images. In addition, a feature loss is proposed to minimize the difference of high-level features between the generated and real images, which further improves the quality of generated images; 4) the proposed US image synthesis method is quite universal and can also be generalized to the US images of other anatomical structures besides the three ones tested in our study (lung, hip joint, and ovary); 5) extensive experiments on three large US image datasets are conducted to validate our method. Ablation studies, customized texture editing, user studies, and segmentation tests demonstrate promising results of our method in synthesizing realistic US images.

preprint2021arXiv

Mechanical Properties of Atomically Thin Tungsten Dichalcogenides: WS$_2$, WSe$_2$ and WTe$_2$

Two-dimensional (2D) tungsten disulfide (WS$_2$), tungsten diselenide (WSe$_2$), and tungsten ditelluride (WTe$_2$) draw increasing attention due to their attractive properties deriving from the heavy tungsten and chalcogenide atoms, but their mechanical properties are still mostly unknown. Here, we determine the intrinsic and air-aged mechanical properties of mono-, bi-, and trilayer (1-3L) WS$_2$, WSe$_2$ and WTe$_2$ using a complementary suite of experiments and theoretical calculations. High-quality 1L WS$_2$ has the highest Young's modulus (302.4+-24.1 GPa) and strength (47.0+-8.6 GPa) of the entire family, overpassing those of 1L WSe$_2$ (258.6+-38.3 and 38.0+-6.0 GPa, respectively) and WTe$_2$ (149.1+-9.4 and 6.4+-3.3 GPa, respectively). However, the elasticity and strength of WS$_2$ decrease most dramatically with increased thickness among the three materials. We interpret the phenomenon by the different tendencies for interlayer sliding in equilibrium state and under in-plane strain and out-of-plane compression conditions in the indentation process, revealed by finite element method (FEM) and density functional theory (DFT) calculations including van der Waals (vdW) interactions. We also demonstrate that the mechanical properties of the high-quality 1-3L WS$_2$ and WSe$_2$ are largely stable in the air for up to 20 weeks. Intriguingly, the 1-3L WSe$_2$ shows increased modulus and strength values with aging in the air. This is ascribed to oxygen doping, which reinforces the structure. The present study will facilitate the design and use of 2D tungsten dichalcogenides in applications, such as strain engineering and flexible field-effect transistors (FETs).

preprint2021arXiv

Near-Optimal Detection for Both Data and Sneak-Path Interference in Resistive Memories with Random Cell Selector Failures

Resistive random-access memory is one of the most promising candidates for the next generation of non-volatile memory technology. However, its crossbar structure causes severe "sneak-path" interference, which also leads to strong inter-cell correlation. Recent works have mainly focused on sub-optimal data detection schemes by ignoring inter-cell correlation and treating sneak-path interference as independent noise. We propose a near-optimal data detection scheme that can approach the performance bound of the optimal detection scheme. Our detection scheme leverages a joint data and sneak-path interference recovery and can use all inter-cell correlations. The scheme is appropriate for data detection of large memory arrays with only linear operation complexity.

preprint2020arXiv

Digital resolution enhancement in low transverse sampling optical coherence tomography angiography using deep learning

Optical coherence tomography angiography (OCTA) requires high transverse sampling density for visualizing retinal and choroidal capillaries. Low transverse sampling causes resolution degradation, such as the angiograms in wide-field OCTA. In this paper, we propose to address this problem using deep learning. We conducted extensive experiments on converting the centrally cropped 3 x 3 mm2 field of view (FOV) of the 8 x 8 mm2 foveal OCTA images (a sampling density of 22.9 $μ$m) to the native 3 x 3 mm2 en face OCTA images (a sampling density of 12.2 $μ$m). We employed a cycle-consistent adversarial network architecture in this conversion. The quantitative analysis using the perceptual similarity measures shows the generated OCTA images are closer to the native 3 x 3 mm2 scans. Besides, the results show the proposed method could also enhance signal-to-noise ratio. We further applied our method to enhance diseased cases and calculate vascular biomarkers, which demonstrates its generalization performance and clinical perspective.

preprint2020arXiv

Encoding Structure-Texture Relation with P-Net for Anomaly Detection in Retinal Images

Anomaly detection in retinal image refers to the identification of abnormality caused by various retinal diseases/lesions, by only leveraging normal images in training phase. Normal images from healthy subjects often have regular structures (e.g., the structured blood vessels in the fundus image, or structured anatomy in optical coherence tomography image). On the contrary, the diseases and lesions often destroy these structures. Motivated by this, we propose to leverage the relation between the image texture and structure to design a deep neural network for anomaly detection. Specifically, we first extract the structure of the retinal images, then we combine both the structure features and the last layer features extracted from original health image to reconstruct the original input healthy image. The image feature provides the texture information and guarantees the uniqueness of the image recovered from the structure. In the end, we further utilize the reconstructed image to extract the structure and measure the difference between structure extracted from original and the reconstructed image. On the one hand, minimizing the reconstruction difference behaves like a regularizer to guarantee that the image is corrected reconstructed. On the other hand, such structure difference can also be used as a metric for normality measurement. The whole network is termed as P-Net because it has a ``P'' shape. Extensive experiments on RESC dataset and iSee dataset validate the effectiveness of our approach for anomaly detection in retinal images. Further, our method also generalizes well to novel class discovery in retinal images and anomaly detection in real-world images.

preprint2020arXiv

Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification

Gait-based person re-identification (Re-ID) is valuable for safety-critical applications, and using only 3D skeleton data to extract discriminative gait features for person Re-ID is an emerging open topic. Existing methods either adopt hand-crafted features or learn gait features by traditional supervised learning paradigms. Unlike previous methods, we for the first time propose a generic gait encoding approach that can utilize unlabeled skeleton data to learn gait representations in a self-supervised manner. Specifically, we first propose to introduce self-supervision by learning to reconstruct input skeleton sequences in reverse order, which facilitates learning richer high-level semantics and better gait representations. Second, inspired by the fact that motion's continuity endows temporally adjacent skeletons with higher correlations ("locality"), we propose a locality-aware attention mechanism that encourages learning larger attention weights for temporally adjacent skeletons when reconstructing current skeleton, so as to learn locality when encoding gait. Finally, we propose Attention-based Gait Encodings (AGEs), which are built using context vectors learned by locality-aware attention, as final gait representations. AGEs are directly utilized to realize effective person Re-ID. Our approach typically improves existing skeleton-based methods by 10-20% Rank-1 accuracy, and it achieves comparable or even superior performance to multi-modal methods with extra RGB or depth information. Our codes are available at https://github.com/Kali-Hac/SGE-LA.

preprint2020arXiv

Solutions to the mean king's problem: higher-dimensional quantum error-correcting codes

Mean king's problem is a kind of quantum state discrimination problems. In the problem, we try to discriminate eigenstates of noncommutative observables with the help of classical delayed information. The problem has been investigated from the viewpoint of error detection and correction. We construct higher-dimensional quantum error-correcting codes against error corresponding to the noncommutative observables. Any code state of the codes provides a way to discriminate the eigenstates correctly with the classical delayed information.

preprint2020arXiv

Sparse-GAN: Sparsity-constrained Generative Adversarial Network for Anomaly Detection in Retinal OCT Image

With the development of convolutional neural network, deep learning has shown its success for retinal disease detection from optical coherence tomography (OCT) images. However, deep learning often relies on large scale labelled data for training, which is oftentimes challenging especially for disease with low occurrence. Moreover, a deep learning system trained from data-set with one or a few diseases is unable to detect other unseen diseases, which limits the practical usage of the system in disease screening. To address the limitation, we propose a novel anomaly detection framework termed Sparsity-constrained Generative Adversarial Network (Sparse-GAN) for disease screening where only healthy data are available in the training set. The contributions of Sparse-GAN are two-folds: 1) The proposed Sparse-GAN predicts the anomalies in latent space rather than image-level; 2) Sparse-GAN is constrained by a novel Sparsity Regularization Net. Furthermore, in light of the role of lesions for disease screening, we present to leverage on an anomaly activation map to show the heatmap of lesions. We evaluate our proposed Sparse-GAN on a publicly available dataset, and the results show that the proposed method outperforms the state-of-the-art methods.

preprint2020arXiv

Theoretical study of kinetics of proton coupled electron transfer in photocatalysis

Photocatalysis induced by sunlight is one of the most promising approach to environmental protection, solar energy conversion and sustainable production of fuels. The computational modeling of photocatalysis is a rapidly expending field which requires to adapt and further develop the available theoretical tools. The coupled transfer of proton and electron is an important reaction during photocatalysis. In this work, we present the first step of our methodology development in which we apply existing kinetic theory of such coupled transfer to a model system, namely, methanol photo-dissociation on rutile TiO$_2$(110) surface, with the help of high-level first-principles calculations. Moreover, we adapt the Stuchebrukhov-Hammes-Schiffer kinetic theory, where we use the Georgievskii-Stuchebrukhova vibronic coupling, to calculate the rate constant of the proton coupled electron transfer reaction for a particular pathway. In particular, we propose a modified expression to calculate the rate constant which enforces the near-resonance condition for the vibrational wavefunction during proton tunneling.

preprint2020arXiv

Universal digital filtering for denoising volumetric retinal OCT and OCT angiography in 3D shearlet domain

Retinal optical coherence tomography (OCT) and OCT angiography (OCTA) suffer from the degeneration of image quality due to speckle noise and bulk-motion noise, respectively. Because the cross-sectional retina has distinct features in OCT and OCTA B-scans, existing digital filters that can denoise OCT efficiently are unable to handle the bulk-motion noise in OCTA. In this Letter, we propose a universal digital filtering approach that is capable of minimizing both types of noise. Considering the retinal capillaries in OCTA are hard to differentiate in B-scans while having distinct curvilinear structures in 3D volumes, we decompose the volumetric OCT and OCTA data with 3D shearlets thus efficiently separate the retinal tissue and vessels from the noise in this transform domain. Compared with wavelets and curvelets, the shearlets provide better representation of the layer edges in OCT and the vasculature in OCTA. Qualitative and quantitative results show the proposed method outperforms the state-of-the-art OCT and OCTA denoising methods. Besides, the superiority of 3D denoising is demonstrated by comparing the 3D shearlet filtering with its 2D counterpart.

preprint2020arXiv

Unsupervised Deformable Medical Image Registration via Pyramidal Residual Deformation Fields Estimation

Deformation field estimation is an important and challenging issue in many medical image registration applications. In recent years, deep learning technique has become a promising approach for simplifying registration problems, and has been gradually applied to medical image registration. However, most existing deep learning registrations do not consider the problem that when the receptive field cannot cover the corresponding features in the moving image and the fixed image, it cannot output accurate displacement values. In fact, due to the limitation of the receptive field, the 3 x 3 kernel has difficulty in covering the corresponding features at high/original resolution. Multi-resolution and multi-convolution techniques can improve but fail to avoid this problem. In this study, we constructed pyramidal feature sets on moving and fixed images and used the warped moving and fixed features to estimate their "residual" deformation field at each scale, called the Pyramidal Residual Deformation Field Estimation module (PRDFE-Module). The "total" deformation field at each scale was computed by upsampling and weighted summing all the "residual" deformation fields at all its previous scales, which can effectively and accurately transfer the deformation fields from low resolution to high resolution and is used for warping the moving features at each scale. Simulation and real brain data results show that our method improves the accuracy of the registration and the rationality of the deformation field.