Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
21works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2026arXiv

CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training

As training scales grow, collective communication libraries (CCL) increasingly face anomalies arising from complex interactions among hardware, software, and environmental factors. These anomalies typically manifest as slow/hang communication, the most frequent and time-consuming category to diagnose. However, traditional diagnostic methods remain inaccurate and inefficient, frequently requiring hours or even days for root cause analysis. To address this, we propose CCL-D, a high-precision diagnostic system designed to detect and locate slow/hang anomalies in large-scale distributed training. CCL-D integrates a rank-level real-time probe with an intelligent decision analyzer. The probe measures cross-layer anomaly metrics using a lightweight distributed tracing framework to monitor communication traffic. The analyzer performs automated anomaly detection and root-cause location, precisely identifying the faulty GPU rank. Deployed on a 4,000-GPU cluster over one year, CCL-D achieved near-complete coverage of known slow/hang anomalies and pinpointed affected ranks within 6 minutes-substantially outperforming existing solutions.

preprint2026arXiv

STEP3-VL-10B Technical Report

We present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish intrinsic vision-language synergy; and second, a scaled post-training pipeline featuring over 1k iterations of reinforcement learning. Crucially, we implement Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute, allocating resources to scalable perceptual reasoning that explores and synthesizes diverse visual hypotheses. Consequently, despite its compact 10B footprint, STEP3-VL-10B rivals or surpasses models 10$\times$-20$\times$ larger (e.g., GLM-4.6V-106B, Qwen3-VL-235B) and top-tier proprietary flagships like Gemini 2.5 Pro and Seed-1.5-VL. Delivering best-in-class performance, it records 92.2% on MMBench and 80.11% on MMMU, while excelling in complex reasoning with 94.43% on AIME2025 and 75.95% on MathVision. We release the full model suite to provide the community with a powerful, efficient, and reproducible baseline.

preprint2023arXiv

The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary Covariance

We study the distribution of the maximum likelihood estimate (MLE) in high-dimensional logistic models, extending the recent results from Sur (2019) to the case where the Gaussian covariates may have an arbitrary covariance structure. We prove that in the limit of large problems holding the ratio between the number $p$ of covariates and the sample size $n$ constant, every finite list of MLE coordinates follows a multivariate normal distribution. Concretely, the $j$th coordinate $\hat β_j$ of the MLE is asymptotically normally distributed with mean $α_\star β_j$ and standard deviation $σ_\star/τ_j$; here, $β_j$ is the value of the true regression coefficient, and $τ_j$ the standard deviation of the $j$th predictor conditional on all the others. The numerical parameters $α_\star > 1$ and $σ_\star$ only depend upon the problem dimensionality $p/n$ and the overall signal strength, and can be accurately estimated. Our results imply that the MLE's magnitude is biased upwards and that the MLE's standard deviation is greater than that predicted by classical theory. We present a series of experiments on simulated and real data showing excellent agreement with the theory.

preprint2022arXiv

An Adaptively Resized Parametric Bootstrap for Inference in High-dimensional Generalized Linear Models

Accurate statistical inference in logistic regression models remains a critical challenge when the ratio between the number of parameters and sample size is not negligible. This is because approximations based on either classical asymptotic theory or bootstrap calculations are grossly off the mark. This paper introduces a resized bootstrap method to infer model parameters in arbitrary dimensions. As in the parametric bootstrap, we resample observations from a distribution, which depends on an estimated regression coefficient sequence. The novelty is that this estimate is actually far from the maximum likelihood estimate (MLE). This estimate is informed by recent theory studying properties of the MLE in high dimensions, and is obtained by appropriately shrinking the MLE towards the origin. We demonstrate that the resized bootstrap method yields valid confidence intervals in both simulated and real data examples. Our methods extend to other high-dimensional generalized linear models.

preprint2022arXiv

An Effective Graph Learning based Approach for Temporal Link Prediction: The First Place of WSDM Cup 2022

Temporal link prediction, as one of the most crucial work in temporal graphs, has attracted lots of attention from the research area. The WSDM Cup 2022 seeks for solutions that predict the existence probabilities of edges within time spans over temporal graph. This paper introduces the solution of AntGraph, which wins the 1st place in the competition. We first analysis the theoretical upper-bound of the performance by removing temporal information, which implies that only structure and attribute information on the graph could achieve great performance. Based on this hypothesis, then we introduce several well-designed features. Finally, experiments conducted on the competition datasets show the superiority of our proposal, which achieved AUC score of 0.666 on dataset A and 0.902 on dataset B, the ablation studies also prove the efficiency of each feature. Code is publicly available at https://github.com/im0qianqian/WSDM2022TGP-AntGraph.

preprint2022arXiv

Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel

While researches on model-based blind single image super-resolution (SISR) have achieved tremendous successes recently, most of them do not consider the image degradation sufficiently. Firstly, they always assume image noise obeys an independent and identically distributed (i.i.d.) Gaussian or Laplacian distribution, which largely underestimates the complexity of real noise. Secondly, previous commonly-used kernel priors (e.g., normalization, sparsity) are not effective enough to guarantee a rational kernel solution, and thus degenerates the performance of subsequent SISR task. To address the above issues, this paper proposes a model-based blind SISR method under the probabilistic framework, which elaborately models image degradation from the perspectives of noise and blur kernel. Specifically, instead of the traditional i.i.d. noise assumption, a patch-based non-i.i.d. noise model is proposed to tackle the complicated real noise, expecting to increase the degrees of freedom of the model for noise representation. As for the blur kernel, we novelly construct a concise yet effective kernel generator, and plug it into the proposed blind SISR method as an explicit kernel prior (EKP). To solve the proposed model, a theoretically grounded Monte Carlo EM algorithm is specifically designed. Comprehensive experiments demonstrate the superiority of our method over current state-of-the-arts on synthetic and real datasets. The source code is available at https://github.com/zsyOAOA/BSRDM.

preprint2022arXiv

Diagnosing Batch Normalization in Class Incremental Learning

Extensive researches have applied deep neural networks (DNNs) in class incremental learning (Class-IL). As building blocks of DNNs, batch normalization (BN) standardizes intermediate feature maps and has been widely validated to improve training stability and convergence. However, we claim that the direct use of standard BN in Class-IL models is harmful to both the representation learning and the classifier training, thus exacerbating catastrophic forgetting. In this paper we investigate the influence of BN on Class-IL models by illustrating such BN dilemma. We further propose BN Tricks to address the issue by training a better feature extractor while eliminating classification bias. Without inviting extra hyperparameters, we apply BN Tricks to three baseline rehearsal-based methods, ER, DER++ and iCaRL. Through comprehensive experiments conducted on benchmark datasets of Seq-CIFAR-10, Seq-CIFAR-100 and Seq-Tiny-ImageNet, we show that BN Tricks can bring significant performance gains to all adopted baselines, revealing its potential generality along this line of research.

preprint2022arXiv

Generation of arbitrarily polarized muon pairs via polarized $e^-e^+$ collision

Generation of arbitrarily spin polarized muon pairs is investigated via polarized $e^-e^+$ collision. We calculate the fully spin-resolved cross section ${\rm d}σ_{e^-e^+\rightarrow μ^-μ^+}$ and utilize the Monte Carlo method of binary collision to describe the production and polarization processes of muon pairs. We find that, due to the dependence of mixed helicities on the scattering angle, arbitrarily polarized muon pairs with both of the longitudinal and transverse spin components can be produced. The collision of tightly collimated electron and positron beams with highly longitudinal polarization and nC charge can generate about $40\%$ muon pairs with longitudinal polarization and about $60\%$ muon pairs with transverse polarization. The compact high-flux $e^-e^+\rightarrowμ^-μ^+$ muon source could be implemented through the next-generation laser-plasma linear collider, and would be essential to facilitate the investigation of fundamental physics and the measurement technology in broad areas.

preprint2022arXiv

Low-light Image Enhancement by Retinex Based Algorithm Unrolling and Adjustment

Motivated by their recent advances, deep learning techniques have been widely applied to low-light image enhancement (LIE) problem. Among which, Retinex theory based ones, mostly following a decomposition-adjustment pipeline, have taken an important place due to its physical interpretation and promising performance. However, current investigations on Retinex based deep learning are still not sufficient, ignoring many useful experiences from traditional methods. Besides, the adjustment step is either performed with simple image processing techniques, or by complicated networks, both of which are unsatisfactory in practice. To address these issues, we propose a new deep learning framework for the LIE problem. The proposed framework contains a decomposition network inspired by algorithm unrolling, and adjustment networks considering both global brightness and local brightness sensitivity. By virtue of algorithm unrolling, both implicit priors learned from data and explicit priors borrowed from traditional methods can be embedded in the network, facilitate to better decomposition. Meanwhile, the consideration of global and local brightness can guide designing simple yet effective network modules for adjustment. Besides, to avoid manually parameter tuning, we also propose a self-supervised fine-tuning strategy, which can always guarantee a promising performance. Experiments on a series of typical LIE datasets demonstrated the effectiveness of the proposed method, both quantitatively and visually, as compared with existing methods.

preprint2022arXiv

Production of polarized particle beams via ultraintense laser pulses

High-energy spin-polarized electron, positron, and $γ$-photon beams have many significant applications in the study of material properties, nuclear structure, particle physics, and high-energy astrophysics. Thus,efficient production of such polarized beams attracts a broad spectrum of research interests. This is driven mainly by the rapid advancements in ultrashort and ultraintense laser technology. Currently available laser pulses can achieve peak intensities in the range of $10^{22}-10^{23}$ Wcm$^{-2}$, with pulse durations of tens of femtoseconds. The dynamics of particles in laser fields of the available intensities is dominated by quantum electrodynamics (QED) and the interaction mechanisms have reached regimes spanned by nonlinear multiphoton absorbtion (strong-field QED processes). In strong-field QED processes, the scattering cross sections obviously depend on the spin and polarization of the particles, and the spin-dependent photon emission and the radiation-reaction effects can be utilized to produce the polarized particles. An ultraintense laser-driven polarized particle source possesses the advantages of high-brilliance and compactness, which could open the way for the unexplored aspects in a range of researches. In this work, we briefly review the seminal conclusions from the study of the polarization effects in strong-field QED processes, as well as the progress made by recent proposals for production of the polarized particles by laser-beam or laser-plasma interactions.

preprint2022arXiv

Temperature-dependent structure of an intermetallic ErPd$_2$Si$_2$ single crystal: A combined synchrotron and in-house X-ray diffraction study

We have grown intermetallic ErPd$_2$Si$_2$ single crystals employing laser-diodes with the floating-zone method. The temperature-dependent crystallography was determined using synchrotron and in-house X-ray powder diffraction measurements from 20 to 500 K. The diffraction patterns fit well with the tetragonal $I$4/$mmm$ space group (No. 139) with two chemical formulas within one unit cell. Our synchrotron X-ray powder diffraction study shows that the refined lattice constants are $a$ = 4.10320(2) Å, $c$ = 9.88393(5) Å at 298 K and $a$ = 4.11737(2) Å, $c$ = 9.88143(5) Å at 500 K, resulting in the unit-cell volume $V$ = 166.408(1) Å$^3$ (298 K) and 167.517(2) Å$^3$ (500 K). In the whole studied temperature range, we did not find any structural phase transition. Upon cooling, the lattice constants a and c are shortened and elongated, respectively.

preprint2021arXiv

Brilliant circularly polarized $γ$-ray sources via single-shot laser plasma interaction

Circularly polarized (CP) $γ$-ray sources are versatile for broad applications in nuclear physics, high-energy physics and astrophysics. The laser-plasma based particle accelerators provide accessibility for much higher flux $γ$-ray sources than conventional approaches, in which, however, the circular polarization properties of emitted $γ$-photons are used to be neglected. In this letter, we show that brilliant CP $γ$-ray beams can be generated via the combination of laser plasma wakefield acceleration and plasma mirror techniques. In weakly nonlinear Compton scattering scheme with moderate laser intensities, the helicity of the driving laser can be transferred to the emitted $γ$-photons, and their average polarization degree can reach about $\sim 37\%$ ($21\%$) with a peak brilliance of $\gtrsim 10^{21}~$photons/(s $\cdot$ mm$^2 \cdot$ mrad$^2 \cdot$ 0.1% BW) around 1~MeV (100~MeV). Moreover, our proposed method is easily feasible and robust with respect to the laser and plasma parameters.

preprint2020arXiv

A Model-driven Deep Neural Network for Single Image Rain Removal

Deep learning (DL) methods have achieved state-of-the-art performance in the task of single image rain removal. Most of current DL architectures, however, are still lack of sufficient interpretability and not fully integrated with physical structures inside general rain streaks. To this issue, in this paper, we propose a model-driven deep neural network for the task, with fully interpretable network structures. Specifically, based on the convolutional dictionary learning mechanism for representing rain, we propose a novel single image deraining model and utilize the proximal gradient descent technique to design an iterative algorithm only containing simple operators for solving the model. Such a simple implementation scheme facilitates us to unfold it into a new deep network architecture, called rain convolutional dictionary network (RCDNet), with almost every network module one-to-one corresponding to each operation involved in the algorithm. By end-to-end training the proposed RCDNet, all the rain kernels and proximal operators can be automatically extracted, faithfully characterizing the features of both rain and clean background layers, and thus naturally lead to its better deraining performance, especially in real scenarios. Comprehensive experiments substantiate the superiority of the proposed network, especially its well generality to diverse testing scenarios and good interpretability for all its modules, as compared with state-of-the-arts both visually and quantitatively. The source codes are available at \url{https://github.com/hongwang01/RCDNet}.

preprint2020arXiv

Crystalline and magnetic structures, magnetization, heat capacity and anisotropic magnetostriction effect in a yttrium-chromium oxide

We have studied a nearly stoichiometric insulating Y$_{0.97(2)}$Cr$_{0.98(2)}$O$_{3.00(2)}$ single crystal by performing measurements of magnetization, heat capacity, and neutron diffraction. Albeit that the YCrO$_3$ compound behaviors like a soft ferromagnet with a coersive force of $\sim$ 0.05 T, there exist strong antiferromagnetic (AFM) interactions between Cr$^{3+}$ spins due to a strongly negative paramagnetic Curie-Weiss temperature, i.e., -433.2(6) K. The coexistence of ferromagnetism and antiferromagnetism may indicate a canted AFM structure. The AFM phase transition occurs at $T_\textrm{N} =$ 141.5(1) K, which increases to $T_\textrm{N}$(5T) = 144.5(1) K at 5 T. Within the accuracy of the present neuron-diffraction studies, we determine a G-type AFM structure with a propagation vector \textbf{k} = (1 1 0) and Cr$^{3+}$ spin directions along the crystallographic \emph{c} axis of the orthorhombic structure with space group \emph{Pnma} below $T_\textrm{N}$. At 12 K, the refined moment size is 2.45(6) $μ_\textrm{B}$, $\sim$ 82\% of the theoretical saturation value 3 $μ_\textrm{B}$. The Cr$^{3+}$ spin interactions are probably two-dimensional Ising like within the reciprocal (1 1 0) scattering plane. Below $T_\textrm{N}$, the lattice configuration (\emph{a}, \emph{b}, \emph{c}, and \emph{V}) deviates largely downward from the Gr$\ddot{\textrm{u}}$neisen law, displaying an anisotropic magnetostriction effect and a magnetoelastic effect. Especially, the sample contraction upon cooling is enhanced below the AFM transition temperature. There is evidence to suggest that the actual crystalline symmetry of YCrO$_3$ compound is probably lower than the currently assumed one. Additionally, we compared the $t_{2\textrm{g}}$ YCrO$_3$ and the $e_\textrm{g}$ La$_{7/8}$Sr$_{1/8}$MnO$_3$ single crystals for a further understanding of the reason for the possible symmetry lowering.

preprint2020arXiv

Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation

Real-world image noise removal is a long-standing yet very challenging task in computer vision. The success of deep neural network in denoising stimulates the research of noise generation, aiming at synthesizing more clean-noisy image pairs to facilitate the training of deep denoisers. In this work, we propose a novel unified framework to simultaneously deal with the noise removal and noise generation tasks. Instead of only inferring the posteriori distribution of the latent clean image conditioned on the observed noisy image in traditional MAP framework, our proposed method learns the joint distribution of the clean-noisy image pairs. Specifically, we approximate the joint distribution with two different factorized forms, which can be formulated as a denoiser mapping the noisy image to the clean one and a generator mapping the clean image to the noisy one. The learned joint distribution implicitly contains all the information between the noisy and clean images, avoiding the necessity of manually designing the image priors and noise assumptions as traditional. Besides, the performance of our denoiser can be further improved by augmenting the original training dataset with the learned generator. Moreover, we propose two metrics to assess the quality of the generated noisy image, for which, to the best of our knowledge, such metrics are firstly proposed along this research line. Extensive experiments have been conducted to demonstrate the superiority of our method over the state-of-the-arts both in the real noise removal and generation tasks. The training and testing code is available at https://github.com/zsyOAOA/DANet.

preprint2020arXiv

ECNU-SenseMaker at SemEval-2020 Task 4: Leveraging Heterogeneous Knowledge Resources for Commonsense Validation and Explanation

This paper describes our system for SemEval-2020 Task 4: Commonsense Validation and Explanation (Wang et al., 2020). We propose a novel Knowledge-enhanced Graph Attention Network (KEGAT) architecture for this task, leveraging heterogeneous knowledge from both the structured knowledge base (i.e. ConceptNet) and unstructured text to better improve the ability of a machine in commonsense understanding. This model has a powerful commonsense inference capability via utilizing suitable commonsense incorporation methods and upgraded data augmentation techniques. Besides, an internal sharing mechanism is cooperated to prohibit our model from insufficient and excessive reasoning for commonsense. As a result, this model performs quite well in both validation and explanation. For instance, it achieves state-of-the-art accuracy in the subtask called Commonsense Explanation (Multi-Choice). We officially name the system as ECNU-SenseMaker. Code is publicly available at https://github.com/ECNU-ICA/ECNU-SenseMaker.

preprint2020arXiv

Generation of highly-polarized high-energy brilliant $γ$-rays via laser-plasma interaction

Generation of highly-polarized high-energy brilliant $γ$-rays via laser-plasma interaction has been investigated in the quantum radiation-reaction regime. We employ a quantum-electrodynamics particle-in-cell code to describe spin-resolved electron dynamics semiclassically and photon emission and polarization quantum mechanically in the local constant field approximation. As an ultrastrong linearly-polarized (LP) laser pulse irradiates on a near-critical-density (NCD) plasma followed by an ultrathin planar aluminum target, the electrons in NCD plasma are first accelerated by the driving laser to ultrarelativistic energies, and then head-on collide with reflected laser pulse by the aluminum target, emitting brilliant LP $γ$-rays due to nonlinear Compton scattering with an average polarization of about 70\% and energy up to hundreds of MeV. By comparison, as a conical gold target filled with NCD plasma is employed, the linear polarization degree, collimation and brilliance of emitted $γ$-ray beam are all significantly improved due to the enhanced strong laser-driven quasi-static magnetic field in plasmas. Such $γ$-rays can be produced with currently achievable laser facilities and find various applications in high-energy physics and astrophysics.

preprint2020arXiv

Learning Adaptive Loss for Robust Learning with Noisy Labels

Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current robust loss functions, however, inevitably involve hyperparameter(s) to be tuned, manually or heuristically through cross validation, which makes them fairly hard to be generally applied in practice. Besides, the non-convexity brought by the loss as well as the complicated network architecture makes it easily trapped into an unexpected solution with poor generalization capability. To address above issues, we propose a meta-learning method capable of adaptively learning hyperparameter in robust loss functions. Specifically, through mutual amelioration between robust loss hyperparameter and network parameters in our method, both of them can be simultaneously finely learned and coordinated to attain solutions with good generalization capability. Four kinds of SOTA robust loss functions are attempted to be integrated into our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its accuracy and generalization performance, as compared with conventional hyperparameter tuning strategy, even with carefully tuned hyperparameters.

preprint2020arXiv

Meta Feature Modulator for Long-tailed Recognition

Deep neural networks often degrade significantly when training data suffer from class imbalance problems. Existing approaches, e.g., re-sampling and re-weighting, commonly address this issue by rearranging the label distribution of training data to train the networks fitting well to the implicit balanced label distribution. However, most of them hinder the representative ability of learned features due to insufficient use of intra/inter-sample information of training data. To address this issue, we propose meta feature modulator (MFM), a meta-learning framework to model the difference between the long-tailed training data and the balanced meta data from the perspective of representation learning. Concretely, we employ learnable hyper-parameters (dubbed modulation parameters) to adaptively scale and shift the intermediate features of classification networks, and the modulation parameters are optimized together with the classification network parameters guided by a small amount of balanced meta data. We further design a modulator network to guide the generation of the modulation parameters, and such a meta-learner can be readily adapted to train the classification network on other long-tailed datasets. Extensive experiments on benchmark vision datasets substantiate the superiority of our approach on long-tailed recognition tasks beyond other state-of-the-art methods.

preprint2020arXiv

Meta Transition Adaptation for Robust Deep Learning with Noisy Labels

To discover intrinsic inter-class transition probabilities underlying data, learning with noise transition has become an important approach for robust deep learning on corrupted labels. Prior methods attempt to achieve such transition knowledge by pre-assuming strongly confident anchor points with 1-probability belonging to a specific class, generally infeasible in practice, or directly jointly estimating the transition matrix and learning the classifier from the noisy samples, always leading to inaccurate estimation misguided by wrong annotation information especially in large noise cases. To alleviate these issues, this study proposes a new meta-transition-learning strategy for the task. Specifically, through the sound guidance of a small set of meta data with clean labels, the noise transition matrix and the classifier parameters can be mutually ameliorated to avoid being trapped by noisy training samples, and without need of any anchor point assumptions. Besides, we prove our method is with statistical consistency guarantee on correctly estimating the desired transition matrix. Extensive synthetic and real experiments validate that our method can more accurately extract the transition matrix, naturally following its more robust performance than prior arts. Its essential relationship with label distribution learning is also discussed, which explains its fine performance even under no-noise scenarios.

preprint2020arXiv

Structural Residual Learning for Single Image Rain Removal

To alleviate the adverse effect of rain streaks in image processing tasks, CNN-based single image rain removal methods have been recently proposed. However, the performance of these deep learning methods largely relies on the covering range of rain shapes contained in the pre-collected training rainy-clean image pairs. This makes them easily trapped into the overfitting-to-the-training-samples issue and cannot finely generalize to practical rainy images with complex and diverse rain streaks. Against this generalization issue, this study proposes a new network architecture by enforcing the output residual of the network possess intrinsic rain structures. Such a structural residual setting guarantees the rain layer extracted by the network finely comply with the prior knowledge of general rain streaks, and thus regulates sound rain shapes capable of being well extracted from rainy images in both training and predicting stages. Such a general regularization function naturally leads to both its better training accuracy and testing generalization capability even for those non-seen rain configurations. Such superiority is comprehensively substantiated by experiments implemented on synthetic and real datasets both visually and quantitatively as compared with current state-of-the-art methods.