Source author record

Wei Deng

Wei Deng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

19works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Weak-to-Strong Elicitation via Mismatched Wrong Drafts

We consider whether off-policy experience from a smaller, weaker model can elicit capability in a stronger learner that on-policy RL fine-tuning (e.g., GRPO) does not reach. We find that injecting mathematically wrong drafts from a smaller but more domain-trained model -- mismatched to the current problem -- into a stronger learner's GRPO context consistently outperforms standard on-policy GRPO on held-out MATH-500 and out-of-distribution AIME 2025/2026. Concretely, we use Mathstral-7B as the learner, Qwen2.5-Math-1.5B as the draft model, 8.8K Level 3--5 MATH problems (with MATH-500 held out), and train with Dr. GRPO. Mismatch is an active ingredient: shuffling drafts to mismatched problems while holding everything else constant yields $+1.62$pp on MATH-500 (greedy pass@1) over the matched-wrong variant ($n=10$ seeds, $p=0.0015$, Welch's $t$). In fact, the mismatched-wrong variant leads all other variants we tested on MATH-500 across both greedy pass@1 and sampling pass@$k$. On out-of-distribution AIME 2025 and 2026, the mismatched-wrong variant uniquely lifts pass@$k$ above both Mathstral-7B (in its native [INST] format) and the Qwen2.5-Math-1.5B draft model at every sample budget from $k=1$ to $k=1024$ across 2 seeds ($+14.2$pp on 2025 and $+9.0$pp on 2026 at pass@1024 over Mathstral-7B), and at pass@1024 also leads no-draft, matched-wrong, and mismatched-correct variants on both years. All variants use the same prompt with no draft injection at test time. The recipe -- trained on a single GPU with no SFT, no reward models, no synthesized data, and no produce-critique-revise inner loop -- reaches 71.98% MATH-500 on Mathstral-7B-v0.1, the highest published result on this model to our knowledge, surpassing the heavier WizardMath pipeline at 70.9% on full MATH (SFT + PPO with process/instruction reward models).

preprint2025arXiv

Index-ASR Technical Report

Automatic speech recognition (ASR) has witnessed remarkable progress in recent years, largely driven by the emergence of LLM-based ASR paradigm. Despite their strong performance on a variety of open-source benchmarks, existing LLM-based ASR systems still suffer from two critical limitations. First, they are prone to hallucination errors, often generating excessively long and repetitive outputs that are not well grounded in the acoustic input. Second, they provide limited support for flexible and fine-grained contextual customization. To address these challenges, we propose Index-ASR, a large-scale LLM-based ASR system designed to simultaneously enhance robustness and support customizable hotword recognition. The core idea of Index-ASR lies in the integration of LLM and large-scale training data enriched with background noise and contextual information. Experimental results show that our Index-ASR achieves strong performance on both open-source benchmarks and in-house test sets, highlighting its robustness and practicality for real-world ASR applications.

preprint2024arXiv

Reflected Schrödinger Bridge for Constrained Generative Modeling

Diffusion models have become the go-to method for large-scale generative models in real-world applications. These applications often involve data distributions confined within bounded domains, typically requiring ad-hoc thresholding techniques for boundary enforcement. Reflected diffusion models (Lou23) aim to enhance generalizability by generating the data distribution through a backward process governed by reflected Brownian motion. However, reflected diffusion models may not easily adapt to diverse domains without the derivation of proper diffeomorphic mappings and do not guarantee optimal transport properties. To overcome these limitations, we introduce the Reflected Schrodinger Bridge algorithm: an entropy-regularized optimal transport approach tailored for generating data within diverse bounded domains. We derive elegant reflected forward-backward stochastic differential equations with Neumann and Robin boundary conditions, extend divergence-based likelihood training to bounded domains, and explore natural connections to entropic optimal transport for the study of approximate linear convergence - a valuable insight for practical training. Our algorithm yields robust generative modeling in diverse domains, and its scalability is demonstrated in real-world constrained generative modeling through standard image benchmarks.

preprint2022arXiv

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

We propose an adaptively weighted stochastic gradient Langevin dynamics algorithm (SGLD), so-called contour stochastic gradient Langevin dynamics (CSGLD), for Bayesian learning in big data statistics. The proposed algorithm is essentially a \emph{scalable dynamic importance sampler}, which automatically \emph{flattens} the target distribution such that the simulation for a multi-modal distribution can be greatly facilitated. Theoretically, we prove a stability condition and establish the asymptotic convergence of the self-adapting parameter to a {\it unique fixed-point}, regardless of the non-convexity of the original energy function; we also present an error analysis for the weighted averaging estimators. Empirically, the CSGLD algorithm is tested on multiple benchmark datasets including CIFAR10 and CIFAR100. The numerical results indicate its superiority to avoid the local trap problem in training deep neural networks.

preprint2021arXiv

DeepLight: Deep Lightweight Feature Interactions for Accelerating CTR Predictions in Ad Serving

Click-through rate (CTR) prediction is a crucial task in online display advertising. The embedding-based neural networks have been proposed to learn both explicit feature interactions through a shallow component and deep feature interactions using a deep neural network (DNN) component. These sophisticated models, however, slow down the prediction inference by at least hundreds of times. To address the issue of significantly increased serving delay and high memory usage for ad serving in production, this paper presents \emph{DeepLight}: a framework to accelerate the CTR predictions in three aspects: 1) accelerate the model inference via explicitly searching informative feature interactions in the shallow component; 2) prune redundant layers and parameters at intra-layer and inter-layer level in the DNN component; 3) promote the sparsity of the embedding layer to preserve the most discriminant signals. By combining the above efforts, the proposed approach accelerates the model inference by 46X on Criteo dataset and 27X on Avazu dataset without any loss on the prediction accuracy. This paves the way for successfully deploying complicated embedding-based neural networks in production for ad serving.

preprint2020arXiv

An Adaptive Empirical Bayesian Method for Sparse Deep Learning

We propose a novel adaptive empirical Bayesian method for sparse deep learning, where the sparsity is ensured via a class of self-adaptive spike-and-slab priors. The proposed method works by alternatively sampling from an adaptive hierarchical posterior distribution using stochastic gradient Markov Chain Monte Carlo (MCMC) and smoothly optimizing the hyperparameters using stochastic approximation (SA). We further prove the convergence of the proposed method to the asymptotically correct distribution under mild conditions. Empirical applications of the proposed method lead to the state-of-the-art performance on MNIST and Fashion MNIST with shallow convolutional neural networks and the state-of-the-art compression performance on CIFAR10 with Residual Networks. The proposed method also improves resistance to adversarial attacks.

preprint2020arXiv

Non-uniform dependence for higher dimensional Camassa-Holm equations in Besov spaces

In this paper, we investigate the dependence on initial data of solutions to higher dimensional Camassa-Holm equations. We show that the data-to-solution map is not uniformly continuous dependence in Besov spaces $B^s_{p,r}(\mathbb{R}^d),s>\max\{1+\frac d2,\frac32\}$.

preprint2020arXiv

NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

This paper reviews the NTIRE 2020 challenge on real world super-resolution. It focuses on the participating methods and final results. The challenge addresses the real world setting, where paired true high and low-resolution images are unavailable. For training, only one set of source input images is therefore provided along with a set of unpaired high-quality target images. In Track 1: Image Processing artifacts, the aim is to super-resolve images with synthetically generated image processing artifacts. This allows for quantitative benchmarking of the approaches \wrt a ground-truth image. In Track 2: Smartphone Images, real low-quality smart phone images have to be super-resolved. In both tracks, the ultimate goal is to achieve the best perceptual quality, evaluated using a human study. This is the second challenge on the subject, following AIM 2019, targeting to advance the state-of-the-art in super-resolution. To measure the performance we use the benchmark protocol from AIM 2019. In total 22 teams competed in the final testing phase, demonstrating new and innovative solutions to the problem.

preprint2016arXiv

Collision-induced magnetic reconnection and a unified interpretation of polarization properties of GRBs and blazars

The jet composition and energy dissipation mechanism of Gamma-ray bursts (GRBs) and Blazars are fundamental questions which remain not fully understood. One plausible model is to interpret the $γ$-ray emission of GRBs and optical emission of blazars as synchrotron radiation of electrons accelerated from the collision-induced magnetic dissipation regions in Poynting-flux-dominated jets. The polarization observation is an important and independent information to test this model. Based on our recent 3D relativistic MHD simulations of collision-induced magnetic dissipation of magnetically dominated blobs, here we perform calculations of the polarization properties of the emission in the dissipation region and apply the results to model the polarization observational data of GRB prompt emission and blazar optical emission. We show that the same numerical model with different input parameters can reproduce well the observational data of both GRBs and blazars, especially the $90^{\circ}$ polarization angle (PA) change in GRB 100826A and the $180^{\circ}$ PA swing in Blazar 3C279. This supports a unified model for GRB and blazar jets, suggesting that collision-induced magnetic reconnection is a common physical mechanism to power the relativistic jet emission from events with very different black hole masses.

preprint2016arXiv

Efficient Production of High-energy Nonthermal Particles during Magnetic Reconnection in a Magnetically-dominated Ion-Electron Plasma

Magnetic reconnection is a leading mechanism for dissipating magnetic energy and accelerating nonthermal particles in Poynting-flux dominated flows. In this letter, we investigate nonthermal particle acceleration during magnetic reconnection in a magnetically-dominated ion-electron plasma using fully kinetic simulations. For an ion-electron plasma with the total magnetization $σ_0=B^2/(4πn(m_i+m_e)c^2)$, the magnetization for each species is $σ_i \sim σ_0$ and $σ_e \sim (m_i/m_e) σ_0$, respectively. We have studied the magnetically dominated regime by varying $σ_{e} = 10^3 - 10^5$ with initial ion and electron temperatures $T_i = T_e = 5 - 20 m_ec^2$ and mass ratio $m_i/m_e = 1 - 1836$. The results demonstrate that reconnection quickly establishes power-law energy distributions for both electrons and ions within several ($2-3$) light-crossing times. For the cases with periodic boundary conditions, the power-law index is $1<s<2$ for both electrons and ions. The hard spectra limit the power-law energies for electrons and ions to be $γ_{be} \sim σ_e$ and $γ_{bi} \sim σ_i$, respectively. The main acceleration mechanism is a Fermi-like acceleration through the drift motions of charged particles. When comparing the spectra for electrons and ions in momentum space, the spectral indices $s_p$ are identical as predicted in Fermi acceleration. We also find that the bulk flow can carry a significant amount of energy during the simulations. We discuss the implication of this study in the context of Poynting-flux dominated jets and pulsar winds especially the applications for explaining the nonthermal high-energy emissions.

preprint2015arXiv

Polarization Signatures of Relativistic Magnetohydrodynamic Shocks in the Blazar Emission Region - I. Force-free Helical Magnetic Fields

The optical radiation and polarization signatures in blazars are known to be highly variable during flaring activities. It is frequently argued that shocks are the main driver of the flaring events. However, the spectral variability modelings generally lack detailed considerations of the self-consistent magnetic field evolution modeling, thus so far the associated optical polarization signatures are poorly understood. We present the first simultaneous modeling of the optical radiation and polarization signatures based on 3D magnetohydrodynamic simulations of relativistic shocks in the blazar emission environment, with the simplest physical assumptions. By comparing the results with observations, we find that shocks in a weakly magnetized environment will largely lead to significant changes in the optical polarization signatures, which are seldom seen in observations. Hence an emission region with relatively strong magnetization is preferred. In such an environment, slow shocks may produce minor flares with either erratic polarization fluctuations or considerable polarization variations, depending on the parameters; fast shocks can produce major flares with smooth PA rotations. In addition, the magnetic fields in both cases are observed to actively revert to the original topology after the shocks. All these features are consistent with observations. Future observations of the radiation and polarization signatures will further constrain the flaring mechanism and the blazar emission environment.

preprint2015arXiv

Relativistic MHD simulations of collision-induced magnetic dissipation in Poynting-flux-dominated jets/outflows

We perform 3D relativistic ideal MHD simulations to study the collisions between high-$σ$ (Poynting-flux-dominated) blobs which contain both poloidal and toroidal magnetic field components. This is meant to mimic the interactions inside a highly variable Poynting-flux-dominated jet. We discover a significant electromagnetic field (EMF) energy dissipation with an Alfvénic rate with the efficiency around 35\%. Detailed analyses show that this dissipation is mostly facilitated by the collision-induced magnetic reconnection. Additional resolution and parameter studies show a robust result that the relative EMF energy dissipation efficiency is nearly independent of the numerical resolution or most physical parameters in the relevant parameter range. The reconnection outflows in our simulation can potentially form the multi-orientation relativistic mini-jets as needed for several analytical models. We also find a linear relationship between the $σ$ values before and after the major EMF energy dissipation process. Our results give support to the proposed astrophysical models that invoke significant magnetic energy dissipation in Poynting-flux-dominated jets, such as the internal collision-induced magnetic reconnection and turbulence (ICMART) model for GRBs, and reconnection triggered mini-jets model for AGNs.

preprint2014arXiv

Cosmological implications of Fast Radio Burst / Gamma-Ray Burst Associations

If a small fraction of Fast Radio Bursts (FRBs) are associated with Gamma-Ray Bursts (GRBs), as recently suggested by Zhang, the combination of redshift measurements of GRBs and dispersion measure (DM) measurements of FRBs opens a new window to study cosmology. At $z<2$ where the universe is essentially fully ionized, detections of FRB/GRB pairs can give an independent measurement of the intergalactic medium portion of the baryon mass fraction, $Ω_b f_{\rm IGM}$, of the universe. If a good sample of FRB/GRB associations are discovered at higher redshifts, the free electron column density history can be mapped, which can be used to probe the reionization history of both hydrogen and helium in the universe. We apply our formulation to GRBs 101011A and 100704A that each might have an associated FRB, and constrained $Ω_b f_{\rm IGM}$ to be consistent with the value derived from other methods. The methodology developed here is also applicable, if the redshifts of FRBs not associated with GRBs can be measured by other means.

preprint2014arXiv

Low Energy Spectral Index and $E_{p}$ evolution of Quasi-thermal Photosphere Emission of Gamma-Ray Bursts

Recent observations by the Fermi satellite suggest that a photosphere emission component is contributing to the observed spectrum of many GRBs. One important question is whether the photosphere component can interpret the typical ``Band'' function of GRBs with a typical low energy photon spectral index $α\sim -1$. We perform a detailed study of the photosphere emission spectrum by progressively introducing several physical ingredients previously not fully incorporated, including the probability distribution of the location of a dynamically evolving photosphere, superposition of emission from an equal-arrival-time ``volume'' in a continuous wind, the evolution of optical depth of a wind with finite but evolving outer boundary, as well as the effect of different top-hat wind luminosity ($L_w$) profiles. By assuming a co-moving blackbody spectrum emerging from the photosphere, we find that for an outflow with a constant or increasing $L_w$, the low-energy spectrum below the peak energy ($E_{p}$), can be modified to $F_ν\sim ν^{1.5}$ ($α\sim +0.5$). A softer ($-1<α<+0.5$) or flat ($α=-1$) spectrum can be obtained during the $L_w$ decreasing phase or high-latitude-emission-dominated phase. We also study the evolution of $E_{p}$ as a function of wind and photosphere luminosity in this photosphere model. An $E_p-L$ tracking pattern can be reproduced if a certain positive dependence between the dimensionless entropy $η$ and $L_w$ is introduced. However, the hard-to-soft evolution pattern cannot be reproduced unless a contrived condition is invoked. In order to interpret the Band spectrum, a more complicated photosphere model or a different energy dissipation and radiation mechanism are needed.

preprint2014arXiv

Parallel Multi-Block ADMM with o(1/k) Convergence

This paper introduces a parallel and distributed extension to the alternating direction method of multipliers (ADMM) for solving convex problem: minimize $\sum_{i=1}^N f_i(x_i)$ subject to $\sum_{i=1}^N A_i x_i=c, x_i\in \mathcal{X}_i$. The algorithm decomposes the original problem into N smaller subproblems and solves them in parallel at each iteration. This Jacobian-type algorithm is well suited for distributed computing and is particularly attractive for solving certain large-scale problems. This paper introduces a few novel results. Firstly, it shows that extending ADMM straightforwardly from the classic Gauss-Seidel setting to the Jacobian setting, from 2 blocks to N blocks, will preserve convergence if matrices $A_i$ are mutually near-orthogonal and have full column-rank. Secondly, for general matrices $A_i$, this paper proposes to add proximal terms of different kinds to the N subproblems so that the subproblems can be solved in flexible and efficient ways and the algorithm converges globally at a rate of o(1/k). Thirdly, a simple technique is introduced to improve some existing convergence rates from O(1/k) to o(1/k). In practice, some conditions in our convergence theorems are conservative. Therefore, we introduce a strategy for dynamically tuning the parameters in the algorithm, leading to substantial acceleration of the convergence in practice. Numerical results are presented to demonstrate the efficiency of the proposed method in comparison with several existing parallel algorithms. We implemented our algorithm on Amazon EC2, an on-demand public computing cloud, and report its performance on very large-scale basis pursuit problems with distributed data.

preprint2013arXiv

Adaptive low rank and sparse decomposition of video using compressive sensing

We address the problem of reconstructing and analyzing surveillance videos using compressive sensing. We develop a new method that performs video reconstruction by low rank and sparse decomposition adaptively. Background subtraction becomes part of the reconstruction. In our method, a background model is used in which the background is learned adaptively as the compressive measurements are processed. The adaptive method has low latency, and is more robust than previous methods. We will present experimental results to demonstrate the advantages of the proposed method.

preprint2013arXiv

Surveillance Video Processing Using Compressive Sensing

A compressive sensing method combined with decomposition of a matrix formed with image frames of a surveillance video into low rank and sparse matrices is proposed to segment the background and extract moving objects in a surveillance video. The video is acquired by compressive measurements, and the measurements are used to reconstruct the video by a low rank and sparse decomposition of matrix. The low rank component represents the background, and the sparse component is used to identify moving objects in the surveillance video. The decomposition is performed by an augmented Lagrangian alternating direction method. Experiments are carried out to demonstrate that moving objects can be reliably extracted with a small amount of measurements.

preprint2010arXiv

Barstar:barnase - A versatile strategy for nanodiamond bioconjugation

This is a new nanoparticle bioconjugation reaction.

preprint2010arXiv

Measuring the beaming angle of GRB 030329 by fitting the rebrightenings in its multiband afterglow

Multiple rebrightenings have been observed in the multiband afterglow of GRB 030329. Especially, a marked and quick rebrightening occurred at about t ~ 1.2 * 10^5 s. Energy injection from late and slow shells seems to be the best interpretation for these rebrightenings. Usually it is assumed that the energy is injected into the whole external shock. However, in the case of GRB 030329, the rebrightenings are so quick that the usual consideration fails to give a satisfactory fit to the observed light curves. Actually, since these late/slow shells coast freely in the wake of the external shock, they should be cold and may not expand laterally. The energy injection then should only occur at the central region of the external shock. Considering this effect, we numerically re-fit the quick rebrightenings observed in GRB 030329. By doing this, we were able to derive the beaming angle of the energy injection process. Our result, with a relative residual of only 5% - 10% during the major rebrightening, is better than any previous modeling. The derived energy injection angle is about 0.035. We assume that these late shells are ejected by the central engine via the same mechanism as those early shells that produce the prompt gamma-ray burst. The main difference is that their velocities are much slower, so that they catch up with the external shock very lately and manifest as the observed quick rebrightenings. If this were true, then the derived energy injection angle can give a good measure of the beaming angle of the prompt gamma-ray emission. Our study may hopefully provide a novel method to measure the beaming angle of gamma-ray bursts.

Wei Deng

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Weak-to-Strong Elicitation via Mismatched Wrong Drafts

Index-ASR Technical Report

Reflected Schrödinger Bridge for Constrained Generative Modeling

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

DeepLight: Deep Lightweight Feature Interactions for Accelerating CTR Predictions in Ad Serving

An Adaptive Empirical Bayesian Method for Sparse Deep Learning

Non-uniform dependence for higher dimensional Camassa-Holm equations in Besov spaces

NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

Collision-induced magnetic reconnection and a unified interpretation of polarization properties of GRBs and blazars

Efficient Production of High-energy Nonthermal Particles during Magnetic Reconnection in a Magnetically-dominated Ion-Electron Plasma

Polarization Signatures of Relativistic Magnetohydrodynamic Shocks in the Blazar Emission Region - I. Force-free Helical Magnetic Fields

Relativistic MHD simulations of collision-induced magnetic dissipation in Poynting-flux-dominated jets/outflows

Cosmological implications of Fast Radio Burst / Gamma-Ray Burst Associations

Low Energy Spectral Index and $E_{p}$ evolution of Quasi-thermal Photosphere Emission of Gamma-Ray Bursts

Parallel Multi-Block ADMM with o(1/k) Convergence

Adaptive low rank and sparse decomposition of video using compressive sensing

Surveillance Video Processing Using Compressive Sensing

Barstar:barnase - A versatile strategy for nanodiamond bioconjugation

Measuring the beaming angle of GRB 030329 by fitting the rebrightenings in its multiband afterglow