Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
22works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

22 published item(s)

preprint2026arXiv

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

As language models become increasingly capable, users expect them to provide not only accurate responses but also behaviors aligned with diverse human preferences across a variety of scenarios. To achieve this, Reinforcement learning (RL) pipelines have begun incorporating multiple rewards, each capturing a distinct preference, to guide models toward these desired behaviors. However, recent work has defaulted to apply Group Relative Policy Optimization (GRPO) under multi-reward setting without examining its suitability. In this paper, we demonstrate that directly applying GRPO to normalize distinct rollout reward combinations causes them to collapse into identical advantage values, reducing the resolution of the training signal and resulting in suboptimal convergence and, in some cases, early training failure. We then introduce Group reward-Decoupled Normalization Policy Optimization (GDPO), a new policy optimization method to resolve these issues by decoupling the normalization of individual rewards, more faithfully preserving their relative differences and enabling more accurate multi-reward optimization, along with substantially improved training stability. We compare GDPO with GRPO across three tasks: tool calling, math reasoning, and coding reasoning, evaluating both correctness metrics (accuracy, bug ratio) and constraint adherence metrics (format, length). Across all settings, GDPO consistently outperforms GRPO, demonstrating its effectiveness and generalizability for multi-reward reinforcement learning optimization.

preprint2022arXiv

Backward-Angle ($u$-channel) Production at an Electron-Ion Collider

In backward photoproduction of mesons, $γp\rightarrow M p$, the target proton takes most of the photon momentum, while the produced meson recoils in the direction from which the photon came. Thus the Mandelstam $u$ is small, while the squared momentum transfer $t$ is typically large, near the kinematic limit. In a collider geometry, backward production transfers the struck baryon by many units of rapidity, in a striking similarity to baryon stopping. We explore this similarity, and point out the similarities between the Regge theories used to model baryon stopping with those that are used for backward production. We then explore how backward production can be explored at higher energies than are available at fixed target experiments, by studying production at an electron-ion collider. We calculate the expected $ep$ cross sections and rates, finding that the rate for backward $ω$ production is about 1/300 that of forward $ω$s. We discuss the kinematics of backward production and consider the detector requirements for experimental study.

preprint2022arXiv

Converting Artificial Neural Networks to Spiking Neural Networks via Parameter Calibration

Spiking Neural Network (SNN), originating from the neural behavior in biology, has been recognized as one of the next-generation neural networks. Conventionally, SNNs can be obtained by converting from pre-trained Artificial Neural Networks (ANNs) by replacing the non-linear activation with spiking neurons without changing the parameters. In this work, we argue that simply copying and pasting the weights of ANN to SNN inevitably results in activation mismatch, especially for ANNs that are trained with batch normalization (BN) layers. To tackle the activation mismatch issue, we first provide a theoretical analysis by decomposing local conversion error to clipping error and flooring error, and then quantitatively measure how this error propagates throughout the layers using the second-order analysis. Motivated by the theoretical results, we propose a set of layer-wise parameter calibration algorithms, which adjusts the parameters to minimize the activation mismatch. Extensive experiments for the proposed algorithms are performed on modern architectures and large-scale tasks including ImageNet classification and MS COCO detection. We demonstrate that our method can handle the SNN conversion with batch normalization layers and effectively preserve the high accuracy even in 32 time steps. For example, our calibration algorithms can increase up to 65% accuracy when converting VGG-16 with BN layers.

preprint2022arXiv

DRESS: Dynamic REal-time Sparse Subnets

The limited and dynamically varied resources on edge devices motivate us to deploy an optimized deep neural network that can adapt its sub-networks to fit in different resource constraints. However, existing works often build sub-networks through searching different network architectures in a hand-crafted sampling space, which not only can result in a subpar performance but also may cause on-device re-configuration overhead. In this paper, we propose a novel training algorithm, Dynamic REal-time Sparse Subnets (DRESS). DRESS samples multiple sub-networks from the same backbone network through row-based unstructured sparsity, and jointly trains these sub-networks in parallel with weighted loss. DRESS also exploits strategies including parameter reusing and row-based fine-grained sampling for efficient storage consumption and efficient on-device adaptation. Extensive experiments on public vision datasets show that DRESS yields significantly higher accuracy than state-of-the-art sub-networks.

preprint2022arXiv

Dressing in the Wild by Watching Dance Videos

While significant progress has been made in garment transfer, one of the most applicable directions of human-centric image generation, existing works overlook the in-the-wild imagery, presenting severe garment-person misalignment as well as noticeable degradation in fine texture details. This paper, therefore, attends to virtual try-on in real-world scenes and brings essential improvements in authenticity and naturalness especially for loose garment (e.g., skirts, formal dresses), challenging poses (e.g., cross arms, bent legs), and cluttered backgrounds. Specifically, we find that the pixel flow excels at handling loose garments whereas the vertex flow is preferred for hard poses, and by combining their advantages we propose a novel generative network called wFlow that can effectively push up garment transfer to in-the-wild context. Moreover, former approaches require paired images for training. Instead, we cut down the laboriousness by working on a newly constructed large-scale video dataset named Dance50k with self-supervised cross-frame training and an online cycle optimization. The proposed Dance50k can boost real-world virtual dressing by covering a wide variety of garments under dancing poses. Extensive experiments demonstrate the superiority of our wFlow in generating realistic garment transfer results for in-the-wild images without resorting to expensive paired datasets.

preprint2022arXiv

Global uniform in $N$ estimates for solutions of a system of Hartree-Fock-Bogoliubov type in the case $β<1$

We extend the results of the 2019 paper by the third and fourth author globally in time. More precisely, we prove uniform in $N$ estimates for the solutions $ϕ$, $Λ$ and $Γ$ of a coupled system of Hartree-Fock-Bogoliubov type with interaction potential $V_N(x-y)=N^{3 β}v(N^β(x-y))$ with $β<1$. The potential satisfies some technical conditions, but is not small. The initial conditions have finite energy and the &#34;pair correlation&#34; part satisfies a smallness condition, but are otherwise general functions in suitable Sobolev spaces, and the expected correlations in $Λ$ develop dynamically in time. The estimates are expected to improve the Fock space bounds from the 2021 paper of the first and fifth author. This will be addressed in a different paper.

preprint2022arXiv

MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing

User data confidentiality protection is becoming a rising challenge in the present deep learning research. Without access to data, conventional data-driven model compression faces a higher risk of performance degradation. Recently, some works propose to generate images from a specific pretrained model to serve as training data. However, the inversion process only utilizes biased feature statistics stored in one model and is from low-dimension to high-dimension. As a consequence, it inevitably encounters the difficulties of generalizability and inexact inversion, which leads to unsatisfactory performance. To address these problems, we propose MixMix based on two simple yet effective techniques: (1) Feature Mixing: utilizes various models to construct a universal feature space for generalized inversion; (2) Data Mixing: mixes the synthesized images and labels to generate exact label information. We prove the effectiveness of MixMix from both theoretical and empirical perspectives. Extensive experiments show that MixMix outperforms existing methods on the mainstream compression tasks, including quantization, knowledge distillation, and pruning. Specifically, MixMix achieves up to 4% and 20% accuracy uplift on quantization and pruning, respectively, compared to existing data-free compression work.

preprint2022arXiv

Neural Mean Discrepancy for Efficient Out-of-Distribution Detection

Various approaches have been proposed for out-of-distribution (OOD) detection by augmenting models, input examples, training sets, and optimization objectives. Deviating from existing work, we have a simple hypothesis that standard off-the-shelf models may already contain sufficient information about the training set distribution which can be leveraged for reliable OOD detection. Our empirical study on validating this hypothesis, which measures the model activation&#39;s mean for OOD and in-distribution (ID) mini-batches, surprisingly finds that activation means of OOD mini-batches consistently deviate more from those of the training data. In addition, training data&#39;s activation means can be computed offline efficiently or retrieved from batch normalization layers as a &#39;free lunch&#39;. Based upon this observation, we propose a novel metric called Neural Mean Discrepancy (NMD), which compares neural means of the input examples and training data. Leveraging the simplicity of NMD, we propose an efficient OOD detector that computes neural means by a standard forward pass followed by a lightweight classifier. Extensive experiments show that NMD outperforms state-of-the-art OOD approaches across multiple datasets and model architectures in terms of both detection accuracy and computational cost.

preprint2022arXiv

PASTA-GAN++: A Versatile Framework for High-Resolution Unpaired Virtual Try-on

Image-based virtual try-on is one of the most promising applications of human-centric image generation due to its tremendous real-world potential. In this work, we take a step forwards to explore versatile virtual try-on solutions, which we argue should possess three main properties, namely, they should support unsupervised training, arbitrary garment categories, and controllable garment editing. To this end, we propose a characteristic-preserving end-to-end network, the PAtch-routed SpaTially-Adaptive GAN++ (PASTA-GAN++), to achieve a versatile system for high-resolution unpaired virtual try-on. Specifically, our PASTA-GAN++ consists of an innovative patch-routed disentanglement module to decouple the intact garment into normalized patches, which is capable of retaining garment style information while eliminating the garment spatial information, thus alleviating the overfitting issue during unsupervised training. Furthermore, PASTA-GAN++ introduces a patch-based garment representation and a patch-guided parsing synthesis block, allowing it to handle arbitrary garment categories and support local garment editing. Finally, to obtain try-on results with realistic texture details, PASTA-GAN++ incorporates a novel spatially-adaptive residual module to inject the coarse warped garment feature into the generator. Extensive experiments on our newly collected UnPaired virtual Try-on (UPT) dataset demonstrate the superiority of PASTA-GAN++ over existing SOTAs and its ability for controllable garment editing.

preprint2022arXiv

SphereFed: Hyperspherical Federated Learning

Federated Learning aims at training a global model from multiple decentralized devices (i.e. clients) without exchanging their private local data. A key challenge is the handling of non-i.i.d. (independent identically distributed) data across multiple clients that may induce disparities of their local features. We introduce the Hyperspherical Federated Learning (SphereFed) framework to address the non-i.i.d. issue by constraining learned representations of data points to be on a unit hypersphere shared by clients. Specifically, all clients learn their local representations by minimizing the loss with respect to a fixed classifier whose weights span the unit hypersphere. After federated training in improving the global model, this classifier is further calibrated with a closed-form solution by minimizing a mean squared loss. We show that the calibration solution can be computed efficiently and distributedly without direct access of local data. Extensive experiments indicate that our SphereFed approach is able to improve the accuracy of multiple existing federated learning algorithms by a considerable margin (up to 6% on challenging datasets) with enhanced computation and communication efficiency across datasets and model architectures.

preprint2022arXiv

SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems

We design deep neural networks (DNNs) and corresponding networks&#39; splittings to distribute DNNs&#39; workload to camera sensors and a centralized aggregator on head mounted devices to meet system performance targets in inference accuracy and latency under the given hardware resource constraints. To achieve an optimal balance among computation, communication, and performance, a split-aware neural architecture search framework, SplitNets, is introduced to conduct model designing, splitting, and communication reduction simultaneously. We further extend the framework to multi-view systems for learning to fuse inputs from multiple camera sensors with optimal performance and systemic efficiency. We validate SplitNets for single-view system on ImageNet as well as multi-view system on 3D classification, and show that the SplitNets framework achieves state-of-the-art (SOTA) performance and system latency compared with existing approaches.

preprint2021arXiv

EIC Physics from An All-Silicon Tracking Detector

The proposed electron-ion collider has a rich physics program to study the internal structure of protons and heavy nuclei. This program will impose strict requirements on detector design. This paper explores how these requirements can be satisfied using an all-silicon tracking detector, by consideration of three representative probes: heavy flavor hadrons, jets, and exclusive vector mesons.

preprint2021arXiv

Investigation of Experimental Observables in Search of the Chiral Magnetic Effect in Heavy-ion Collisions in the STAR experiment

The chiral magnetic effect (CME) is a novel transport phenomenon, arising from the interplay between quantum anomalies and strong magnetic fields in chiral systems. In high-energy nuclear collisions, the CME may survive the expansion of the quark-gluon plasma fireball and be detected in experiments. Over the past decade, the experimental searches for the CME have aroused extensive interest at the Relativistic Heavy Ion Collider (RHIC) and the Large Hadron Collider (LHC). The main goal of this article is to investigate three pertinent experimental approaches: the $γ$ correlator, the $R$ correlator and the signed balance functions. We will exploit both simple Monte Carlo simulations and a realistic event generator (EBE-AVFD) to verify the equivalence in the kernel-component observables among these methods and to ascertain their sensitivities to the CME signal for the isobaric collisions at RHIC.

preprint2021arXiv

Probing gluon helicity with heavy flavor at the EIC

We propose a new measurement of the heavy flavor hadron double spin asymmetry in deep-inelastic scattering at a future Electron-Ion Collider (EIC) to constrain the polarized gluon distribution function inside the proton. Statistical projection on $D^0$ meson double spin asymmetry is calculated with an EIC central detector using an all-silicon tracker and vertexing subsystem. A first impact study was done by interpreting pseudo-data at next-to-leading order in QCD. The sensitivity of the experimental observable in constraining gluon helicity distribution in a wide range of parton momentum fraction $x$ has been investigated considering different beam energy configurations. This measurement complements the inclusive spin-dependent structure function measurement and provides an opportunity to constrain the gluon helicity distribution in the moderate $x$ region.

preprint2020arXiv

ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs

A number of cross-lingual transfer learning approaches based on neural networks have been proposed for the case when large amounts of parallel text are at our disposal. However, in many real-world settings, the size of parallel annotated training data is restricted. Additionally, prior cross-lingual mapping research has mainly focused on the word level. This raises the question of whether such techniques can also be applied to effortlessly obtain cross-lingually aligned sentence representations. To this end, we propose an Adversarial Bi-directional Sentence Embedding Mapping (ABSent) framework, which learns mappings of cross-lingual sentence representations from limited quantities of parallel data.

preprint2020arXiv

Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks

We propose Additive Powers-of-Two~(APoT) quantization, an efficient non-uniform quantization scheme for the bell-shaped and long-tailed distribution of weights and activations in neural networks. By constraining all quantization levels as the sum of Powers-of-Two terms, APoT quantization enjoys high computational efficiency and a good match with the distribution of weights. A simple reparameterization of the clipping function is applied to generate a better-defined gradient for learning the clipping threshold. Moreover, weight normalization is presented to refine the distribution of weights to make the training more stable and consistent. Experimental results show that our proposed method outperforms state-of-the-art methods, and is even competitive with the full-precision models, demonstrating the effectiveness of our proposed APoT quantization. For example, our 4-bit quantized ResNet-50 on ImageNet achieves 76.6% top-1 accuracy without bells and whistles; meanwhile, our model reduces 22% computational cost compared with the uniformly quantized counterpart. The code is available at https://github.com/yhhhli/APoT_Quantization.

preprint2020arXiv

Charm and beauty isolation from heavy flavor decay electrons in Au+Au collisions at $\sqrt{s_{\rm NN}}$ = 200 GeV at RHIC

We present a study of charm and beauty isolation based on a data-driven method with recent measurements on heavy flavor hadrons and their decay electrons in Au+Au collisions at $\sqrt{s_{\rm NN}}$ = 200 GeV at RHIC. The individual electron $p_{\rm T}$ spectra, $R_{\rm AA}$ and $v_2$ distributions from charmed and beauty hadron decays are obtained. We find that the electron $R_{\rm AA}$ from beauty hadron decays ($R_{\rm AA}^{\rm b\rightarrow e}$) is suppressed in minimum bias Au+Au collisions but less suppressed compared with that from charmed hadron decays at $p_{\rm T}$ $>$ 3.5 GeV/$c$, which indicates that beauty quark interacts with the hot-dense medium with depositing its energy and is consistent with the mass-dependent energy loss scenario. For the first time, the non-zero electron $v_2$ from beauty hadron decays ($v_2^{\rm b\rightarrow e}$) at $p_{\rm T}$ $>$ 3.0 GeV/$c$ is observed and shows smaller elliptic flow compared with that from charmed hadron decays at $p_{\rm T}$ $<$ 4.0 GeV/$c$. At 2.5 GeV/$c$ $<$ $p_{\rm T}$ $<$ 4.5 GeV/$c$, $v_2^{\rm b\rightarrow e}$ is smaller than a number-of-constituent-quark (NCQ) scaling hypothesis. This suggests that beauty quark is unlikely thermalized and too heavy to be moved in a partonic collectivity in heavy-ion collisions at the RHIC energy.

preprint2020arXiv

Efficient Bitwidth Search for Practical Mixed Precision Neural Network

Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks. Recent efforts propose to quantize weights and activations from different layers with different precision to improve the overall performance. However, it is challenging to find the optimal bitwidth (i.e., precision) for weights and activations of each layer efficiently. Meanwhile, it is yet unclear how to perform convolution for weights and activations of different precision efficiently on generic hardware platforms. To resolve these two issues, in this paper, we first propose an Efficient Bitwidth Search (EBS) algorithm, which reuses the meta weights for different quantization bitwidth and thus the strength for each candidate precision can be optimized directly w.r.t the objective without superfluous copies, reducing both the memory and computational cost significantly. Second, we propose a binary decomposition algorithm that converts weights and activations of different precision into binary matrices to make the mixed precision convolution efficient and practical. Experiment results on CIFAR10 and ImageNet datasets demonstrate our mixed precision QNN outperforms the handcrafted uniform bitwidth counterparts and other mixed precision techniques.

preprint2020arXiv

Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text Classification

In cross-lingual text classification, one seeks to exploit labeled data from one language to train a text classification model that can then be applied to a completely different language. Recent multilingual representation models have made it much easier to achieve this. Still, there may still be subtle differences between languages that are neglected when doing so. To address this, we present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations. The resulting model then serves as a teacher to induce labels for unlabeled target language samples that can be used during further adversarial training, allowing us to gradually adapt our model to the target language. Compared with a number of strong baselines, we observe significant gains in effectiveness on document and intent classification for a diverse set of languages.

preprint2020arXiv

The Hartree equation with a constant magnetic field: Well-posedness theory

We consider the Hartree equation for infinitely many electrons with a constant external magnetic field. For the system, we show a local well-posedness result when the initial data is the pertubation of a Fermi sea, which is a non-trace class stationary solution to the system. In this case, the one particle Hamiltonian is the Pauli operator, which possesses distinct properties from the Laplace operator, for example, it has a discrete spectrum and infinite-dimensional eigenspaces. The new ingredient is that we use the Fourier-Wigner transform and the asymptotic properties of associated Laguerre polynomials to derive a collapsing estimate, by which we establish the local well-posedness result.

preprint2019arXiv

Asymmetrical Hierarchical Networks with Attentive Interactions for Interpretable Review-Based Recommendation

Recently, recommender systems have been able to emit substantially improved recommendations by leveraging user-provided reviews. Existing methods typically merge all reviews of a given user or item into a long document, and then process user and item documents in the same manner. In practice, however, these two sets of reviews are notably different: users&#39; reviews reflect a variety of items that they have bought and are hence very heterogeneous in their topics, while an item&#39;s reviews pertain only to that single item and are thus topically homogeneous. In this work, we develop a novel neural network model that properly accounts for this important difference by means of asymmetric attentive modules. The user module learns to attend to only those signals that are relevant with respect to the target item, whereas the item module learns to extract the most salient contents with regard to properties of the item. Our multi-hierarchical paradigm accounts for the fact that neither are all reviews equally useful, nor are all sentences within each review equally pertinent. Extensive experimental results on a variety of real datasets demonstrate the effectiveness of our method.

preprint2019arXiv

Open Heavy-Flavor Production in Heavy-Ion Collisions

The ultra-relativistic heavy-ion programs at the Relativistic Heavy Ion Collider and the Large Hadron Collider have evolved into a phase of quantitative studies of Quantum Chromodynamics at very high temperatures. The charm and bottom hadron production offer unique insights into the remarkable transport properties and the microscopic structure of the Quark-Gluon Plasma (QGP) created in these collisions. Heavy quarks, due to their large masses, undergo Brownian motion at low momentum, provide a window on hadronization mechanisms at intermediate momenta, and are expected to merge into a radiative-energy loss regime at high momentum. We review recent experimental and theoretical achievements on measuring a variety of heavy-flavor observables, characterizing the different regimes in momentum, extracting pertinent transport coefficients and deducing implications for the &#34;inner workings&#34; of the QGP medium.