Researcher profile

Dawei Yang

Dawei Yang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs

Large language models (LLMs) have driven major progress in NLP, yet their substantial memory and compute demands still hinder practical deployment. Binarization can compress weights to 1 bit, fundamentally lowering compute and bandwidth cost. However, existing methods cannot address activation heavy tails and thus must keep activations in high precision, preventing true end-to-end acceleration. To overcome this limitation, we propose BWLA (Binarized Weights and Low-bit Activations), the first post-training quantization framework that preserves high accuracy while achieving 1-bit weight quantization together with low-bit activations (e.g., 6 bits). The Orthogonal-Kronecker Transformation (OKT) learns an orthogonal mapping via EM minimization, converting unimodal weights into symmetric bimodal forms while suppressing activation tails and incoherence. The Proximal SVD Projection (PSP) then performs lightweight low-rank refinement through proximal SVD projection, further enhancing quantizability with minimal overhead. On Qwen3-32B, BWLA reaches a Wikitext2 perplexity of 11.92 under 6-bit activations (vs. 38 from SOTA), improves five zero-shot tasks by more than 70%, and delivers 3.26 times inference speedup, demonstrating strong potential for real-world LLM compression and acceleration.

preprint2026arXiv

CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model

Segment Anything Models (SAMs) are extensively used in computer vision for universal image segmentation, but deploying them on resource-constrained devices is challenging due to their high computational and memory demands. Post-Training Quantization (PTQ) is a widely used technique for model compression and acceleration. However, existing PTQ methods fail to consider the cross-attention architecture in the SAM decoder. This degradation primarily stems from the unique challenges posed by SAMs: (1) Attention dissipation, where the attention information in the decoder, which is crucial for representing segmentation masks, collapses into a diffuse and non-semantic form under low-bit quantization; and (2) Reconstruction oscillation, where bidirectional coupling within the two-way transformer introduces cross-branch error interference and destabilizes convergence. To tackle these issues, we propose CAR-SAM, a unified quantization framework tailored for SAMs. Firstly, to mitigate attention dissipation, we introduce MatMul-Aware Compensation (MAC) mechanism that transfers activation-induced quantization errors from MatMul to preceding linear weights. Secondly, to mitigate oscillation in decoder optimization, we develop a Joint Cross-Attention Reconstruction (JCAR) strategy that jointly reconstructs coupled attention branches, suppressing oscillatory behavior and promoting stable convergence. Extensive experiments show that CAR-SAM robustly quantizes SAM models down to 4-bit precision, surpassing existing methods by 14.6% and 6.6% mAP on SAM-B and SAM-L respectively.

preprint2026arXiv

TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization

As Large Language Models (LLMs) advance toward practical deployment, the Microscaling FP4 (MXFP4) format has emerged as a cornerstone for next-generation low-bit inference, owing to its ability to balance high dynamic range with hardware efficiency. However, directly applying MXFP4 to LLM activation quantization inevitably leads to significant accuracy degradation. In this paper, we theoretically analyze the error structure of MXFP4 activation quantization, revealing that the root cause of this performance drop lies in two structural imbalances between activation distributions and the MXFP4 block floating-point format: (1) extreme inter-block variance imbalance and (2) intra-block codebook utilization imbalance. To address these challenges, we propose TORQ (Two-level Orthogonal Rotation for MXFP4 Quantization), a training-free Post-Training Quantization (PTQ) framework designed to reshape the geometric properties of the activation space through optimal coordinate transformations. At the macroscopic level, TORQ leverages the Schur-Horn theorem to redistribute activation energy via inter-block orthogonal rotation, preventing high-variance blocks from driving up shared scaling factors and thereby preserving the precision of small-magnitude elements. At the microscopic level, TORQ employs maximum-entropy-guided intra-block rotation to alleviate codebook collapse and maximize the MXFP4 codebook's information capacity. Experiments on mainstream LLMs such as LLaMA3 and Qwen3 show that TORQ significantly improves the accuracy of MXFP4 activation quantization compared to existing methods: on Qwen3-32B, the perplexity on WikiText is reduced to 8.43 (vs. 7.61 for BF16), and the average accuracy increases from 38.40% with direct RTN to 73.63% (vs. 74.82% for BF16), substantially narrowing the gap between 4-bit floating-point quantization and full-precision inference.

preprint2022arXiv

A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances

Affective computing plays a key role in human-computer interactions, entertainment, teaching, safe driving, and multimedia integration. Major breakthroughs have been made recently in the areas of affective computing (i.e., emotion recognition and sentiment analysis). Affective computing is realized based on unimodal or multimodal data, primarily consisting of physical information (e.g., textual, audio, and visual data) and physiological signals (e.g., EEG and ECG signals). Physical-based affect recognition caters to more researchers due to multiple public databases. However, it is hard to reveal one's inner emotion hidden purposely from facial expressions, audio tones, body gestures, etc. Physiological signals can generate more precise and reliable emotional results; yet, the difficulty in acquiring physiological signals also hinders their practical application. Thus, the fusion of physical information and physiological signals can provide useful features of emotional states and lead to higher accuracy. Instead of focusing on one specific field of affective analysis, we systematically review recent advances in the affective computing, and taxonomize unimodal affect recognition as well as multimodal affective analysis. Firstly, we introduce two typical emotion models followed by commonly used databases for affective computing. Next, we survey and taxonomize state-of-the-art unimodal affect recognition and multimodal affective analysis in terms of their detailed architectures and performances. Finally, we discuss some important aspects on affective computing and their applications and conclude this review with an indication of the most promising future directions, such as the establishment of baseline dataset, fusion strategies for multimodal affective analysis, and unsupervised learning models.

preprint2020arXiv

Ergodic optimization for some dynamical systems beyond uniform hyperbolicity

In this paper, we show that for several interesting systems beyond uniform hyperbolicity, any generic continuous function has a unique maximizing measure with zero entropy. In some cases, we also know that the maximizing measure has full support. These interesting systems including singular hyperbolic attractors, $C^\infty$ surface diffeomorphisms and diffeomorphisms away from homoclinic tangencies.

preprint2020arXiv

Learning to Generate Synthetic 3D Training Data through Hybrid Gradient

Synthetic images rendered by graphics engines are a promising source for training deep networks. However, it is challenging to ensure that they can help train a network to perform well on real images, because a graphics-based generation pipeline requires numerous design decisions such as the selection of 3D shapes and the placement of the camera. In this work, we propose a new method that optimizes the generation of 3D training data based on what we call "hybrid gradient". We parametrize the design decisions as a real vector, and combine the approximate gradient and the analytical gradient to obtain the hybrid gradient of the network performance with respect to this vector. We evaluate our approach on the task of estimating surface normal, depth or intrinsic decomposition from a single image. Experiments on standard benchmarks show that our approach can outperform the prior state of the art on optimizing the generation of 3D training data, particularly in terms of computational efficiency.

preprint2020arXiv

On the notions of singular domination and (multi-)singular hyperbolicity

The properties of uniform hyperbolicity and dominated splitting have been introduced to study the stability of the dynamics of diffeomorphisms. One meets difficulties when one tries to extend these definitions to vector fields and Shantao Liao has shown that it is more relevant to consider the linear Poincaré flow rather than the tangent flow in order to study the properties of the derivative. In this paper we define the notion of singular domination, an analog of the dominated splitting for the linear Poincaré flow which is robust under perturbations. Based on this, we give a new definition of multi-singular hyperbolicity which is equivalent to the one recently introduced by Bonatti-da Luz. The novelty of our definition is that it does not involve the blowup of the singular set and the renormalization cocycle of the linear flows.

preprint2020arXiv

On the partial hyperbolicity of robustly transitive sets with singularities

Homoclinic tangencies and singular hyperbolicity are involved in the Palis conjecture for vector fields. Typical three dimensional vector fields are well understood by recent works. We study the dynamics of higher dimensional vector fields that are away from homoclinic tangencies. More precisely, we prove that for \emph{any} dimensional vector field that is away from homoclinic tangencies, all singularities contained in its robustly transitive singular set are all hyperbolic and have the same index. Moreover, the robustly transitive set is {$C^1$-generically }partially hyperbolic if the vector field cannot be accumulated by ones with a homoclinic tangency.

preprint2020arXiv

Robust transitivity of singular hyperbolic attractors

Singular hyperbolicity is a weakened form of hyperbolicity that has been introduced for vector fields in order to allow non-isolated singularities inside the non-wandering set. A typical example of a singular hyperbolic set is the Lorenz attractor. However, in contrast to uniform hyperbolicity, singular hyperbolicity does not immediately imply robust topological properties, such as the transitivity. In this paper, we prove that open and densely inside the space of $C^1$ vector fields of a compact manifold, any singular hyperbolic attractors is robustly transitive.