Source author record

Guanghua Yu

Guanghua Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence cond-mat.mtrl-sci physics.app-ph

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification

The deployment of Large Language Models (LLMs) on resource-constrained edge devices is increasingly hindered by prohibitive memory and computational requirements. While ternary quantization offers a compelling solution by reducing weights to {-1, 0, +1}, current implementations suffer from a fundamental misalignment with commodity hardware. Most existing methods must choose between 2-bit aligned packing, which incurs significant bit wastage, or 1.67-bit irregular packing, which degrades inference speed. To resolve this tension, we propose Sherry, a hardware-efficient ternary quantization framework. Sherry introduces a 3:4 fine-grained sparsity that achieves a regularized 1.25-bit width by packing blocks of four weights into five bits, restoring power-of-two alignment. Furthermore, we identify weight trapping issue in sparse ternary training, which leads to representational collapse. To address this, Sherry introduces Arenas, an annealing residual synapse mechanism that maintains representational diversity during training. Empirical evaluations on LLaMA-3.2 across five benchmarks demonstrate that Sherry matches state-of-the-art ternary performance while significantly reducing model size. Notably, on an Intel i7-14700HX CPU, our 1B model achieves zero accuracy loss compared to SOTA baselines while providing 25% bit savings and 10% speed up. The code is available at https://github.com/Tencent/AngelSlim .

preprint2026arXiv

SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization

NVFP4 has recently emerged as an efficient 4-bit microscaling format for large language models (LLMs), offering superior numerical fidelity with native hardware support. However, existing methods often yield suboptimal performance due to inflexible scale selection and the coupled treatment of quantization and dequantization scales. To address these issues, we propose Scale Optimization for Accurate Reconstruction (SOAR), a novel post-training quantization framework that improves the accuracy of NVFP4 quantization. At its core, SOAR features Closed-form Joint Scale Optimization (CJSO), which jointly optimizes global and block-wise scales via analytical solutions derived from reconstruction error minimization. Furthermore, it incorporates Decoupled Scale Search (DSS). DSS decouples the high-precision quantization scale from its constrained dequantization counterpart, and performs discrete search to mitigate precision loss from scale quantization. Extensive experiments across multiple LLMs show that our method consistently outperforms existing NVFP4 quantization baselines, achieving superior accuracy under the same memory footprint with no additional hardware overhead. The code and models will be available at https://github.com/steven-bao1/SOAR.

preprint2019arXiv

Electrically Tunable Wafer-Sized Three-Dimensional Topological Insulator Thin Films Grown by Magnetron Sputtering

Three-dimensional (3D) topological insulators (TIs) are candidate materials for various electronic and spintronic devices due to their strong spin-orbit coupling and unique surface electronic structure. Rapid, low-cost preparation of large-area TI thin films compatible with conventional semiconductor technology is key to the practical applications of TIs. Here, we show that wafer-sized Bi2Te3 family TI and magnetic TI films with decent quality and well-controlled composition and properties can be prepared on amorphous SiO2/Si substrates by magnetron cosputtering. The SiO2/Si substrates enable us to electrically tune (Bi1-xSbx)2Te3 and Cr-doped (Bi1-xSbx)2Te3 TI films between p-type and n-type behavior and thus study the phenomena associated with topological surface states, such as the quantum anomalous Hall effect (QAHE). This work significantly facilitates the fabrication of TI-based devices for electronic and spintronic applications.