Source author record

Jialiang Cheng

Jialiang Cheng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning

Catalog footprint

What is connected

2works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models by activating only a subset of experts per token. However, standard MoE employs a fixed Top-K routing strategy, leading to redundant computation and suboptimal inference latency. Existing acceleration methods either require costly retraining with architectural changes or suffer from severe performance drop at high sparsity due to train-inference mismatch. To address these limitations, we propose BEAM (Binary Expert Activation Masking), a novel method that learns token-adaptive expert selection via trainable binary masks. With a straight-through estimator and an auxiliary regularization loss, BEAM induces dynamic expert sparsity through end-to-end training while maintaining model capability. We further implement an efficient custom CUDA kernel for BEAM, ensuring seamless integration with the vLLM inference framework. Experiments show that BEAM retains over 98\% of the original model's performance while reducing MoE layer FLOPs by up to 85\%, achieving up to 2.5$\times$ faster decoding and 1.4$\times$ higher throughput, demonstrating its effectiveness as a practical, plug-and-play solution for efficient MoE inference.

preprint2021arXiv

Machine Learning Regression based Single Event Transient Modeling Method for Circuit-Level Simulation

In this paper, a novel machine learning regression based single event transient (SET) modeling method is proposed. The proposed method can obtain a reasonable and accurate model without considering the complex physical mechanism. We got plenty of SET current data of SMIC 130nm bulk CMOS by TCAD simulation under different conditions (e.g. different LET and different drain bias voltage). A multilayer feedfordward neural network is used to build the SET pulse current model by learning the data from TCAD simulation. The proposed model is validated with the simulation results from TCAD simulation. The trained SET pulse current model is implemented as a Verilog-A current source in the Cadence Spectre circuit simulator and an inverter with five fan-outs is used to show the practicability and reasonableness of the proposed SET pulse current model for circuit-level single-event effect (SEE) simulation.