Researcher profile

Jialiang Cheng

Jialiang Cheng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models by activating only a subset of experts per token. However, standard MoE employs a fixed Top-K routing strategy, leading to redundant computation and suboptimal inference latency. Existing acceleration methods either require costly retraining with architectural changes or suffer from severe performance drop at high sparsity due to train-inference mismatch. To address these limitations, we propose BEAM (Binary Expert Activation Masking), a novel method that learns token-adaptive expert selection via trainable binary masks. With a straight-through estimator and an auxiliary regularization loss, BEAM induces dynamic expert sparsity through end-to-end training while maintaining model capability. We further implement an efficient custom CUDA kernel for BEAM, ensuring seamless integration with the vLLM inference framework. Experiments show that BEAM retains over 98\% of the original model's performance while reducing MoE layer FLOPs by up to 85\%, achieving up to 2.5$\times$ faster decoding and 1.4$\times$ higher throughput, demonstrating its effectiveness as a practical, plug-and-play solution for efficient MoE inference.

preprint2021arXiv

Machine Learning Regression based Single Event Transient Modeling Method for Circuit-Level Simulation

In this paper, a novel machine learning regression based single event transient (SET) modeling method is proposed. The proposed method can obtain a reasonable and accurate model without considering the complex physical mechanism. We got plenty of SET current data of SMIC 130nm bulk CMOS by TCAD simulation under different conditions (e.g. different LET and different drain bias voltage). A multilayer feedfordward neural network is used to build the SET pulse current model by learning the data from TCAD simulation. The proposed model is validated with the simulation results from TCAD simulation. The trained SET pulse current model is implemented as a Verilog-A current source in the Cadence Spectre circuit simulator and an inverter with five fan-outs is used to show the practicability and reasonableness of the proposed SET pulse current model for circuit-level single-event effect (SEE) simulation.