Source author record

Hui Shen

Hui Shen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Computation and Language Machine Learning Computer Vision Applications Distributed, Parallel, and Cluster Computing eess.IV Hardware Architecture Methodology Networking and Internet Architecture physics.class-ph physics.med-ph Quantitative Methods

Catalog footprint

What is connected

20works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DoPE: Denoising Rotary Position Embedding

Positional encoding is essential for large language models (LLMs) to represent sequence order, yet recent studies show that Rotary Position Embedding (RoPE) can induce massive activation. We investigate the source of these instabilities via a spectral analysis of RoPE, and show that its low-frequency components concentrate structured energy, producing low-rank, over-aligned attention patterns. We theoretically reveal that this low-frequency alignment manifests as activation noise, degrading stability during long-context extrapolation. To mitigate this effect, we introduce Denoising Rotary Position Embedding (DoPE), a training-free method that identifies and suppresses noisy attention heads using truncated matrix entropy, then reparameterizes their attention maps with an isotropic Gaussian distribution. Across a range of settings, DoPE improves length extrapolation performance without fine-tuning, increases robustness to perturbations, and boosts both needle-in-a-haystack and many-shot in-context learning tasks. These results suggest that selective positional encoding is key to robust extrapolation. Our project page is Project: https://The-physical-picture-of-LLMs.github.io

preprint2026arXiv

LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction

Large language models (LLMs) have made significant progress in Emotional Intelligence (EI) and long-context modeling. However, existing benchmarks often overlook the fact that emotional information processing unfolds as a continuous long-context process. To address the absence of multidimensional EI evaluation in long-context inference and explore model performance under more challenging conditions, we present LongEmotion, a benchmark that encompasses a diverse suite of tasks targeting the assessment of models' capabilities in Emotion Recognition, Knowledge Application, and Empathetic Generation, with an average context length of 15,341 tokens. To enhance performance under realistic constraints, we introduce the Collaborative Emotional Modeling (CoEM) framework, which integrates Retrieval-Augmented Generation (RAG) and multi-agent collaboration to improve models' EI in long-context scenarios. We conduct a detailed analysis of various models in long-context settings, investigating how reasoning mode activation, RAG-based retrieval strategies, and context-length adaptability influence their EI performance. Our project page is: https://longemotion.github.io/

preprint2026arXiv

MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training

The training of large-scale Mixture of Experts (MoE) models faces a critical memory bottleneck due to severe load imbalance caused by dynamic token routing. This imbalance leads to memory overflow on GPUs with limited capacity, constraining model scalability. Existing load balancing methods, which cap expert capacity, compromise model accuracy and fail on memory-constrained hardware. To address this, we propose MemFine, a memory-aware fine-grained scheduling framework for MoE training. MemFine decomposes the token distribution and expert computation into manageable chunks and employs a chunked recomputation strategy, dynamically optimized through a theoretical memory model to balance memory efficiency and throughput. Experiments demonstrate that MemFine reduces activation memory by 48.03% and improves throughput by 4.42% compared to full recomputation-based baselines, enabling stable large-scale MoE training on memory-limited GPUs.

preprint2026arXiv

MMFormalizer: Multimodal Autoformalization in the Wild

Autoformalization, which translates natural language mathematics into formal statements to enable machine reasoning, faces fundamental challenges in the wild due to the multimodal nature of the physical world, where physics requires inferring hidden constraints (e.g., mass or energy) from visual elements. To address this, we propose MMFormalizer, which extends autoformalization beyond text by integrating adaptive grounding with entities from real-world mathematical and physical domains. MMFormalizer recursively constructs formal propositions from perceptually grounded primitives through recursive grounding and axiom composition, with adaptive recursive termination ensuring that every abstraction is supported by visual evidence and anchored in dimensional or axiomatic grounding. We evaluate MMFormalizer on a new benchmark, PhyX-AF, comprising 115 curated samples from MathVerse, PhyX, Synthetic Geometry, and Analytic Geometry, covering diverse multimodal autoformalization tasks. Results show that frontier models such as GPT-5 and Gemini-3-Pro achieve the highest compile and semantic accuracy, with GPT-5 excelling in physical reasoning, while geometry remains the most challenging domain. Overall, MMFormalizer provides a scalable framework for unified multimodal autoformalization, bridging perception and formal reasoning. To the best of our knowledge, this is the first multimodal autoformalization method capable of handling classical mechanics (derived from the Hamiltonian), as well as relativity, quantum mechanics, and thermodynamics. More details are available on our project page: MMFormalizer.github.io

preprint2022arXiv

A robust kernel machine regression towards biomarker selection in multi-omics datasets of osteoporosis for drug discovery

Many statistical machine approaches could ultimately highlight novel features of the etiology of complex diseases by analyzing multi-omics data. However, they are sensitive to some deviations in distribution when the observed samples are potentially contaminated with adversarial corrupted outliers (e.g., a fictional data distribution). Likewise, statistical advances lag in supporting comprehensive data-driven analyses of complex multi-omics data integration. We propose a novel non-linear M-estimator-based approach, "robust kernel machine regression (RobKMR)," to improve the robustness of statistical machine regression and the diversity of fictional data to examine the higher-order composite effect of multi-omics datasets. We address a robust kernel-centered Gram matrix to estimate the model parameters accurately. We also propose a robust score test to assess the marginal and joint Hadamard product of features from multi-omics data. We apply our proposed approach to a multi-omics dataset of osteoporosis (OP) from Caucasian females. Experiments demonstrate that the proposed approach effectively identifies the inter-related risk factors of OP. With solid evidence (p-value = 0.00001), biological validations, network-based analysis, causal inference, and drug repurposing, the selected three triplets ((DKK1, SMTN, DRGX), (MTND5, FASTKD2, CSMD3), (MTND5, COG3, CSMD3)) are significant biomarkers and directly relate to BMD. Overall, the top three selected genes (DKK1, MTND5, FASTKD2) and one gene (SIDT1 at p-value= 0.001) significantly bond with four drugs- Tacrolimus, Ibandronate, Alendronate, and Bazedoxifene out of 30 candidates for drug repurposing in OP. Further, the proposed approach can be applied to any disease model where multi-omics datasets are available.

preprint2020arXiv

A Deep Learning-Based Method for Automatic Segmentation of Proximal Femur from Quantitative Computed Tomography Images

Purpose: Proximal femur image analyses based on quantitative computed tomography (QCT) provide a method to quantify the bone density and evaluate osteoporosis and risk of fracture. We aim to develop a deep-learning-based method for automatic proximal femur segmentation. Methods and Materials: We developed a 3D image segmentation method based on V-Net, an end-to-end fully convolutional neural network (CNN), to extract the proximal femur QCT images automatically. The proposed V-net methodology adopts a compound loss function, which includes a Dice loss and a L2 regularizer. We performed experiments to evaluate the effectiveness of the proposed segmentation method. In the experiments, a QCT dataset which included 397 QCT subjects was used. For the QCT image of each subject, the ground truth for the proximal femur was delineated by a well-trained scientist. During the experiments for the entire cohort then for male and female subjects separately, 90% of the subjects were used in 10-fold cross-validation for training and internal validation, and to select the optimal parameters of the proposed models; the rest of the subjects were used to evaluate the performance of models. Results: Visual comparison demonstrated high agreement between the model prediction and ground truth contours of the proximal femur portion of the QCT images. In the entire cohort, the proposed model achieved a Dice score of 0.9815, a sensitivity of 0.9852 and a specificity of 0.9992. In addition, an R2 score of 0.9956 (p<0.001) was obtained when comparing the volumes measured by our model prediction with the ground truth. Conclusion: This method shows a great promise for clinical application to QCT and QCT-based finite element analysis of the proximal femur for evaluating osteoporosis and hip fracture risk.

preprint2020arXiv

A generalized kernel machine approach to identify higher-order composite effects in multi-view datasets

In recent years, a comprehensive study of multi-view datasets (e.g., multi-omics and imaging scans) has been a focus and forefront in biomedical research. State-of-the-art biomedical technologies are enabling us to collect multi-view biomedical datasets for the study of complex diseases. While all the views of data tend to explore complementary information of a disease, multi-view data analysis with complex interactions is challenging for a deeper and holistic understanding of biological systems. In this paper, we propose a novel generalized kernel machine approach to identify higher-order composite effects in multi-view biomedical datasets. This generalized semi-parametric (a mixed-effect linear model) approach includes the marginal and joint Hadamard product of features from different views of data. The proposed kernel machine approach considers multi-view data as predictor variables to allow more thorough and comprehensive modeling of a complex trait. The proposed method can be applied to the study of any disease model, where multi-view datasets are available. We applied our approach to both synthesized datasets and real multi-view datasets from adolescence brain development and osteoporosis study, including an imaging scan dataset and five omics datasets. Our experiments demonstrate that the proposed method can effectively identify higher-order composite effects and suggest that corresponding features (genes, region of interests, and chemical taxonomies) function in a concerted effort. We show that the proposed method is more generalizable than existing ones.

preprint2020arXiv

PP-YOLO: An Effective and Efficient Implementation of Object Detector

Object detection is one of the most important areas in computer vision, which plays a key role in various practical scenarios. Due to limitation of hardware, it is often necessary to sacrifice accuracy to ensure the infer speed of the detector in practice. Therefore, the balance between effectiveness and efficiency of object detector must be considered. The goal of this paper is to implement an object detector with relatively balanced effectiveness and efficiency that can be directly applied in actual application scenarios, rather than propose a novel detection model. Considering that YOLOv3 has been widely used in practice, we develop a new object detector based on YOLOv3. We mainly try to combine various existing tricks that almost not increase the number of model parameters and FLOPs, to achieve the goal of improving the accuracy of detector as much as possible while ensuring that the speed is almost unchanged. Since all experiments in this paper are conducted based on PaddlePaddle, we call it PP-YOLO. By combining multiple tricks, PP-YOLO can achieve a better balance between effectiveness (45.2% mAP) and efficiency (72.9 FPS), surpassing the existing state-of-the-art detectors such as EfficientDet and YOLOv4.Source code is at https://github.com/PaddlePaddle/PaddleDetection.

preprint2016arXiv

Training Auto-encoders Effectively via Eliminating Task-irrelevant Input Variables

Auto-encoders are often used as building blocks of deep network classifier to learn feature extractors, but task-irrelevant information in the input data may lead to bad extractors and result in poor generalization performance of the network. In this paper,via dropping the task-irrelevant input variables the performance of auto-encoders can be obviously improved .Specifically, an importance-based variable selection method is proposed to aim at finding the task-irrelevant input variables and dropping them.It firstly estimates importance of each variable,and then drops the variables with importance value lower than a threshold. In order to obtain better performance, the method can be employed for each layer of stacked auto-encoders. Experimental results show that when combined with our method the stacked denoising auto-encoders achieves significantly improved performance on three challenging datasets.

preprint2015arXiv

A Decision-Aided Parallel SC-List Decoder for Polar Codes

In this paper, we propose a decision-aided scheme for parallel SC-List decoding of polar codes. At the parallel SC-List decoder, each survival path is extended based on multiple information bits, therefore the number of split paths becomes very large and the sorting to find the top L paths becomes very complex. We propose a decision-aided scheme to reduce the number of split paths and thus reduce the sorting complexity.

preprint2015arXiv

Capacity-Achieving Rateless Polar Codes

A rateless coding scheme transmits incrementally more and more coded bits over an unknown channel until all the information bits are decoded reliably by the receiver. We propose a new rateless coding scheme based on polar codes, and we show that this scheme is capacity-achieving, i.e. its information rate is as good as the best code specifically designed for the unknown channel. Previous rateless coding schemes are designed for specific classes of channels such as AWGN channels, binary erasure channels, etc. but the proposed rateless coding scheme is capacity-achieving for broad classes of channels as long as they are ordered via degradation. Moreover, it inherits the conceptual and computational simplicity of polar codes.

preprint2015arXiv

Low-latency List Decoding Of Polar Codes With Double Thresholding

For polar codes with short-to-medium code length, list successive cancellation decoding is used to achieve a good error-correcting performance. However, list pruning in the current list decoding is based on the sorting strategy and its timing complexity is high. This results in a long decoding latency for large list size. In this work, aiming at a low-latency list decoding implementation, a double thresholding algorithm is proposed for a fast list pruning. As a result, with a negligible performance degradation, the list pruning delay is greatly reduced. Based on the double thresholding, a low-latency list decoding architecture is proposed and implemented using a UMC 90nm CMOS technology. Synthesis results show that, even for a large list size of 16, the proposed low-latency architecture achieves a decoding throughput of 220 Mbps at a frequency of 641 MHz.

preprint2015arXiv

Reduce the Complexity of List Decoding of Polar Codes by Tree-Pruning

Polar codes under cyclic redundancy check aided successive cancellation list (CA-SCL) decoding can outperform the turbo codes and the LDPC codes when code lengths are configured to be several kilobits. In order to reduce the decoding complexity, a novel tree-pruning scheme for the \mbox{SCL/CA-SCL} decoding algorithms is proposed in this paper. In each step of the decoding procedure, the candidate paths with metrics less than a threshold are dropped directly to avoid the unnecessary computations for the path searching on the descendant branches of them. Given a candidate path, an upper bound of the path metric of its descendants is proposed to determined whether the pruning of this candidate path would affect frame error rate (FER) performance. By utilizing this upper bounding technique and introducing a dynamic threshold, the proposed scheme deletes the redundant candidate paths as many as possible while keeping the performance deterioration in a tolerant region, thus it is much more efficient than the existing pruning scheme. With only a negligible loss of FER performance, the computational complexity of the proposed pruned decoding scheme is only about $40\%$ of the standard algorithm in the low signal-to-noise ratio (SNR) region (where the FER under CA-SCL decoding is about $0.1 \sim 0.001$), and it can be very close to that of the successive cancellation (SC) decoder in the moderate and high SNR regions.

preprint2014arXiv

A RM-Polar Codes

In this letter we propose a new hybrid code called "RM-Polar" codes. This new codes are constructed by combining the construction of Reed-Muller (RM) code and Polar code. It has much larger minimum Hamming distance than Polar codes, therefore it has much better error performance than Polar codes.

preprint2013arXiv

Multi-cancer molecular signatures and their interrelationships

Although cancer is known to be characterized by several unifying biological hallmarks, systems biology has had limited success in identifying molecular signatures present in in all types of cancer. The current availability of rich data sets from many different cancer types provides an opportunity for thorough computational data mining in search of such common patterns. Here we report the identification of 18 "pan-cancer" molecular signatures resulting from analysis of data sets containing values from mRNA expression, microRNA expression, DNA methylation, and protein activity, from twelve different cancer types. The membership of many of these signatures points to particular biological mechanisms related to cancer progression, suggesting that they represent important attributes of cancer in need of being elucidated for potential applications in diagnostic, prognostic and therapeutic products applicable to multiple cancer types.

preprint2013arXiv

Parallel Decoders of Polar Codes

In this letter, we propose parallel SC (Successive Cancellation) decoder and parallel SC-List decoder for polar codes. The parallel decoder is composed of M=2^m(m>=1) component decoders working in parallel and each component decoder decodes a Polar code of a block size of 1/M of the original Polar code. Therefore the parallel decoder has M times faster decoding speed. Our simulation results show that the parallel decoder has almost the same error-rate performance as the conventional non-parallel decoder.

preprint2012arXiv

An Adaptive Successive Cancellation List Decoder for Polar Codes with Cyclic Redundancy Check

In this letter, we propose an adaptive SC (Successive Cancellation)-List decoder for polar codes with CRC. This adaptive SC-List decoder iteratively increases the list size until the decoder outputs contain at least one survival path which can pass CRC. Simulation shows that the adaptive SC-List decoder provides significant complexity reduction. We also demonstrate that polar code (2048, 1024) with 24-bit CRC decoded by our proposed adaptive SC-List decoder with very large list size can achieve a frame error rate FER=0.001 at Eb/No=1.1dB, which is about 0.2dB from the information theoretic limit at this block length.

preprint2012arXiv

Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery

Cross-validation (CV) is widely used for tuning a model with respect to user-selected parameters and for selecting a "best" model. For example, the method of $k$-nearest neighbors requires the user to choose $k$, the number of neighbors, and a neural network has several tuning parameters controlling the network complexity. Once such parameters are optimized for a particular data set, the next step is often to compare the various optimized models and choose the method with the best predictive performance. Both tuning and model selection boil down to comparing models, either across different values of the tuning parameters or across different classes of statistical models and/or sets of explanatory variables. For multiple large sets of data, like the PubChem drug discovery cheminformatics data which motivated this work, reliable CV comparisons are computationally demanding, or even infeasible. In this paper we develop an efficient sequential methodology for model comparison based on CV. It also takes into account the randomness in CV. The number of models is reduced via an adaptive, multiplicity-adjusted sequential algorithm, where poor performers are quickly eliminated. By exploiting matching of individual observations, it is sometimes even possible to establish the statistically significant inferiority of some models with just one execution of CV.

preprint2010arXiv

A low-power circuit for piezoelectric vibration control by synchronized switching on voltage sources

In the paper, a vibration damping system powered by harvested energy with implementation of the so-called SSDV (synchronized switch damping on voltage source) technique is designed and investigated. In the semi-passive approach, the piezoelectric element is intermittently switched from open-circuit to specific impedance synchronously with the structural vibration. Due to this switching procedure, a phase difference appears between the strain induced by vibration and the resulting voltage, thus creating energy dissipation. By supplying the energy collected from the piezoelectric materials to the switching circuit, a new low-power device using the SSDV technique is proposed. Compared with the original self-powered SSDI (synchronized switch damping on inductor), such a device can significantly improve its performance of vibration control. Its effectiveness in the single-mode resonant damping of a composite beam is validated by the experimental results.

preprint2010arXiv

Optimization Framework and Graph-Based Approach for Relay-Assisted Bidirectional OFDMA Cellular Networks

This paper considers a relay-assisted bidirectional cellular network where the base station (BS) communicates with each mobile station (MS) using OFDMA for both uplink and downlink. The goal is to improve the overall system performance by exploring the full potential of the network in various dimensions including user, subcarrier, relay, and bidirectional traffic. In this work, we first introduce a novel three-time-slot time-division duplexing (TDD) transmission protocol. This protocol unifies direct transmission, one-way relaying and network-coded two-way relaying between the BS and each MS. Using the proposed three-time-slot TDD protocol, we then propose an optimization framework for resource allocation to achieve the following gains: cooperative diversity (via relay selection), network coding gain (via bidirectional transmission mode selection), and multiuser diversity (via subcarrier assignment). We formulate the problem as a combinatorial optimization problem, which is NP-complete. To make it more tractable, we adopt a graph-based approach. We first establish the equivalence between the original problem and a maximum weighted clique problem in graph theory. A metaheuristic algorithm based on any colony optimization (ACO) is then employed to find the solution in polynomial time. Simulation results demonstrate that the proposed protocol together with the ACO algorithm significantly enhances the system total throughput.

Hui Shen

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

DoPE: Denoising Rotary Position Embedding

LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction

MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training

MMFormalizer: Multimodal Autoformalization in the Wild

A robust kernel machine regression towards biomarker selection in multi-omics datasets of osteoporosis for drug discovery

A Deep Learning-Based Method for Automatic Segmentation of Proximal Femur from Quantitative Computed Tomography Images

A generalized kernel machine approach to identify higher-order composite effects in multi-view datasets

PP-YOLO: An Effective and Efficient Implementation of Object Detector

Training Auto-encoders Effectively via Eliminating Task-irrelevant Input Variables

A Decision-Aided Parallel SC-List Decoder for Polar Codes

Capacity-Achieving Rateless Polar Codes

Low-latency List Decoding Of Polar Codes With Double Thresholding

Reduce the Complexity of List Decoding of Polar Codes by Tree-Pruning

A RM-Polar Codes

Multi-cancer molecular signatures and their interrelationships

Parallel Decoders of Polar Codes

An Adaptive Successive Cancellation List Decoder for Polar Codes with Cyclic Redundancy Check

Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery

A low-power circuit for piezoelectric vibration control by synchronized switching on voltage sources

Optimization Framework and Graph-Based Approach for Relay-Assisted Bidirectional OFDMA Cellular Networks