Source author record

Chao Jin

Chao Jin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence cond-mat.mtrl-sci Cryptography and Security Machine Learning physics.med-ph physics.optics Biological Physics Distributed, Parallel, and Cluster Computing eess.AS Information Theory math.IT Sound Tissues and Organs

Catalog footprint

What is connected

10works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ReLibra: Routing-Replay-Guided Load Balancing for MoE Training in Reinforcement Learning

Load imbalance is a long-standing challenge in Mixture-of-Experts (MoE) training and is exacerbated in reinforcement learning (RL) for LLMs, where hot experts can shift frequently across micro-batches. Existing MoE training systems rely on historical loads to predict future expert demand, making them less effective under sharp fluctuations. We propose ReLibra, an MoE RL training system that exploits a unique opportunity in RL's rollout-training workflow, routing replay, to enable fine-grained load balancing at micro-batch granularity. Because rollout and training process the same tokens with the same MoE parameters, the token-to-expert routing decisions are known before training starts. Leveraging this information, ReLibra places two MoE load-balancing mechanisms at inter- and intra-batch timescales, matching their communication patterns to hierarchical network bandwidths. At the inter-batch timescale, ReLibra performs expert reordering to redistribute experts for batch-level cross-node balancing; at the intra-batch timescale, it dynamically performs expert replication within a node to absorb micro-batch-level load fluctuations. Experiments on diverse MoE LLMs and RL workloads show that ReLibra improves training throughput by up to 1.6$\times$ over Megatron-LM and by up to 1.2$\times$ over EPLB, even when EPLB is given oracle loads. Moreover, ReLibra remains within 6%-10% of the throughput of an idealized balanced baseline.

preprint2022arXiv

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

Although modern automatic speech recognition (ASR) systems can achieve high performance, they may produce errors that weaken readers' experience and do harm to downstream tasks. To improve the accuracy and reliability of ASR hypotheses, we propose a cross-modal post-processing system for speech recognizers, which 1) fuses acoustic features and textual features from different modalities, 2) joints a confidence estimator and an error corrector in multi-task learning fashion and 3) unifies error correction and utterance rejection modules. Compared with single-modal or single-task models, our proposed system is proved to be more effective and efficient. Experiment result shows that our post-processing system leads to more than 10% relative reduction of character error rate (CER) for both single-speaker and multi-speaker speech on our industrial ASR system, with about 1.7ms latency for each token, which ensures that extra latency introduced by post-processing is acceptable in streaming speech recognition.

preprint2022arXiv

FFConv: Fast Factorized Convolutional Neural Network Inference on Encrypted Data

Homomorphic Encryption (HE), allowing computations on encrypted data (ciphertext) without decrypting it first, enables secure but prohibitively slow Convolutional Neural Network (CNN) inference for privacy-preserving applications in clouds. To reduce the inference latency, one approach is to pack multiple messages into a single ciphertext in order to reduce the number of ciphertexts and support massive parallelism of Homomorphic Multiply-Accumulate (HMA) operations between ciphertexts. Despite the faster HECNN inference, the mainstream packing schemes Dense Packing (DensePack) and Convolution Packing (ConvPack) introduce expensive rotation overhead, which prolongs the inference latency of HECNN for deeper and wider CNN architectures. In this paper, we propose a low-rank factorization method named FFConv dedicated to efficient ciphertext packing for reducing both the rotation overhead and HMA operations. FFConv approximates a d x d convolution layer with low-rank factorized convolutions, in which a d x d low-rank convolution with fewer channels is followed by a 1 x 1 convolution to restore the channels. The d x d low-rank convolution with DensePack leads to significantly reduced rotation operations, while the rotation overhead of 1 x 1 convolution with ConvPack is close to zero. To our knowledge, FFConv is the first work that is capable of reducing the rotation overhead incurred by DensePack and ConvPack simultaneously, without introducing additional special blocks into the HECNN inference pipeline. Compared to prior art LoLa and Falcon, our method reduces the inference latency by up to 88% and 21%, respectively, with comparable accuracy on MNIST and CIFAR-10.

preprint2022arXiv

LSTMSPLIT: Effective SPLIT Learning based LSTM on Sequential Time-Series Data

Federated learning (FL) and split learning (SL) are the two popular distributed machine learning (ML) approaches that provide some data privacy protection mechanisms. In the time-series classification problem, many researchers typically use 1D convolutional neural networks (1DCNNs) based on the SL approach with a single client to reduce the computational overhead at the client-side while still preserving data privacy. Another method, recurrent neural network (RNN), is utilized on sequentially partitioned data where segments of multiple-segment sequential data are distributed across various clients. However, to the best of our knowledge, it is still not much work done in SL with long short-term memory (LSTM) network, even the LSTM network is practically effective in processing time-series data. In this work, we propose a new approach, LSTMSPLIT, that uses SL architecture with an LSTM network to classify time-series data with multiple clients. The differential privacy (DP) is applied to solve the data privacy leakage. The proposed method, LSTMSPLIT, has achieved better or reasonable accuracy compared to the Split-1DCNN method using the electrocardiogram dataset and the human activity recognition dataset. Furthermore, the proposed method, LSTMSPLIT, can also achieve good accuracy after applying differential privacy to preserve the user privacy of the cut layer of the LSTMSPLIT.

preprint2015arXiv

Bayesian-based aberration correction and numerical diffraction for improved lensfree on-chip microscopy of biological specimens

Lensfree on-chip microscopy is an emerging imaging technique that can be used to visualize and study biological specimens without the need for imaging lens systems. Important issues that can limit the performance of lensfree on-chip microscopy include interferometric aberrations, acquisition noise, and image reconstruction artifacts. In this study, we introduce a Bayesian-based method for performing aberration correction and numerical diffraction that accounts for all three of these issues to improve the effective numerical aperture (NA) and signal-to-noise ratio (SNR) of the reconstructed microscopic image. The proposed method was experimentally validated using the USAF resolution target as well as real waterborne Anabaena flos-aquae samples, demonstrating improvements in NA by ~25% over the standard method, and improvements in SNR of 2.3 dB and 3.8 dB in the reconstructed image when compared to the reconstructed images produced using the standard method and a maximum likelihood estimation method, respectively.

preprint2015arXiv

Lensfree Spectral Light-field Fusion Microscopy for Contrast- and Resolution-enhanced Imaging of Biological Specimens

A lensfree spectral light-field fusion microscopy (LSLFM) system is presented for enabling contrast- and resolution-enhanced imaging of biological specimens. LSLFM consists of a pulsed multispectral lensfree microscope for capturing interferometric light-field encodings at various wavelengths, and Bayesian-based fusion to reconstruct a fused object light-field from the encodings. By fusing unique object detail information captured at different wavelengths, LSLFM can achieve improved resolution, contrast, and signal-to-noise ratio (SNR) over a single-channel lensfree microscopy system. A five-channel LSLFM system was developed and quantitatively evaluated to validate the design. Experimental results demonstrated that the LSLFM system provided SNR improvements of 6-12 dB, as well as a six-fold improvement in the dispersion index (DI), over that achieved using a single-channel, resolution-enhancing lensfree deconvolution microscopy system or its multi-wavelength counterpart. Furthermore, the LSLFM system achieved an increase in numerical aperture (NA) of ~16% over a single-channel resolution-enhancing lensfree deconvolution microscopy system at the highest-resolution wavelength used in the study. Samples of Staurastrum paradoxum, a waterborne algae, and human corneal epithelial cells were imaged using the system to illustrate its potential for enhanced imaging of biological specimens.

preprint2014arXiv

Liquid Metal as Connecting or Functional Recovery Channel for the Transected Sciatic Nerve

In this article, the liquid metal GaInSn alloy (67% Ga, 20.5% In, and 12.5% Sn by volume) is proposed for the first time to repair the peripheral neurotmesis as connecting or functional recovery channel. Such material owns a group of unique merits in many aspects, such as favorable fluidity, super compliance, high electrical conductivity, which are rather beneficial for conducting the excited signal of nerve during the regeneration process in vivo. It was found that the measured electroneurographic signal from the transected bullfrog sciatic nerve reconnected by the liquid metal after the electrical stimulation was close to that from the intact sciatic nerve. The control experiments through replacement of GaInSn with the conventionally used Riger Solution revealed that Riger Solution could not be competitive with the liquid metal in the performance as functional recovery channel. In addition, through evaluation of the basic electrical property, the material GaInSn works more suitable for the conduction of the weak electroneurographic signal as its impedance was several orders lower than that of the well-known Riger Solution. Further, the visibility under the plain radiograph of such material revealed the high convenience in performing secondary surgery. This new generation nerve connecting material is expected to be important for the functional recovery during regeneration of the injured peripheral nerve and the optimization of neurosurgery in the near future.

preprint2013arXiv

Liquid-Solid Phase Transition Alloy as Reversible and Rapid Molding Bone Cement

Bone cement has been demonstrated as an essential restorative material in the orthopedic surgery. However current materials often imply unavoidable drawbacks, such as tissue-cement reaction induced thermal injuries and troublesome revision procedure. Here we proposed an injectable alloy cement to address such problems through its liquid-solid phase transition mechanism. The cement is made of a unique alloy BiInSnZn with a specifically designed low melting point 57.5°C. This property enables its rapid molding into various shapes with high plasticity. Some fundamental characteristics including mechanical strength behaviors and phase transition-induced thermal features have been measured to demonstrate the competence of alloy as unconventional cement with favorable merits. Further biocompatible tests showed that this material could be safely employed in vivo. In addition, experiments also found the alloy cement capability as an excellent contrast agent for radiation imaging. Particularly, the proposed alloy cement with reversible phase transition feature significantly simplifies the revision of cement and prosthesis. This study opens the way to implement alloy material as bone cement to fulfill diverse clinical needs.

preprint2013arXiv

Parity Declustering for Fault-Tolerant Storage Systems via $t$-designs

Parity declustering allows faster reconstruction of a disk array when some disk fails. Moreover, it guarantees uniform reconstruction workload on all surviving disks. It has been shown that parity declustering for one-failure tolerant array codes can be obtained via Balanced Incomplete Block Designs. We extend this technique for array codes that can tolerate an arbitrary number of disk failures via $t$-designs.

preprint2009arXiv

Jeeva: Enterprise Grid-enabled Web Portal for Protein Secondary Structure Prediction

This paper presents a Grid portal for protein secondary structure prediction developed by using services of Aneka, a .NET-based enterprise Grid technology. The portal is used by research scientists to discover new prediction structures in a parallel manner. An SVM (Support Vector Machine)-based prediction algorithm is used with 64 sample protein sequences as a case study to demonstrate the potential of enterprise Grids.

Chao Jin

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

ReLibra: Routing-Replay-Guided Load Balancing for MoE Training in Reinforcement Learning

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

FFConv: Fast Factorized Convolutional Neural Network Inference on Encrypted Data

LSTMSPLIT: Effective SPLIT Learning based LSTM on Sequential Time-Series Data

Bayesian-based aberration correction and numerical diffraction for improved lensfree on-chip microscopy of biological specimens

Lensfree Spectral Light-field Fusion Microscopy for Contrast- and Resolution-enhanced Imaging of Biological Specimens

Liquid Metal as Connecting or Functional Recovery Channel for the Transected Sciatic Nerve

Liquid-Solid Phase Transition Alloy as Reversible and Rapid Molding Bone Cement

Parity Declustering for Fault-Tolerant Storage Systems via $t$-designs

Jeeva: Enterprise Grid-enabled Web Portal for Protein Secondary Structure Prediction