Source author record

Jia Wei

Jia Wei appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Artificial Intelligence cond-mat.str-el cond-mat.supr-con eess.IV Hardware Architecture math.NA Performance Quantitative Methods

Catalog footprint

What is connected

11works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning of large language models, yet most variants target dense architectures. Mixture-of-Experts (MoE) models scale parameters at near-constant per-token compute, and their sparse activation patterns create untapped opportunities for more efficient adaptation. We propose Hot-Experts Layer-level Low-Rank Adaptation (HELLoRA), which attaches LoRA modules only to the most frequently activated experts at each layer. This simple mechanism reduces trainable parameters and adapter-induced FLOPs while improving downstream performance, an effect we attribute to a form of structured regularization that preserves pretrained expert specialization. To stress-test HELLoRA under extreme parameter budgets, we further compose it with LoRI to form HELLoRI, which freezes the up-projection and sparsifies the down-projection. Across three MoE backbones, namely OlMoE-1B-7B, Mixtral-8x7B, and DeepSeekMoE, and three task families covering mathematical reasoning, code generation, and safety alignment, HELLoRA consistently outperforms strong PEFT baselines. Relative to vanilla LoRA on OlMoE, HELLoRA uses 15.7% of the trainable parameters, reduces adapter FLOPs by 38.7%, achieves 1.9x the training throughput, and improves accuracy by 9.2%. On DeepSeekMoE, HELLoRA outperforms LoRA while using only 23.2% of its trainable parameters. These results demonstrate that activation-aware adapter placement is an effective and practical route to scaling PEFT for MoE language models.

preprint2026arXiv

Robust and Generalizable Atrial Fibrillation Detection from ECG Using Time-Frequency Fusion and Supervised Contrastive Learning

Atrial fibrillation (AF) is a common cardiac arrhythmia that significantly increases the risk of stroke and heart failure, necessitating reliable and generalizable detection methods from electrocardiogram (ECG) recordings. Although deep learning has advanced automated AF diagnosis, existing approaches often struggle to exploit complementary time-frequency information effectively, limiting both robustness under intra-dataset and generalization across diverse clinical datasets. To address these challenges, we propose a cross-modal deep learning framework comprising two key components: a Bidirectional Gating Module (BGM) and a Cross-modal Supervised Contrastive Learning (CSCL) strategy. The BGM facilitates dynamic, reciprocal refinement between time and frequency domain features, enhancing model robustness to signal variations within a dataset. Meanwhile, CSCL explicitly structures the joint embedding space by pulling together label-consistent samples and pushing apart different ones, thereby improving inter-class separability and enabling strong cross-dataset generalization. We evaluate our method through five-fold cross-validation on the AFDB and the CPSC2021 dataset, as well as bidirectional cross-dataset experiments (training on one and testing on the other). Results show consistent improvements over state-of-the-art methods across multiple metrics, demonstrating that our approach achieves both high intra-dataset robustness and excellent cross-dataset generalization. We further demonstrate that our method achieves high computational efficiency and anti-interference capability, making it suitable for edge deployment.

preprint2026arXiv

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cores in Blackwell GPUs to accelerate attention computation. Our implementation achieves 1038 TOPS on RTX5090, which is a 5x speedup over the fastest FlashAttention on RTX5090. Experiments show that our FP4 attention can accelerate inference of various models in a plug-and-play way. Second, we pioneer low-bit attention to training tasks. Existing low-bit attention works like FlashAttention3 and SageAttention focus only on inference. However, the efficiency of training large models is also important. To explore whether low-bit attention can be effectively applied to training tasks, we design an accurate and efficient 8-bit attention for both forward and backward propagation. Experiments indicate that 8-bit attention achieves lossless performance in fine-tuning tasks but exhibits slower convergence in pretraining tasks. The code is available at https://github.com/thu-ml/SageAttention.

preprint2022arXiv

Learning Multi-Modal Brain Tumor Segmentation from Privileged Semi-Paired MRI Images with Curriculum Disentanglement Learning

Due to the difficulties of obtaining multimodal paired images in clinical practice, recent studies propose to train brain tumor segmentation models with unpaired images and capture complementary information through modality translation. However, these models cannot fully exploit the complementary information from different modalities. In this work, we thus present a novel two-step (intra-modality and inter-modality) curriculum disentanglement learning framework to effectively utilize privileged semi-paired images, i.e. limited paired images that are only available in training, for brain tumor segmentation. Specifically, in the first step, we propose to conduct reconstruction and segmentation with augmented intra-modality style-consistent images. In the second step, the model jointly performs reconstruction, unsupervised/supervised translation, and segmentation for both unpaired and paired inter-modality images. A content consistency loss and a supervised translation loss are proposed to leverage complementary information from different modalities in this step. Through these two steps, our method effectively extracts modality-specific style codes describing the attenuation of tissue features and image contrast, and modality-invariant content codes containing anatomical and functional information from the input images. Experiments on three brain tumor segmentation tasks show that our model outperforms competing segmentation models based on unpaired images.

preprint2022arXiv

Slice Imputation: Intermediate Slice Interpolation for Anisotropic 3D Medical Image Segmentation

We introduce a novel frame-interpolation-based method for slice imputation to improve segmentation accuracy for anisotropic 3D medical images, in which the number of slices and their corresponding segmentation labels can be increased between two consecutive slices in anisotropic 3D medical volumes. Unlike previous inter-slice imputation methods, which only focus on the smoothness in the axial direction, this study aims to improve the smoothness of the interpolated 3D medical volumes in all three directions: axial, sagittal, and coronal. The proposed multitask inter-slice imputation method, in particular, incorporates a smoothness loss function to evaluate the smoothness of the interpolated 3D medical volumes in the through-plane direction (sagittal and coronal). It not only improves the resolution of the interpolated 3D medical volumes in the through-plane direction but also transforms them into isotropic representations, which leads to better segmentation performances. Experiments on whole tumor segmentation in the brain, liver tumor segmentation, and prostate segmentation indicate that our method outperforms the competing slice imputation methods on both computed tomography and magnetic resonance images volumes in most cases.

preprint2022arXiv

Unsupervised Multi-Modal Medical Image Registration via Discriminator-Free Image-to-Image Translation

In clinical practice, well-aligned multi-modal images, such as Magnetic Resonance (MR) and Computed Tomography (CT), together can provide complementary information for image-guided therapies. Multi-modal image registration is essential for the accurate alignment of these multi-modal images. However, it remains a very challenging task due to complicated and unknown spatial correspondence between different modalities. In this paper, we propose a novel translation-based unsupervised deformable image registration approach to convert the multi-modal registration problem to a mono-modal one. Specifically, our approach incorporates a discriminator-free translation network to facilitate the training of the registration network and a patchwise contrastive loss to encourage the translation network to preserve object shapes. Furthermore, we propose to replace an adversarial loss, that is widely used in previous multi-modal image registration methods, with a pixel loss in order to integrate the output of translation into the target modality. This leads to an unsupervised method requiring no ground-truth deformation or pairs of aligned images for training. We evaluate four variants of our approach on the public Learn2Reg 2021 datasets \cite{hering2021learn2reg}. The experimental results demonstrate that the proposed architecture achieves state-of-the-art performance. Our code is available at https://github.com/heyblackC/DFMIR.

preprint2020arXiv

Inter-slice image augmentation based on frame interpolation for boosting medical image segmentation accuracy

We introduce the idea of inter-slice image augmentation whereby the numbers of the medical images and the corresponding segmentation labels are increased between two consecutive images in order to boost medical image segmentation accuracy. Unlike conventional data augmentation methods in medical imaging, which only increase the number of training samples directly by adding new virtual samples using simple parameterized transformations such as rotation, flipping, scaling, etc., we aim to augment data based on the relationship between two consecutive images, which increases not only the number but also the information of training samples. For this purpose, we propose a frame-interpolation-based data augmentation method to generate intermediate medical images and the corresponding segmentation labels between two consecutive images. We train and test a supervised U-Net liver segmentation network on SLIVER07 and CHAOS2019, respectively, with the augmented training samples, and obtain segmentation scores exhibiting significant improvement compared to the conventional augmentation methods.

preprint2020arXiv

Tips and Tricks for Webly-Supervised Fine-Grained Recognition: Learning from the WebFG 2020 Challenge

WebFG 2020 is an international challenge hosted by Nanjing University of Science and Technology, University of Edinburgh, Nanjing University, The University of Adelaide, Waseda University, etc. This challenge mainly pays attention to the webly-supervised fine-grained recognition problem. In the literature, existing deep learning methods highly rely on large-scale and high-quality labeled training data, which poses a limitation to their practicability and scalability in real world applications. In particular, for fine-grained recognition, a visual task that requires professional knowledge for labeling, the cost of acquiring labeled training data is quite high. It causes extreme difficulties to obtain a large amount of high-quality training data. Therefore, utilizing free web data to train fine-grained recognition models has attracted increasing attentions from researchers in the fine-grained community. This challenge expects participants to develop webly-supervised fine-grained recognition methods, which leverages web images in training fine-grained recognition models to ease the extreme dependence of deep learning methods on large-scale manually labeled datasets and to enhance their practicability and scalability. In this technical report, we have pulled together the top WebFG 2020 solutions of total 54 competing teams, and discuss what methods worked best across the set of winning teams, and what surprisingly did not help.

preprint2013arXiv

A hybrid HDMR for mixed multiscale finite element method with application for flows in random porous media

Stochastic modeling has become a popular approach to quantify uncertainty in flows through heterogeneous porous media. The uncertainty in heterogeneous structure properties is often parameterized by a high-dimensional random variable. This leads to a deterministic problem in a high-dimensional parameter space and the numerical computation becomes very challengeable as the dimension of the parameter space increases. To efficiently tackle the high-dimensionality, we propose a hybrid high dimensional model representation (HDMR) technique, through which the high-dimensional stochastic model is decomposed into a moderate-dimensional stochastic model in a most active random space and a few one-dimensional stochastic models. The derived low-dimensional stochastic models are solved by incorporating sparse grid stochastic collocation method into the proposed hybrid HDMR. The porous media properties such as permeability are often heterogeneous. To treat the heterogeneity, we use a mixed multiscale finite element method (MMsFEM) to simulate each of derived stochastic models. To capture the non-local spatial features of the porous media and the important effects of random variables, we can hierarchically incorporate the global information individually from each of random parameters. This significantly enhances the accuracy of the multiscale simulation. The synergy of the hybrid HDMR and the MMsFEM reduces the stochastic model of flows in both stochastic space and physical space, and significantly decreases the computation complexity. We carefully analyze the proposed HDMR technique and the derived stochastic MMsFEM. A few numerical experiments are carried out for two-phase flows in random porous media and support the efficiency and accuracy of the MMsFEM based on the hybrid HDMR.

preprint2010arXiv

Electronic structure of Fe1.04(Te0.66Se0.34)

We report the electronic structure of the iron-chalcogenide superconductor, Fe1.04(Te0.66Se0.34), obtained with high resolution angle-resolved photoemission spectroscopy and density functional calculations. In photoemission measurements, various photon energies and polarizations are exploited to study the Fermi surface topology and symmetry properties of the bands. The measured band structure and their symmetry characters qualitatively agree with our density function theory calculations of Fe(Te0.66Se0.34), although the band structure is renormalized by about a factor of three. We find that the electronic structures of this iron-chalcogenides and the iron-pnictides have many aspects in common, however, significant differences exist near the Gamma-point. For Fe1.04(Te0.66Se0.34), there are clearly separated three bands with distinct even or odd symmetry that cross the Fermi energy (EF) near the zone center, which contribute to three hole-like Fermi surfaces. Especially, both experiments and calculations show a hole-like elliptical Fermi surface at the zone center. Moreover, no sign of spin density wave was observed in the electronic structure and susceptibility measurements of this compound.

preprint2010arXiv

High-resolution angle-resolved photoemission spectroscopy study of the electronic structure of EuFe2As2

We report the high-resolution angle-resolved photoemission spectroscopy studies of electronic structure of EuFe2As2. The paramagnetic state data are found to be consistent with density-functional calculations. In the antiferromagnetic ordering state of Fe, our results show that the band splitting, folding, and hybridization evolve with temperature, which cannot be explained by a simple folding picture. Detailed measurements reveal that a tiny electron Fermi pocket and a tiny hole pocket are formed near (pi,pi) in the (0,0)-(pi,pi) direction, which qualitatively agree with the results of quantum oscillations, considering kz variation of Fermi surface. Furthermore, no noticeable change within the energy resolution is observed across the antiferromagnetic transition of Eu2+ ordering, suggesting weak coupling between Eu sublattice and FeAs sublattice.

Jia Wei

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

Robust and Generalizable Atrial Fibrillation Detection from ECG Using Time-Frequency Fusion and Supervised Contrastive Learning

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Learning Multi-Modal Brain Tumor Segmentation from Privileged Semi-Paired MRI Images with Curriculum Disentanglement Learning

Slice Imputation: Intermediate Slice Interpolation for Anisotropic 3D Medical Image Segmentation

Unsupervised Multi-Modal Medical Image Registration via Discriminator-Free Image-to-Image Translation

Inter-slice image augmentation based on frame interpolation for boosting medical image segmentation accuracy

Tips and Tricks for Webly-Supervised Fine-Grained Recognition: Learning from the WebFG 2020 Challenge

A hybrid HDMR for mixed multiscale finite element method with application for flows in random porous media

Electronic structure of Fe1.04(Te0.66Se0.34)

High-resolution angle-resolved photoemission spectroscopy study of the electronic structure of EuFe2As2