Researcher profile

Lei Qi

Lei Qi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

IMPACT-HOI: Supervisory Control for Onset-Anchored Partial HOI Event Construction

We present IMPACT-HOI, a mixed-initiative framework for annotating egocentric procedural video by constructing structured event graphs for Human-Object Interactions (HOI), motivated by the need for high-quality structured supervision for learning robot manipulation from human demonstration. IMPACT-HOI frames this task as the incremental resolution of a partially specified, onset-anchored event state. A trust-calibrated controller selects among direct queries, human-confirmed suggestions, and conservative completions based on empirical annotator behavior and evidence quality. A risk-bounded execution protocol, utilizing atomic rollback, ensures that human-confirmed decisions are preserved against conflicting automated updates. A user study with 9 participants shows a 13.5% reduction in manual annotation actions, a 46.67% event match rate, and zero confirmed-field violations under the studied protocol. The code will be made publicly available at https://github.com/541741106/IMPACT_HOI.

preprint2022arXiv

A Novel Mix-normalization Method for Generalizable Multi-source Person Re-identification

Person re-identification (Re-ID) has achieved great success in the supervised scenario. However, it is difficult to directly transfer the supervised model to arbitrary unseen domains due to the model overfitting to the seen source domains. In this paper, we aim to tackle the generalizable multi-source person Re-ID task (i.e., there are multiple available source domains, and the testing domain is unseen during training) from the data augmentation perspective, thus we put forward a novel method, termed MixNorm, which consists of domain-aware mix-normalization (DMN) and domain-ware center regularization (DCR). Different from the conventional data augmentation, the proposed domain-aware mix-normalization to enhance the diversity of features during training from the normalization view of the neural network, which can effectively alleviate the model overfitting to the source domains, so as to boost the generalization capability of the model in the unseen domain. To better learn the domain-invariant model, we further develop the domain-aware center regularization to better map the produced diverse features into the same space. Extensive experiments on multiple benchmark datasets validate the effectiveness of the proposed method and show that the proposed method can outperform the state-of-the-art methods. Besides, further analysis also reveals the superiority of the proposed method.

preprint2022arXiv

Better Pseudo-label: Joint Domain-aware Label and Dual-classifier for Semi-supervised Domain Generalization

With the goal of directly generalizing trained model to unseen target domains, domain generalization (DG), a newly proposed learning paradigm, has attracted considerable attention. Previous DG models usually require a sufficient quantity of annotated samples from observed source domains during training. In this paper, we relax this requirement about full annotation and investigate semi-supervised domain generalization (SSDG) where only one source domain is fully annotated along with the other domains totally unlabeled in the training process. With the challenges of tackling the domain gap between observed source domains and predicting unseen target domains, we propose a novel deep framework via joint domain-aware labels and dual-classifier to produce high-quality pseudo-labels. Concretely, to predict accurate pseudo-labels under domain shift, a domain-aware pseudo-labeling module is developed. Also, considering inconsistent goals between generalization and pseudo-labeling: former prevents overfitting on all source domains while latter might overfit the unlabeled source domains for high accuracy, we employ a dual-classifier to independently perform pseudo-labeling and domain generalization in the training process. When accurate pseudo-labels are generated for unlabeled source domains, the domain mixup operation is applied to augment new domains between labeled and unlabeled domains, which is beneficial for boosting the generalization capability of the model. Extensive results on publicly available DG benchmark datasets show the efficacy of our proposed SSDG method.

preprint2022arXiv

Feature-based Style Randomization for Domain Generalization

As a recent noticeable topic, domain generalization (DG) aims to first learn a generic model on multiple source domains and then directly generalize to an arbitrary unseen target domain without any additional adaption. In previous DG models, by generating virtual data to supplement observed source domains, the data augmentation based methods have shown its effectiveness. To simulate the possible unseen domains, most of them enrich the diversity of original data via image-level style transformation. However, we argue that the potential styles are hard to be exhaustively illustrated and fully augmented due to the limited referred styles, leading the diversity could not be always guaranteed. Unlike image-level augmentation, we in this paper develop a simple yet effective feature-based style randomization module to achieve feature-level augmentation, which can produce random styles via integrating random noise into the original style. Compared with existing image-level augmentation, our feature-level augmentation favors a more goal-oriented and sample-diverse way. Furthermore, to sufficiently explore the efficacy of the proposed module, we design a novel progressive training strategy to enable all parameters of the network to be fully trained. Extensive experiments on three standard benchmark datasets, i.e., PACS, VLCS and Office-Home, highlight the superiority of our method compared to the state-of-the-art methods.

preprint2022arXiv

Generalizable Cross-modality Medical Image Segmentation via Style Augmentation and Dual Normalization

For medical image segmentation, imagine if a model was only trained using MR images in source domain, how about its performance to directly segment CT images in target domain? This setting, namely generalizable cross-modality segmentation, owning its clinical potential, is much more challenging than other related settings, e.g., domain adaptation. To achieve this goal, we in this paper propose a novel dual-normalization model by leveraging the augmented source-similar and source-dissimilar images during our generalizable segmentation. To be specific, given a single source domain, aiming to simulate the possible appearance change in unseen target domains, we first utilize a nonlinear transformation to augment source-similar and source-dissimilar images. Then, to sufficiently exploit these two types of augmentations, our proposed dual-normalization based model employs a shared backbone yet independent batch normalization layer for separate normalization. Afterward, we put forward a style-based selection scheme to automatically choose the appropriate path in the test stage. Extensive experiments on three publicly available datasets, i.e., BraTS, Cross-Modality Cardiac, and Abdominal Multi-Organ datasets, have demonstrated that our method outperforms other state-of-the-art domain generalization methods. Code is available at https://github.com/zzzqzhou/Dual-Normalization.

preprint2022arXiv

Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration

For medical image analysis, segmentation models trained on one or several domains lack generalization ability to unseen domains due to discrepancies between different data acquisition policies. We argue that the degeneration in segmentation performance is mainly attributed to overfitting to source domains and domain shift. To this end, we present a novel generalizable medical image segmentation method. To be specific, we design our approach as a multi-task paradigm by combining the segmentation model with a self-supervision domain-specific image restoration (DSIR) module for model regularization. We also design a random amplitude mixup (RAM) module, which incorporates low-level frequency information of different domain images to synthesize new images. To guide our model be resistant to domain shift, we introduce a semantic consistency loss. We demonstrate the performance of our method on two public generalizable segmentation benchmarks in medical images, which validates our method could achieve the state-of-the-art performance.

preprint2022arXiv

Label Distribution Learning for Generalizable Multi-source Person Re-identification

Person re-identification (Re-ID) is a critical technique in the video surveillance system, which has achieved significant success in the supervised setting. However, it is difficult to directly apply the supervised model to arbitrary unseen domains due to the domain gap between the available source domains and unseen target domains. In this paper, we propose a novel label distribution learning (LDL) method to address the generalizable multi-source person Re-ID task (i.e., there are multiple available source domains, and the testing domain is unseen during training), which aims to explore the relation of different classes and mitigate the domain-shift across different domains so as to improve the discrimination of the model and learn the domain-invariant feature, simultaneously. Specifically, during the training process, we produce the label distribution via the online manner to mine the relation information of different classes, thus it is beneficial for extracting the discriminative feature. Besides, for the label distribution of each class, we further revise it to give more and equal attention to the other domains that the class does not belong to, which can effectively reduce the domain gap across different domains and obtain the domain-invariant feature. Furthermore, we also give the theoretical analysis to demonstrate that the proposed method can effectively deal with the domain-shift issue. Extensive experiments on multiple benchmark datasets validate the effectiveness of the proposed method and show that the proposed method can outperform the state-of-the-art methods. Besides, further analysis also reveals the superiority of the proposed method.

preprint2022arXiv

Learngene: From Open-World to Your Learning Task

Although deep learning has made significant progress on fixed large-scale datasets, it typically encounters challenges regarding improperly detecting unknown/unseen classes in the open-world scenario, over-parametrized, and overfitting small samples. Since biological systems can overcome the above difficulties very well, individuals inherit an innate gene from collective creatures that have evolved over hundreds of millions of years and then learn new skills through few examples. Inspired by this, we propose a practical collective-individual paradigm where an evolution (expandable) network is trained on sequential tasks and then recognize unknown classes in real-world. Moreover, the learngene, i.e., the gene for learning initialization rules of the target model, is proposed to inherit the meta-knowledge from the collective model and reconstruct a lightweight individual model on the target task. Particularly, a novel criterion is proposed to discover learngene in the collective model, according to the gradient information. Finally, the individual model is trained only with few samples on the target learning tasks. We demonstrate the effectiveness of our approach in an extensive empirical study and theoretical analysis.

preprint2022arXiv

Limit Cycle Oscillations, response time and the time-dependent solution to the Lotka-Volterra Predator-Prey model

In this work, the time-dependent solution for the Lotka-Volterra Predator-Prey model is derived with the help of the Lambert W function. This allows an exact analytical expression for the period of the associated limit-cycle oscillations (LCO), and also for the response time between predator and prey population. These results are applied to the predator-prey interaction of zonal density corrugations and turbulent particle flux in gyrokinetic simulations of collisionless trapped-electron model (CTEM) turbulence. In the turbulence simulations, the response time is shown to increase when approaching the linear threshold, and the same trend is observed in the Lotka-Volterra model.

preprint2022arXiv

Linear analysis and crossphase dynamics in the $\nabla T_e$-driven CTEM fluid model

Collisionless trapped-electron mode (CTEM) turbulence is an important contributor to heat and particle transport in fusion devices. The ITG/TEM fluid models are rarely treated analytically, due to the large number of transport channels involved, e.g. particle and ion/electron heat transport. The $\nabla T_e$-driven CTEM fluid model [Anderson et al, Plasma Phys. Control. Fusion 48, 651 (2006)] provides a simplified model, in the regime where the density gradient drive is negligeable compared to the electron temperature gradient drive ($\nabla T_e$). This provides an interesting model to study mechanisms associated to linear waves, such as crossphase dynamics, and its possible role in the formation of $E\times B$ staircase. Here, the $\nabla T_e$-driven CTEM fluid model is rigourously derived from the more general ITG/TEM model, and its linear dynamics is first analyzed and compared with CTEM gyrokinetic simulations with bounce-averaged kinetic electrons, while nonlinear analysis is left for future work. Comparisons of linear ITG spectrum are also made with other analytical models.

preprint2022arXiv

MVDG: A Unified Multi-view Framework for Domain Generalization

To generalize the model trained in source domains to unseen target domains, domain generalization (DG) has recently attracted lots of attention. Since target domains can not be involved in training, overfitting source domains is inevitable. As a popular regularization technique, the meta-learning training scheme has shown its ability to resist overfitting. However, in the training stage, current meta-learning-based methods utilize only one task along a single optimization trajectory, which might produce a biased and noisy optimization direction. Beyond the training stage, overfitting could also cause unstable prediction in the test stage. In this paper, we propose a novel multi-view DG framework to effectively reduce the overfitting in both the training and test stage. Specifically, in the training stage, we develop a multi-view regularized meta-learning algorithm that employs multiple optimization trajectories to produce a suitable optimization direction for model updating. We also theoretically show that the generalization bound could be reduced by increasing the number of tasks in each trajectory. In the test stage, we utilize multiple augmented images to yield a multi-view prediction to alleviate unstable prediction, which significantly promotes model reliability. Extensive experiments on three benchmark datasets validate that our method can find a flat minimum to enhance generalization and outperform several state-of-the-art approaches.

preprint2022arXiv

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Self-training via pseudo labeling is a conventional, simple, and popular pipeline to leverage unlabeled data. In this work, we first construct a strong baseline of self-training (namely ST) for semi-supervised semantic segmentation via injecting strong data augmentations (SDA) on unlabeled images to alleviate overfitting noisy labels as well as decouple similar predictions between the teacher and student. With this simple mechanism, our ST outperforms all existing methods without any bells and whistles, e.g., iterative re-training. Inspired by the impressive results, we thoroughly investigate the SDA and provide some empirical analysis. Nevertheless, incorrect pseudo labels are still prone to accumulate and degrade the performance. To this end, we further propose an advanced self-training framework (namely ST++), that performs selective re-training via prioritizing reliable unlabeled images based on holistic prediction-level stability. Concretely, several model checkpoints are saved in the first stage supervised training, and the discrepancy of their predictions on the unlabeled image serves as a measurement for reliability. Our image-level selection offers holistic contextual information for learning. We demonstrate that it is more suitable for segmentation than common pixel-wise selection. As a result, ST++ further boosts the performance of our ST. Code is available at https://github.com/LiheYoung/ST-PlusPlus.

preprint2021arXiv

A novel multiple instance learning framework for COVID-19 severity assessment via data augmentation and self-supervised learning

How to fast and accurately assess the severity level of COVID-19 is an essential problem, when millions of people are suffering from the pandemic around the world. Currently, the chest CT is regarded as a popular and informative imaging tool for COVID-19 diagnosis. However, we observe that there are two issues -- weak annotation and insufficient data that may obstruct automatic COVID-19 severity assessment with CT images. To address these challenges, we propose a novel three-component method, i.e., 1) a deep multiple instance learning component with instance-level attention to jointly classify the bag and also weigh the instances, 2) a bag-level data augmentation component to generate virtual bags by reorganizing high confidential instances, and 3) a self-supervised pretext component to aid the learning process. We have systematically evaluated our method on the CT images of 229 COVID-19 cases, including 50 severe and 179 non-severe cases. Our method could obtain an average accuracy of 95.8%, with 93.6% sensitivity and 96.4% specificity, which outperformed previous works.

preprint2021arXiv

Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing Vertical and Horizontal Convolutions

Accurate image segmentation plays a crucial role in medical image analysis, yet it faces great challenges of various shapes, diverse sizes, and blurry boundaries. To address these difficulties, square kernel-based encoder-decoder architecture has been proposed and widely used, but its performance remains still unsatisfactory. To further cope with these challenges, we present a novel double-branch encoder architecture. Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels in the double-branch encoder, so features learned by the two branches can be expected to complement each other. 2) Considering that spatial attention can help models to better focus on the target region in a large-sized image, we develop an attention loss to further emphasize the segmentation on small-sized targets. Together, the above two schemes give rise to a novel double-branch encoder segmentation framework for medical image segmentation, namely Crosslink-Net. The experiments validate the effectiveness of our model on four datasets. The code is released at https://github.com/Qianyu1226/Crosslink-Net.

preprint2021arXiv

Deep Symmetric Adaptation Network for Cross-modality Medical Image Segmentation

Unsupervised domain adaptation (UDA) methods have shown their promising performance in the cross-modality medical image segmentation tasks. These typical methods usually utilize a translation network to transform images from the source domain to target domain or train the pixel-level classifier merely using translated source images and original target images. However, when there exists a large domain shift between source and target domains, we argue that this asymmetric structure could not fully eliminate the domain gap. In this paper, we present a novel deep symmetric architecture of UDA for medical image segmentation, which consists of a segmentation sub-network, and two symmetric source and target domain translation sub-networks. To be specific, based on two translation sub-networks, we introduce a bidirectional alignment scheme via a shared encoder and private decoders to simultaneously align features 1) from source to target domain and 2) from target to source domain, which helps effectively mitigate the discrepancy between domains. Furthermore, for the segmentation sub-network, we train a pixel-level classifier using not only original target images and translated source images, but also original source images and translated target images, which helps sufficiently leverage the semantic information from the images with different styles. Extensive experiments demonstrate that our method has remarkable advantages compared to the state-of-the-art methods in both cross-modality Cardiac and BraTS segmentation tasks.

preprint2020arXiv

GreyReID: A Two-stream Deep Framework with RGB-grey Information for Person Re-identification

In this paper, we observe that most false positive images (i.e., different identities with query images) in the top ranking list usually have the similar color information with the query image in person re-identification (Re-ID). Meanwhile, when we use the greyscale images generated from RGB images to conduct the person Re-ID task, some hard query images can obtain better performance compared with using RGB images. Therefore, RGB and greyscale images seem to be complementary to each other for person Re-ID. In this paper, we aim to utilize both RGB and greyscale images to improve the person Re-ID performance. To this end, we propose a novel two-stream deep neural network with RGB-grey information, which can effectively fuse RGB and greyscale feature representations to enhance the generalization ability of Re-ID. Firstly, we convert RGB images to greyscale images in each training batch. Based on these RGB and greyscale images, we train the RGB and greyscale branches, respectively. Secondly, to build up connections between RGB and greyscale branches, we merge the RGB and greyscale branches into a new joint branch. Finally, we concatenate the features of all three branches as the final feature representation for Re-ID. Moreover, in the training process, we adopt the joint learning scheme to simultaneously train each branch by the independent loss function, which can enhance the generalization ability of each branch. Besides, a global loss function is utilized to further fine-tune the final concatenated feature. The extensive experiments on multiple benchmark datasets fully show that the proposed method can outperform the state-of-the-art person Re-ID methods. Furthermore, using greyscale images can indeed improve the person Re-ID performance.

preprint2020arXiv

Progressive Cross-camera Soft-label Learning for Semi-supervised Person Re-identification

In this paper, we focus on the semi-supervised person re-identification (Re-ID) case, which only has the intra-camera (within-camera) labels but not inter-camera (cross-camera) labels. In real-world applications, these intra-camera labels can be readily captured by tracking algorithms or few manual annotations, when compared with cross-camera labels. In this case, it is very difficult to explore the relationships between cross-camera persons in the training stage due to the lack of cross-camera label information. To deal with this issue, we propose a novel Progressive Cross-camera Soft-label Learning (PCSL) framework for the semi-supervised person Re-ID task, which can generate cross-camera soft-labels and utilize them to optimize the network. Concretely, we calculate an affinity matrix based on person-level features and adapt them to produce the similarities between cross-camera persons (i.e., cross-camera soft-labels). To exploit these soft-labels to train the network, we investigate the weighted cross-entropy loss and the weighted triplet loss from the classification and discrimination perspectives, respectively. Particularly, the proposed framework alternately generates progressive cross-camera soft-labels and gradually improves feature representations in the whole learning course. Extensive experiments on five large-scale benchmark datasets show that PCSL significantly outperforms the state-of-the-art unsupervised methods that employ labeled source domains or the images generated by the GAN-based models. Furthermore, the proposed method even has a competitive performance with respect to deep supervised Re-ID methods.