Researcher profile

Jing Huo

Jing Huo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

Reinforcement learning with verifiable rewards (RLVR) has demonstrated superior performance in enhancing the reasoning capability of large language models (LLMs). However, this accuracy-oriented learning paradigm often suffers from entropy collapse, which reduces policy exploration and limits reasoning capabilities. To address this challenge, we propose an efficient reinforcement learning framework that leverages entropy signals at both the semantic and token levels to improve reasoning. From the data perspective, we introduce semantic entropy-guided curriculum learning, organizing training data from low to high semantic entropy to guide progressive optimization from easier to more challenging tasks. For the algorithmic design, we adopt non-uniform token treatment by imposing KL regularization on low-entropy tokens that critically impact policy exploration and applying stronger constraints on high-covariance portions within these tokens. By jointly optimizing data organization and algorithmic design, our method effectively mitigates entropy collapse and enhances LLM reasoning. Experimental results across 6 benchmarks with 3 different parameter-scale base models demonstrate that our method outperforms other entropy-based approaches in improving reasoning.

preprint2026arXiv

Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning

Large Reasoning Models (LRMs) often suffer from overthinking, generating verbose reasoning traces that compromise both computational efficiency and interpretability. Unlike prior efforts that rely on global length-based rewards, we propose a semantic-aware decomposition of redundancy into two distinct forms: internal redundancy (informational stagnation within the reasoning process) and external redundancy (superfluous continuation after the final answer). We introduce a dual-penalty reinforcement learning framework that surgically targets these inefficiencies: a sliding-window semantic analysis is employed to penalize low-gain steps within the reasoning trajectory, while a normalized metric suppresses the post-answer tail. Extensive experiments demonstrate that our method significantly compresses Chain-of-Thought traces with minimal accuracy degradation, while maintaining strong generalization to out-of-domain tasks. Crucially, we reveal an asymmetry in redundancy: external redundancy can be safely eliminated without performance loss, whereas internal redundancy removal requires a calibrated trade-off to maintain reasoning fidelity. Our framework enables fine-grained, implicit control over reasoning length, paving the way for more concise and interpretable LRMs.

preprint2026arXiv

Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning

Large reasoning models (LRMs) have attracted much attention due to their exceptional performance. However, their performance mainly stems from thinking, a long Chain of Thought (CoT), which significantly increase computational overhead. To address this overthinking problem, existing work focuses on using reinforcement learning (RL) to train hybrid reasoning models that automatically decide whether to engage in thinking or not based on the complexity of the query. Unfortunately, using RL will suffer the the reward hacking problem, e.g., the model engages in thinking but is judged as not doing so, resulting in incorrect rewards. To mitigate this problem, existing works either employ supervised fine-tuning (SFT), which incurs high computational costs, or enforce uniform token limits on non-thinking responses, which yields limited mitigation of the problem. In this paper, we propose Thinking-Based Non-Thinking (TNT). It does not employ SFT, and sets different maximum token usage for responses not using thinking across various queries by leveraging information from the solution component of the responses using thinking. Experiments on five mathematical benchmarks demonstrate that TNT reduces token usage by around 50% compared to DeepSeek-R1-Distill-Qwen-1.5B/7B and DeepScaleR-1.5B, while significantly improving accuracy. In fact, TNT achieves the optimal trade-off between accuracy and efficiency among all tested methods. Additionally, the probability of reward hacking problem in TNT's responses, which are classified as not using thinking, remains below 10% across all tested datasets.

preprint2026arXiv

Training-Free Video Editing via Optical Flow-Enhanced Score Distillation

The rapid advancement in visual generation, particularly the emergence of pre-trained text-to-image and text-to-video models, has catalyzed growing interest in training-free video editing research. Mirroring training-free image editing techniques, current approaches preserve original video information through video input inversion and manipulating intermediate features and attention during the inference process to achieve content editing. Although they have demonstrated promising results, the lossy nature of the inversion process poses significant challenges in maintaining unedited regions of the video. Furthermore, feature and attention manipulation during inference can lead to unintended over-editing and face challenges in both local temporal continuity and global content consistency. To address these challenges, this study proposes a score distillation paradigm based on pre-trained text-to-video models, where the original video is iteratively optimized through multiple steps guided by editing gradients provided by score distillation to ultimately obtain the target video. The iterative optimization starting from the original video, combined with content preservation loss, ensures the maintenance of unedited regions in the original video and suppresses over-editing. To further guarantee video content consistency and temporal continuity, we additionally introduce a global consistency auxiliary loss and optical flow prediction-based local editing gradient smoothing. Experiments demonstrate that these strategies effectively address the aforementioned challenges, achieving comparable or superior performance across multiple dimensions including preservation of unedited regions, local temporal continuity, and global content consistency of editing results, compared to state-of-the-art methods.

preprint2022arXiv

LibFewShot: A Comprehensive Library for Few-shot Learning

Few-shot learning, especially few-shot image classification, has received increasing attention and witnessed significant advances in recent years. Some recent studies implicitly show that many generic techniques or ``tricks'', such as data augmentation, pre-training, knowledge distillation, and self-supervision, may greatly boost the performance of a few-shot learning method. Moreover, different works may employ different software platforms, backbone architectures and input image sizes, making fair comparisons difficult and practitioners struggle with reproducibility. To address these situations, we propose a comprehensive library for few-shot learning (LibFewShot) by re-implementing eighteen state-of-the-art few-shot learning methods in a unified framework with the same single codebase in PyTorch. Furthermore, based on LibFewShot, we provide comprehensive evaluations on multiple benchmarks with various backbone architectures to evaluate common pitfalls and effects of different training tricks. In addition, with respect to the recent doubts on the necessity of meta- or episodic-training mechanism, our evaluation results confirm that such a mechanism is still necessary especially when combined with pre-training. We hope our work can not only lower the barriers for beginners to enter the area of few-shot learning but also elucidate the effects of nontrivial tricks to facilitate intrinsic research on few-shot learning. The source code is available from https://github.com/RL-VIG/LibFewShot.

preprint2022arXiv

Playing Lottery Tickets in Style Transfer Models

Style transfer has achieved great success and attracted a wide range of attention from both academic and industrial communities due to its flexible application scenarios. However, the dependence on a pretty large VGG-based autoencoder leads to existing style transfer models having high parameter complexities, which limits their applications on resource-constrained devices. Compared with many other tasks, the compression of style transfer models has been less explored. Recently, the lottery ticket hypothesis (LTH) has shown great potential in finding extremely sparse matching subnetworks which can achieve on par or even better performance than the original full networks when trained in isolation. In this work, we for the first time perform an empirical study to verify whether such trainable matching subnetworks also exist in style transfer models. Specifically, we take two most popular style transfer models, i.e., AdaIN and SANet, as the main testbeds, which represent global and local transformation based style transfer methods respectively. We carry out extensive experiments and comprehensive analysis, and draw the following conclusions. (1) Compared with fixing the VGG encoder, style transfer models can benefit more from training the whole network together. (2) Using iterative magnitude pruning, we find the matching subnetworks at 89.2% sparsity in AdaIN and 73.7% sparsity in SANet, which demonstrates that style transfer models can play lottery tickets too. (3) The feature transformation module should also be pruned to obtain a much sparser model without affecting the existence and quality of the matching subnetworks. (4) Besides AdaIN and SANet, other models such as LST, MANet, AdaAttN and MCCNet can also play lottery tickets, which shows that LTH can be generalized to various style transfer models.

preprint2021arXiv

MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT Prostate Segmentation via Online Sampling

Fully convolutional networks (FCNs), including UNet and VNet, are widely-used network architectures for semantic segmentation in recent studies. However, conventional FCN is typically trained by the cross-entropy or Dice loss, which only calculates the error between predictions and ground-truth labels for pixels individually. This often results in non-smooth neighborhoods in the predicted segmentation. To address this problem, we propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate by a multi-task UNet architecture. We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network. Therefore, the proposed network has a dual-branch architecture that tackles two tasks: 1) a segmentation sub-network aiming to generate the prostate segmentation, and 2) a voxel-metric learning sub-network aiming to improve the quality of the learned feature space supervised by a metric loss. Specifically, the voxel-metric learning sub-network samples tuples (including triplets and pairs) in voxel-level through the intermediate feature maps. Unlike conventional deep metric learning methods that generate triplets or pairs in image-level before the training phase, our proposed voxel-wise tuples are sampled in an online manner and operated in an end-to-end fashion via multi-task learning. To evaluate the proposed method, we implement extensive experiments on a real CT image dataset consisting of 339 patients. The ablation studies show that our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss. And the comparisons show that the proposed method outperforms the state-of-the-art methods by a reasonable margin.

preprint2020arXiv

Asymmetric Distribution Measure for Few-shot Learning

The core idea of metric-based few-shot image classification is to directly measure the relations between query images and support classes to learn transferable feature embeddings. Previous work mainly focuses on image-level feature representations, which actually cannot effectively estimate a class's distribution due to the scarcity of samples. Some recent work shows that local descriptor based representations can achieve richer representations than image-level based representations. However, such works are still based on a less effective instance-level metric, especially a symmetric metric, to measure the relations between query images and support classes. Given the natural asymmetric relation between a query image and a support class, we argue that an asymmetric measure is more suitable for metric-based few-shot learning. To that end, we propose a novel Asymmetric Distribution Measure (ADM) network for few-shot learning by calculating a joint local and global asymmetric measure between two multivariate local distributions of queries and classes. Moreover, a task-aware Contrastive Measure Strategy (CMS) is proposed to further enhance the measure function. On popular miniImageNet and tieredImageNet, we achieve $3.02\%$ and $1.56\%$ gains over the state-of-the-art method on the $5$-way $1$-shot task, respectively, validating our innovative design of asymmetric distribution measures for few-shot learning.

preprint2020arXiv

GreyReID: A Two-stream Deep Framework with RGB-grey Information for Person Re-identification

In this paper, we observe that most false positive images (i.e., different identities with query images) in the top ranking list usually have the similar color information with the query image in person re-identification (Re-ID). Meanwhile, when we use the greyscale images generated from RGB images to conduct the person Re-ID task, some hard query images can obtain better performance compared with using RGB images. Therefore, RGB and greyscale images seem to be complementary to each other for person Re-ID. In this paper, we aim to utilize both RGB and greyscale images to improve the person Re-ID performance. To this end, we propose a novel two-stream deep neural network with RGB-grey information, which can effectively fuse RGB and greyscale feature representations to enhance the generalization ability of Re-ID. Firstly, we convert RGB images to greyscale images in each training batch. Based on these RGB and greyscale images, we train the RGB and greyscale branches, respectively. Secondly, to build up connections between RGB and greyscale branches, we merge the RGB and greyscale branches into a new joint branch. Finally, we concatenate the features of all three branches as the final feature representation for Re-ID. Moreover, in the training process, we adopt the joint learning scheme to simultaneously train each branch by the independent loss function, which can enhance the generalization ability of each branch. Besides, a global loss function is utilized to further fine-tune the final concatenated feature. The extensive experiments on multiple benchmark datasets fully show that the proposed method can outperform the state-of-the-art person Re-ID methods. Furthermore, using greyscale images can indeed improve the person Re-ID performance.

preprint2020arXiv

Progressive Cross-camera Soft-label Learning for Semi-supervised Person Re-identification

In this paper, we focus on the semi-supervised person re-identification (Re-ID) case, which only has the intra-camera (within-camera) labels but not inter-camera (cross-camera) labels. In real-world applications, these intra-camera labels can be readily captured by tracking algorithms or few manual annotations, when compared with cross-camera labels. In this case, it is very difficult to explore the relationships between cross-camera persons in the training stage due to the lack of cross-camera label information. To deal with this issue, we propose a novel Progressive Cross-camera Soft-label Learning (PCSL) framework for the semi-supervised person Re-ID task, which can generate cross-camera soft-labels and utilize them to optimize the network. Concretely, we calculate an affinity matrix based on person-level features and adapt them to produce the similarities between cross-camera persons (i.e., cross-camera soft-labels). To exploit these soft-labels to train the network, we investigate the weighted cross-entropy loss and the weighted triplet loss from the classification and discrimination perspectives, respectively. Particularly, the proposed framework alternately generates progressive cross-camera soft-labels and gradually improves feature representations in the whole learning course. Extensive experiments on five large-scale benchmark datasets show that PCSL significantly outperforms the state-of-the-art unsupervised methods that employ labeled source domains or the images generated by the GAN-based models. Furthermore, the proposed method even has a competitive performance with respect to deep supervised Re-ID methods.

preprint2020arXiv

Unsupervised Domain Attention Adaptation Network for Caricature Attribute Recognition

Caricature attributes provide distinctive facial features to help research in Psychology and Neuroscience. However, unlike the facial photo attribute datasets that have a quantity of annotated images, the annotations of caricature attributes are rare. To facility the research in attribute learning of caricatures, we propose a caricature attribute dataset, namely WebCariA. Moreover, to utilize models that trained by face attributes, we propose a novel unsupervised domain adaptation framework for cross-modality (i.e., photos to caricatures) attribute recognition, with an integrated inter- and intra-domain consistency learning scheme. Specifically, the inter-domain consistency learning scheme consisting an image-to-image translator to first fill the domain gap between photos and caricatures by generating intermediate image samples, and a label consistency learning module to align their semantic information. The intra-domain consistency learning scheme integrates the common feature consistency learning module with a novel attribute-aware attention-consistency learning module for a more efficient alignment. We did an extensive ablation study to show the effectiveness of the proposed method. And the proposed method also outperforms the state-of-the-art methods by a margin. The implementation of the proposed method is available at https://github.com/KeleiHe/DAAN.