Source author record

Chen Huang

Chen Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Computation and Language eess.SP eess.AS astro-ph.HE cond-mat.stat-mech econ.EM Information Retrieval math.AP Methodology Sound

Catalog footprint

What is connected

17works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Awakening LLMs' Reasoning Potential: A Fine-Grained Pipeline to Evaluate and Mitigate Vague Perception

Large language models (LLMs) are increasingly trained to abstain on difficult questions by answering unknown. However, we observe that LLMs often misuse this option: they output unknown even when LLMs can actually solve the questions, or they fail to understand why questions are truly unsolvable. We formalize this mismatch between potential ability and the inclination of abstention as the Vague Perception phenomenon. We introduce the WakenLLM pipeline that (1) extracts Vague Perception samples and (2) measures how many of them can be converted to correct answers under stimulation. Based on stage-wise metrics (TCR, OCR, etc.) and the upper-bound accuracy Acc(WakenLLM), we quantify LLMs' reasoning potential beyond one-shot accuracy. Experiments on six LLMs suggest that, without further training or parameter revisions, LLMs can achieve up to a 68.53% increase in accuracy on Vague Perception samples through our designed pipeline. We further analyze how Vague Perception, Conformity and Degradation vary from model families and parameter sizes, and offer model selection strategies in multi-stage reasoning workflows. Finally, by comparing WakenLLM against mainstream reasoning baselines, both training and non-training ones, we show that existing baselines only activate a small portion of LLMs' reasoning potential, pointing to perception-aware reasoning as a promising direction for future LLM designing. Code and datasets are available at https://github.com/WakenLLMTeam/WakenLLM-toolkit.

preprint2026arXiv

Quantifying LLM Biases Across Instruction Boundary in Mixed Question Forms

Large Language Models (LLMs) annotated datasets are widely used nowadays, however, large-scale annotations often show biases in low-quality datasets. For example, Multiple-Choice Questions (MCQs) datasets with one single correct option is common, however, there may be questions attributed to none or multiple correct options; whereas true-or-false questions are supposed to be labeled with either True or False, but similarly the text can include unsolvable elements, which should be further labeled as Unknown. There are problems when low-quality datasets with mixed question forms can not be identified. We refer to these exceptional label forms as Sparse Labels, and LLMs' ability to distinguish datasets with Sparse Labels mixture is important. Since users may not know situations of datasets, their instructions can be biased. To study how different instruction settings affect LLMs' identifications of Sparse Labels mixture, we introduce the concept of Instruction Boundary, which systematically evaluates different instruction settings that lead to biases. We propose BiasDetector, a diagnostic benchmark to systematically evaluate LLMs on datasets with mixed question forms under Instruction Boundary settings. Experiments show that users' instructions induce large biases on our benchmark, highlighting the need not only for LLM developers to recognize risks of LLM biased annotation resulting in Sparse Labels mixture, but also problems arising from users' instructions to identify them. Code, datasets and detailed implementations are available at https://github.com/ZpLing/Instruction-Boundary.

preprint2026arXiv

Text-Conditional JEPA for Learning Semantically Rich Visual Representations

Image-based Joint-Embedding Predictive Architecture (I-JEPA) offers a promising approach to visual self-supervised learning through masked feature prediction. However with the inherent visual uncertainty at masked positions, feature prediction remains challenging and may fail to learn semantic representations. In this work, we propose Text-Conditional JEPA (TC-JEPA) that uses image captions to reduce the prediction uncertainty. Specifically, we modulate the predicted patch features using a fine-grained text conditioner that computes sparse cross-attention over input text tokens. With such conditioning, patch features become predictable as a function of text, thus are more semantically meaningful. We show TC-JEPA improves downstream performance and training stability, with promising scaling properties. TC-JEPA also offers a new vision-language pretraining paradigm based on feature prediction only, outperforming contrastive methods on diverse tasks, especially those requiring fine-grained visual understanding and reasoning.

preprint2026arXiv

Unlocking Biological Workflows for Robust Protein-Text Question Answering: A Dual-Dimensional RAG Framework

Protein-Text Question Answering (QA) is crucial for interpreting biological sequences through natural language. The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) that efficiently leverages biological databases and facilitates reasoning offers a potent approach for it. However, constrained by the standard RAG pipeline, these models often rely on curated, static datasets instead of expert-proven biological workflows, lacking the fine-grained information processing and struggling to generalize to novel (OOD) proteins. To bridge this gap, we propose 2D-ProteinRAG, a novel framework that empowers LLMs to operate within the gold-standard biological research workflow (BLAST). To further extract high-quality information from noisy retrieval contexts, we introduce a dual-dimensional (2D) filtering strategy following the expert analytical paradigms. Horizontal Fine-grained Attribute Alignment utilizes a lightweight, intent-aware discriminative filter to prune irrelevant metadata and align database entries with specific user queries. Vertical Homology-based Semantic Denoising resolves functional contradictions and redundancy across multiple homologs via hierarchical clustering. Extensive evaluations on both In-Distribution and diverse biological OOD benchmarks demonstrate that 2D-ProteinRAG consistently achieves state-of-the-art performance, outperforming fine-tuned baselines and other RAG methods. Our results validate the framework's robustness and scalability, providing a practical solution for interpreting protein functions in real-world scientific scenarios.

preprint2025arXiv

Ultrahigh-Energy Gamma-ray Emission Associated with Black Hole-Jet Systems

Black holes (BH), one of the most intriguing objects in the universe, can manifest themselves through electromagnetic radiation initiated by the accretion flow. Some stellar-mass BHs drive relativistic jets when accreting matter from their companion stars, forming microquasars. Non-thermal emission from the radio to tera-electronvolt (TeV) gamma-ray band has been observed from microquasars, indicating the acceleration of relativistic particles. Here we report detection of four microquasars (SS 433, V4641 Sgr, GRS 1915+105, MAXI J1820+070) of spectrum extending to the ultrahigh-energy (UHE; photon energy $E>100$ TeV) band and one microquasar (Cygnus X-1) of spectrum approaching 100 TeV, using the Large High Altitude Air Shower Observatory (LHAASO). Notably, the total emission associated with SS 433 cannot be interpreted with a single leptonic component. In the UHE band, its emission is in spatial coincidence with a giant atomic cloud, which is consistent with a hadronic origin. An elongated source is discovered from V4641 Sgr with the spectrum continuing up to 800 TeV. The detection of UHE gamma rays demonstrates that accreting BHs and their environments can operate as extremely efficient accelerators of particles out of 1 peta-electronvolt (PeV), suggesting microquasars to be important contributors to Galactic cosmic rays especially around the `knee' region.

preprint2022arXiv

A Weighted Random Forest Based PositioningAlgorithm for 6G Indoor Communications

Due to the indoor none-line-of-sight (NLoS) propagation and multi-access interference (MAI), it is a great challenge to achieve centimeter-level positioning accuracy in indoor scenarios. However, the sixth generation (6G) wireless communications provide a good opportunity for the centimeter-level positioning. In 6G, the millimeter wave (mmWave) and terahertz (THz) communications have ultra-broad bandwidth so that the channel state information (CSI) will have a high resolution. In this paper, a weighted random forest (WRF) based indoor positioning algorithm using CSI based channel fingerprint feature is proposed to achieve high-precision positioning for 6G indoor communications. In addition, ray-tracing (RT) is used to improve the efficiency of establishing channel fingerprint database. The simulation results demonstrate the accuracy and robustness of the proposed algorithm. It is shown that the positioning accuracy of the algorithm is stable within 6 cm in different indoor scenarios with the channel fingerprint database established at 0.2 m intervals.

preprint2022arXiv

An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and internal language model estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract the implicitly learned internal language model (ILM) prior, in order to integrate the ELM. While recent studies suggest that RNN-T only learns some low-order language model information, the DR method uses a well-trained neural language model with full context, which may be inappropriate for the estimation of ILM and deteriorate the integration performance. Based on the DR method, we propose a low-order density ratio method (LODR) by replacing the estimation with a low-order weak language model. Extensive empirical experiments are conducted on both in-domain and cross-domain scenarios on English LibriSpeech & Tedlium-2 and Chinese WenetSpeech & AISHELL-1 datasets. It is shown that LODR consistently outperforms SF in all tasks, while performing generally close to ILME and better than DR in most tests.

preprint2022arXiv

Efficient Representation Learning via Adaptive Context Pooling

Self-attention mechanisms model long-range context by using pairwise attention between all input tokens. In doing so, they assume a fixed attention granularity defined by the individual tokens (e.g., text characters or image pixels), which may not be optimal for modeling complex dependencies at higher levels. In this paper, we propose ContextPool to address this problem by adapting the attention granularity for each token. Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer. The pooling weights and support size are adaptively determined, allowing the pooled features to encode meaningful context with varying scale. We show that ContextPool makes attention models more expressive, achieving strong performance often with fewer layers and thus significantly reduced cost. Experiments validate that our ContextPool module, when plugged into transformer models, matches or surpasses state-of-the-art performance using less compute on several language and image benchmarks, outperforms recent works with learned context sizes or sparse attention patterns, and is also applicable to ConvNets for efficient feature learning.

preprint2022arXiv

Evading Thermodynamic Uncertainty Relations via Asymmetric Dynamic Protocols

Many versions of Thermodynamic Uncertainty Relations (TUR) have recently been discovered, which impose lower bounds on relative fluctuations of integrated currents in irreversible dissipative processes, and suggest that there may be fundamental limitations on the precision of small scale machines and heat engines. In this work we rigorously demonstrate that TUR can be evaded by using dynamic protocols that are asymmetric under time-reversal. We illustrate our results using a model heat engine using two-level systems, and also discuss heuristically the fundamental connections between TUR and time-reversal symmetry.

preprint2022arXiv

Position Prediction as an Effective Pretraining Strategy

Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing this representational capacity effectively requires a large amount of data, strong regularization, or both, to mitigate overfitting. Recently, the power of the Transformer has been unlocked by self-supervised pretraining strategies based on masked autoencoders which rely on reconstructing masked inputs, directly, or contrastively from unmasked content. This pretraining strategy which has been used in BERT models in NLP, Wav2Vec models in Speech and, recently, in MAE models in Vision, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives. In this paper, we propose a novel, but surprisingly simple alternative to content reconstruction~-- that of predicting locations from content, without providing positional information for it. Doing so requires the Transformer to understand the positional relationships between different parts of the input, from their content alone. This amounts to an efficient implementation where the pretext task is a classification problem among all possible positions for each input token. We experiment on both Vision and Speech benchmarks, where our approach brings improvements over strong supervised training baselines and is comparable to modern unsupervised/self-supervised pretraining methods. Our method also enables Transformers trained without position embeddings to outperform ones trained with full position information.

preprint2022arXiv

The existence and multiplicity of solutions for general quasi-linear elliptic equations with sub-cubic nonlinearity

We consider the existence and multiplicity of solutions for a class of quasi-linear Schrödinger equations which include the modified nonlinear Schrödinger equations. A new perturbation approach is used to treat the sub-cubic nonlinearity.

preprint2021arXiv

Artificial intelligence enabled radio propagation for communications-Part I: Channel characterization and antenna-channel optimization

To provide higher data rates, as well as better coverage, cost efficiency, security, adaptability, and scalability, the 5G and beyond 5G networks are developed with various artificial intelligence techniques. In this two-part paper, we investigate the application of artificial intelligence (AI) and in particular machine learning (ML) to the study of wireless propagation channels. It firstly provides a comprehensive overview of ML for channel characterization and ML-based antenna-channel optimization in this first part, and then it gives a state-of-the-art literature review of channel scenario identification and channel modeling in Part II. Fundamental results and key concepts of ML for communication networks are presented, and widely used ML methods for channel data processing, propagation channel estimation, and characterization are analyzed and compared. A discussion of challenges and future research directions for ML-enabled next generation networks of the topics covered in this part rounds off the paper.

preprint2021arXiv

Artificial intelligence enabled radio propagation for communications-Part II: Scenario identification and channel modeling

This two-part paper investigates the application of artificial intelligence (AI) and in particular machine learning (ML) to the study of wireless propagation channels. In Part I, we introduced AI and ML as well as provided a comprehensive survey on ML enabled channel characterization and antenna-channel optimization, and in this part (Part II) we review state-of-the-art literature on scenario identification and channel modeling here. In particular, the key ideas of ML for scenario identification and channel modeling/prediction are presented, and the widely used ML methods for propagation scenario identification and channel modeling and prediction are analyzed and compared. Based on the state-of-art, the future challenges of AI/ML-based channel data processing techniques are given as well.

preprint2020arXiv

LASSO-Driven Inference in Time and Space

We consider the estimation and inference in a system of high-dimensional regression equations allowing for temporal and cross-sectional dependency in covariates and error processes, covering rather general forms of weak temporal dependence. A sequence of regressions with many regressors using LASSO (Least Absolute Shrinkage and Selection Operator) is applied for variable selection purpose, and an overall penalty level is carefully chosen by a block multiplier bootstrap procedure to account for multiplicity of the equations and dependencies in the data. Correspondingly, oracle properties with a jointly selected tuning parameter are derived. We further provide high-quality de-biased simultaneous inference on the many target parameters of the system. We provide bootstrap consistency results of the test procedure, which are based on a general Bahadur representation for the $Z$-estimators with dependent data. Simulations demonstrate good performance of the proposed inference procedure. Finally, we apply the method to quantify spillover effects of textual sentiment indices in a financial market and to test the connectedness among sectors.

preprint2016arXiv

Discriminative Sparse Neighbor Approximation for Imbalanced Learning

Data imbalance is common in many vision tasks where one or more classes are rare. Without addressing this issue conventional methods tend to be biased toward the majority class with poor predictive accuracy for the minority class. These methods further deteriorate on small, imbalanced data that has a large degree of class overlap. In this study, we propose a novel discriminative sparse neighbor approximation (DSNA) method to ameliorate the effect of class-imbalance during prediction. Specifically, given a test sample, we first traverse it through a cost-sensitive decision forest to collect a good subset of training examples in its local neighborhood. Then we generate from this subset several class-discriminating but overlapping clusters and model each as an affine subspace. From these subspaces, the proposed DSNA iteratively seeks an optimal approximation of the test sample and outputs an unbiased prediction. We show that our method not only effectively mitigates the imbalance issue, but also allows the prediction to extrapolate to unseen data. The latter capability is crucial for achieving accurate prediction on small dataset with limited samples. The proposed imbalanced learning method can be applied to both classification and regression tasks at a wide range of imbalance levels. It significantly outperforms the state-of-the-art methods that do not possess an imbalance handling mechanism, and is found to perform comparably or even better than recent deep learning methods by using hand-crafted features only.

preprint2016arXiv

Local Similarity-Aware Deep Feature Embedding

Existing deep embedding methods in vision tasks are capable of learning a compact Euclidean space from images, where Euclidean distances correspond to a similarity metric. To make learning more effective and efficient, hard sample mining is usually employed, with samples identified through computing the Euclidean feature distance. However, the global Euclidean distance cannot faithfully characterize the true feature similarity in a complex visual feature space, where the intraclass distance in a high-density region may be larger than the interclass distance in low-density regions. In this paper, we introduce a Position-Dependent Deep Metric (PDDM) unit, which is capable of learning a similarity metric adaptive to local feature structure. The metric can be used to select genuinely hard samples in a local neighborhood to guide the deep embedding learning in an online and robust manner. The new layer is appealing in that it is pluggable to any convolutional networks and is trained end-to-end. Our local similarity-aware feature embedding not only demonstrates faster convergence and boosted performance on two complex image retrieval datasets, its large margin nature also leads to superior generalization results under the large and open set scenarios of transfer learning and zero-shot learning on ImageNet 2010 and ImageNet-10K datasets.

preprint2016arXiv

Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking

In this paper, we develop a new approach of spatially supervised recurrent convolutional neural networks for visual object tracking. Our recurrent convolutional network exploits the history of locations as well as the distinctive visual features learned by the deep neural networks. Inspired by recent bounding box regression methods for object detection, we study the regression capability of Long Short-Term Memory (LSTM) in the temporal domain, and propose to concatenate high-level visual features produced by convolutional networks with region information. In contrast to existing deep learning based trackers that use binary classification for region candidates, we use regression for direct prediction of the tracking locations both at the convolutional layer and at the recurrent unit. Our extensive experimental results and performance comparison with state-of-the-art tracking methods on challenging benchmark video tracking datasets shows that our tracker is more accurate and robust while maintaining low computational cost. For most test video sequences, our method achieves the best tracking performance, often outperforms the second best by a large margin.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computer Vision Machine Learning Computation and Language eess.SP eess.AS astro-ph.HE cond-mat.stat-mech econ.EM Information Retrieval math.AP Methodology Sound

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2410.08988:author:90:chen-huang

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.03245:author:1:chen-huang

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.17261:author:3:chen-huang

Imported May 20, 2026Synced May 20, 2026

3 works

Cheng-Xiang Wang

Researcher

Cheng-Xiang Wang contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Andreas F. Molisch

Researcher

Andreas F. Molisch contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Bo Ai

Researcher

Bo Ai contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Bo Liu

Researcher

Bo Liu contributes to research discovery and scholarly infrastructure.

Open to collaborate

Chen Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Awakening LLMs' Reasoning Potential: A Fine-Grained Pipeline to Evaluate and Mitigate Vague Perception

Quantifying LLM Biases Across Instruction Boundary in Mixed Question Forms

Text-Conditional JEPA for Learning Semantically Rich Visual Representations

Unlocking Biological Workflows for Robust Protein-Text Question Answering: A Dual-Dimensional RAG Framework

Ultrahigh-Energy Gamma-ray Emission Associated with Black Hole-Jet Systems

A Weighted Random Forest Based PositioningAlgorithm for 6G Indoor Communications

An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Efficient Representation Learning via Adaptive Context Pooling

Evading Thermodynamic Uncertainty Relations via Asymmetric Dynamic Protocols

Position Prediction as an Effective Pretraining Strategy

The existence and multiplicity of solutions for general quasi-linear elliptic equations with sub-cubic nonlinearity

Artificial intelligence enabled radio propagation for communications-Part I: Channel characterization and antenna-channel optimization

Artificial intelligence enabled radio propagation for communications-Part II: Scenario identification and channel modeling

LASSO-Driven Inference in Time and Space

Discriminative Sparse Neighbor Approximation for Imbalanced Learning

Local Similarity-Aware Deep Feature Embedding

Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking