Source author record

Ming Jiang

Ming Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Information Theory math.IT Artificial Intelligence Computation and Language Multimedia eess.IV eess.SP Machine Learning Applications Cryptography and Security Digital Libraries Human-Computer Interaction Neurons and Cognition Robotics

Catalog footprint

What is connected

16works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MLB: A Scenario-Driven Benchmark for Evaluating Large Language Models in Clinical Applications

The proliferation of Large Language Models (LLMs) presents transformative potential for healthcare, yet practical deployment is hindered by the absence of frameworks that assess real-world clinical utility. Existing benchmarks test static knowledge, failing to capture the dynamic, application-oriented capabilities required in clinical practice. To bridge this gap, we introduce a Medical LLM Benchmark MLB, a comprehensive benchmark evaluating LLMs on both foundational knowledge and scenario-based reasoning. MLB is structured around five core dimensions: Medical Knowledge (MedKQA), Safety and Ethics (MedSE), Medical Record Understanding (MedRU), Smart Services (SmartServ), and Smart Healthcare (SmartCare). The benchmark integrates 22 datasets (17 newly curated) from diverse Chinese clinical sources, covering 64 clinical specialties. Its design features a rigorous curation pipeline involving 300 licensed physicians. Besides, we provide a scalable evaluation methodology, centered on a specialized judge model trained via Supervised Fine-Tuning (SFT) on expert annotations. Our comprehensive evaluation of 10 leading models reveals a critical translational gap: while the top-ranked model, Kimi-K2-Instruct (77.3% accuracy overall), excels in structured tasks like information extraction (87.8% accuracy in MedRU), performance plummets in patient-facing scenarios (61.3% in SmartServ). Moreover, the exceptional safety score (90.6% in MedSE) of the much smaller Baichuan-M2-32B highlights that targeted training is equally critical. Our specialized judge model, trained via SFT on a 19k expert-annotated medical dataset, achieves 92.1% accuracy, an F1-score of 94.37%, and a Cohen's Kappa of 81.3% for human-AI consistency, validating a reproducible and expert-aligned evaluation protocol. MLB thus provides a rigorous framework to guide the development of clinically viable LLMs.

preprint2022arXiv

Artificial Intelligence Enables Real-Time and Intuitive Control of Prostheses via Nerve Interface

Objective: The next generation prosthetic hand that moves and feels like a real hand requires a robust neural interconnection between the human minds and machines. Methods: Here we present a neuroprosthetic system to demonstrate that principle by employing an artificial intelligence (AI) agent to translate the amputee's movement intent through a peripheral nerve interface. The AI agent is designed based on the recurrent neural network (RNN) and could simultaneously decode six degree-of-freedom (DOF) from multichannel nerve data in real-time. The decoder's performance is characterized in motor decoding experiments with three human amputees. Results: First, we show the AI agent enables amputees to intuitively control a prosthetic hand with individual finger and wrist movements up to 97-98% accuracy. Second, we demonstrate the AI agent's real-time performance by measuring the reaction time and information throughput in a hand gesture matching task. Third, we investigate the AI agent's long-term uses and show the decoder's robust predictive performance over a 16-month implant duration. Conclusion & significance: Our study demonstrates the potential of AI-enabled nerve technology, underling the next generation of dexterous and intuitive prosthetic hands.

preprint2022arXiv

Attention in Reasoning: Dataset, Analysis, and Modeling

While attention has been an increasingly popular component in deep neural networks to both interpret and boost the performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable. In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling a quantitative measurement of attention that considers the reasoning process. We then collect human eye-tracking and answer correctness data, and analyze various machine and human attention mechanisms on their reasoning capability and how they impact task performance. To improve the attention and reasoning ability of visual question answering models, we propose to supervise the learning of attention progressively along the reasoning process and to differentiate the correct and incorrect attention patterns. We demonstrate the effectiveness of the proposed framework in analyzing and modeling attention with better reasoning capability and task performance. The code and data are available at https://github.com/szzexpoi/AiR

preprint2022arXiv

Cross-Modality Gated Attention Fusion for Multimodal Sentiment Analysis

Multimodal sentiment analysis is an important research task to predict the sentiment score based on the different modality data from a specific opinion video. Many previous pieces of research have proved the significance of utilizing the shared and unique information across different modalities. However, the high-order combined signals from multimodal data would also help extract satisfied representations. In this paper, we propose CMGA, a Cross-Modality Gated Attention fusion model for MSA that tends to make adequate interaction across different modality pairs. CMGA also adds a forget gate to filter the noisy and redundant signals introduced in the interaction procedure. We experiment on two benchmark datasets in MSA, MOSI, and MOSEI, illustrating the performance of CMGA over several baseline models. We also conduct the ablation study to demonstrate the function of different components inside CMGA.

preprint2020arXiv

A CRC-aided Hybrid Decoding for Turbo Codes

Turbo codes and CRC codes are usually decoded separately according to the serially concatenated inner codes and outer codes respectively. In this letter, we propose a hybrid decoding algorithm of turbo-CRC codes, where the outer codes, CRC codes, are not used for error detection but as an assistance to improve the error correction performance. Two independent iterative decoding and reliability based decoding are carried out in a hybrid schedule, which can efficiently decode the two different codes as an entire codeword. By introducing an efficient error detecting method based on normalized Euclidean distance without CRC check, significant gain can be obtained by using the hybrid decoding method without loss of the error detection ability.

preprint2020arXiv

AiR: Attention with Reasoning Capability

While attention has been an increasingly popular component in deep neural networks to both interpret and boost performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable. In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling quantitative measurement of attention that considers the reasoning process. We then collect human eye-tracking and answer correctness data, and analyze various machine and human attentions on their reasoning capability and how they impact task performance. Furthermore, we propose a supervision method to jointly and progressively optimize attention, reasoning, and task performance so that models learn to look at regions of interests by following a reasoning process. We demonstrate the effectiveness of the proposed framework in analyzing and modeling attention with better reasoning capability and task performance. The code and data are available at https://github.com/szzexpoi/AiR

preprint2020arXiv

Improving Scholarly Knowledge Representation: Evaluating BERT-based Models for Scientific Relation Classification

With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on the large corpus have been popularly explored for automatic relation classification. Despite remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To this end, we present a thorough empirical evaluation on eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small size of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.

preprint2020arXiv

Joint Shortening and Puncturing Optimization for Structured LDPC Codes

The demand for flexible broadband wireless services makes the pruning technique, including both shortening and puncturing, an indispensable component of error correcting codes. The analysis of the pruning process for structured lowdensity parity-check (LDPC) codes can be considerably simplified with their equivalent representations through base-matrices or protographs. In this letter, we evaluate the thresholds of the pruned base-matrices by using protograph based on extrinsic information transfer (PEXIT). We also provide an efficient method to optimize the pruning patterns, which can significantly improve the thresholds of both the full-length patterns and the sub-patterns. Numerical results show that the structured LDPC codes pruned by the improved patterns outperform those with the existing patterns.

preprint2020arXiv

Lossless Attention in Convolutional Networks for Facial Expression Recognition in the Wild

Unlike the constraint frontal face condition, faces in the wild have various unconstrained interference factors, such as complex illumination, changing perspective and various occlusions. Facial expressions recognition (FER) in the wild is a challenging task and existing methods can't perform well. However, for occluded faces (containing occlusion caused by other objects and self-occlusion caused by head posture changes), the attention mechanism has the ability to focus on the non-occluded regions automatically. In this paper, we propose a Lossless Attention Model (LLAM) for convolutional neural networks (CNN) to extract attention-aware features from faces. Our module avoids decay information in the process of generating attention maps by using the information of the previous layer and not reducing the dimensionality. Sequentially, we adaptively refine the feature responses by fusing the attention map with the feature map. We participate in the seven basic expression classification sub-challenges of FG-2020 Affective Behavior Analysis in-the-wild Challenge. And we validate our method on the Aff-Wild2 datasets released by the Challenge. The total accuracy (Accuracy) and the unweighted mean (F1) of our method on the validation set are 0.49 and 0.38 respectively, and the final result is 0.42 (0.67 F1-Score + 0.33 Accuracy).

preprint2020arXiv

Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment

Currently, most image quality assessment (IQA) models are supervised by the MAE or MSE loss with empirically slow convergence. It is well-known that normalization can facilitate fast convergence. Therefore, we explore normalization in the design of loss functions for IQA. Specifically, we first normalize the predicted quality scores and the corresponding subjective quality scores. Then, the loss is defined based on the norm of the differences between these normalized values. The resulting "Norm-in-Norm'' loss encourages the IQA model to make linear predictions with respect to subjective quality scores. After training, the least squares regression is applied to determine the linear mapping from the predicted quality to the subjective quality. It is shown that the new loss is closely connected with two common IQA performance criteria (PLCC and RMSE). Through theoretical analysis, it is proved that the embedded normalization makes the gradients of the loss function more stable and more predictable, which is conducive to the faster convergence of the IQA model. Furthermore, to experimentally verify the effectiveness of the proposed loss, it is applied to solve a challenging problem: quality assessment of in-the-wild images. Experiments on two relevant datasets (KonIQ-10k and CLIVE) show that, compared to MAE or MSE loss, the new loss enables the IQA model to converge about 10 times faster and the final model achieves better performance. The proposed model also achieves state-of-the-art prediction performance on this challenging problem. For reproducible scientific research, our code is publicly available at https://github.com/lidq92/LinearityIQA.

preprint2020arXiv

Saliency Prediction with External Knowledge

The last decades have seen great progress in saliency prediction, with the success of deep neural networks that are able to encode high-level semantics. Yet, while humans have the innate capability in leveraging their knowledge to decide where to look (e.g. people pay more attention to familiar faces such as celebrities), saliency prediction models have only been trained with large eye-tracking datasets. This work proposes to bridge this gap by explicitly incorporating external knowledge for saliency models as humans do. We develop networks that learn to highlight regions by incorporating prior knowledge of semantic relationships, be it general or domain-specific, depending on the task of interest. At the core of the method is a new Graph Semantic Saliency Network (GraSSNet) that constructs a graph that encodes semantic relationships learned from external knowledge. A Spatial Graph Attention Network is then developed to update saliency features based on the learned graph. Experiments show that the proposed model learns to predict saliency from the external knowledge and outperforms the state-of-the-art on four saliency benchmarks.

preprint2020arXiv

Unified Quality Assessment of In-the-Wild Videos with Mixed Datasets Training

Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and distortions among existing datasets are quite different, which leads to poor performance of data-driven methods in the cross-dataset evaluation setting. To improve the performance of quality assessment models, we borrow intuitions from human perception, specifically, content dependency and temporal-memory effects of human visual system. To face the cross-dataset evaluation challenge, we explore a mixed datasets training strategy for training a single VQA model with multiple datasets. The proposed unified framework explicitly includes three stages: relative quality assessor, nonlinear mapping, and dataset-specific perceptual scale alignment, to jointly predict relative quality, perceptual quality, and subjective quality. Experiments are conducted on four publicly available datasets for VQA in the wild, i.e., LIVE-VQC, LIVE-Qualcomm, KoNViD-1k, and CVD2014. The experimental results verify the effectiveness of the mixed datasets training strategy and prove the superior performance of the unified model in comparison with the state-of-the-art models. For reproducible research, we make the PyTorch implementation of our method available at https://github.com/lidq92/MDTVSFA.

preprint2018arXiv

Quality Assessment for Tone-Mapped HDR Images Using Multi-Scale and Multi-Layer Information

Tone mapping operators and multi-exposure fusion methods allow us to enjoy the informative contents of high dynamic range (HDR) images with standard dynamic range devices, but also introduce distortions into HDR contents. Therefore methods are needed to evaluate tone-mapped image quality. Due to the complexity of possible distortions in a tone-mapped image, information from different scales and different levels should be considered when predicting tone-mapped image quality. So we propose a new no-reference method of tone-mapped image quality assessment based on multi-scale and multi-layer features that are extracted from a pre-trained deep convolutional neural network model. After being aggregated, the extracted features are mapped to quality predictions by regression. The proposed method is tested on the largest public database for TMIQA and compared to existing no-reference methods. The experimental results show that the proposed method achieves better performance.

preprint2012arXiv

A Hierarchical Bayesian Approach for Aerosol Retrieval Using MISR Data

Atmospheric aerosols can cause serious damage to human health and life expectancy. Using the radiances observed by NASA's Multi-angle Imaging SpectroRadiometer (MISR), the current MISR operational algorithm retrieves Aerosol Optical Depth (AOD) at a spatial resolution of 17.6 km x 17.6 km. A systematic study of aerosols and their impact on public health, especially in highly-populated urban areas, requires a finer-resolution estimate of the spatial distribution of AOD values. We embed MISR's operational weighted least squares criterion and its forward simulations for AOD retrieval in a likelihood framework and further expand it into a Bayesian hierarchical model to adapt to a finer spatial scale of 4.4 km x 4.4 km. To take advantage of AOD's spatial smoothness, our method borrows strength from data at neighboring pixels by postulating a Gaussian Markov Random Field prior for AOD. Our model considers both AOD and aerosol mixing vectors as continuous variables. The inference of AOD and mixing vectors is carried out using Metropolis-within-Gibbs sampling methods. Retrieval uncertainties are quantified by posterior variabilities. We also implement a parallel MCMC algorithm to reduce computational cost. We assess our retrievals performance using ground-based measurements from the AErosol RObotic NETwork (AERONET), a hand-held sunphotometer and satellite images from Google Earth. Based on case studies in the greater Beijing area, China, we show that a 4.4 km resolution can improve the accuracy and coverage of remotely-sensed aerosol retrievals, as well as our understanding of the spatial and seasonal behaviors of aerosols. This improvement is particularly important during high-AOD events, which often indicate severe air pollution.

preprint2010arXiv

Closed-Form Expressions for Relay Selection with Secrecy Constraints

An opportunistic relay selection based on instantaneous knowledge of channels is considered to increase security against eavesdroppers. The closed-form expressions are derived for the average secrecy rates and the outage probability when the cooperative networks use Decode-and-Forward (DF) or Amplify-and-Forward (AF) strategy. These techniques are demonstrated analytically and with simulation results.

preprint2010arXiv

Closed-Form Expressions for Secrecy Capacity over Correlated Rayleigh Fading Channels

We investigate the secure communications over correlated wiretap Rayleigh fading channels assuming the full channel state information (CSI) available. Based on the information theoretic formulation, we derive closed-form expressions for the average secrecy capacity and the outage probability. Simulation results confirm our analytical expressions.

Ming Jiang

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

MLB: A Scenario-Driven Benchmark for Evaluating Large Language Models in Clinical Applications

Artificial Intelligence Enables Real-Time and Intuitive Control of Prostheses via Nerve Interface

Attention in Reasoning: Dataset, Analysis, and Modeling

Cross-Modality Gated Attention Fusion for Multimodal Sentiment Analysis

A CRC-aided Hybrid Decoding for Turbo Codes

AiR: Attention with Reasoning Capability

Improving Scholarly Knowledge Representation: Evaluating BERT-based Models for Scientific Relation Classification

Joint Shortening and Puncturing Optimization for Structured LDPC Codes

Lossless Attention in Convolutional Networks for Facial Expression Recognition in the Wild

Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment

Saliency Prediction with External Knowledge

Unified Quality Assessment of In-the-Wild Videos with Mixed Datasets Training

Quality Assessment for Tone-Mapped HDR Images Using Multi-Scale and Multi-Layer Information

A Hierarchical Bayesian Approach for Aerosol Retrieval Using MISR Data

Closed-Form Expressions for Relay Selection with Secrecy Constraints

Closed-Form Expressions for Secrecy Capacity over Correlated Rayleigh Fading Channels