Source author record

Guangming Shi

Guangming Shi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Computer Vision Information Theory Artificial Intelligence math.IT Machine Learning Multimedia Distributed, Parallel, and Cluster Computing eess.IV eess.SP Human-Computer Interaction Multiagent Systems

Catalog footprint

What is connected

15works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FUN: A Focal U-Net Combining Reconstruction and Object Detection for Snapshot Spectral Imaging

Conventional push-broom hyperspectral imaging suffers from slow acquisition speeds, precluding real-time object detection; in contrast, snapshot spectral imaging enables instantaneous hyperspectral images (HSIs) capture, making real-time object detection feasible, yet its potential is often compromised by time-consuming post-capture reconstruction. To address this issue, we propose the Focal U-shaped Network (FUN), a novel end-to-end framework that jointly performs HSI reconstruction and object detection via multi-task learning. FUN employs a shared U-shaped backbone, where reconstruction provides underlying spectral information while detection guides semantic-aware priors learning, facilitating mutually beneficial task interaction. Crucially, we introduce focal modulation, an efficient alternative to self-attention that modulates spatial and spectral features while reducing quadratic computational complexity, enabling a self-attention-free architecture for joint reconstruction and detection. Furthermore, we contribute a new HSI object detection dataset with 8712 annotated objects across 363 HSIs to facilitate evaluation of the proposed method. Experiments demonstrate that FUN achieves state-of-the-art performance on both tasks, using 40% fewer parameters and 30% less computation than recent alternatives, making it promising for future real-time edge deployment. The code and datasets are available: https://github.com/ShawnDong98/FUN.

preprint2026arXiv

Generalization Bounds of Emergent Communications for Agentic AI Networking

The evolution of 6G networking toward agentic AI networking (AgentNet) systems requires a shift from traditional data pipelines to task-aware, agentic AI-native communication solutions. Emergent communication, a novel communication paradigm in which autonomous agents learn their own signaling protocols through interaction, is increasingly viewed as a promising solution to address the challenges posed by existing rigid, predefined protocol-based networking architecture. However, most existing emergent communication frameworks fail to account for physical networking constraints, such as bandwidth and computational complexity, and often lack a rigorous information-theoretical foundation. To address these challenges, this paper introduces a novel emergent communication framework that facilitates collaborative task-solving among heterogeneous agents through an information-theoretic lens. We propose a novel joint loss function that unifies the optimization of decision-making functions and the learning of communication signaling. Our proposed solution is grounded on the multi-agent and multi-task distributed information bottleneck (DIB) theory, which allows the quantification of the fundamental trade-off between task-relevant information representation and computational complexity. We further provide theoretical generalization bounds of the emergent communication protocol during decentralized inference across unseen environmental states. Experimental validation on a real-world hardware prototype confirms that our proposed framework significantly improves generalization performance, compared to the state-of-the-art solutions.

preprint2025arXiv

Distributed Information Bottleneck Theory for Multi-Modal Task-Aware Semantic Communication

Semantic communication shifts the focus from bit-level accuracy to task-relevant semantic delivery, enabling efficient and intelligent communication for next-generation networks. However, existing multi-modal solutions often process all available data modalities indiscriminately, ignoring that their contributions to downstream tasks are often unequal. This not only leads to severe resource inefficiency but also degrades task inference performance due to irrelevant or redundant information. To tackle this issue, we propose a novel task-aware distributed information bottleneck (TADIB) framework, which quantifies the contribution of any set of modalities to given tasks. Based on this theoretical framework, we design a practical coding scheme that intelligently selects and compresses only the most task-relevant modalities at the transmitter. To find the optimal selection and the codecs in the network, we adopt the probabilistic relaxation of discrete selection, enabling distributed encoders to make coordinated decisions with score function estimation and common randomness. Extensive experiments on public datasets demonstrate that our solution matches or surpasses the inference quality of full-modal baselines while significantly reducing communication and computational costs.

preprint2023arXiv

Imitation Learning-based Implicit Semantic-aware Communication Networks: Multi-layer Representation and Collaborative Reasoning

Semantic communication has recently attracted significant interest from both industry and academia due to its potential to transform the existing data-focused communication architecture towards a more generally intelligent and goal-oriented semantic-aware networking system. Despite its promising potential, semantic communications and semantic-aware networking are still at their infancy. Most existing works focus on transporting and delivering the explicit semantic information, e.g., labels or features of objects, that can be directly identified from the source signal. The original definition of semantics as well as recent results in cognitive neuroscience suggest that it is the implicit semantic information, in particular the hidden relations connecting different concepts and feature items that plays the fundamental role in recognizing, communicating, and delivering the real semantic meanings of messages. Motivated by this observation, we propose a novel reasoning-based implicit semantic-aware communication network architecture that allows multiple tiers of CDC and edge servers to collaborate and support efficient semantic encoding, decoding, and interpretation for end-users. We introduce a new multi-layer representation of semantic information taking into consideration both the hierarchical structure of implicit semantics as well as the personalized inference preference of individual users. We model the semantic reasoning process as a reinforcement learning process and then propose an imitation-based semantic reasoning mechanism learning (iRML) solution for the edge servers to leaning a reasoning policy that imitates the inference behavior of the source user. A federated GCN-based collaborative reasoning solution is proposed to allow multiple edge servers to jointly construct a shared semantic interpretation model based on decentralized knowledge datasets.

preprint2023arXiv

Towards Net-Zero Carbon Emissions in Network AI for 6G and Beyond

A global effort has been initiated to reduce the worldwide greenhouse gas (GHG) emissions, primarily carbon emissions, by half by 2030 and reach net-zero by 2050. The development of 6G must also be compliant with this goal. Unfortunately, developing a sustainable and net-zero emission systems to meet the users' fast growing demands on mobile services, especially smart services and applications, may be much more challenging than expected. Particularly, despite the energy efficiency improvement in both hardware and software designs, the overall energy consumption and carbon emission of mobile networks are still increasing at a tremendous speed. The growing penetration of resource-demanding AI algorithms and solutions further exacerbate this challenge. In this article, we identify the major emission sources and introduce an evaluation framework for analyzing the lifecycle of network AI implementations. A novel joint dynamic energy trading and task allocation optimization framework, called DETA, has been introduced to reduce the overall carbon emissions. We consider a federated edge intelligence-based network AI system as a case study to verify the effectiveness of our proposed solution. Experimental results based on a hardware prototype suggest that our proposed solution can reduce carbon emissions of network AI systems by up to 74.9%. Finally, open problems and future directions are discussed.

preprint2022arXiv

Life-long Learning for Reasoning-based Semantic Communication

Semantic communication is an emerging paradigm that focuses on understanding and delivering semantics, or meaning of messages. Most existing semantic communication solutions define semantic meaning as the meaning of object labels recognized from a source signal, while ignoring intrinsic information that cannot be directly observed. Moreover, existing solutions often assume the recognizable semantic meanings are limited by a pre-defined label database. In this paper, we propose a novel reasoning-based semantic communication architecture in which the semantic meaning is represented by a graph-based knowledge structure in terms of object-entity, relationships, and reasoning rules. An embedding-based semantic interpretation framework is proposed to convert the high-dimensional graph-based representation of semantic meaning into a low-dimensional representation, which is efficient for channel transmission. We develop a novel inference function-based approach that can automatically infer hidden information such as missing entities and relations that cannot be directly observed from the message. Finally, we introduce a life-long model updating approach in which the receiver can learn from previously received messages and automatically update the reasoning rules of users when new unknown semantic entities and relations have been discovered. Extensive experiments are conducted based on a real-world knowledge database and numerical results show that our proposed solution achieves 76% interpretation accuracy of semantic meaning at the receiver, notably when some entities are missing in the transmitted message.

preprint2022arXiv

ModulE: Module Embedding for Knowledge Graphs

Knowledge graph embedding (KGE) has been shown to be a powerful tool for predicting missing links of a knowledge graph. However, existing methods mainly focus on modeling relation patterns, while simply embed entities to vector spaces, such as real field, complex field and quaternion space. To model the embedding space from a more rigorous and theoretical perspective, we propose a novel general group theory-based embedding framework for rotation-based models, in which both entities and relations are embedded as group elements. Furthermore, in order to explore more available KGE models, we utilize a more generic group structure, module, a generalization notion of vector space. Specifically, under our framework, we introduce a more generic embedding method, ModulE, which projects entities to a module. Following the method of ModulE, we build three instantiating models: ModulE$_{\mathbb{R},\mathbb{C}}$, ModulE$_{\mathbb{R},\mathbb{H}}$ and ModulE$_{\mathbb{H},\mathbb{H}}$, by adopting different module structures. Experimental results show that ModulE$_{\mathbb{H},\mathbb{H}}$ which embeds entities to a module over non-commutative ring, achieves state-of-the-art performance on multiple benchmark datasets.

preprint2022arXiv

Rate-Distortion Theory for Strategic Semantic Communication

This paper analyzes the fundamental limit of the strategic semantic communication problem in which a transmitter obtains a limited number of indirect observation of an intrinsic semantic information source and can then influence the receiver's decoding by sending a limited number of messages to an imperfect channel. The transmitter and the receiver can have different distortion measures and can make rational decision about their encoding and decoding strategies, respectively. The decoder can also have some side information (e.g., background knowledge and/or information obtained from previous communications) about the semantic source to assist its interpretation of the semantic information. We focus particularly on the case that the transmitter can commit to an encoding strategy and study the impact of the strategic decision making on the rate distortion of semantic communication. Three equilibrium solutions including the strong Stackelberg equilibrium, weak Stackelberg equilibrium, as well as Nash equilibrium have been studied and compared. The optimal encoding and decoding strategy profiles under various equilibrium solutions have been derived. We prove that committing to an encoding strategy cannot always bring benefit to the encoder. We therefore propose a feasible condition under which committing to an encoding strategy can always reduce the distortion performance of semantic communication.

preprint2022arXiv

Reasoning on the Air: An Implicit Semantic Communication Architecture

Semantic communication is a novel communication paradigm which draws inspiration from human communication focusing on the delivery of the meaning of a message to the intended users. It has attracted significant interest recently due to its potential to improve efficiency and reliability of communication, enhance users' quality-of-experience (QoE), and achieve smoother cross-protocol/domain communication. Most existing works in semantic communication focus on identifying and transmitting explicit semantic meaning, e.g., labels of objects, that can be directly identified from the source signal. This paper investigates implicit semantic communication in which the hidden information, e.g., implicit causality and reasoning mechanisms of users, that cannot be directly observed from the source signal needs to be transported and delivered to the intended users. We propose a novel implicit semantic communication (iSC) architecture for representing, communicating, and interpreting the implicit semantic meaning. In particular, we first propose a graph-inspired structure to represent implicit meaning of message based on three key components: entity, relation, and reasoning mechanism. We then propose a generative adversarial imitation learning-based reasoning mechanism learning (GAML) solution for the destination user to learn and imitate the reasoning process of the source user. We prove that, by applying GAML, the destination user can accurately imitate the reasoning process of the users to generate reasoning paths that follow the same probability distribution as the expert paths. Numerical results suggest that our proposed architecture can achieve accurate implicit meaning interpretation at the destination user.

preprint2021arXiv

Optical Flow Estimation via Motion Feature Recovery

Optical flow estimation with occlusion or large displacement is a problematic challenge due to the lost of corresponding pixels between consecutive frames. In this paper, we discover that the lost information is related to a large quantity of motion features (more than 40%) computed from the popular discriminative cost-volume feature would completely vanish due to invalid sampling, leading to the low efficiency of optical flow learning. We call this phenomenon the Vanishing Cost Volume Problem. Inspired by the fact that local motion tends to be highly consistent within a short temporal window, we propose a novel iterative Motion Feature Recovery (MFR) method to address the vanishing cost volume via modeling motion consistency across multiple frames. In each MFR iteration, invalid entries from original motion features are first determined based on the current flow. Then, an efficient network is designed to adaptively learn the motion correlation to recover invalid features for lost-information restoration. The final optical flow is then decoded from the recovered motion features. Experimental results on Sintel and KITTI show that our method achieves state-of-the-art performances. In fact, MFR currently ranks second on Sintel public website.

preprint2020arXiv

A Novel Transferability Attention Neural Network Model for EEG Emotion Recognition

The existed methods for electroencephalograph (EEG) emotion recognition always train the models based on all the EEG samples indistinguishably. However, some of the source (training) samples may lead to a negative influence because they are significant dissimilar with the target (test) samples. So it is necessary to give more attention to the EEG samples with strong transferability rather than forcefully training a classification model by all the samples. Furthermore, for an EEG sample, from the aspect of neuroscience, not all the brain regions of an EEG sample contains emotional information that can transferred to the test data effectively. Even some brain region data will make strong negative effect for learning the emotional classification model. Considering these two issues, in this paper, we propose a transferable attention neural network (TANN) for EEG emotion recognition, which learns the emotional discriminative information by highlighting the transferable EEG brain regions data and samples adaptively through local and global attention mechanism. This can be implemented by measuring the outputs of multiple brain-region-level discriminators and one single sample-level discriminator. We conduct the extensive experiments on three public EEG emotional datasets. The results validate that the proposed model achieves the state-of-the-art performance.

preprint2020arXiv

MetaIQA: Deep Meta-learning for No-Reference Image Quality Assessment

Recently, increasing interest has been drawn in exploiting deep convolutional neural networks (DCNNs) for no-reference image quality assessment (NR-IQA). Despite of the notable success achieved, there is a broad consensus that training DCNNs heavily relies on massive annotated data. Unfortunately, IQA is a typical small sample problem. Therefore, most of the existing DCNN-based IQA metrics operate based on pre-trained networks. However, these pre-trained networks are not designed for IQA task, leading to generalization problem when evaluating different types of distortions. With this motivation, this paper presents a no-reference IQA metric based on deep meta-learning. The underlying idea is to learn the meta-knowledge shared by human when evaluating the quality of images with various distortions, which can then be adapted to unknown distortions easily. Specifically, we first collect a number of NR-IQA tasks for different distortions. Then meta-learning is adopted to learn the prior knowledge shared by diversified distortions. Finally, the quality prior model is fine-tuned on a target NR-IQA task for quickly obtaining the quality model. Extensive experiments demonstrate that the proposed metric outperforms the state-of-the-arts by a large margin. Furthermore, the meta-model learned from synthetic distortions can also be easily generalized to authentic distortions, which is highly desired in real-world applications of IQA metrics.

preprint2020arXiv

Towards Ubiquitous AI in 6G with Federated Learning

With 5G cellular systems being actively deployed worldwide, the research community has started to explore novel technological advances for the subsequent generation, i.e., 6G. It is commonly believed that 6G will be built on a new vision of ubiquitous AI, an hyper-flexible architecture that brings human-like intelligence into every aspect of networking systems. Despite its great promise, there are several novel challenges expected to arise in ubiquitous AI-based 6G. Although numerous attempts have been made to apply AI to wireless networks, these attempts have not yet seen any large-scale implementation in practical systems. One of the key challenges is the difficulty to implement distributed AI across a massive number of heterogeneous devices. Federated learning (FL) is an emerging distributed AI solution that enables data-driven AI solutions in heterogeneous and potentially massive-scale networks. Although it still in an early stage of development, FL-inspired architecture has been recognized as one of the most promising solutions to fulfill ubiquitous AI in 6G. In this article, we identify the requirements that will drive convergence between 6G and AI. We propose an FL-based network architecture and discuss its potential for addressing some of the novel challenges expected in 6G. Future trends and key research problems for FL-enabled 6G are also discussed.

preprint2010arXiv

Image Deblurring and Super-resolution by Adaptive Sparse Domain Selection and Adaptive Regularization

As a powerful statistical image modeling technique, sparse representation has been successfully used in various image restoration applications. The success of sparse representation owes to the development of l1-norm optimization techniques, and the fact that natural images are intrinsically sparse in some domain. The image restoration quality largely depends on whether the employed sparse domain can represent well the underlying image. Considering that the contents can vary significantly across different images or different patches in a single image, we propose to learn various sets of bases from a pre-collected dataset of example image patches, and then for a given patch to be processed, one set of bases are adaptively selected to characterize the local sparse domain. We further introduce two adaptive regularization terms into the sparse representation framework. First, a set of autoregressive (AR) models are learned from the dataset of example image patches. The best fitted AR models to a given patch are adaptively selected to regularize the image local structures. Second, the image non-local self-similarity is introduced as another regularization term. In addition, the sparsity regularization parameter is adaptively estimated for better image restoration performance. Extensive experiments on image deblurring and super-resolution validate that by using adaptive sparse domain selection and adaptive regularization, the proposed method achieves much better results than many state-of-the-art algorithms in terms of both PSNR and visual perception.

preprint2010arXiv

Morphological dilation image coding with context weights prediction

This paper proposes an adaptive morphological dilation image coding with context weights prediction. The new dilation method is not to use fixed models, but to decide whether a coefficient needs to be dilated or not according to the coefficient's predicted significance degree. It includes two key dilation technologies: 1) controlling dilation process with context weights to reduce the output of insignificant coefficients, and 2) using variable-length group test coding with context weights to adjust the coding order and cost as few bits as possible to present the events with large probability. Moreover, we also propose a novel context weight strategy to predict coefficient's significance degree more accurately, which serves for two dilation technologies. Experimental results show that our proposed method outperforms the state of the art image coding algorithms available today.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2510.04000:author:6:guangming-shi

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2604.27653:author:7:guangming-shi

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.08613:author:3:guangming-shi

Imported May 20, 2026Synced May 20, 2026

8 works

Yong Xiao

Researcher

Yong Xiao contributes to research discovery and scholarly infrastructure.

Open to collaborate

5 works

Yingyu Li

Researcher

Yingyu Li contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Jingxuan Chai

Researcher

Jingxuan Chai contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Ping Zhang

Researcher

Ping Zhang contributes to research discovery and scholarly infrastructure.

Open to collaborate

Guangming Shi

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

FUN: A Focal U-Net Combining Reconstruction and Object Detection for Snapshot Spectral Imaging

Generalization Bounds of Emergent Communications for Agentic AI Networking

Distributed Information Bottleneck Theory for Multi-Modal Task-Aware Semantic Communication

Imitation Learning-based Implicit Semantic-aware Communication Networks: Multi-layer Representation and Collaborative Reasoning

Towards Net-Zero Carbon Emissions in Network AI for 6G and Beyond

Life-long Learning for Reasoning-based Semantic Communication

ModulE: Module Embedding for Knowledge Graphs

Rate-Distortion Theory for Strategic Semantic Communication

Reasoning on the Air: An Implicit Semantic Communication Architecture

Optical Flow Estimation via Motion Feature Recovery

A Novel Transferability Attention Neural Network Model for EEG Emotion Recognition

MetaIQA: Deep Meta-learning for No-Reference Image Quality Assessment

Towards Ubiquitous AI in 6G with Federated Learning

Image Deblurring and Super-resolution by Adaptive Sparse Domain Selection and Adaptive Regularization

Morphological dilation image coding with context weights prediction