Researcher profile

Ming Zhao

Ming Zhao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2026arXiv

inRAN: Interpretable Online Bayesian Learning for Network Automation in Open Radio Access Networks

Emerging AI/ML techniques have been showing great potential in automating network control in open radio access networks (Open RAN). However, existing approaches heavily rely on blackbox policies parameterized by deep neural networks, which inherently lack interpretability, explainability, and transparency, and create substantial obstacles in practical network deployment. In this paper, we propose inRAN, a novel interpretable online Bayesian learning framework for network automation in Open RAN. The core idea is to integrate interpretable surrogate models and safe optimization solvers to continually optimize control actions, while adapting to non-stationary dynamics in real-world networks. We achieve the inRAN framework with three key components: 1) an interpretable surrogate model via ensembling Kolmogorov-Arnold Networks (KANs); 2) safe optimization solvers via integrating genetic search and trust-region descent method; 3) an online dynamics tracker via continual model learning and adaptive threshold offset. We implement inRAN in an end-to-end O-RAN-compliant network testbed, and conduct extensive over-the-air experiments with the focused use case of network slicing. The results show that, inRAN substantially outperforms state-of-the-art works, by guaranteeing the chance-based constraint with a 92.67% assurance ratio with comparative resource usage throughout the online network control, under unforeseeable time-evolving network dynamics.

preprint2026arXiv

oneTwin: Online Digital Network Twin via Neural Radio Radiance Field

Digital network twin is a promising technology that replicates real-world networks in real-time and assists with the design, operation, and management of next-generation networks. However, existing approaches (e.g., simulator-based and neural-based) cannot effectively realize the digital network twin, in terms of fidelity, synchronicity, and tractability. In this paper, we propose oneTwin, the first online digital twin system, for the prediction of physical layer metrics. We architect the oneTwin system with two primary components: an enhanced simulator and a neural radio radiance field (NRRF). On the one hand, we achieve the enhanced simulator by designing a material tuning algorithm that incrementally optimizes the building materials to minimize the twin-to-real gap. On the other hand, we achieve the NRRF by designing a neural learning algorithm that continually updates its DNNs based on both online and simulated data from the enhanced simulator. We implement oneTwin system using Sionna RT as the simulator and developing new DNNs as the NRRF, under a public cellular network. Extensive experimental results show that, compared to state-of-the-art solutions, oneTwin achieves real-time updating (0.98s), with 36.39% and 57.50% reductions of twin-to-real gap under in-distribution and out-of-distribution test datasets, respectively.

preprint2022arXiv

Automatic Registration of Images with Inconsistent Content Through Line-Support Region Segmentation and Geometrical Outlier Removal

The implementation of automatic image registration is still difficult in various applications. In this paper, an automatic image registration approach through line-support region segmentation and geometrical outlier removal (ALRS-GOR) is proposed. This new approach is designed to address the problems associated with the registration of images with affine deformations and inconsistent content, such as remote sensing images with different spectral content or noise interference, or map images with inconsistent annotations. To begin with, line-support regions, namely a straight region whose points share roughly the same image gradient angle, are extracted to address the issues of inconsistent content existing in images. To alleviate the incompleteness of line segments, an iterative strategy with multi-resolution is employed to preserve global structures that are masked at full resolution by image details or noise. Then, Geometrical Outlier Removal (GOR) is developed to provide reliable feature point matching, which is based on affineinvariant geometrical classifications for corresponding matches initialized by SIFT. The candidate outliers are selected by comparing the disparity of accumulated classifications among all matches, instead of conventional methods which only rely on local geometrical relations. Various image sets have been considered in this paper for the evaluation of the proposed approach, including aerial images with simulated affine deformations, remote sensing optical and synthetic aperture radar images taken at different situations (multispectral, multisensor, and multitemporal), and map images with inconsistent annotations. Experimental results demonstrate the superior performance of the proposed method over the existing approaches for the whole data set.

preprint2022arXiv

EmoCaps: Emotion Capsule based Model for Conversational Emotion Recognition

Emotion recognition in conversation (ERC) aims to analyze the speaker's state and identify their emotion in the conversation. Recent works in ERC focus on context modeling but ignore the representation of contextual emotional tendency. In order to extract multi-modal information and the emotional tendency of the utterance effectively, we propose a new structure named Emoformer to extract multi-modal emotion vectors from different modalities and fuse them with sentence vector to be an emotion capsule. Furthermore, we design an end-to-end ERC model called EmoCaps, which extracts emotion vectors through the Emoformer structure and obtain the emotion classification results from a context analysis model. Through the experiments with two benchmark datasets, our model shows better performance than the existing state-of-the-art models.

preprint2022arXiv

Enabling Deep Learning on Edge Devices through Filter Pruning and Knowledge Transfer

Deep learning models have introduced various intelligent applications to edge devices, such as image classification, speech recognition, and augmented reality. There is an increasing need of training such models on the devices in order to deliver personalized, responsive, and private learning. To address this need, this paper presents a new solution for deploying and training state-of-the-art models on the resource-constrained devices. First, the paper proposes a novel filter-pruning-based model compression method to create lightweight trainable models from large models trained in the cloud, without much loss of accuracy. Second, it proposes a novel knowledge transfer method to enable the on-device model to update incrementally in real time or near real time using incremental learning on new data and enable the on-device model to learn the unseen categories with the help of the in-cloud model in an unsupervised fashion. The results show that 1) our model compression method can remove up to 99.36% parameters of WRN-28-10, while preserving a Top-1 accuracy of over 90% on CIFAR-10; 2) our knowledge transfer method enables the compressed models to achieve more than 90% accuracy on CIFAR-10 and retain good accuracy on old categories; 3) it allows the compressed models to converge within real time (three to six minutes) on the edge for incremental learning tasks; 4) it enables the model to classify unseen categories of data (78.92% Top-1 accuracy) that it is never trained with.

preprint2022arXiv

High fill factor confocal compound eyes fabricated by direct laser writing for better imaging quality

We fabricate two kinds of 100% fill factor compound eye structures using direct laser writing, including conventional compound eyes (CVCEs) with the same focal length of each microlens unit, and specially designed confocal compound eyes (CFCEs). For CFCEs, the focal length of each microlens unit is determined by its position and is equal to the distance between the microlens unit and the image sensor. In this letter, the optical properties of CVCEs and CFCEs are tested and compared. It is found that compared with CVCEs, CFCEs can improve the focusing efficiency by about 7%, enlarge the imaging area by about 25%, and have better imaging quality at the edge of the field of view.

preprint2022arXiv

Iterative Activation-based Structured Pruning

Deploying complex deep learning models on edge devices is challenging because they have substantial compute and memory resource requirements, whereas edge devices' resource budget is limited. To solve this problem, extensive pruning techniques have been proposed for compressing networks. Recent advances based on the Lottery Ticket Hypothesis (LTH) show that iterative model pruning tends to produce smaller and more accurate models. However, LTH research focuses on unstructured pruning, which is hardware-inefficient and difficult to accelerate on hardware platforms. In this paper, we investigate iterative pruning in the context of structured pruning because structurally pruned models map well on commodity hardware. We find that directly applying a structured weight-based pruning technique iteratively, called iterative L1-norm based pruning (ILP), does not produce accurate pruned models. To solve this problem, we propose two activation-based pruning methods, Iterative Activation-based Pruning (IAP) and Adaptive Iterative Activation-based Pruning (AIAP). We observe that, with only 1% accuracy loss, IAP and AIAP achieve 7.75X and 15.88$X compression on LeNet-5, and 1.25X and 1.71X compression on ResNet-50, whereas ILP achieves 4.77X and 1.13X, respectively.

preprint2022arXiv

LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network

3D room layout estimation by a single panorama using deep neural networks has made great progress. However, previous approaches can not obtain efficient geometry awareness of room layout with the only latitude of boundaries or horizon-depth. We present that using horizon-depth along with room height can obtain omnidirectional-geometry awareness of room layout in both horizontal and vertical directions. In addition, we propose a planar-geometry aware loss function with normals and gradients of normals to supervise the planeness of walls and turning of corners. We propose an efficient network, LGT-Net, for room layout estimation, which contains a novel Transformer architecture called SWG-Transformer to model geometry relations. SWG-Transformer consists of (Shifted) Window Blocks and Global Blocks to combine the local and global geometry relations. Moreover, we design a novel relative position embedding of Transformer to enhance the spatial identification ability for the panorama. Experiments show that the proposed LGT-Net achieves better performance than current state-of-the-arts (SOTA) on benchmark datasets.

preprint2022arXiv

MIC: Model-agnostic Integrated Cross-channel Recommenders

Semantically connecting users and items is a fundamental problem for the matching stage of an industrial recommender system. Recent advances in this topic are based on multi-channel retrieval to efficiently measure users' interest on items from the massive candidate pool. However, existing work are primarily built upon pre-defined retrieval channels, including User-CF (U2U), Item-CF (I2I), and Embedding-based Retrieval (U2I), thus access to the limited correlation between users and items which solely entail from partial information of latent interactions. In this paper, we propose a model-agnostic integrated cross-channel (MIC) approach for the large-scale recommendation, which maximally leverages the inherent multi-channel mutual information to enhance the matching performance. Specifically, MIC robustly models correlation within user-item, user-user, and item-item from latent interactions in a universal schema. For each channel, MIC naturally aligns pairs with semantic similarity and distinguishes them otherwise with more uniform anisotropic representation space. While state-of-the-art methods require specific architectural design, MIC intuitively considers them as a whole by enabling the complete information flow among users and items. Thus MIC can be easily plugged into other retrieval recommender systems. Extensive experiments show that our MIC helps several state-of-the-art models boost their performance on two real-world benchmarks. The satisfactory deployment of the proposed MIC on industrial online services empirically proves its scalability and flexibility.

preprint2022arXiv

OpenCalib: A Multi-sensor Calibration Toolbox for Autonomous Driving

Accurate sensor calibration is a prerequisite for multi-sensor perception and localization systems for autonomous vehicles. The intrinsic parameter calibration of the sensor is to obtain the mapping relationship inside the sensor, and the extrinsic parameter calibration is to transform two or more sensors into a unified spatial coordinate system. Most sensors need to be calibrated after installation to ensure the accuracy of sensor measurements. To this end, we present OpenCalib, a calibration toolbox that contains a rich set of various sensor calibration methods. OpenCalib covers manual calibration tools, automatic calibration tools, factory calibration tools, and online calibration tools for different application scenarios. At the same time, to evaluate the calibration accuracy and subsequently improve the accuracy of the calibration algorithm, we released a corresponding benchmark dataset. This paper introduces various features and calibration methods of this toolbox. To our knowledge, this is the first open-sourced calibration codebase containing the full set of autonomous-driving-related calibration approaches in this area. We wish that the toolbox could be helpful to autonomous driving researchers. We have open-sourced our code on GitHub to benefit the community. Code is available at https://github.com/PJLab-ADG/SensorsCalibration.

preprint2022arXiv

Phase-Only Holographic Assisted Planar Printing for Massively Multiplexed Optical Display and Encryption

Multiplexed planar printings, made of single or few layer micro and nano optical platforms, are essential for high capacity display, information storage and encryption. Although having been developed rapidly, the demonstrated channels are still limited and also lack instantaneity. Here, holograms and printings, always regarded as two independent information coding domains with totally different principles, are combined together through our proposed angle multiplexing framework, leading to multiplexed printings with hundreds of channels. Based on such approach, we experimentally encode respectively 25 gray scale printings into 25 angles and even 8 gray scale videos into 8 angles with a phase-only spatial light modulator. As a bridge between printings and holograms, our method allows to generate printings combining various holographic methods. Beneficial from this, we demonstrate a gradient metasurface based 324 channel printing which multiplexes angles, polarizations and wavelengths simultaneously. Our work paves the way to flexibly angle-dependent printing display and massively multiplexed encryption systems.

preprint2022arXiv

RFVTM: A Recovery and Filtering Vertex Trichotomy Matching for Remote Sensing Image Registration

Reliable feature point matching is a vital yet challenging process in feature-based image registration. In this paper,a robust feature point matching algorithm called Recovery and Filtering Vertex Trichotomy Matching (RFVTM) is proposed to remove outliers and retain sufficient inliers for remote sensing images. A novel affine invariant descriptor called vertex trichotomy descriptor is proposed on the basis of that geometrical relations between any of vertices and lines are preserved after affine transformations, which is constructed by mapping each vertex into trichotomy sets. The outlier removals in Vertex Trichotomy Matching (VTM) are implemented by iteratively comparing the disparity of corresponding vertex trichotomy descriptors. Some inliers mistakenly validated by a large amount of outliers are removed in VTM iterations, and several residual outliers close to correct locations cannot be excluded with the same graph structures. Therefore, a recovery and filtering strategy is designed to recover some inliers based on identical vertex trichotomy descriptors and restricted transformation errors. Assisted with the additional recovered inliers, residual outliers can also be filtered out during the process of reaching identical graph for the expanded vertex sets. Experimental results demonstrate the superior performance on precision and stability of this algorithm under various conditions, such as remote sensing images with large transformations, duplicated patterns, or inconsistent spectral content.

preprint2022arXiv

RGB-X Classification for Electronics Sorting

Effectively disassembling and recovering materials from waste electrical and electronic equipment (WEEE) is a critical step in moving global supply chains from carbon-intensive, mined materials to recycled and renewable ones. Conventional recycling processes rely on shredding and sorting waste streams, but for WEEE, which is comprised of numerous dissimilar materials, we explore targeted disassembly of numerous objects for improved material recovery. Many WEEE objects share many key features and therefore can look quite similar, but their material composition and internal component layout can vary, and thus it is critical to have an accurate classifier for subsequent disassembly steps for accurate material separation and recovery. This work introduces RGB-X, a multi-modal image classification approach, that utilizes key features from external RGB images with those generated from X-ray images to accurately classify electronic objects. More specifically, this work develops Iterative Class Activation Mapping (iCAM), a novel network architecture that explicitly focuses on the finer-details in the multi-modal feature maps that are needed for accurate electronic object classification. In order to train a classifier, electronic objects lack large and well annotated X-ray datasets due to expense and need of expert guidance. To overcome this issue, we present a novel way of creating a synthetic dataset using domain randomization applied to the X-ray domain. The combined RGB-X approach gives us an accuracy of 98.6% on 10 generations of modern smartphones, which is greater than their individual accuracies of 89.1% (RGB) and 97.9% (X-ray) independently. We provide experimental results3 to corroborate our results.

preprint2021arXiv

On the Evaluation of Vision-and-Language Navigation Instructions

Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions. However, existing instruction generators have not been comprehensively evaluated, and the automatic evaluation metrics used to develop them have not been validated. Using human wayfinders, we show that these generators perform on par with or only slightly better than a template-based generator and far worse than human instructors. Furthermore, we discover that BLEU, ROUGE, METEOR and CIDEr are ineffective for evaluating grounded navigation instructions. To improve instruction evaluation, we propose an instruction-trajectory compatibility model that operates without reference instructions. Our model shows the highest correlation with human wayfinding outcomes when scoring individual instructions. For ranking instruction generation systems, if reference instructions are available we recommend using SPICE.

preprint2020arXiv

A comprehensive review on convolutional neural network in machine fault diagnosis

With the rapid development of manufacturing industry, machine fault diagnosis has become increasingly significant to ensure safe equipment operation and production. Consequently, multifarious approaches have been explored and developed in the past years, of which intelligent algorithms develop particularly rapidly. Convolutional neural network, as a typical representative of intelligent diagnostic models, has been extensively studied and applied in recent five years, and a large amount of literature has been published in academic journals and conference proceedings. However, there has not been a systematic review to cover these studies and make a prospect for the further research. To fill in this gap, this work attempts to review and summarize the development of the Convolutional Network based Fault Diagnosis (CNFD) approaches comprehensively. Generally, a typical CNFD framework is composed of the following steps, namely, data collection, model construction, and feature learning and decision making, thus this paper is organized by following this stream. Firstly, data collection process is described, in which several popular datasets are introduced. Then, the fundamental theory from the basic convolutional neural network to its variants is elaborated. After that, the applications of CNFD are reviewed in terms of three mainstream directions, i.e. classification, prediction and transfer diagnosis. Finally, conclusions and prospects are presented to point out the characteristics of current development, facing challenges and future trends. Last but not least, it is expected that this work would provide convenience and inspire further exploration for researchers in this field.

preprint2020arXiv

Compact optical polarization-insensitive zoom metalens-doublet

Metasurface-based lenses (metalenses) offer specific conceptual advantages compared to ordinary refractive lenses. For example, it is possible to tune the focal length of a metalens doublet by varying the relative angle between the two metalenses while fixing their distance, leading to an extremely compact zoom lens. An improved polarization-insensitive design based on silicon-nanocylinders on silica substrates is presented. This design is realized and characterized experimentally at 1550 nm wavelength. By varying the relative angle between the metalenses in steps of 10 degrees, tuning of the doublet focal length is demonstrated from -54 mm to -+3 mm to +54 mm. This results in a zoom factor of an imaging system varying between 1 and 18. For positive focal lengths, the doublet focusing efficiency has a minimum of 34% and a maximum of 83%. Experiment and theory are in very good agreement.

preprint2020arXiv

Floquet engineering the Hofstadter butterfly in the square lattice and its effective Hamiltonian

In this paper, we use Floquet theory to theoretically study the effect of monochromatic circularly and linearly polarized light on the Hofstadter butterfly in the square lattice, which is induced by uniform perpendicular magnetic field. In the absence of laser, the butterfly has a fractal, self-similar structure particle-hole symmetry and reflection symmetry about magnetic flux $ϕ= 1/2$. These symmetries are preserved by the sub-lattice and the time-reversal symmetry, respectively. As the system is exposed to circularly polarized light, the original Hofsatdter butterfly in equilibrium is deformed by breaking both the particle-hole symmetry and the mirror symmetry, while the inversion symmetry about energy $E=0$ and magnetic flux $ϕ=1/2$ is preserved. Our study show that, the circularly polarized light break both the sub-lattice symmetry and the time-reversal symmetry. The inversion symmetry is preserved because the Hamiltonian at magnetic flux $ϕ$ and $1-ϕ$ is connected through the sub-lattice transformation. Focusing on the small flux region, we study the Landau level and the influence of circularly polarized light on the Landau level. On the contrary, the linearly polarized light deforms the original Hofstadter butterfly by breaking the rotational symmetry while preserving sub-lattice and the time-reversal symmetry. Further, we study the influence of the periodic drive on the Chern number of the lowest band in middle Floquet copy within the off-resonance regime. We found strong circularly polarized light will change the Chern number. For linearly polarized light, the Chern number will not change and the values stay independent of laser polarization direction. Our work highlights the generic features expected for the periodically driven Hofstadter problem on square lattice and provide the strategy to engineering the Hofstadter butterfly with laser.

preprint2020arXiv

Semi-Federated Learning

Federated learning (FL) enables massive distributed Information and Communication Technology (ICT) devices to learn a global consensus model without any participants revealing their own data to the central server. However, the practicality, communication expense and non-independent and identical distribution (Non-IID) data challenges in FL still need to be concerned. In this work, we propose the Semi-Federated Learning (Semi-FL) which differs from the FL in two aspects, local clients clustering and in-cluster training. A sequential training manner is designed for our in-cluster training in this paper which enables the neighboring clients to share their learning models. The proposed Semi-FL can be easily applied to future mobile communication networks and require less up-link transmission bandwidth. Numerical experiments validate the feasibility, learning performance and the robustness to Non-IID data of the proposed Semi-FL. The Semi-FL extends the existing potentials of FL.

preprint2020arXiv

Wireless Communication Based on Microwave Photon-Level Detection With Superconducting Devices: Achievable Rate Prediction

Future wireless communication system embraces physical-layer signal detection with high sensitivity, especially in the microwave photon level. Currently, the receiver primarily adopts the signal detection based on semi-conductor devices for signal detection, while this paper introduces high-sensitivity photon-level microwave detection based on superconducting structure. We first overview existing works on the photon-level communication in the optical spectrum as well as the microwave photon-level sensing based on superconducting structure in both theoretical and experimental perspectives, including microwave detection circuit model based on Josephson junction, microwave photon counter based on Josephson junction, and two reconstruction approaches under background noise. In addition, we characterize channel modeling based on two different microwave photon detection approaches, including the absorption barrier and the dual-path Handury Brown-Twiss (HBT) experiments, and predict the corresponding achievable rates. According to the performance prediction, it is seen that the microwave photon-level signal detection can increase the receiver sensitivity compared with the state-of-the-art standardized communication system with waveform signal reception, with gain over $10$dB.