Researcher profile

Hongyu Li

Hongyu Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding

Dynamic spatial reasoning from monocular video is essential for bridging visual intelligence and the physical world, yet remains challenging for vision-language models (VLMs). Prior approaches either verbalize spatial-temporal reasoning entirely as text, which is inherently verbose and imprecise for complex dynamics, or rely on external geometric modules that increase inference complexity without fostering intrinsic model capability. In this paper, we present 4DThinker, the first framework that enables VLMs to "think with 4D" through dynamic latent mental imagery, i.e., internally simulating how scenes evolve within the continuous hidden space. Specifically, we first introduce a scalable, annotation-free data generation pipeline that synthesizes 4D reasoning data from raw videos. We then propose Dynamic-Imagery Fine-Tuning (DIFT), which jointly supervises textual tokens and 4D latents to ground the model in dynamic visual semantics. Building on this, 4D Reinforcement Learning (4DRL) further tackles complex reasoning tasks via outcome-based rewards, restricting policy gradients to text tokens to ensure stable optimization. Extensive experiments across multiple dynamic spatial reasoning benchmarks demonstrate that 4DThinker consistently outperforms strong baselines and offers a new perspective toward 4D reasoning in VLMs. Our code is available at https://github.com/zhangquanchen/4DThinker.

preprint2026arXiv

Antenna Coding Optimization for Pixel Antenna Empowered MIMO Wireless Power Transfer

We investigate antenna coding utilizing pixel antennas as a new degree of freedom for enhancing multiple-input multiple-output (MIMO) wireless power transfer (WPT) systems. The objective is to enhance the output direct current (DC) power under RF combining and DC combining schemes by jointly exploiting gains from antenna coding, beamforming, and rectenna nonlinearity. We first propose the MIMO WPT system model with binary and continuous antenna coding using the beamspace channel model and formulate the joint antenna coding and beamforming optimization using a nonlinear rectenna model. We propose two efficient closed-form successive convex approximation algorithms to efficiently optimize the beamforming. To further reduce the computational complexity, we propose codebook-based antenna coding designs for output DC power maximization based on K-means clustering. Results show that the proposed pixel antenna empowered MIMO WPT system with binary antenna coding increases output DC power by more than 15 dB compared with conventional systems with fixed antenna configuration. With continuous antenna coding, the performance improves another 6 dB. Moreover, the proposed codebook design outperforms previous designs by up to 40% and shows good performance with reduced computational complexity. Overall, the significant improvement in output DC power verifies the potential of leveraging antenna coding utilizing pixel antennas to enhance WPT systems.

preprint2024arXiv

Synergizing Beyond Diagonal Reconfigurable Intelligent Surface and Rate-Splitting Multiple Access

This work focuses on the synergy of rate-splitting multiple access (RSMA) and beyond diagonal reconfigurable intelligent surface (BD-RIS) to enlarge the coverage, improve the performance, and save on antennas. Specifically, we employ a multi-sector BD-RIS modeled as a prism, which can achieve highly directional full-space coverage, in a multiuser multiple input single output communication system. With the multi-sector BD-RIS aided RSMA model, we jointly design the transmit precoder and BD-RIS matrix under the imperfect channel state information (CSI) conditions. The robust design is performed by solving a stochastic average sum-rate maximization problem. With sample average approximation and weighted minimum mean square error-rate relationship, the stochastic problem is transformed into a deterministic one with multiple blocks, each of which is iteratively designed. Simulation results show that multi-sector BD-RIS aided RSMA outperforms space division multiple access schemes. More importantly, synergizing multi-sector BD-RIS with RSMA is an efficient strategy to reduce the number of active antennas at the transmitter and the number of passive antennas in BD-RIS.

preprint2022arXiv

Beyond Diagonal Reconfigurable Intelligent Surfaces: From Transmitting and Reflecting Modes to Single-, Group-, and Fully-Connected Architectures

Reconfigurable intelligent surfaces (RISs) are envisioned as a promising technology for future wireless communications. With various hardware realizations, RISs can work under different modes (reflective/transmissive/hybrid) or have different architectures (single/group/fully-connected). However, most existing research focused on single-connected reflective RISs, mathematically characterized by diagonal phase shift matrices, while there is a lack of a comprehensive study for RISs unifying different modes/architectures. In this paper, we solve this issue by analyzing and proposing a general RIS-aided communication model. Specifically, we establish an RIS model not limited to diagonal phase shift matrices, a novel branch referred to as beyond diagonal RIS (BD-RIS), unifying modes and architectures. With the proposed model, we develop efficient algorithms to jointly design transmit precoder and BDRIS matrix to maximize the sum-rate for RIS-aided systems. We also provide simulation results to compare the performance of BD-RISs with different modes/architectures. Simulation results show that under the same mode, fully- and group-connected RIS can effectively increase the sum-rate performance compared with single-connected RIS, and that hybrid RIS outperforms reflective/transmissive RIS with the same architecture.

preprint2022arXiv

Build Smart Grids on Artificial Intelligence -- A Real-world Example

Power grid data are going big with the deployment of various sensors. The big data in power grids creates huge opportunities for applying artificial intelligence technologies to improve resilience and reliability. This paper introduces multiple real-world applications based on artificial intelligence to improve power grid situational awareness and resilience. These applications include event identification, inertia estimation, event location and magnitude estimation, data authentication, control, and stability assessment. These applications are operating on a real-world system called FNET-GridEye, which is a wide-area measurement network and arguably the world-largest cyber-physical system that collects power grid big data. These applications showed much better performance compared with conventional approaches and accomplished new tasks that are impossible to realized using conventional technologies. These encouraging results demonstrate that combining power grid big data and artificial intelligence can uncover and capture the non-linear correlation between power grid data and its stabilities indices and will potentially enable many advanced applications that can significantly improve power grid resilience.

preprint2022arXiv

Complementary Feature Enhanced Network with Vision Transformer for Image Dehazing

Conventional CNNs-based dehazing models suffer from two essential issues: the dehazing framework (limited in interpretability) and the convolution layers (content-independent and ineffective to learn long-range dependency information). In this paper, firstly, we propose a new complementary feature enhanced framework, in which the complementary features are learned by several complementary subtasks and then together serve to boost the performance of the primary task. One of the prominent advantages of the new framework is that the purposively chosen complementary tasks can focus on learning weakly dependent complementary features, avoiding repetitive and ineffective learning of the networks. We design a new dehazing network based on such a framework. Specifically, we select the intrinsic image decomposition as the complementary tasks, where the reflectance and shading prediction subtasks are used to extract the color-wise and texture-wise complementary features. To effectively aggregate these complementary features, we propose a complementary features selection module (CFSM) to select the more useful features for image dehazing. Furthermore, we introduce a new version of vision transformer block, named Hybrid Local-Global Vision Transformer (HyLoG-ViT), and incorporate it within our dehazing networks. The HyLoG-ViT block consists of the local and the global vision transformer paths used to capture local and global dependencies. As a result, the HyLoG-ViT introduces locality in the networks and captures the global and long-range dependencies. Extensive experiments on homogeneous, non-homogeneous, and nighttime dehazing tasks reveal that the proposed dehazing network can achieve comparable or even better performance than CNNs-based dehazing models.

preprint2022arXiv

Decision Making in Monopoly using a Hybrid Deep Reinforcement Learning Approach

Learning to adapt and make real-time informed decisions in a dynamic and complex environment is a challenging problem. Monopoly is a popular strategic board game that requires players to make multiple decisions during the game. Decision-making in Monopoly involves many real-world elements such as strategizing, luck, and modeling of opponent's policies. In this paper, we present novel representations for the state and action space for the full version of Monopoly and define an improved reward function. Using these, we show that our deep reinforcement learning agent can learn winning strategies for Monopoly against different fixed-policy agents. In Monopoly, players can take multiple actions even if it is not their turn to roll the dice. Some of these actions occur more frequently than others, resulting in a skewed distribution that adversely affects the performance of the learning agent. To tackle the non-uniform distribution of actions, we propose a hybrid approach that combines deep reinforcement learning (for frequent but complex decisions) with a fixed policy approach (for infrequent but straightforward decisions). Experimental results show that our hybrid agent outperforms a standard deep reinforcement learning agent by 30% in the number of games won against fixed-policy agents.

preprint2022arXiv

Deep Learning based Intelligent Coin-tap Test for Defect Recognition

The coin-tap test is a convenient and primary method for non-destructive testing, while its manual on-site operation is tough and costly. With the help of the latest intelligent signal processing method, convolutional neural networks (CNN), we achieve an intelligent coin-tap test which exhibited superior performance in recognizing the defects. However, this success of CNNs relies on plenty of well-labeled data from the identical scenario, which could be difficult to get for many real industrial practices. This paper further develops transfer learning strategies for this issue, that is, to transfer the model trained on data of one scenario to another. In experiments, the result presents a notable improvement by using domain adaptation and pseudo label learning strategies. Hence, it becomes possible to apply the model into scenarios with none or little (less than 10\%) labeled data adopting the transfer learning strategies proposed herein. In addition, we used a benchmark dataset constructed ourselves throughout this study. This benchmark dataset for the coin-tap test containing around 100,000 sound signals is published at https://github.com/PPhub-hy/torch-tapnet.

preprint2022arXiv

Deep Learning Eliminates Massive Dust Storms from Images of Tianwen-1

Dust storms may remarkably degrade the imaging quality of Martian orbiters and delay the progress of mapping the global topography and geomorphology. To address this issue, this paper presents an approach that reuses the image dehazing knowledge obtained on Earth to resolve the dust-removal problem on Mars. In this approach, we collect remote-sensing images captured by Tianwen-1 and manually select hundreds of clean and dusty images. Inspired by the haze formation process on Earth, we formulate a similar visual degradation process on clean images and synthesize dusty images sharing a similar feature distribution with realistic dusty images. These realistic clean and synthetic dusty image pairs are used to train a deep model that inherently encodes dust irrelevant features and decodes them into dust-free images. Qualitative and quantitative results show that dust storms can be effectively eliminated by the proposed approach, leading to obviously improved topographical and geomorphological details of Mars.

preprint2022arXiv

Deep Reinforcement Learning based Robot Navigation in Dynamic Environments using Occupancy Values of Motion Primitives

This paper presents a Deep Reinforcement Learning based navigation approach in which we define the occupancy observations as heuristic evaluations of motion primitives, rather than using raw sensor data. Our method enables fast mapping of the occupancy data, generated by multi-sensor fusion, into trajectory values in 3D workspace. The computationally efficient trajectory evaluation allows dense sampling of the action space. We utilize our occupancy observations in different data structures to analyze their effects on both training process and navigation performance. We train and test our methodology on two different robots within challenging physics-based simulation environments including static and dynamic obstacles. We benchmark our occupancy representations with other conventional data structures from state-of-the-art methods. The trained navigation policies are also validated successfully with physical robots in dynamic environments. The results show that our method not only decreases the required training time but also improves the navigation performance as compared to other occupancy representations. The open-source implementation of our work and all related info are available at \url{https://github.com/RIVeR-Lab/tentabot}.

preprint2022arXiv

Rate-Splitting Multiple Access for 6G -- Part III: Interplay with Reconfigurable Intelligent Surfaces

This letter is the third part of a three-part tutorial that focuses on rate-splitting multiple access (RSMA) for 6G. As Part III of the tutorial, this letter provides an overview of integrating RSMA and reconfigurable intelligent surface (RIS). We first introduce two potential PHY layer techniques, namely, RSMA and RIS, including the need for integrating RSMA with RIS and how they could help each other. Next, we provide a general model of an RIS-aided RSMA system and summarize some key performance metrics. Then, we discuss the major advantages of RIS-aided RSMA networks, and illustrate the rate region of RIS-aided RSMA for both perfect and imperfect channel conditions. Finally, we summarize the research challenges and open problems for RIS-aided RSMA systems. In conclusion, RSMA is a promising technology for next generation multiple access (NGMA) and future networks such as 6G and beyond.

preprint2022arXiv

Statistical Inference of Cell-type Proportions Estimated from Bulk Expression Data

There is a growing interest in cell-type-specific analysis from bulk samples with a mixture of different cell types. A critical first step in such analyses is the accurate estimation of cell-type proportions in a bulk sample. Although many methods have been proposed recently, quantifying the uncertainties associated with the estimated cell-type proportions has not been well studied. Lack of consideration of these uncertainties can lead to missed or false findings in downstream analyses. In this article, we introduce a flexible statistical deconvolution framework that allows a general and subject-specific covariance of bulk gene expressions. Under this framework, we propose a decorrelated constrained least squares method called DECALS that estimates cell-type proportions as well as the sampling distribution of the estimates. Simulation studies demonstrate that DECALS can accurately quantify the uncertainties in the estimated proportions whereas other methods fail. Applying DECALS to analyze bulk gene expression data of post mortem brain samples from the ROSMAP and GTEx projects, we show that taking into account the uncertainties in the estimated cell-type proportions can lead to more accurate identifications of cell-type-specific differentially expressed genes and transcripts between different subject groups, such as between Alzheimer's disease patients and controls and between males and females.

preprint2021arXiv

Channel Estimation for Practical IRS-Assisted OFDM Systems

Intelligent reflecting surface (IRS), composed of a large number of hardware-efficient passive elements, is deemed as a potential technique for future wireless communications since it can adaptively enhance the propagation environment. In order to effectively utilize IRS to achieve promising beamforming gains, the problem of channel state information (CSI) acquisition needs to be carefully considered. However, most recent works assume to employ an ideal IRS, i.e., each reflecting element has constant amplitude, variable phase shifts, as well as the same response for the signals with different frequencies, which will cause severe estimation error due to the mismatch between the ideal IRS and the practical one. In this paper, we study channel estimation in practical IRS-aided orthogonal frequency division multiplexing (OFDM) systems with discrete phase shifts. Different from the prior works which assume that IRS has an ideal reflection model, we perform channel estimation by considering amplitude-phase shift-frequency relationship for the response of practical IRS. Aiming at minimizing normalized-mean-square-error (NMSE) of the estimated channel, a novel IRS time-varying reflection pattern is designed by leveraging the alternating optimization (AO) algorithm for the case of using low-resolution phase shifters. Moreover, for the high-resolution IRS cases, we provide another practical reflection pattern scheme to further reduce the complexity. Simulation results demonstrate the necessity of considering practical IRS model for channel estimation and the effectiveness of our proposed channel estimation methods.

preprint2021arXiv

Intelligent reflecting surface enhanced wideband MIMO-OFDM communications: From practical model to reflection optimization

Intelligent reflecting surface (IRS) is envisioned as a revolutionary technology for future wireless communication systems since it can intelligently change radio environment and integrate it into wireless communication optimization. However, most existing works adopted an ideal IRS reflection model, which is impractical and can cause significant performance degradation in realistic wideband systems. To address this issue, we first study the dual phase- and amplitude-squint effect of reflected signals and present a simplified practical IRS reflection model for wideband signals. Then, an IRS enhanced wideband multiuser multi-input single-output orthogonal frequency division multiplexing (MU-MISO-OFDM) system is investigated. We aim to jointly design the transmit beamformer and IRS reflection for the case of using both continuous and discrete phase shifters to maximize the average sum-rate over all subcarriers. By exploiting the relationship between sum-rate maximization and mean square error (MSE) minimization, the original problem is equivalently transformed into a multi-block/variable problem, which can be efficiently solved by the block coordinate descent (BCD) method. Complexity and convergence for both cases are analyzed or illustrated. Simulation results demonstrate that the proposed algorithm can offer significant average sum-rate enhancement compared to that achieved using the ideal IRS reflection model, which confirms the importance of the use of the practical model for the design of wideband systems.

preprint2020arXiv

A Study on Evaluation Standard for Automatic Crack Detection Regard the Random Fractal

A reasonable evaluation standard underlies construction of effective deep learning models. However, we find in experiments that the automatic crack detectors based on deep learning are obviously underestimated by the widely used mean Average Precision (mAP) standard. This paper presents a study on the evaluation standard. It is clarified that the random fractal of crack disables the mAP standard, because the strict box matching in mAP calculation is unreasonable for the fractal feature. As a solution, a fractal-available evaluation standard named CovEval is proposed to correct the underestimation in crack detection. In CovEval, a different matching process based on the idea of covering box matching is adopted for this issue. In detail, Cover Area rate (CAr) is designed as a covering overlap, and a multi-match strategy is employed to release the one-to-one matching restriction in mAP. Extended Recall (XR), Extended Precision (XP) and Extended F-score (Fext) are defined for scoring the crack detectors. In experiments using several common frameworks for object detection, models get much higher scores in crack detection according to CovEval, which matches better with the visual performance. Moreover, based on faster R-CNN framework, we present a case study to optimize a crack detector based on CovEval standard. Recall (XR) of our best model achieves an industrial-level at 95.8, which implies that with reasonable standard for evaluation, the methods for object detection are with great potential for automatic industrial inspection.

preprint2020arXiv

Arbitrary-sized Image Training and Residual Kernel Learning: Towards Image Fraud Identification

Preserving original noise residuals in images are critical to image fraud identification. Since the resizing operation during deep learning will damage the microstructures of image noise residuals, we propose a framework for directly training images of original input scales without resizing. Our arbitrary-sized image training method mainly depends on the pseudo-batch gradient descent (PBGD), which bridges the gap between the input batch and the update batch to assure that model updates can normally run for arbitrary-sized images. In addition, a 3-phase alternate training strategy is designed to learn optimal residual kernels for image fraud identification. With the learnt residual kernels and PBGD, the proposed framework achieved the state-of-the-art results in image fraud identification, especially for images with small tampered regions or unseen images with different tampering distributions.

preprint2020arXiv

Constraining the inner density slope of massive galaxy clusters

We determine the inner density profiles of massive galaxy clusters (M$_{200}$ > $5 \times 10^{14}$ M$_{\odot}$) in the Cluster-EAGLE (C-EAGLE) hydrodynamic simulations, and investigate whether the dark matter density profiles can be correctly estimated from a combination of mock stellar kinematical and gravitational lensing data. From fitting mock stellar kinematics and lensing data generated from the simulations, we find that the inner density slopes of both the total and the dark matter mass distributions can be inferred reasonably well. We compare the density slopes of C-EAGLE clusters with those derived by Newman et al. for 7 massive galaxy clusters in the local Universe. We find that the asymptotic best-fit inner slopes of "generalized" NFW (gNFW) profiles, $γ_{\rm gNFW}$, of the dark matter haloes of the C-EAGLE clusters are significantly steeper than those inferred by Newman et al. However, the mean mass-weighted dark matter density slopes of the simulated clusters are in good agreement with the Newman et al. estimates. We also find that the estimate of $γ_{\rm gNFW}$ is very sensitive to the constraints from weak lensing measurements in the outer parts of the cluster and a bias can lead to an underestimate of $γ_{\rm gNFW}$.

preprint2020arXiv

Knowledge Federation: A Unified and Hierarchical Privacy-Preserving AI Framework

With strict protections and regulations of data privacy and security, conventional machine learning based on centralized datasets is confronted with significant challenges, making artificial intelligence (AI) impractical in many mission-critical and data-sensitive scenarios, such as finance, government, and health. In the meantime, tremendous datasets are scattered in isolated silos in various industries, organizations, different units of an organization, or different branches of an international organization. These valuable data resources are well underused. To advance AI theories and applications, we propose a comprehensive framework (called Knowledge Federation - KF) to address these challenges by enabling AI while preserving data privacy and ownership. Beyond the concepts of federated learning and secure multi-party computation, KF consists of four levels of federation: (1) information level, low-level statistics and computation of data, meeting the requirements of simple queries, searching and simplistic operators; (2) model level, supporting training, learning, and inference; (3) cognition level, enabling abstract feature representation at various levels of abstractions and contexts; (4) knowledge level, fusing knowledge discovery, representation, and reasoning. We further clarify the relationship and differentiation between knowledge federation and other related research areas. We have developed a reference implementation of KF, called iBond Platform, to offer a production-quality KF platform to enable industrial applications in finance, insurance et al. The iBond platform will also help establish the KF community and a comprehensive ecosystem and usher in a novel paradigm shift towards secure, privacy-preserving and responsible AI. As far as we know, knowledge federation is the first hierarchical and unified framework for secure multi-party computing and learning.

preprint2020arXiv

Practical Modeling and Beamforming for Intelligent Reflecting Surface Aided Wideband Systems

Intelligent reflecting surface (IRS) has emerged as a revolutionizing solution to enhance wireless communications by intelligently changing the propagation environment. Prior studies on IRS are based on an ideal reflection model with a constant amplitude and a variable phase shift. However, it is difficult and unrealistic to implement an IRS satisfying such ideal reflection model in practical applications. In this letter, we aim to investigate the phase-amplitude-frequency relationship of the reflected signals and propose a practical model of reflection coefficient for an IRS-aided wideband system. Then, based on this practical model, joint transmit power allocation of each subcarrier and IRS beamforming optimization are investigated for an IRS-aided wideband orthogonal frequency-division multiplexing (OFDM) system. Simulation results illustrate the importance of the practical model on the IRS designs and validate the effectiveness of our proposed model.

preprint2018arXiv

Research on the Security of Blockchain Data: A Survey

With the more and more extensive application of blockchain, blockchain security has been widely concerned by the society and deeply studied by scholars. Moreover, the security of blockchain data directly affects the security of various applications of blockchain. In this survey, we perform a comprehensive classification and summary of the security of blockchain data. First, we present classification of blockchain data attacks. Subsequently, we present the attacks and defenses of blockchain data in terms of privacy, availability, integrity and controllability. Data privacy attacks present data leakage or data obtained by attackers through analysis. Data availability attacks present abnormal or incorrect access to blockchain data. Data integrity attacks present blockchain data being tampered. Data controllability attacks present blockchain data accidentally manipulated by smart contract vulnerability. Finally, we present several important open research directions to identify follow-up studies in this area.