Researcher profile

Peng Hu

Peng Hu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

Attention Transfer Is Not Universally Effective for Vision Transformers

A recent work shows that Attention Transfer, which transfers only the attention patterns from a pre-trained teacher Vision Transformer (ViT) to a randomly initialized standard student ViT, is sufficient to recover the full benefit of the teacher's pre-trained weights. We revisit this finding on a comprehensive benchmark of 20 teachers from 11 well-known ViT families and reveal that Attention Transfer is not universally effective. While 7 families transfer successfully, 4 consistently fail, falling up to 5.1\% below the from-scratch no-transfer baseline. Further results demonstrate that this failure is family-consistent across model sizes, and persists under extended training durations, different transfer datasets, and out-of-distribution evaluations. Controlled analyses then consistently localize the problem to the attention-routing channel, indicating that the key issue is not whether the student can match the teacher's attention patterns, but whether the matched patterns remain functional for the student. Crucially, we identify architectural mismatch between the pre-trained teacher and the standard student as the primary mechanism. By adding only the teacher's native architectural components to the student in a randomly initialized state, we completely reverse the failure for all 4 families. Notably, these components alone do not improve from-scratch training, confirming that they specifically unlock the usability of the teacher's attention. We further systematically show that this failure is not explained by the inadequate choice of transfer loss or by differences in pre-training recipes. Our findings refine the prevailing understanding of attention in ViT representations: attention is sufficient \textit{only} when the student architecture matches the teacher.

preprint2024arXiv

Cross-modal Active Complementary Learning with Self-refining Correspondence

Recently, image-text matching has attracted more and more attention from academia and industry, which is fundamental to understanding the latent correspondence across visual and textual modalities. However, most existing methods implicitly assume the training pairs are well-aligned while ignoring the ubiquitous annotation noise, a.k.a noisy correspondence (NC), thereby inevitably leading to a performance drop. Although some methods attempt to address such noise, they still face two challenging problems: excessive memorizing/overfitting and unreliable correction for NC, especially under high noise. To address the two problems, we propose a generalized Cross-modal Robust Complementary Learning framework (CRCL), which benefits from a novel Active Complementary Loss (ACL) and an efficient Self-refining Correspondence Correction (SCC) to improve the robustness of existing methods. Specifically, ACL exploits active and complementary learning losses to reduce the risk of providing erroneous supervision, leading to theoretically and experimentally demonstrated robustness against NC. SCC utilizes multiple self-refining processes with momentum correction to enlarge the receptive field for correcting correspondences, thereby alleviating error accumulation and achieving accurate and stable corrections. We carry out extensive experiments on three image-text benchmarks, i.e., Flickr30K, MS-COCO, and CC152K, to verify the superior robustness of our CRCL against synthetic and real-world noisy correspondences.

preprint2023arXiv

Laser Inter-Satellite Link Setup Delay: Quantification, Impact, and Tolerable Value

Dynamic laser inter-satellite links (LISLs) provide the flexibility of connecting a pair of satellites as required (dynamically) while static LISLs need to be active continuously between the energy-constrained satellites. However, due to the LISL establishment time (termed herein as LISL setup delay) being in the order of seconds, realizing dynamic LISLs is currently unfeasible. Towards the realization of dynamic LISLs, we first study the quantification of LISL setup delay; then we calculate the end-to-end latency of a free-space optical satellite network (FSOSN) with the LISL setup delay; subsequently, we analyze the impact of LISL setup delay on the end-to-end latency of the FSOSN. We also provide design guidelines for the laser communication terminal manufacturers in the form of maximum tolerable value of LISL setup delay for which the FSOSN based on Starlink's Phase I satellite constellation will be meaningful to use for low-latency long-distance inter-continental data communications.

preprint2022arXiv

5G Enabled Fault Detection and Diagnostics: How Do We Achieve Efficiency?

The 5th-generation wireless networks (5G) technologies and mobile edge computing (MEC) provide great promises of enabling new capabilities for the industrial Internet of Things. However, the solutions enabled by the 5G ultra-reliable low-latency communication (URLLC) paradigm come with challenges, where URLLC alone does not necessarily guarantee the efficient execution of time-critical fault detection and diagnostics (FDD) applications. Based on the Tennessee Eastman Process model, we propose the concept of the communication-edge-computing (CEC) loop and a system model for evaluating the efficiency of FDD applications. We then formulate an optimization problem for achieving the defined CEC efficiency and discuss some typical solutions to the generic CEC-based FDD services, and propose a new uplink-based communication protocol called "ReFlexUp". From the performance analysis and numerical results, the proposed ReFlexUp protocol shows its effectiveness compared to the typical protocols such as Selective Repeat ARQ, HARQ, and "Occupy CoW" in terms of the key metrics such as latency, reliability, and efficiency. These results are further convinced from the mmWave-based simulations in a typical 5G MEC-based implementation.

preprint2022arXiv

Closing the Management Gap for Satellite-Integrated Community Networks: A Hierarchical Approach to Self-Maintenance

Community networks (CNs) have become an important paradigm for providing essential Internet connectivity in unserved and underserved areas across the world. However, an indispensable part for CNs is network management, where responsive and autonomous maintenance is much needed. With the technological advancement in telecommunications networks, a classical satellite-dependent CN is envisioned to be transformed into a satellite-integrated CN (SICN), which will embrace significant autonomy, intelligence, and scalability in network management. This article discusses the machine-learning (ML) based hierarchical approach to enabling autonomous self-maintenance for SICNs. The approach is split into the anomaly identification and anomaly mitigation phases, where the related ML methods, data collection means, deployment options, and mitigation schemes are presented. With the case study, we discuss a typical scenario using satellite and fixed connections as backhaul options and show the effectiveness and performance improvements of the proposed approach with recurrent neural network and ensemble methods

preprint2022arXiv

Fluid-particle interaction regimes during the evolution of turbidity currents from a coupled LES/DEM model

In this paper, fluid-particle interactions in lock-exchange turbidity currents (TCs) over a flat bed are investigated using a model combining LES and DEM (Large-Eddy Simulation and Discrete Element Method). The reliability of this model is demonstrated via comparing the numerical solutions with measurements of the front positions, fluid velocity profile, and particle concentration profile of lock-exchange TCs. The following physical understandings are obtained. The vorticity field plays an important role for the current evolution by affecting the fluid lift force (i.e., in the direction normal to the fluid-particle slip velocity) acting on the particles. At the very beginning, a longitudinal positive lift force due to strong positive vorticity promotes longitudinal particle transport. Afterwards, the longitudinal lift force decreases and eventually becomes negative, with a magnitude that even exceeds that of the positive longitudinal drag force, because more and more of the settling particles are affected by the negative vorticity near the bottom wall caused by surface friction. Interestingly, in spite of the complex behavior of the fluid-particle interaction forces and their role in TC evolution, only a very small fraction of the initial particle gravitational potential energy is actually transformed into TC kinetic energy (both particle and fluid).

preprint2022arXiv

KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

An automatic pitch correction system typically includes several stages, such as pitch extraction, deviation estimation, pitch shift processing, and cross-fade smoothing. However, designing these components with strategies often requires domain expertise and they are likely to fail on corner cases. In this paper, we present KaraTuner, an end-to-end neural architecture that predicts pitch curve and resynthesizes the singing voice directly from the tuned pitch and vocal spectrum extracted from the original recordings. Several vital technical points have been introduced in KaraTuner to ensure pitch accuracy, pitch naturalness, timbre consistency, and sound quality. A feed-forward Transformer is employed in the pitch predictor to capture longterm dependencies in the vocal spectrum and musical note. We also develop a pitch-controllable vocoder based on a novel source-filter block and the Fre-GAN architecture. KaraTuner obtains a higher preference than the rule-based pitch correction approach through A/B tests, and perceptual experiments show that the proposed vocoder achieves significant advantages in timbre consistency and sound quality compared with the parametric WORLD vocoder, phase vocoder and CLPC vocoder.

preprint2022arXiv

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the key is to find suitable compression allocation (e.g., pruning sparsity and quantization codebook) of each layer. Existing solutions obtain the compression allocation in an iterative/manual fashion while finetuning the compressed model, thus suffering from the efficiency issue. Different from the prior art, we propose a novel One-shot Pruning-Quantization (OPQ) in this paper, which analytically solves the compression allocation with pre-trained weight parameters only. During finetuning, the compression module is fixed and only weight parameters are updated. To our knowledge, OPQ is the first work that reveals pre-trained model is sufficient for solving pruning and quantization simultaneously, without any complex iterative/manual optimization at the finetuning stage. Furthermore, we propose a unified channel-wise quantization method that enforces all channels of each layer to share a common codebook, which leads to low bit-rate allocation without introducing extra overhead brought by traditional channel-wise quantization. Comprehensive experiments on ImageNet with AlexNet/MobileNet-V1/ResNet-50 show that our method improves accuracy and training efficiency while obtains significantly higher compression rates compared to the state-of-the-art.

preprint2022arXiv

Routing heterogeneous traffic in delay tolerant satellite networks

Delay Tolerant Networking (DTN) has been proposed as a new architecture to provide efficient store-carry-and-forward data transport in satellite networks. Since these networks relay on scheduled contact plans, the Contact Graph Routing (CGR) algorithm can be used to optimize routing and data delivery performance. However, in spite of the various improvements that have been made to CGR, there have been no significant proposals to prioritize traffic with different quality of service requirements. In this work we propose adaptations to CGR that allow performance improvements when sending traffic with different latency constraints, and develop a linear programming optimization model that works as a performance upper bound. The simulation results of the proposed schemes are promising and open the debate on other ways to improve performance while meeting the particular needs of heterogeneous traffic.

preprint2022arXiv

Transcranial photoacoustic computed tomography of human brain function

Herein we report the first in-human transcranial imaging of brain function using photoacoustic computed tomography. Functional responses to benchmark motor tasks were imaged on both the skull-less and the skull-intact hemispheres of a hemicraniectomy patient. The observed brain responses in these preliminary results demonstrate the potential of photoacoustic computed tomography for achieving transcranial functional imaging.

preprint2021arXiv

IoT-based Contact Tracing Systems for Infectious Diseases: Architecture and Analysis

The recent COVID-19 pandemic has become a major threat to human health and well-being. Non-pharmaceutical interventions such as contact tracing solutions are important to contain the spreads of COVID-19-like infectious diseases. However, current contact tracing solutions are fragmented with limited use of sensing technologies and centered on monitoring the interactions between individuals without an analytical framework for evaluating effectiveness. Therefore, we need to first explore generic architecture for contact tracing in the context of today's Internet of Things (IoT) technologies based on a broad range of applicable sensors. A new architecture for IoT based solutions to contact tracing is proposed and its overall effectiveness for disease containment is analyzed based on the traditional epidemiological models with the simulation results. The proposed work aims to provide a framework for assisting future designs and evaluation of IoT-based contact tracing solutions and to enable data-driven collective efforts on combating current and future infectious diseases.

preprint2021arXiv

Unsupervised Neural Rendering for Image Hazing

Image hazing aims to render a hazy image from a given clean one, which could be applied to a variety of practical applications such as gaming, filming, photographic filtering, and image dehazing. To generate plausible haze, we study two less-touched but challenging problems in hazy image rendering, namely, i) how to estimate the transmission map from a single image without auxiliary information, and ii) how to adaptively learn the airlight from exemplars, i.e., unpaired real hazy images. To this end, we propose a neural rendering method for image hazing, dubbed as HazeGEN. To be specific, HazeGEN is a knowledge-driven neural network which estimates the transmission map by leveraging a new prior, i.e., there exists the structure similarity (e.g., contour and luminance) between the transmission map and the input clean image. To adaptively learn the airlight, we build a neural module based on another new prior, i.e., the rendered hazy image and the exemplar are similar in the airlight distribution. To the best of our knowledge, this could be the first attempt to deeply rendering hazy images in an unsupervised fashion. Comparing with existing haze generation methods, HazeGEN renders the hazy images in an unsupervised, learnable, and controllable manner, thus avoiding the labor-intensive efforts in paired data collection and the domain-shift issue in haze generation. Extensive experiments show the promising performance of our method comparing with some baselines in both qualitative and quantitative comparisons. The code will be released on GitHub after acceptance.

preprint2020arXiv

A 5G NR based System Architecture for Real-Time Control with Batteryless RFID Sensors

The fifth-generation wireless networking (5G) technologies have been developed to meet various time-critical use cases with ultra-reliable, low-latency and massive machine-type communications which are indispensable for tactile Internet applications. Recent advancements in very low-cost and batteryless radio-frequency identification (RFID) sensors have given promises of deploying a massive amount of such sensors for real-time sensing and control applications on a 5G New Radio (NR) network. However, the system design and performance of such applications have not been well studied. This paper proposes a novel system architecture for the representative batteryless RFID touch sensors in generic real-time control applications in a 5G NR mmWave environment. We will discuss the solution using edge computing nodes on the 5G NR base station to the implementation of the proposed system architecture. The real-time performance evaluation with the comparison of the Long-Term Evolution (LTE) networks has shown the effectiveness of the proposed system architecture.

preprint2020arXiv

A new method for constructing EAQEC MDS codes

Entanglement-assisted quantum error-correcting (EAQEC) codes make use of preexisting entanglement between the sender and receiver to boost the rate of transmission. It is possible to construct an EAQEC code from any classical linear code, unlike standard quantum error-correcting codes, which can only be constructed from dual-containing codes. However, the number $c$ of pre-shared maximally entangled states is usually calculated by computer search. In this paper, we first give a new formula for calculating the number $c$ of pre-shared maximally entangled states. Then, using this formula, we construct three classes of new entanglement-assisted quantum error-correcting maximum-distance-separable ( EAQEC MDS) codes.

preprint2020arXiv

Contrastive Clustering

In this paper, we propose a one-stage online clustering method called Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning. To be specific, for a given dataset, the positive and negative instance pairs are constructed through data augmentations and then projected into a feature space. Therein, the instance- and cluster-level contrastive learning are respectively conducted in the row and column space by maximizing the similarities of positive pairs while minimizing those of negative ones. Our key observation is that the rows of the feature matrix could be regarded as soft labels of instances, and accordingly the columns could be further regarded as cluster representations. By simultaneously optimizing the instance- and cluster-level contrastive loss, the model jointly learns representations and cluster assignments in an end-to-end manner. Extensive experimental results show that CC remarkably outperforms 17 competitive clustering methods on six challenging image benchmarks. In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19\% (39\%) performance improvement compared with the best baseline.

preprint2019arXiv

Champion Team Paper: Dynamic Passing-Shooting Algorithm Based on CUDA of The RoboCup SSL 2019 Champion

ZJUNlict became the Small Size League Champion of RoboCup 2019 with 6 victories and 1 tie for their 7 games. The overwhelming ability of ball-handling and passing allows ZJUNlict to greatly threaten its opponent and almost kept its goal clear without being threatened. This paper presents the core technology of its ball-handling and robot movement which consist of hardware optimization, dynamic passing and shooting strategy, and multi-agent cooperation and formation. We first describe the mechanical optimization on the placement of the capacitors, the redesign of the damping system of the dribbler and the electrical optimization on the replacement of the core chip. We then describe our passing point algorithm. The passing and shooting strategy can be separated into two different parts, where we search the passing point on SBIP-DPPS and evaluate the point based on the ball model. The statements and the conclusion should be supported by the performances and log of games on Small Size League RoboCup 2019.