Source author record

Yunhao Liu

Yunhao Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Computer Vision Networking and Internet Architecture Artificial Intelligence Machine Learning Robotics Distributed, Parallel, and Cluster Computing Human-Computer Interaction

Catalog footprint

What is connected

13works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AdaptInfer: Adaptive Token Pruning for Vision-Language Model Inference with Dynamical Text Guidance

Vision-language models (VLMs) have achieved impressive performance on multimodal reasoning tasks such as visual question answering, image captioning and so on, but their inference cost remains a significant challenge due to the large number of vision tokens processed during the prefill stage. Existing pruning methods often rely on directly using the attention patterns or static text prompt guidance, failing to exploit the dynamic internal signals generated during inference. To address these issues, we propose AdaptInfer, a plug-and-play framework for adaptive vision token pruning in VLMs. First, we introduce a fine-grained, dynamic text-guided pruning mechanism that reuses layer-wise text-to-text attention maps to construct soft priors over text-token importance, allowing more informed scoring of vision tokens at each stage. Second, we perform an offline analysis of cross-modal attention shifts and identify consistent inflection locations in inference, which inspire us to propose a more principled and efficient pruning schedule. Our method is lightweight and plug-and-play, also generalizable across multi-modal tasks. Experimental results have verified the effectiveness of the proposed method. For example, it reduces CUDA latency by 61.3% while maintaining an average accuracy of 93.1% on vanilla LLaVA-1.5-7B. Under the same token budget, AdaptInfer surpasses SOTA in accuracy.

preprint2026arXiv

EgoKit: Towards Unified Low-Cost Egocentric Data Collection with Heterogeneous Devices

Egocentric video is increasingly used as a data source for robot learning, activity understanding, and embodied AI research, but collecting it at scale remains fragmented in practice: each candidate host device, such as an Android phone, iPhone, iPad, smart glasses, or extended reality (XR) headset, exposes a different SDK, a different policy on raw camera access, and different limitations on external USB cameras and on-device tracking. Synchronized ego-view and wrist-view capture is therefore typically obtained by either committing to a single proprietary platform or building one-off rigs that do not transfer across devices. To address this gap, we present EgoKit, a toolkit that exposes the same egocentric recording workflow across six heterogeneous host devices. Across all supported devices, EgoKit presents the same recording interaction and produces locally stored video with a uniform log format; on XR headsets, it additionally logs head pose and OpenXR-standard 26-joint hand tracking aligned to the video streams. The companion accessories, including two wrist cameras with mounts, a head strap, and a USB-C hub, add wrist-view capture to any supported host without custom hardware fabrication. EgoKit is available at \url{https://egokit.chuange.org/}.

preprint2026arXiv

HCInfer: An Efficient Inference System via Error Compensation for Resource-Constrained Devices

LLMs often struggle with memory-constrained deployment on consumer-grade hardware due to their massive parameter sizes. While existing solutions such as model compression and offloading improve deployment feasibility, they often suffer from substantial accuracy degradation or severe throughput bottlenecks. Recent error compensation methods recover accuracy through auxiliary LoRA-style branches, and we observe that these branches are inherently amenable to offloading: they require substantial parameter storage but access only a small subset of compensation parameters during each inference step. Motivated by this opportunity, we propose HCInfer, a heterogeneous inference system that offloads residual compensation to the CPU while executing the compressed backbone on the GPU, and further introduces an asynchronous compensation pipeline and sensitivity-aware dynamic rank allocation to hide compensation overhead and maximize accuracy recovery. Experimental results show that HCInfer achieves a maximum accuracy improvement of 5.2% on downstream tasks compared to compression model and sustaining a maximum speedup of 10.4x compared to full-precision model.

preprint2026arXiv

Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding

Proactive streaming video understanding requires Video-LLMs to decide when to respond as a video unfolds, a task where existing methods often fall short due to their implicit, query-agnostic modeling of visual evidence. We introduce Response-G1, a novel framework that establishes explicit, structured alignment between the accumulated video evidence and the query's expected response conditions via scene graphs. The framework operates in three fine-tuning-free stages: (1) online query-guided scene graph generation from streaming clips; (2) memory-based retrieval of the most semantically relevant historical scene graphs; and (3) retrieval-augmented trigger prompting for per-frame "silence/response" decisions. By grounding both evidence and conditions in a shared graph representation, Response-G1 achieves more interpretable and accurate response timing decisions. Experimental results on established benchmarks demonstrate the superiority of our method in both proactive and reactive tasks, validating the advantage of explicit scene graph modeling and retrieval in streaming video understanding.

preprint2022arXiv

DiffSRL: Learning Dynamical State Representation for Deformable Object Manipulation with Differentiable Simulator

Dynamic state representation learning is an important task in robot learning. Latent space that can capture dynamics related information has wide application in areas such as accelerating model free reinforcement learning, closing the simulation to reality gap, as well as reducing the motion planning complexity. However, current dynamic state representation learning methods scale poorly on complex dynamic systems such as deformable objects, and cannot directly embed well defined simulation function into the training pipeline. We propose DiffSRL, a dynamic state representation learning pipeline utilizing differentiable simulation that can embed complex dynamics models as part of the end-to-end training. We also integrate differentiable dynamic constraints as part of the pipeline which provide incentives for the latent state to be aware of dynamical constraints. We further establish a state representation learning benchmark on a soft-body simulation system, PlasticineLab, and our model demonstrates superior performance in terms of capturing long-term dynamics as well as reward prediction.

preprint2021arXiv

Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities

The vast proliferation of sensor devices and Internet of Things enables the applications of sensor-based activity recognition. However, there exist substantial challenges that could influence the performance of the recognition system in practical scenarios. Recently, as deep learning has demonstrated its effectiveness in many areas, plenty of deep methods have been investigated to address the challenges in activity recognition. In this study, we present a survey of the state-of-the-art deep learning methods for sensor-based human activity recognition. We first introduce the multi-modality of the sensory data and provide information for public datasets that can be used for evaluation in different challenge tasks. We then propose a new taxonomy to structure the deep methods by challenges. Challenges and challenge-related deep methods are summarized and analyzed to form an overview of the current research progress. At the end of this work, we discuss the open issues and provide some insights for future directions.

preprint2014arXiv

Cloud-based Privacy Preserving Image Storage, Sharing and Search

High-resolution cameras produce huge volume of high quality images everyday. It is extremely challenging to store, share and especially search those huge images, for which increasing number of cloud services are presented to support such functionalities. However, images tend to contain rich sensitive information (\eg, people, location and event), and people's privacy concerns hinder their readily participation into the services provided by untrusted third parties. In this work, we introduce PIC: a Privacy-preserving large-scale Image search system on Cloud. Our system enables efficient yet secure content-based image search with fine-grained access control, and it also provides privacy-preserving image storage and sharing among users. Users can specify who can/cannot search on their images when using the system, and they can search on others' images if they satisfy the condition specified by the image owners. Majority of the computationally intensive jobs are outsourced to the cloud side, and users only need to submit the query and receive the result throughout the entire image search. Specially, to deal with massive images, we design our system suitable for distributed and parallel computation and introduce several optimizations to further expedite the search process. We implement a prototype of PIC including both cloud side and client side. The cloud side is a cluster of computers with distributed file system (Hadoop HDFS) and MapReduce architecture (Hadoop MapReduce). The client side is built for both Windows OS laptops and Android phones. We evaluate the prototype system with large sets of real-life photos. Our security analysis and evaluation results show that PIC successfully protect the image privacy at a low cost of computation and communication.

preprint2014arXiv

Enable Portrait Privacy Protection in Photo Capturing and Sharing

The wide adoption of wearable smart devices with onboard cameras greatly increases people's concern on privacy infringement. Here we explore the possibility of easing persons from photos captured by smart devices according to their privacy protection requirements. To make this work, we need to address two challenges: 1) how to let users explicitly express their privacy protection intention, and 2) how to associate the privacy requirements with persons in captured photos accurately and efficiently. Furthermore, the association process itself should not cause portrait information leakage and should be accomplished in a privacy-preserving way. In this work, we design, develop, and evaluate a protocol, that enables a user to flexibly express her privacy requirement and empowers the photo service provider (or image taker) to exert the privacy protection policy.Leveraging the visual distinguishability of people in the field-of-view and the dimension-order-independent property of vector similarity measurement, we achieves high accuracy and low overhead. We implement a prototype system, and our evaluation results on both the trace-driven and real-life experiments confirm the feasibility and efficiency of our system.

preprint2014arXiv

Outsource Photo Sharing and Searching for Mobile Devices With Privacy Protection

With the proliferation of mobile devices, cloud-based photo sharing and searching services are becoming common due to the mobile devices' resource constrains. Meanwhile, there is also increasing concern about privacy in photos. In this work, we present a framework \ourprotocolNSP, which enables cloud servers to provide privacy-preserving photo sharing and search as a service to mobile device users. Privacy-seeking users can share their photos via our framework to allow only their authorized friends to browse and search their photos using resource-bounded mobile devices. This is achieved by our carefully designed architecture and novel outsourced privacy-preserving computation protocols, through which no information about the outsourced photos or even the search contents (including the results) would be revealed to the cloud servers. Our framework is compatible with most of the existing image search technologies, and it requires few changes to the existing cloud systems. The evaluation of our prototype system with 31,772 real-life images shows the communication and computation efficiency of our system.

preprint2013arXiv

Accurate Indoor Localization Using Acoustic Direction Finding via Smart Phones

We propose and implement a novel indoor localization scheme, Swadloon, built upon an accurate acoustic direction finding. Swadloon leverages sensors of the smartphone without the requirement of any specialized devices. The scheme Swadloon does not rely on any fingerprints and is very easy to use: a user only needs to shake the phone for a short duration before walking and localization. Our Swadloon design exploits a key observation: the relative shift and velocity of the phone-shaking movement corresponds to the subtle phase and frequency shift of the Doppler effects experienced in the received acoustic signal by the phone. A novel method is designed to derive the direction from the phone to the acoustic source by combining the velocity calculated from the subtle Doppler shift with the one from the inertial sensors of the phone. Then a real-time precise localization and tracking is enabled by using a few anchor speakers with known locations. Major challenges in implementing Swadloon are to measure the frequency shift precisely and to estimate the shaking velocity accurately when the speed of phone-shaking is low and changes arbitrarily. We propose rigorous methods to address these challenges, then design and deploy Swadloon in several floors of an indoor building each with area about 2000m^2. Our extensive experiments show that the mean error of direction finding is around 2.1 degree when the acoustic source is within the range of 32m. For indoor localization, the 90-percentile errors are under 0.92m, while the maximum error is 1.73m and the mean is about 0.5m. For real-time tracking, the errors are within 0.4m for walks of 51m.

preprint2013arXiv

DorFin: WiFi Fingerprint-based Localization Revisited

Although WiFi fingerprint-based indoor localization is attractive, its accuracy remains a primary challenge especially in mobile environments. Existing approaches either appeal to physical layer information or rely on extra wireless signals for high accuracy. In this paper, we revisit the RSS fingerprint-based localization scheme and reveal crucial observations that act as the root causes of localization errors, yet are surprisingly overlooked or even unseen in previous works. Specifically, we recognize APs' diverse discrimination for fingerprinting a specific location, observe the RSS inconsistency caused by signal fluctuations and human body blockages, and uncover the RSS outdated problem on commodity smartphones. Inspired by these insights, we devise a discrimination factor to quantify different APs' discrimination, incorporate robust regression to tolerate outlier measurements, and reassemble different fingerprints to cope with outdated RSSs. Combining these techniques in a unified solution, we propose DorFin, a novel scheme of fingerprint generation, representation, and matching, which yields remarkable accuracy without incurring extra cost. Extensive experiments demonstrate that DorFin achieves mean error of 2 meters and more importantly, bounds the 95th percentile error under 5.5 meters; these are about 56% and 69% lower, respectively, compared with the state-of-the-art schemes such as Horus and RADAR.

preprint2012arXiv

Efficient and Secure Key Extraction using CSI without Chasing down Errors

Generating keys and keeping them secret is critical in secure communications. Due to the "open-air" nature, key distribution is more susceptible to attacks in wireless communications. An ingenious solution is to generate common secret keys by two communicating parties separately without the need of key exchange or distribution, and regenerate them on needs. Recently, it is promising to extract keys by measuring the random variation in wireless channels, e.g., RSS. In this paper, we propose an efficient Secret Key Extraction protocol without Chasing down Errors, SKECE. It establishes common cryptographic keys for two communicating parties in wireless networks via the realtime measurement of Channel State Information (CSI). It outperforms RSS-based approaches for key generation in terms of multiple subcarriers measurement, perfect symmetry in channel, rapid decorrelation with distance, and high sensitivity towards environments. In the SKECE design, we also propose effective mechanisms such as the adaptive key stream generation, leakage resilient consistence validation, and weighted key recombination, to fully exploit the excellent properties of CSI. We implement SKECE on off-the-shelf 802.11n devices and evaluate its performance via extensive experiments. The results demonstrate that SKECE achieves a more than 3x throughput gain in the key generation from one subcarrier in static scenarios, and due to its high efficiency, a 50% reduction on the communication overhead compared to the state-of-the-art RSS based approaches.

preprint2012arXiv

Triggercast: Enabling Wireless Collisions Constructive

It is generally considered that concurrent transmissions should be avoided in order to reduce collisions in wireless sensor networks. Constructive interference (CI) envisions concurrent transmissions to positively interfere at the receiver. CI potentially allows orders of magnitude reductions in energy consumptions and improvements on link quality. In this paper, we theoretically introduce a sufficient condition to construct CI with IEEE 802.15.4 radio for the first time. Moreover, we propose Triggercast, a distributed middleware, and show it is feasible to generate CI in TMote Sky sensor nodes. To synchronize transmissions of multiple senders at the chip level, Triggercast effectively compensates propagation and radio processing delays, and has $95^{th}$ percentile synchronization errors of at most 250ns. Triggercast also intelligently decides which co-senders to participate in simultaneous transmissions, and aligns their transmission time to maximize the overall link PRR, under the condition of maximal system robustness. Extensive experiments in real testbeds reveal that Triggercast significantly improves PRR from 5% to 70% with 7 concurrent senders. We also demonstrate that Triggercast provides on average $1.3\times$ PRR performance gains, when integrated with existing data forwarding protocols.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint