Source author record

Yuntao Wang

Yuntao Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Cryptography and Security Human-Computer Interaction cond-mat.stat-mech eess.SP Computer Science and Game Theory Machine Learning math.NT

Catalog footprint

What is connected

11works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Depth-Synergized Mamba Meets Memory Experts for All-Day Image Reflection Separation

Image reflection separation aims to disentangle the transmission layer and the reflection layer from a blended image. Existing methods rely on limited information from a single image, tending to confuse the two layers when their contrasts are similar, a challenge more severe at night. To address this issue, we propose the Depth-Memory Decoupling Network (DMDNet). It employs the Depth-Aware Scanning (DAScan) to guide Mamba toward salient structures, promoting information flow along semantic coherence to construct stable states. Working in synergy with DAScan, the Depth-Synergized State-Space Model (DS-SSM) modulates the sensitivity of state activations by depth, suppressing the spread of ambiguous features that interfere with layer disentanglement. Furthermore, we introduce the Memory Expert Compensation Module (MECM), leveraging cross-image historical knowledge to guide experts in providing layer-specific compensation. To address the lack of datasets for nighttime reflection separation, we construct the Nighttime Image Reflection Separation (NightIRS) dataset. Extensive experiments demonstrate that DMDNet outperforms state-of-the-art methods in both daytime and nighttime.

preprint2026arXiv

EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

Despite extensive efforts on egocentric video datasets and benchmarks, understanding users' internal states, which is crucial for enabling seamless AI assistant experiences, remains largely overlooked. In this work, we introduce EgoIntrospect, the first egocentric dataset captured in user-driven scenarios with self-annotations that explicitly reveal users' interactive intentions with AI assistants. EgoIntrospect was collected using a cross-device setup, providing synchronized video, audio, gaze, motion, and physiological signals. It consists of 180 hours of recordings from 60 subjects, with an average recording duration of 3 hours per subject. Leveraging EgoIntrospect, we formalize a suite of tasks centered on user internal states, including affective experience, interactive intent, and cognitive memory. We further process the annotations to construct benchmarks that evaluate the ability of modern multimodal large language models to reason about users' internal states from egocentric observations. Experiments on our benchmark suggest that existing multimodal large language models struggle to effectively leverage multimodal signals to infer users' subjective internal states. The dataset and annotations will be made publicly available to advance research in egocentric vision and wearable AI assistants. Project page: https://ego-introspect.github.io/

preprint2026arXiv

Vol-Mark: A Watermark for 3D Medical Volume Data Via Cubic Difference Expansion and Contrastive Learning

Today, advances in medical technology extensively utilize 3D volume data for accurate and efficient diagnostics. However, sharing these data across networks in telemedicine poses significant security risks of data tampering and unauthorized copying. To address these challenges, this paper proposes a novel reversible-zero watermarking approach, termed Vol-Mark, for medical volume data to protect their ownership and authenticity in telemedicine. The proposed Vol-Mark method offers two key benefits: 1) it designs a volume data feature extractor that leverages contrastive learning to efficiently extract discriminative and stable volumetric features, ensuring robustness against 3D attacks; 2) it introduces the cubic difference expansion (c-DE) technique, which leverages the 3D integer wavelet transform to embed watermark bits into neighboring voxels within cubes at low-frequency coefficients. The voxel differences within each cube are expanded to create embedding space, and a majority voting mechanism is employed during extraction to enhance reliability. The embedding process incurs low distortion and supports lossless removal, thereby preserving the integrity and diagnostic accuracy of medical volume data. Through these two benefits, Vol-Mark enables both integrity verification and ownership verification. Integrity verification is first performed, and ownership verification through hypothesis testing is further conducted to enhance reliability, particularly under data tampering or watermark removal attacks. Comprehensive experimental results show the effectiveness of the proposed method and its superior robustness against conventional, geometric, and hybrid attacks on medical volume data. In particular, through multiple tasks evaluations, Vol-Mark consistently achieves an ACC above 0.90 in most attack scenarios, outperforming existing methods by a clear margin.

preprint2022arXiv

A Survey on Metaverse: Fundamentals, Security, and Privacy

Metaverse, as an evolving paradigm of the next-generation Internet, aims to build a fully immersive, hyper spatiotemporal, and self-sustaining virtual shared space for humans to play, work, and socialize. Driven by recent advances in emerging technologies such as extended reality, artificial intelligence, and blockchain, metaverse is stepping from science fiction to an upcoming reality. However, severe privacy invasions and security breaches (inherited from underlying technologies or emerged in the new digital ecology) of metaverse can impede its wide deployment. At the same time, a series of fundamental challenges (e.g., scalability and interoperability) can arise in metaverse security provisioning owing to the intrinsic characteristics of metaverse, such as immersive realism, hyper spatiotemporality, sustainability, and heterogeneity. In this paper, we present a comprehensive survey of the fundamentals, security, and privacy of metaverse. Specifically, we first investigate a novel distributed metaverse architecture and its key characteristics with ternary-world interactions. Then, we discuss the security and privacy threats, present the critical challenges of metaverse systems, and review the state-of-the-art countermeasures. Finally, we draw open research directions for building future metaverse systems.

preprint2022arXiv

Energy-Efficient and Physical Layer Secure Computation Offloading in Blockchain-Empowered Internet of Things

This paper investigates computation offloading in blockchain-empowered Internet of Things (IoT), where the task data uploading link from sensors to a base station (BS) is protected by intelligent reflecting surface (IRS)-assisted physical layer security (PLS). After receiving task data, the BS allocates computational resources provided by mobile edge computing (MEC) servers to help sensors perform tasks. Existing blockchain-based computation offloading schemes usually focus on network performance improvements, such as energy consumption minimization or latency minimization, and neglect the Gas fee for computation offloading, resulting in the dissatisfaction of high Gas providers. Also, the secrecy rate during the data uploading process can not be measured by a steady value because of the time-varying characteristics of IRS-based wireless channels, thereby computational resources allocation with a secrecy rate measured before data uploading is inappropriate. In this paper, we design a Gas-oriented computation offloading scheme that guarantees a low degree of dissatisfaction of sensors, while reducing energy consumption. Also, we deduce the ergodic secrecy rate of IRS-assisted PLS transmission that can represent the global secrecy performance to allocate computational resources. The simulations show that the proposed scheme has lower energy consumption compared to existing schemes, and ensures that the node paying higher Gas gets stronger computational resources.

preprint2022arXiv

FaceOri: Tracking Head Position and Orientation Using Ultrasonic Ranging on Earphones

Face orientation can often indicate users' intended interaction target. In this paper, we propose FaceOri, a novel face tracking technique based on acoustic ranging using earphones. FaceOri can leverage the speaker on a commodity device to emit an ultrasonic chirp, which is picked up by the set of microphones on the user's earphone, and then processed to calculate the distance from each microphone to the device. These measurements are used to derive the user's face orientation and distance with respect to the device. We conduct a ground truth comparison and user study to evaluate FaceOri's performance. The results show that the system can determine whether the user orients to the device at a 93.5% accuracy within a 1.5 meters range. Furthermore, FaceOri can continuously track the user's head orientation with a median absolute error of 10.9 mm in the distance, 3.7 degrees in yaw, and 5.8 degrees in pitch. FaceOri can allow for convenient hands-free control of devices and produce more intelligent context-aware interaction.

preprint2022arXiv

Mobile Wireless Rechargeable UAV Networks: Challenges and Solutions

Unmanned aerial vehicles (UAVs) can help facilitate cost-effective and flexible service provisioning in future smart cities. Nevertheless, UAV applications generally suffer severe flight time limitations due to constrained onboard battery capacity, causing a necessity of frequent battery recharging or replacement when performing persistent missions. Utilizing wireless mobile chargers, such as vehicles with wireless charging equipment for on-demand self-recharging has been envisioned as a promising solution to address this issue. In this article, we present a comprehensive study of \underline{v}ehicle-assisted \underline{w}ireless rechargeable \underline{U}AV \underline{n}etworks (VWUNs) to promote on-demand, secure, and efficient UAV recharging services. Specifically, we first discuss the opportunities and challenges of deploying VWUNs and review state-of-the-art solutions in this field. We then propose a secure and privacy-preserving VWUN framework for UAVs and ground vehicles based on differential privacy (DP). Within this framework, an online double auction mechanism is developed for optimal charging scheduling, and a two-phase DP algorithm is devised to preserve the sensitive bidding and energy trading information of participants. Experimental results demonstrate that the proposed framework can effectively enhance charging efficiency and security. Finally, we outline promising directions for future research in this emerging field.

preprint2022arXiv

MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing

Camera-based contactless photoplethysmography refers to a set of popular techniques for contactless physiological measurement. The current state-of-the-art neural models are typically trained in a supervised manner using videos accompanied by gold standard physiological measurements. However, they often generalize poorly out-of-domain examples (i.e., videos that are unlike those in the training set). Personalizing models can help improve model generalizability, but many personalization techniques still require some gold standard data. To help alleviate this dependency, in this paper, we present a novel mobile sensing system called MobilePhys, the first mobile personalized remote physiological sensing system, that leverages both front and rear cameras on a smartphone to generate high-quality self-supervised labels for training personalized contactless camera-based PPG models. To evaluate the robustness of MobilePhys, we conducted a user study with 39 participants who completed a set of tasks under different mobile devices, lighting conditions/intensities, motion tasks, and skin types. Our results show that MobilePhys significantly outperforms the state-of-the-art on-device supervised training and few-shot adaptation methods. Through extensive user studies, we further examine how does MobilePhys perform in complex real-world settings. We envision that calibrated or personalized camera-based contactless PPG models generated from our proposed dual-camera mobile sensing system will open the door for numerous future applications such as smart mirrors, fitness and mobile health applications.

preprint2021arXiv

SplitSR: An End-to-End Approach to Super-Resolution on Mobile Devices

Super-resolution (SR) is a coveted image processing technique for mobile apps ranging from the basic camera apps to mobile health. Existing SR algorithms rely on deep learning models with significant memory requirements, so they have yet to be deployed on mobile devices and instead operate in the cloud to achieve feasible inference time. This shortcoming prevents existing SR methods from being used in applications that require near real-time latency. In this work, we demonstrate state-of-the-art latency and accuracy for on-device super-resolution using a novel hybrid architecture called SplitSR and a novel lightweight residual block called SplitSRBlock. The SplitSRBlock supports channel-splitting, allowing the residual blocks to retain spatial information while reducing the computation in the channel dimension. SplitSR has a hybrid design consisting of standard convolutional blocks and lightweight residual blocks, allowing people to tune SplitSR for their computational budget. We evaluate our system on a low-end ARM CPU, demonstrating both higher accuracy and up to 5 times faster inference than previous approaches. We then deploy our model onto a smartphone in an app called ZoomSR to demonstrate the first-ever instance of on-device, deep learning-based SR. We conducted a user study with 15 participants to have them assess the perceived quality of images that were post-processed by SplitSR. Relative to bilinear interpolation -- the existing standard for on-device SR -- participants showed a statistically significant preference when looking at both images (Z=-9.270, p<0.01) and text (Z=-6.486, p<0.01).

preprint2020arXiv

LLL and stochastic sandpile models

Theaimofthepresentpaperistosuggestthatstatisticalphysicsprovides the correct language to understand the practical behavior of the LLL algorithm, most of which are left unexplained to this day. To this end, we propose sandpile models that imitate LLL with compelling accuracy, and prove for these models some of the most desired statements regarding LLL. We also formulate a few conjectures that formally capture our heuristics and would serve as milestones for further development of the theory.

preprint2019arXiv

A stochastic variant of the abelian sandpile model

We introduce a natural stochastic extension, called SSP, of the abelian sandpile model(ASM), which shares many mathematical properties with ASM, yet radically differs in its physical behavior, for example in terms of the shape of the steady state and of the avalanche size distribution. We establish a basic theory of SSP analogous to that of ASM, and present a brief numerical study of its behavior. Our original motivation for studying SSP stems from its connection to the LLL algorithm established in another work by the authors [5]. The importance of understanding how LLL works cannot be stressed more, especially from the point of view of lattice-based cryptography. We believe SSP serves as a tractable toy model of LLL that would help further our understanding of it.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint