Source author record

Zeyu Xiong

Zeyu Xiong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Cryptography and Security Human-Computer Interaction

Catalog footprint

What is connected

4works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.

preprint2022arXiv

BROOK Dataset: A Playground for Exploiting Data-Driven Techniques in Human-Vehicle Interactive Designs

Emerging Autonomous Vehicles (AV) breed great potentials to exploit data-driven techniques for adaptive and personalized Human-Vehicle Interactions. However, the lack of high-quality and rich data supports limits the opportunities to explore the design space of data-driven techniques, and validate the effectiveness of concrete mechanisms. Our goal is to initialize the efforts to deliver the building block for exploring data-driven Human-Vehicle Interaction designs. To this end, we present BROOK dataset, a multi-modal dataset with facial video records. We first brief our rationales to build BROOK dataset. Then, we elaborate how to build the current version of BROOK dataset via a year-long study, and give an overview of the dataset. Next, we present three example studies using BROOK to justify the applicability of BROOK dataset. We also identify key learning lessons from building BROOK dataset, and discuss about how BROOK dataset can foster an extensive amount of follow-up studies.

preprint2022arXiv

Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video Grounding

Spatial-Temporal Video Grounding (STVG) is a challenging task which aims to localize the spatio-temporal tube of the interested object semantically according to a natural language query. Most previous works not only severely rely on the anchor boxes extracted by Faster R-CNN, but also simply regard the video as a series of individual frames, thus lacking their temporal modeling. Instead, in this paper, we are the first to propose an anchor-free framework for STVG, called Gaussian Kernel-based Cross Modal Network (GKCMN). Specifically, we utilize the learned Gaussian Kernel-based heatmaps of each video frame to locate the query-related object. A mixed serial and parallel connection network is further developed to leverage both spatial and temporal relations among frames for better grounding. Experimental results on VidSTG dataset demonstrate the effectiveness of our proposed GKCMN.

preprint2022arXiv

HUT: Enabling High-UTility, Batched Queries under Differential Privacy Protection for Internet-of-Vehicles

The emerging trends of Internet-of-Vehicles (IoV) demand centralized servers to collect/process sensitive data with limited computational resources on a single vehicle. Such centralizations of sensitive data demand practical privacy protections. One widely-applied paradigm, Differential Privacy, can provide strong guarantees over sensitive data by adding noises. However, directly applying DP for IoV incurs significant challenges for data utility and effective protection. We observe that the key issue about DP-enabled protection in IoV lies in how to synergistically combine DP with special characteristics of IoV, whose query sequences are usually formed as unbalanced batches due to frequent interactions between centralized servers and edge vehicles. To this end, we propose HUT, a new algorithm to enable High UTility for DP-enabled protection in IoV. Our key insight is to leverage the inherent characteristics in IoV: the unbalanced batches. Our key idea is to aggregate local batches and apply Order Constraints, so that information loss from DP protection can be mitigated. We evaluate the effectiveness of HUT against the state-of-the-art DP protection mechanisms. The results show that HUT can provide much lower information loss by 95.69\% and simultaneously enable strong mathematically-guaranteed protection over sensitive data.

Zeyu Xiong

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

BROOK Dataset: A Playground for Exploiting Data-Driven Techniques in Human-Vehicle Interactive Designs

Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video Grounding

HUT: Enabling High-UTility, Batched Queries under Differential Privacy Protection for Internet-of-Vehicles