Researcher profile

Sibo Zhang

Sibo Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary

With the advance of deep learning technology, automatic video generation from audio or text has become an emerging and promising research topic. In this paper, we present a novel approach to synthesize video from the text. The method builds a phoneme-pose dictionary and trains a generative adversarial network (GAN) to generate video from interpolated phoneme poses. Compared to audio-driven video generation algorithms, our approach has a number of advantages: 1) It only needs a fraction of the training data used by an audio-driven approach; 2) It is more flexible and not subject to vulnerability due to speaker variation; 3) It significantly reduces the preprocessing, training and inference time. We perform extensive experiments to compare the proposed method with state-of-the-art talking face generation methods on a benchmark dataset and datasets of our own. The results demonstrate the effectiveness and superiority of our approach.

preprint2020arXiv

DVI: Depth Guided Video Inpainting for Autonomous Driving

To get clear street-view and photo-realistic simulation in autonomous driving, we present an automatic video inpainting algorithm that can remove traffic agents from videos and synthesize missing regions with the guidance of depth/point cloud. By building a dense 3D map from stitched point clouds, frames within a video are geometrically correlated via this common 3D map. In order to fill a target inpainting area in a frame, it is straightforward to transform pixels from other frames into the current one with correct occlusion. Furthermore, we are able to fuse multiple videos through 3D point cloud registration, making it possible to inpaint a target video with multiple source videos. The motivation is to solve the long-time occlusion problem where an occluded area has never been visible in the entire video. To our knowledge, we are the first to fuse multiple videos for video inpainting. To verify the effectiveness of our approach, we build a large inpainting dataset in the real urban road environment with synchronized images and Lidar data including many challenge scenes, e.g., long time occlusion. The experimental results show that the proposed approach outperforms the state-of-the-art approaches for all the criteria, especially the RMSE (Root Mean Squared Error) has been reduced by about 13%.

preprint2017arXiv

Event-Radar: Real-time Local Event Detection System for Geo-Tagged Tweet Streams

The local event detection is to use posting messages with geotags on social networks to reveal the related ongoing events and their locations. Recent studies have demonstrated that the geo-tagged tweet stream serves as an unprecedentedly valuable source for local event detection. Nevertheless, how to effectively extract local events from large geo-tagged tweet streams in real time remains challenging. A robust and efficient cloud-based real-time local event detection software system would benefit various aspects in the real-life society, from shopping recommendation for customer service providers to disaster alarming for emergency departments. We use the preliminary research GeoBurst as a starting point, which proposed a novel method to detect local events. GeoBurst+ leverages a novel cross-modal authority measure to identify several pivots in the query window. Such pivots reveal different geo-topical activities and naturally attract related tweets to form candidate events. It further summarises the continuous stream and compares the candidates against the historical summaries to pinpoint truly interesting local events. We mainly implement a website demonstration system Event-Radar with an improved algorithm to show the real-time local events online for public interests. Better still, as the query window shifts, our method can update the event list with little time cost, thus achieving continuous monitoring of the stream.