Researcher profile

Bin Zhao

Bin Zhao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

VeloGauss: Learning Physically Consistent Gaussian Velocity Fields from Videos

In this paper, we aim to jointly model the geometry, appearance, and physical information of 3D scenes solely from dynamic multi-view videos, without relying on any physical priors. Existing works typically employ physical losses merely as soft constraints or integrate physical simulations into neural networks; however, these approaches often fail to effectively learn complex motion physics. Although modeling velocity fields holds the potential to capture authentic physical information, due to the lack of appropriate physical constraints, current methods are unable to correctly learn the interaction mechanisms between rigid and non-rigid particles. To address this, we propose VeloGauss, designed to learn the physical properties of complex dynamic 3D scenes without physical priors. Our method learns the velocity field for each Gaussian particle by introducing a Physics Code and a Particle Dynamics System, and ultimately incorporates Global Physical Constraints to ensure the physical consistency of the scene. Extensive experiments on four public datasets demonstrate that our method outperforms achieves state-of-the-art performance in both Novel View Interpolation and Future Frame Extrapolation tasks.

preprint2022arXiv

Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction

The Dynamic Saliency Prediction (DSP) task simulates the human selective attention mechanism to perceive the dynamic scene, which is significant and imperative in many vision tasks. Most of existing methods only consider visual cues, while neglect the accompanied audio information, which can provide complementary information for the scene understanding. In fact, there exists a strong relation between auditory and visual cues, and humans generally perceive the surrounding scene by collaboratively sensing these cues. Motivated by this, an audio-visual collaborative representation learning method is proposed for the DSP task, which explores the audio modality to better predict the dynamic saliency map by assisting vision modality. The proposed method consists of three parts: 1) audio-visual encoding, 2) audio-visual location, and 3) collaborative integration parts. Firstly, a refined SoundNet architecture is adopted to encode audio modality for obtaining corresponding features, and a modified 3D ResNet-50 architecture is employed to learn visual features, containing both spatial location and temporal motion information. Secondly, an audio-visual location part is devised to locate the sound source in the visual scene by learning the correspondence between audio-visual information. Thirdly, a collaborative integration part is devised to adaptively aggregate audio-visual information and center-bias prior to generate the final saliency map. Extensive experiments are conducted on six challenging audiovisual eye-tracking datasets, including DIEM, AVAD, Coutrot1, Coutrot2, SumMe, and ETMD, which shows significant superiority over state-of-the-art DSP models.

preprint2022arXiv

Eddy Covariance: A Scientometric Review (1981-2018)

The history of eddy covariance (EC) measuring system could be dated back to 100 years ago, but it was not until the recent decades that EC gains popularity and being widely used in global change ecological studies, with explosion of related work published in papers from various journals. Investigating 8297 literature related with EC from 1981 to 2018, we make a comprehensive and critical review of scientific development of EC from a scientometric perspective. First, the paper outlines general bibliometric statistics, including publication number, country contribution, productive institutions, active authors, journal distribution, highly cited articles and fund support, to provide an informative picture of EC studies. Second, research trends are revealed by network visualization and modeling based on keyword analysis, from where we could discover the knowledge structure of EC and detect the research focus and hotspots transitions at different periods. Third, collaboration in EC research community has been explored. FLUXNET is the largest global network uniting EC researchers, here we have quantified and evaluated its performance by using bibliometric indicators of cooperation and citation. Specific discussions have been given to the historical development of EC, including technical maturation and application promotion. Considering the current barrier for collaboration, the review closes by analyzing the reasons hindering data sharing and makes a prospect of new models for data-intensive collaboration in the future.

preprint2022arXiv

RCLane: Relay Chain Prediction for Lane Detection

Lane detection is an important component of many real-world autonomous systems. Despite a wide variety of lane detection approaches have been proposed, reporting steady benchmark improvements over time, lane detection remains a largely unsolved problem. This is because most of the existing lane detection methods either treat the lane detection as a dense prediction or a detection task, few of them consider the unique topologies (Y-shape, Fork-shape, nearly horizontal lane) of the lane markers, which leads to sub-optimal solution. In this paper, we present a new method for lane detection based on relay chain prediction. Specifically, our model predicts a segmentation map to classify the foreground and background region. For each pixel point in the foreground region, we go through the forward branch and backward branch to recover the whole lane. Each branch decodes a transfer map and a distance map to produce the direction moving to the next point, and how many steps to progressively predict a relay station (next point). As such, our model is able to capture the keypoints along the lanes. Despite its simplicity, our strategy allows us to establish new state-of-the-art on four major benchmarks including TuSimple, CULane, CurveLanes and LLAMAS.

preprint2021arXiv

Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos

Segmenting objects in videos is a fundamental computer vision task. The current deep learning based paradigm offers a powerful, but data-hungry solution. However, current datasets are limited by the cost and human effort of annotating object masks in videos. This effectively limits the performance and generalization capabilities of existing video segmentation methods. To address this issue, we explore weaker form of bounding box annotations. We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos. To this end, we propose a spatio-temporal aggregation module that effectively mines consistencies in the object and background appearance across multiple frames. We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks. We generate segmentation masks for large scale tracking datasets, using only their bounding box annotations. The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.

preprint2021arXiv

Semantics-Consistent Representation Learning for Remote Sensing Image-Voice Retrieval

With the development of earth observation technology, massive amounts of remote sensing (RS) images are acquired. To find useful information from these images, cross-modal RS image-voice retrieval provides a new insight. This paper aims to study the task of RS image-voice retrieval so as to search effective information from massive amounts of RS data. Existing methods for RS image-voice retrieval rely primarily on the pairwise relationship to narrow the heterogeneous semantic gap between images and voices. However, apart from the pairwise relationship included in the datasets, the intra-modality and non-paired inter-modality relationships should also be taken into account simultaneously, since the semantic consistency among non-paired representations plays an important role in the RS image-voice retrieval task. Inspired by this, a semantics-consistent representation learning (SCRL) method is proposed for RS image-voice retrieval. The main novelty is that the proposed method takes the pairwise, intra-modality, and non-paired inter-modality relationships into account simultaneously, thereby improving the semantic consistency of the learned representations for the RS image-voice retrieval. The proposed SCRL method consists of two main steps: 1) semantics encoding and 2) semantics-consistent representation learning. Firstly, an image encoding network is adopted to extract high-level image features with a transfer learning strategy, and a voice encoding network with dilated convolution is devised to obtain high-level voice features. Secondly, a consistent representation space is conducted by modeling the three kinds of relationships to narrow the heterogeneous semantic gap and learn semantics-consistent representations across two modalities. Extensive experimental results on three challenging RS image-voice datasets show the effectiveness of the proposed method.

preprint2021arXiv

Weather GAN: Multi-Domain Weather Translation Using Generative Adversarial Networks

In this paper, a new task is proposed, namely, weather translation, which refers to transferring weather conditions of the image from one category to another. It is important for photographic style transfer. Although lots of approaches have been proposed in traditional image translation tasks, few of them can handle the multi-category weather translation task, since weather conditions have rich categories and highly complex semantic structures. To address this problem, we develop a multi-domain weather translation approach based on generative adversarial networks (GAN), denoted as Weather GAN, which can achieve the transferring of weather conditions among sunny, cloudy, foggy, rainy and snowy. Specifically, the weather conditions in the image are determined by various weather-cues, such as cloud, blue sky, wet ground, etc. Therefore, it is essential for weather translation to focus the main attention on weather-cues. To this end, the generator of Weather GAN is composed of an initial translation module, an attention module and a weather-cue segmentation module. The initial translation module performs global translation during generation procedure. The weather-cue segmentation module identifies the structure and exact distribution of weather-cues. The attention module learns to focus on the interesting areas of the image while keeping other areas unaltered. The final generated result is synthesized by these three parts. This approach suppresses the distortion and deformation caused by weather translation. our approach outperforms the state-of-the-arts has been shown by a large number of experiments and evaluations.

preprint2020arXiv

Automated Segmentation of Brain Gray Matter Nuclei on Quantitative Susceptibility Mapping Using Deep Convolutional Neural Network

Abnormal iron accumulation in the brain subcortical nuclei has been reported to be correlated to various neurodegenerative diseases, which can be measured through the magnetic susceptibility from the quantitative susceptibility mapping (QSM). To quantitively measure the magnetic susceptibility, the nuclei should be accurately segmented, which is a tedious task for clinicians. In this paper, we proposed a double-branch residual-structured U-Net (DB-ResUNet) based on 3D convolutional neural network (CNN) to automatically segment such brain gray matter nuclei. To better tradeoff between segmentation accuracy and the memory efficiency, the proposed DB-ResUNet fed image patches with high resolution and the patches with low resolution but larger field of view into the local and global branches, respectively. Experimental results revealed that by jointly using QSM and T$_\text{1}$ weighted imaging (T$_\text{1}$WI) as inputs, the proposed method was able to achieve better segmentation accuracy over its single-branch counterpart, as well as the conventional atlas-based method and the classical 3D-UNet structure. The susceptibility values and the volumes were also measured, which indicated that the measurements from the proposed DB-ResUNet are able to present high correlation with values from the manually annotated regions of interest.

preprint2020arXiv

Automatic acute ischemic stroke lesion segmentation using semi-supervised learning

Ischemic stroke is a common disease in the elderly population, which can cause long-term disability and even death. However, the time window for treatment of ischemic stroke in its acute stage is very short. To fast localize and quantitively evaluate the acute ischemic stroke (AIS) lesions, many deep-learning-based lesion segmentation methods have been proposed in the literature, where a deep convolutional neural network (CNN) was trained on hundreds of fully labeled subjects with accurate annotations of AIS lesions. Despite that high segmentation accuracy can be achieved, the accurate labels should be annotated by experienced clinicians, and it is therefore very time-consuming to obtain a large number of fully labeled subjects. In this paper, we propose a semi-supervised method to automatically segment AIS lesions in diffusion weighted images and apparent diffusion coefficient maps. By using a large number of weakly labeled subjects and a small number of fully labeled subjects, our proposed method is able to accurately detect and segment the AIS lesions. In particular, our proposed method consists of three parts: 1) a double-path classification net (DPC-Net) trained in a weakly-supervised way is used to detect the suspicious regions of AIS lesions; 2) a pixel-level K-Means clustering algorithm is used to identify the hyperintensive regions on the DWIs; and 3) a region-growing algorithm combines the outputs of the DPC-Net and the K-Means to obtain the final precise lesion segmentation. In our experiment, we use 460 weakly labeled subjects and 15 fully labeled subjects to train and fine-tune the proposed method. By evaluating on a clinical dataset with 150 fully labeled subjects, our proposed method achieves a mean dice coefficient of 0.642, and a lesion-wise F1 score of 0.822.

preprint2020arXiv

Automatically growing global reactive neural network potential energy surfaces: a trajectory free active learning strategy

An efficient and trajectory-free active learning method is proposed to automatically sample data points for constructing globally accurate reactive potential energy surfaces (PESs) using neural networks (NNs). Although NNs do not provide the predictive variance as the Gaussian process regression does, we can alternatively minimize the negative of the squared difference surface (NSDS) given by two different NN models to actively locate the point where the PES is least confident. A batch of points in the minima of this NSDS can be iteratively added into the training set to improve the PES. The configuration space is gradually and globally covered with no need to run classical trajectory (or equivalently molecular dynamics) simulations. Through refitting the available analytical PESs of H3 and OH3 reactive systems, we demonstrate the efficiency and robustness of this new strategy, which enables fast convergence of the reactive PESs with respect to the number of points in terms of quantum scattering probabilities.

preprint2020arXiv

Spectral self-adaptive absorber/emitter for harvesting energy from the sun and outer space

The sun (~6000 K) and outer space (~3 K) are the original heat source and sink for human beings on Earth. The energy applications of absorbing solar irradiation and harvesting the coldness of outer space for energy utilization have attracted considerable interest from researchers. However, combining these two functions in a static device for continuous energy harvesting is unachievable due to the intrinsic infrared spectral conflict. In this study, we developed spectral self-adaptive absorber/emitter (SSA/E) for daytime photothermal and nighttime radiative sky cooling modes depending on the phase transition of the vanadium dioxide coated layer. A 24-hour day-night test showed that the fabricated SSA/E has continuous energy harvesting ability and improved overall energy utilization performance, thus showing remarkable potential in future energy applications.