Source author record

Jialiang Wang

Jialiang Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence eess.SP Machine Learning Multimedia physics.optics

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

Diffusion models have transformed the image-to-image (I2I) synthesis and are now permeating into videos. However, the advancement of video-to-video (V2V) synthesis has been hampered by the challenge of maintaining temporal consistency across video frames. This paper proposes a consistent V2V synthesis framework by jointly leveraging spatial conditions and temporal optical flow clues within the source video. Contrary to prior methods that strictly adhere to optical flow, our approach harnesses its benefits while handling the imperfection in flow estimation. We encode the optical flow via warping from the first frame and serve it as a supplementary reference in the diffusion model. This enables our model for video synthesis by editing the first frame with any prevalent I2I models and then propagating edits to successive frames. Our V2V model, FlowVid, demonstrates remarkable properties: (1) Flexibility: FlowVid works seamlessly with existing I2I models, facilitating various modifications, including stylization, object swaps, and local edits. (2) Efficiency: Generation of a 4-second video with 30 FPS and 512x512 resolution takes only 1.5 minutes, which is 3.1x, 7.2x, and 10.5x faster than CoDeF, Rerender, and TokenFlow, respectively. (3) High-quality: In user studies, our FlowVid is preferred 45.7% of the time, outperforming CoDeF (3.5%), Rerender (10.2%), and TokenFlow (40.4%).

preprint2022arXiv

A Practical Second-order Latent Factor Model via Distributed Particle Swarm Optimization

Latent Factor (LF) models are effective in representing high-dimension and sparse (HiDS) data via low-rank matrices approximation. Hessian-free (HF) optimization is an efficient method to utilizing second-order information of an LF model's objective function and it has been utilized to optimize second-order LF (SLF) model. However, the low-rank representation ability of a SLF model heavily relies on its multiple hyperparameters. Determining these hyperparameters is time-consuming and it largely reduces the practicability of an SLF model. To address this issue, a practical SLF (PSLF) model is proposed in this work. It realizes hyperparameter self-adaptation with a distributed particle swarm optimizer (DPSO), which is gradient-free and parallelized. Experiments on real HiDS data sets indicate that PSLF model has a competitive advantage over state-of-the-art models in data representation ability.

preprint2022arXiv

Toward Practical Monocular Indoor Depth Estimation

The majority of prior monocular depth estimation methods without groundtruth depth guidance focus on driving scenarios. We show that such methods generalize poorly to unseen complex indoor scenes, where objects are cluttered and arbitrarily arranged in the near field. To obtain more robustness, we propose a structure distillation approach to learn knacks from an off-the-shelf relative depth estimator that produces structured but metric-agnostic depth. By combining structure distillation with a branch that learns metrics from left-right consistency, we attain structured and metric depth for generic indoor scenes and make inferences in real-time. To facilitate learning and evaluation, we collect SimSIN, a dataset from simulation with thousands of environments, and UniSIN, a dataset that contains about 500 real scan sequences of generic indoor environments. We experiment in both sim-to-real and real-to-real settings, and show improvements, as well as in downstream applications using our depth maps. This work provides a full study, covering methods, data, and applications aspects.

preprint2022arXiv

UMSNet: An Universal Multi-sensor Network for Human Activity Recognition

Human activity recognition (HAR) based on multimodal sensors has become a rapidly growing branch of biometric recognition and artificial intelligence. However, how to fully mine multimodal time series data and effectively learn accurate behavioral features has always been a hot topic in this field. Practical applications also require a well-generalized framework that can quickly process a variety of raw sensor data and learn better feature representations. This paper proposes a universal multi-sensor network (UMSNet) for human activity recognition. In particular, we propose a new lightweight sensor residual block (called LSR block), which improves the performance by reducing the number of activation function and normalization layers, and adding inverted bottleneck structure and grouping convolution. Then, the Transformer is used to extract the relationship of series features to realize the classification and recognition of human activities. Our framework has a clear structure and can be directly applied to various types of multi-modal Time Series Classification (TSC) tasks after simple specialization. Extensive experiments show that the proposed UMSNet outperforms other state-of-the-art methods on two popular multi-sensor human activity recognition datasets (i.e. HHAR dataset and MHEALTH dataset).

preprint2020arXiv

Improving Deep Stereo Network Generalization with Geometric Priors

End-to-end deep learning methods have advanced stereo vision in recent years and obtained excellent results when the training and test data are similar. However, large datasets of diverse real-world scenes with dense ground truth are difficult to obtain and currently not publicly available to the research community. As a result, many algorithms rely on small real-world datasets of similar scenes or synthetic datasets, but end-to-end algorithms trained on such datasets often generalize poorly to different images that arise in real-world applications. As a step towards addressing this problem, we propose to incorporate prior knowledge of scene geometry into an end-to-end stereo network to help networks generalize better. For a given network, we explicitly add a gradient-domain smoothness prior and occlusion reasoning into the network training, while the architecture remains unchanged during inference. Experimentally, we show consistent improvements if we train on synthetic datasets and test on the Middlebury (real images) dataset. Noticeably, we improve PSM-Net accuracy on Middlebury from 5.37 MAE to 3.21 MAE without sacrificing speed.

preprint2019arXiv

Fiber-optic joint time and frequency transfer with the same wavelength

Optical fiber links have demonstrated their ability to transfer the ultra-stable clock signals. In this paper we propose and demonstrate a new scheme to transfer both time and radio frequency with the same wavelength based on coherent demodulation technique. Time signal is encoded as a binary phase-shift keying (BPSK) to the optical carrier using electro optic modulator (EOM) by phase modulation and makes sure the frequency signal free from interference with single pulse. The phase changes caused by the fluctuations of the transfer links are actively cancelled at local site by optical delay lines. Radio frequency with 1GHz and time signal with one pulse per second (1PPS) transmitted over a 110km fiber spools are obtained. The experimental results demonstrate that frequency instabilities of 1.7E-14 at 1s and 5.9E-17 at 104s. Moreover, time interval transfer of 1PPS signal reaches sub-ps stability after 1000s. This scheme offers advantages with respect to reduce the channel in fiber network, and can keep time and frequency signal independent of each other.

preprint2016arXiv

Time and Frequency Injection into a Stabilized Fiber Link for Multi-clock Dissemination Network

Owing to the characteristics of ultra-low loss and anti-electromagnetic interference, using optical fiber to deliver time and frequency signal has been a preferred choice for high precise clock dissemination and comparison. As a brilliant idea, one has been able to reproduce ultra-stable signals from one local station to multiple users. In this paper, we take a step further. A concept of multi-clock (in different locations) dissemination for multi-terminals is presented. By injecting frequency signals into one stabilized ring-like fiber network, the relative stabilities of 3.4e-14@1s for a master clock dissemination and 5.1e-14@1s for a slave clock dissemination have been achieved. The proposed scheme can greatly simplify the future 'N' to 'N' time and frequency dissemination network, especially facing a multi-clock comparison situation.

Jialiang Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

A Practical Second-order Latent Factor Model via Distributed Particle Swarm Optimization

Toward Practical Monocular Indoor Depth Estimation

UMSNet: An Universal Multi-sensor Network for Human Activity Recognition

Improving Deep Stereo Network Generalization with Geometric Priors

Fiber-optic joint time and frequency transfer with the same wavelength

Time and Frequency Injection into a Stabilized Fiber Link for Multi-clock Dissemination Network