Researcher profile

Yiling Xu

Yiling Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2023arXiv

Reduced Reference Quality Assessment for Point Cloud Compression

In this paper, we propose a reduced reference (RR) point cloud quality assessment (PCQA) model named R-PCQA to quantify the distortions introduced by the lossy compression. Specifically, we use the attribute and geometry quantization steps of different compression methods (i.e., V-PCC, G-PCC and AVS) to infer the point cloud quality, assuming that the point clouds have no other distortions before compression. First, we analyze the compression distortion of point clouds under separate attribute compression and geometry compression to avoid their mutual masking, for which we consider 5 point clouds as references to generate a compression dataset (PCCQA) containing independent attribute compression and geometry compression samples. Then, we develop the proposed R-PCQA via fitting the relationship between the quantization steps and the perceptual quality. We evaluate the performance of R-PCQA on both the established dataset and another independent dataset. The results demonstrate that the proposed R-PCQA can exhibit reliable performance and high generalization ability.

preprint2022arXiv

3DAC: Learning Attribute Compression for Point Clouds

We study the problem of attribute compression for large-scale unstructured 3D point clouds. Through an in-depth exploration of the relationships between different encoding steps and different attribute channels, we introduce a deep compression network, termed 3DAC, to explicitly compress the attributes of 3D point clouds and reduce storage usage in this paper. Specifically, the point cloud attributes such as color and reflectance are firstly converted to transform coefficients. We then propose a deep entropy model to model the probabilities of these coefficients by considering information hidden in attribute transforms and previous encoded attributes. Finally, the estimated probabilities are used to further compress these transform coefficients to a final attributes bitstream. Extensive experiments conducted on both indoor and outdoor large-scale open point cloud datasets, including ScanNet and SemanticKITTI, demonstrated the superior compression rates and reconstruction quality of the proposed 3DAC.

preprint2022arXiv

4DAC: Learning Attribute Compression for Dynamic Point Clouds

With the development of the 3D data acquisition facilities, the increasing scale of acquired 3D point clouds poses a challenge to the existing data compression techniques. Although promising performance has been achieved in static point cloud compression, it remains under-explored and challenging to leverage temporal correlations within a point cloud sequence for effective dynamic point cloud compression. In this paper, we study the attribute (e.g., color) compression of dynamic point clouds and present a learning-based framework, termed 4DAC. To reduce temporal redundancy within data, we first build the 3D motion estimation and motion compensation modules with deep neural networks. Then, the attribute residuals produced by the motion compensation component are encoded by the region adaptive hierarchical transform into residual coefficients. In addition, we also propose a deep conditional entropy model to estimate the probability distribution of the transformed coefficients, by incorporating temporal context from consecutive point clouds and the motion estimation/compensation modules. Finally, the data stream is losslessly entropy coded with the predicted distribution. Extensive experiments on several public datasets demonstrate the superior compression performance of the proposed approach.

preprint2022arXiv

H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System

High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content at fine granularity. The acquisition of H2-Stereo video, however, remains challenging with commodity cameras. Existing spatial super-resolution or temporal frame interpolation methods provide compromised solutions that lack temporal or spatial details, respectively. To alleviate this problem, we propose a dual camera system, in which one camera captures high-spatial-resolution low-frame-rate (HSR-LFR) videos with rich spatial details, and the other captures low-spatial-resolution high-frame-rate (LSR-HFR) videos with smooth temporal details. We then devise a Learned Information Fusion network (LIFnet) that exploits the cross-camera redundancies to enhance both camera views to high spatiotemporal resolution (HSTR) for reconstructing the H2-Stereo video effectively. We utilize a disparity network to transfer spatiotemporal information across views even in large disparity scenes, based on which, we propose disparity-guided flow-based warping for LSR-HFR view and complementary warping for HSR-LFR view. A multi-scale fusion method in feature domain is proposed to minimize occlusion-induced warping ghosts and holes in HSR-LFR view. The LIFnet is trained in an end-to-end manner using our collected high-quality Stereo Video dataset from YouTube. Extensive experiments demonstrate that our model outperforms existing state-of-the-art methods for both views on synthetic data and camera-captured real data with large disparity. Ablation studies explore various aspects, including spatiotemporal resolution, camera baseline, camera desynchronization, long/short exposures and applications, of our system to fully understand its capability for potential applications.

preprint2022arXiv

No-Reference Point Cloud Quality Assessment via Domain Adaptation

We present a novel no-reference quality assessment metric, the image transferred point cloud quality assessment (IT-PCQA), for 3D point clouds. For quality assessment, deep neural network (DNN) has shown compelling performance on no-reference metric design. However, the most challenging issue for no-reference PCQA is that we lack large-scale subjective databases to drive robust networks. Our motivation is that the human visual system (HVS) is the decision-maker regardless of the type of media for quality assessment. Leveraging the rich subjective scores of the natural images, we can quest the evaluation criteria of human perception via DNN and transfer the capability of prediction to 3D point clouds. In particular, we treat natural images as the source domain and point clouds as the target domain, and infer point cloud quality via unsupervised adversarial domain adaptation. To extract effective latent features and minimize the domain discrepancy, we propose a hierarchical feature encoder and a conditional-discriminative network. Considering that the ultimate purpose is regressing objective score, we introduce a novel conditional cross entropy loss in the conditional-discriminative network to penalize the negative samples which hinder the convergence of the quality regression network. Experimental results show that the proposed method can achieve higher performance than traditional no-reference metrics, even comparable results with full-reference metrics. The proposed method also suggests the feasibility of assessing the quality of specific media content without the expensive and cumbersome subjective evaluations. Code is available at https://github.com/Qi-Yangsjtu/IT-PCQA.

preprint2022arXiv

Point Cloud Quality Assessment: Dataset Construction and Learning-based No-Reference Metric

Full-reference (FR) point cloud quality assessment (PCQA) has achieved impressive progress in recent years. However, in many cases, obtaining the reference point clouds is difficult, so no-reference (NR) metrics have become a research hotspot. Few researches about NR-PCQA are carried out due to the lack of a large-scale PCQA dataset. In this paper, we first build a large-scale PCQA dataset named LS-PCQA, which includes 104 reference point clouds and more than 22,000 distorted samples. In the dataset, each reference point cloud is augmented with 31 types of impairments (e.g., Gaussian noise, contrast distortion, local missing, and compression loss) at 7 distortion levels. Besides, each distorted point cloud is assigned with a pseudo quality score as its substitute of Mean Opinion Score (MOS). Inspired by the hierarchical perception system and considering the intrinsic attributes of point clouds, we propose a NR metric ResSCNN based on sparse convolutional neural network (CNN) to accurately estimate the subjective quality of point clouds. We conduct several experiments to evaluate the performance of the proposed NR metric. The results demonstrate that ResSCNN exhibits the state-of-the-art (SOTA) performance among all the existing NR-PCQA metrics and even outperforms some FR metrics. The dataset presented in this work will be made publicly accessible at http://smt.sjtu.edu.cn. The source code for the proposed ResSCNN can be found at https://github.com/lyp22/ResSCNN.

preprint2020arXiv

A Dual Camera System for High Spatiotemporal Resolution Video Acquisition

This paper presents a dual camera system for high spatiotemporal resolution (HSTR) video acquisition, where one camera shoots a video with high spatial resolution and low frame rate (HSR-LFR) and another one captures a low spatial resolution and high frame rate (LSR-HFR) video. Our main goal is to combine videos from LSR-HFR and HSR-LFR cameras to create an HSTR video. We propose an end-to-end learning framework, AWnet, mainly consisting of a FlowNet and a FusionNet that learn an adaptive weighting function in pixel domain to combine inputs in a frame recurrent fashion. To improve the reconstruction quality for cameras used in reality, we also introduce noise regularization under the same framework. Our method has demonstrated noticeable performance gains in terms of both objective PSNR measurement in simulation with different publicly available video and light-field datasets and subjective evaluation with real data captured by dual iPhone 7 and Grasshopper3 cameras. Ablation studies are further conducted to investigate and explore various aspects (such as reference structure, camera parallax, exposure time, etc) of our system to fully understand its capability for potential applications.