Source author record

Pan Gao

Pan Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Multimedia cond-mat.str-el Information Theory math.IT quant-ph

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Dynamic Local Feature Aggregation for Learning on Point Clouds

Existing point cloud learning methods aggregate features from neighbouring points relying on constructing graph in the spatial domain, which results in feature update for each point based on spatially-fixed neighbours throughout layers. In this paper, we propose a dynamic feature aggregation (DFA) method that can transfer information by constructing local graphs in the feature domain without spatial constraints. By finding k-nearest neighbors in the feature domain, we perform relative position encoding and semantic feature encoding to explore latent position and feature similarity information, respectively, so that rich local features can be learned. At the same time, we also learn low-dimensional global features from the original point cloud for enhancing feature representation. Between DFA layers, we dynamically update the constructed local graph structure, so that we can learn richer information, which greatly improves adaptability and efficiency. We demonstrate the superiority of our method by conducting extensive experiments on point cloud classification and segmentation tasks. Implementation code is available: https://github.com/jiamang/DFA.

preprint2022arXiv

Dilated convolutional neural network-based deep reference picture generation for video compression

Motion estimation and motion compensation are indispensable parts of inter prediction in video coding. Since the motion vector of objects is mostly in fractional pixel units, original reference pictures may not accurately provide a suitable reference for motion compensation. In this paper, we propose a deep reference picture generator which can create a picture that is more relevant to the current encoding frame, thereby further reducing temporal redundancy and improving video compression efficiency. Inspired by the recent progress of Convolutional Neural Network(CNN), this paper proposes to use a dilated CNN to build the generator. Moreover, we insert the generated deep picture into Versatile Video Coding(VVC) as a reference picture and perform a comprehensive set of experiments to evaluate the effectiveness of our network on the latest VVC Test Model VTM. The experimental results demonstrate that our proposed method achieves on average 9.7% bit saving compared with VVC under low-delay P configuration.

preprint2022arXiv

IPDAE: Improved Patch-Based Deep Autoencoder for Lossy Point Cloud Geometry Compression

Point cloud is a crucial representation of 3D contents, which has been widely used in many areas such as virtual reality, mixed reality, autonomous driving, etc. With the boost of the number of points in the data, how to efficiently compress point cloud becomes a challenging problem. In this paper, we propose a set of significant improvements to patch-based point cloud compression, i.e., a learnable context model for entropy coding, octree coding for sampling centroid points, and an integrated compression and training process. In addition, we propose an adversarial network to improve the uniformity of points during reconstruction. Our experiments show that the improved patch-based autoencoder outperforms the state-of-the-art in terms of rate-distortion performance, on both sparse and large-scale point clouds. More importantly, our method can maintain a short compression time while ensuring the reconstruction quality.

preprint2022arXiv

SSformer: A Lightweight Transformer for Semantic Segmentation

It is well believed that Transformer performs better in semantic segmentation compared to convolutional neural networks. Nevertheless, the original Vision Transformer may lack of inductive biases of local neighborhoods and possess a high time complexity. Recently, Swin Transformer sets a new record in various vision tasks by using hierarchical architecture and shifted windows while being more efficient. However, as Swin Transformer is specifically designed for image classification, it may achieve suboptimal performance on dense prediction-based segmentation task. Further, simply combing Swin Transformer with existing methods would lead to the boost of model size and parameters for the final segmentation model. In this paper, we rethink the Swin Transformer for semantic segmentation, and design a lightweight yet effective transformer model, called SSformer. In this model, considering the inherent hierarchical design of Swin Transformer, we propose a decoder to aggregate information from different layers, thus obtaining both local and global attentions. Experimental results show the proposed SSformer yields comparable mIoU performance with state-of-the-art models, while maintaining a smaller model size and lower compute.

preprint2022arXiv

Video Frame Interpolation Based on Deformable Kernel Region

Video frame interpolation task has recently become more and more prevalent in the computer vision field. At present, a number of researches based on deep learning have achieved great success. Most of them are either based on optical flow information, or interpolation kernel, or a combination of these two methods. However, these methods have ignored that there are grid restrictions on the position of kernel region during synthesizing each target pixel. These limitations result in that they cannot well adapt to the irregularity of object shape and uncertainty of motion, which may lead to irrelevant reference pixels used for interpolation. In order to solve this problem, we revisit the deformable convolution for video interpolation, which can break the fixed grid restrictions on the kernel region, making the distribution of reference points more suitable for the shape of the object, and thus warp a more accurate interpolation frame. Experiments are conducted on four datasets to demonstrate the superior performance of the proposed model in comparison to the state-of-the-art alternatives.

preprint2022arXiv

Video-based Smoky Vehicle Detection with A Coarse-to-Fine Framework

Automatic smoky vehicle detection in videos is a superior solution to the traditional expensive remote sensing one with ultraviolet-infrared light devices for environmental protection agencies. However, it is challenging to distinguish vehicle smoke from shadow and wet regions coming from rear vehicle or clutter roads, and could be worse due to limited annotated data. In this paper, we first introduce a real-world large-scale smoky vehicle dataset with 75,000 annotated smoky vehicle images, facilitating the effective training of advanced deep learning models. To enable fair algorithm comparison, we also build a smoky vehicle video dataset including 163 long videos with segment-level annotations. Moreover, we present a new Coarse-to-fine Deep Smoky vehicle detection (CoDeS) framework for efficient smoky vehicle detection. The CoDeS first leverages a light-weight YOLO detector for fast smoke detection with high recall rate, and then applies a smoke-vehicle matching strategy to eliminate non-vehicle smoke, and finally uses a elaborately-designed 3D model to further refine the results in spatial temporal space. Extensive experiments in four metrics demonstrate that our framework is significantly superior to those hand-crafted feature based methods and recent advanced methods. The code and dataset will be released at https://github.com/pengxj/smokyvehicle.

preprint2021arXiv

Optimizing a Polynomial Function on a Quantum Simulator

Gradient descent method, as one of the major methods in numerical optimization, is the key ingredient in many machine learning algorithms. As one of the most fundamental way to solve the optimization problems, it promises the function value to move along the direction of steepest descent. For the vast resource consumption when dealing with high-dimensional problems, a quantum version of this iterative optimization algorithm has been proposed recently[arXiv:1612.01789]. Here, we develop this protocol and implement it on a quantum simulator with limited resource. Moreover, a prototypical experiment was shown with a 4-qubit Nuclear Magnetic Resonance quantum processor, demonstrating a optimization process of polynomial function iteratively. In each iteration, we achieved an average fidelity of 94\% compared with theoretical calculation via full-state tomography. In particular, the iterative point gradually converged to the local minimum. We apply our method to multidimensional scaling problem, further showing the potentially capability to yields an exponentially improvement compared with classical counterparts. With the onrushing tendency of quantum information, our work could provide a subroutine for the application of future practical quantum computers.

preprint2013arXiv

Numerical study of magnetic and pairing correlation in bilayer triangular lattice

By using the determinant Quantum Monte Carlo method, the magnetic and pairing correlation of the Na$_{x}$CoO$_{2}\cdot$yH$_{2}$O system are studied within the Hubbard model on a bilayer triangular lattice. The temperature dependence of spin correlation function and pairing susceptibility with several kinds of symmetries at different electron fillings and inter layer coupling terms are investigated. It is found that the system shows an antiferromagnetic correlation around the half filling, and the $fn$-wave pairing correlation dominates over other kinds of pairing symmetry in the low doping region. As the electron filling decreases away from the half filling, both the ferromagnetic correlation and the $f$-wave paring susceptibility are enhanced and tend to dominate. It is also shown that both the magnetic susceptibility and paring susceptibility decrease as the inter layer coupling increases.

Pan Gao

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Dynamic Local Feature Aggregation for Learning on Point Clouds

Dilated convolutional neural network-based deep reference picture generation for video compression

IPDAE: Improved Patch-Based Deep Autoencoder for Lossy Point Cloud Geometry Compression

SSformer: A Lightweight Transformer for Semantic Segmentation

Video Frame Interpolation Based on Deformable Kernel Region

Video-based Smoky Vehicle Detection with A Coarse-to-Fine Framework

Optimizing a Polynomial Function on a Quantum Simulator

Numerical study of magnetic and pairing correlation in bilayer triangular lattice