Researcher profile

Hao Shen

Hao Shen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2024arXiv

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction

We present HOI4D, a large-scale 4D egocentric dataset with rich annotations, to catalyze the research of category-level human-object interaction. HOI4D consists of 2.4M RGB-D egocentric video frames over 4000 sequences collected by 4 participants interacting with 800 different object instances from 16 categories over 610 different indoor rooms. Frame-wise annotations for panoptic segmentation, motion segmentation, 3D hand pose, category-level object pose and hand action have also been provided, together with reconstructed object meshes and scene point clouds. With HOI4D, we establish three benchmarking tasks to promote category-level HOI from 4D visual signals including semantic segmentation of 4D dynamic point cloud sequences, category-level object pose tracking, and egocentric action segmentation with diverse interaction targets. In-depth analysis shows HOI4D poses great challenges to existing methods and produces great research opportunities.

preprint2022arXiv

A stochastic analysis approach to lattice Yang--Mills at strong coupling

We develop a new stochastic analysis approach to the lattice Yang--Mills model at strong coupling in any dimension $d>1$, with t&#39; Hooft scaling $βN$ for the inverse coupling strength. We study their Langevin dynamics, ergodicity, functional inequalities, large $N$ limits, and mass gap. Assuming $|β| < \frac{N-2}{32(d-1)N}$ for the structure group $SO(N)$, or $|β| < \frac{1}{16(d-1)}$ for $SU(N)$, we prove the following results. The invariant measure for the corresponding Langevin dynamic is unique on the entire lattice, and the dynamic is exponentially ergodic under a Wasserstein distance. The finite volume Yang--Mills measures converge to this unique invariant measure in the infinite volume limit, for which Log-Sobolev and Poincaré inequalities hold. These functional inequalities imply that the suitably rescaled Wilson loops for the infinite volume measure has factorized correlations and converges in probability to deterministic limits in the large $N$ limit, and correlations of a large class of observables decay exponentially, namely the infinite volume measure has a strictly positive mass gap. Our method improves earlier results or simplifies the proofs, and provides some new perspectives to the study of lattice Yang--Mills model.

preprint2022arXiv

A stochastic PDE approach to large N problems in quantum field theory: a survey

In this survey we review some recent rigorous results on large N problems in quantum field theory, stochastic quantization and singular stochastic PDEs, and their mean field limit problems. In particular we discuss the O(N) linear sigma model on two and three dimensional torus. The stochastic quantization procedure leads to a coupled system of N interacting $Φ^4$ equations. In d = 2, we show uniform in N bounds for the dynamics and convergence to a mean-field singular SPDE. For large enough mass or small enough coupling, the invariant measures (i.e. the O(N) linear sigma model) converge to the massive Gaussian free field, the unique invariant measure of the mean-field dynamics, in a Wasserstein distance. We also obtain tightness for certain O(N) invariant observables as random fields in suitable Besov spaces as $N\to \infty$, along with exact descriptions of the limiting correlations. In d = 3, the estimates become more involved since the equation is more singular. We discuss in this case how to prove convergence to the massive Gaussian free field. The proofs of these results build on the recent progress of singular SPDE theory and combine many new techniques such as uniform in N estimates and dynamical mean field theory. These are based on joint papers with Scott Smith, Rongchan Zhu and Xiangchan Zhu.

preprint2022arXiv

Analysis and Optimisation of Bellman Residual Errors with Neural Function Approximation

Recent development of Deep Reinforcement Learning (DRL) has demonstrated superior performance of neural networks in solving challenging problems with large or even continuous state spaces. One specific approach is to deploy neural networks to approximate value functions by minimising the Mean Squared Bellman Error (MSBE) function. Despite great successes of DRL, development of reliable and efficient numerical algorithms to minimise the MSBE is still of great scientific interest and practical demand. Such a challenge is partially due to the underlying optimisation problem being highly non-convex or using incomplete gradient information as done in Semi-Gradient algorithms. In this work, we analyse the MSBE from a smooth optimisation perspective and develop an efficient Approximate Newton&#39;s algorithm. First, we conduct a critical point analysis of the error function and provide technical insights on optimisation and design choices for neural networks. When the existence of global minima is assumed and the objective fulfils certain conditions, suboptimal local minima can be avoided when using over-parametrised neural networks. We construct a Gauss Newton Residual Gradient algorithm based on the analysis in two variations. The first variation applies to discrete state spaces and exact learning. We confirm theoretical properties of this algorithm such as being locally quadratically convergent to a global minimum numerically. The second employs sampling and can be used in the continuous setting. We demonstrate feasibility and generalisation capabilities of the proposed algorithm empirically using continuous control problems and provide a numerical verification of our critical point analysis. We outline the difficulties of combining Semi-Gradient approaches with Hessian information. To benefit from second-order information complete derivatives of the MSBE must be considered during training.

preprint2022arXiv

Large $N$ limit of the $O(N)$ linear sigma model in 3D

In this paper we study the large N limit of the $O(N)$-invariant linear sigma model, which is a vector-valued generalization of the $Φ^4$ quantum field theory, on the three dimensional torus. We study the problem via its stochastic quantization, which yields a coupled system of N interacting SPDEs. We prove tightness of the invariant measures in the large N limit. For large enough mass or small enough coupling constant, they converge to the (massive) Gaussian free field at a rate of order $1/\sqrt N$ with respect to the Wasserstein distance. We also obtain tightness results for certain $O(N)$ invariant observables. These generalize some of the results in \cite{SSZZ20} from two dimensions to three dimensions. The proof leverages the method recently developed by \cite{GH18} and combines many new techniques such as uniform in $N$ estimates on perturbative objects as well as the solutions.

preprint2022arXiv

Learning Category-Level Generalizable Object Manipulation Policy via Generative Adversarial Self-Imitation Learning from Demonstrations

Generalizable object manipulation skills are critical for intelligent and multi-functional robots to work in real-world complex scenes. Despite the recent progress in reinforcement learning, it is still very challenging to learn a generalizable manipulation policy that can handle a category of geometrically diverse articulated objects. In this work, we tackle this category-level object manipulation policy learning problem via imitation learning in a task-agnostic manner, where we assume no handcrafted dense rewards but only a terminal reward. Given this novel and challenging generalizable policy learning problem, we identify several key issues that can fail the previous imitation learning algorithms and hinder the generalization to unseen instances. We then propose several general but critical techniques, including generative adversarial self-imitation learning from demonstrations, progressive growing of discriminator, and instance-balancing for expert buffer, that accurately pinpoints and tackles these issues and can benefit category-level manipulation policy learning regardless of the tasks. Our experiments on ManiSkill benchmarks demonstrate a remarkable improvement on all tasks and our ablation studies further validate the contribution of each proposed technique.

preprint2022arXiv

Learning from Attacks: Attacking Variational Autoencoder for Improving Image Classification

Adversarial attacks are often considered as threats to the robustness of Deep Neural Networks (DNNs). Various defending techniques have been developed to mitigate the potential negative impact of adversarial attacks against task predictions. This work analyzes adversarial attacks from a different perspective. Namely, adversarial examples contain implicit information that is useful to the predictions i.e., image classification, and treat the adversarial attacks against DNNs for data self-expression as extracted abstract representations that are capable of facilitating specific learning tasks. We propose an algorithmic framework that leverages the advantages of the DNNs for data self-expression and task-specific predictions, to improve image classification. The framework jointly learns a DNN for attacking Variational Autoencoder (VAE) networks and a DNN for classification, coined as Attacking VAE for Improve Classification (AVIC). The experiment results show that AVIC can achieve higher accuracy on standard datasets compared to the training with clean examples and the traditional adversarial training.

preprint2021arXiv

Large $N$ Limit of the $O(N)$ Linear Sigma Model via Stochastic Quantization

This article studies large $N$ limits of a coupled system of $N$ interacting $Φ^4$ equations posed over $\mathbb{T}^{d}$ for $d=2$, known as the $O(N)$ linear sigma model. Uniform in $N$ bounds on the dynamics are established, allowing us to show convergence to a mean-field singular SPDE, also proved to be globally well-posed. Moreover, we show tightness of the invariant measures in the large $N$ limit. For large enough mass, they converge to the (massive) Gaussian free field, the unique invariant measure of the mean-field dynamics, at a rate of order $1/\sqrt{N}$ with respect to the Wasserstein distance. We also consider fluctuations and obtain tightness results for certain $O(N)$ invariant observables, along with an exact description of the limiting correlations.

preprint2021arXiv

Stochastic Ricci Flow on Compact Surfaces

In this paper we introduce the stochastic Ricci flow (SRF) in two spatial dimensions. The flow is symmetric with respect to a measure induced by Liouville Conformal Field Theory. Using the theory of Dirichlet forms, we construct a weak solution to the associated equation of the area measure on a flat torus, in the full &#34;$L^1$ regime&#34; $σ< σ_{L^1}=2\sqrtπ$ where $σ$ is the noise strength. We also describe the main necessary modifications needed for the SRF on general compact surfaces, and list some open questions.

preprint2020arXiv

3D Scene Geometry-Aware Constraint for Camera Localization with Deep Learning

Camera localization is a fundamental and key component of autonomous driving vehicles and mobile robots to localize themselves globally for further environment perception, path planning and motion control. Recently end-to-end approaches based on convolutional neural network have been much studied to achieve or even exceed 3D-geometry based traditional methods. In this work, we propose a compact network for absolute camera pose regression. Inspired from those traditional methods, a 3D scene geometry-aware constraint is also introduced by exploiting all available information including motion, depth and image contents. We add this constraint as a regularization term to our proposed network by defining a pixel-level photometric loss and an image-level structural similarity loss. To benchmark our method, different challenging scenes including indoor and outdoor environment are tested with our proposed approach and state-of-the-arts. And the experimental results demonstrate significant performance improvement of our method on both prediction accuracy and convergence efficiency.

preprint2020arXiv

CenterMask: single shot instance segmentation with point representation

In this paper, we propose a single-shot instance segmentation method, which is simple, fast and accurate. There are two main challenges for one-stage instance segmentation: object instances differentiation and pixel-wise feature alignment. Accordingly, we decompose the instance segmentation into two parallel subtasks: Local Shape prediction that separates instances even in overlapping conditions, and Global Saliency generation that segments the whole image in a pixel-to-pixel manner. The outputs of the two branches are assembled to form the final instance masks. To realize that, the local shape information is adopted from the representation of object center points. Totally trained from scratch and without any bells and whistles, the proposed CenterMask achieves 34.5 mask AP with a speed of 12.3 fps, using a single-model with single-scale training/testing on the challenging COCO dataset. The accuracy is higher than all other one-stage instance segmentation methods except the 5 times slower TensorMask, which shows the effectiveness of CenterMask. Besides, our method can be easily embedded to other one-stage object detectors such as FCOS and performs well, showing the generalization of CenterMask.

preprint2020arXiv

Dynamic Variational Autoencoders for Visual Process Modeling

This work studies the problem of modeling visual processes by leveraging deep generative architectures for learning linear, Gaussian representations from observed sequences. We propose a joint learning framework, combining a vector autoregressive model and Variational Autoencoders. This results in an architecture that allows Variational Autoencoders to simultaneously learn a non-linear observation as well as a linear state model from sequences of frames. We validate our approach on artificial sequences and dynamic textures.