Source author record

Guangming Wang

Guangming Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci Robotics physics.chem-ph physics.comp-ph cond-mat.str-el Human-Computer Interaction Information Retrieval

Catalog footprint

What is connected

20works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

From Trajectories to Phenotypes: Disease Progression as Structural Priors for Multi-organ Imaging Representation Learning

Imaging-derived phenotypes (IDPs) summarize multi-organ physiology but provide only static snapshots of diseases that evolve over time. In contrast, longitudinal electronic health records encode disease trajectories through temporal dependencies among past diagnosis events and comorbidity structure. We hypothesize that IDPs and disease trajectories contain partially shared disease-relevant structure. We propose a trajectory-aware distillation framework that transfers structural knowledge from a generative disease trajectory Transformer into an organ-wise IDP encoder. A population-scale trajectory model trained on longitudinal diagnosis sequences produces subject-level embeddings that supervise IDP representation learning via geometry-preserving alignment. During downstream prediction, trajectory and imaging representations can also be fused via cross-attention. Across 159 diseases in the UK Biobank cohort, trajectory-aware pretraining consistently improves both discrimination (AUC) and time-to-onset prediction (MAE), with the largest gains for low-prevalence diseases. Similarity relationships in IDP embedding space also align with those in trajectory space, providing supportive evidence for partially aligned representation geometry. These results suggest that population-scale generative disease models can serve as structural priors for data-limited imaging modalities, improving robustness under realistic cohort constraints.

preprint2026arXiv

Person Parametric Physics-informed Representation for mmWave-based Human Pose Estimation

Millimeter-wave (mmWave) radar enables privacy-preserving, illumination-invariant Human Pose Estimation (HPE). However, current mmWave-based HPE systems face a signal-noise dilemma: Heatmaps retain human reflections but embed environmental clutter, while Point Clouds (PC) suppress noise through aggressive thresholding but discard informative human reflections, limiting robustness across environments and radar configurations. To address this intrinsic bottleneck, we introduce Person Parametric Physics-informed Representation (PPPR), a physics-informed parametric intermediate representation that replaces purely signal-level encodings with human-centric parameterization. PPPR models each human joint as a Gaussian primitive encoding both kinematic properties, which include position, velocity, orientation, and electromagnetic properties, which include scattering intensity and Doppler signature. These parameters enable optimization through a dual-constraint process: kinematic objectives enforce biomechanical consistency to suppress spatial artifacts, while electromagnetic objectives ensure adherence to mmWave propagation physics, decoupling input representations from non-human noise. Experiments across three mmWave-based HPE datasets with four HPE models demonstrate that replacing conventional inputs with PPPR consistently yields substantial accuracy gains. Furthermore, cross-scenes and cross-datasets experiments confirm PPPR's noise decoupling capability: models trained with PPPR maintain stable performance across diverse furniture arrangements and different radar chipsets, demonstrating its promising generalization capability in the challenging cross-dataset settings. Code will be released upon publication.

preprint2026arXiv

Robot Learning from Human Videos: A Survey

A critical bottleneck hindering further advancement in embodied AI and robotics is the challenge of scaling robot data. To address this, the field of learning robot manipulation skills from human video data has attracted rapidly growing attention in recent years, driven by the abundance of human activity videos and advances in computer vision. This line of research promises to enable robots to acquire skills passively from the vast and readily available resource of human demonstrations, substantially favoring scalable learning for generalist robotic systems. Therefore, we present this survey to provide a comprehensive and up-to-date review of human-video-based learning techniques in robotics, focusing on both human-robot skill transfer and data foundations. We first review the policy learning foundations in robotics, and then describe the fundamental interfaces to incorporate human videos. Subsequently, we introduce a hierarchical taxonomy of transferring human videos to robot skills, covering task-, observation-, and action-oriented pathways, along with a cross-family analysis of their couplings with different data configurations and learning paradigms. In addition, we investigate the data foundations including widely-used human video datasets and video generation schemes, and provide large-scale statistical trends in dataset development and utilization. Ultimately, we emphasize the challenges and limitations intrinsic to this field, and delineate potential avenues for future research. The paper list of our survey is available at https://github.com/IRMVLab/awesome-robot-learning-from-human-videos.

preprint2022arXiv

3D Hierarchical Refinement and Augmentation for Unsupervised Learning of Depth and Pose from Monocular Video

Depth and ego-motion estimations are essential for the localization and navigation of autonomous robots and autonomous driving. Recent studies make it possible to learn the per-pixel depth and ego-motion from the unlabeled monocular video. A novel unsupervised training framework is proposed with 3D hierarchical refinement and augmentation using explicit 3D geometry. In this framework, the depth and pose estimations are hierarchically and mutually coupled to refine the estimated pose layer by layer. The intermediate view image is proposed and synthesized by warping the pixels in an image with the estimated depth and coarse pose. Then, the residual pose transformation can be estimated from the new view image and the image of the adjacent frame to refine the coarse pose. The iterative refinement is implemented in a differentiable manner in this paper, making the whole framework optimized uniformly. Meanwhile, a new image augmentation method is proposed for the pose estimation by synthesizing a new view image, which creatively augments the pose in 3D space but gets a new augmented 2D image. The experiments on KITTI demonstrate that our depth estimation achieves state-of-the-art performance and even surpasses recent approaches that utilize other auxiliary tasks. Our visual odometry outperforms all recent unsupervised monocular learning-based methods and achieves competitive performance to the geometry-based method, ORB-SLAM2 with back-end optimization.

preprint2022arXiv

A new generation of effective core potentials from correlated and spin-orbit calculations: selected heavy elements

We introduce new correlation consistent effective core potentials (ccECPs) for the elements I, Te, Bi, Ag, Au, Pd, Ir, Mo, and W with $4d$, $5d$, $6s$ and $6p$ valence spaces. These ccECPs are given as a sum of spin-orbit averaged relativistic effective potential (AREP) and effective spin-orbit (SO) terms. The construction involves several steps with increasing refinements from more simple to fully correlated methods. The optimizations are carried out with objective functions that include weighted many-body atomic spectra, norm-conservation criteria, and spin-orbit splittings. Transferability tests involve molecular binding curves of corresponding hydride and oxide dimers. The constructed ccECPs are systematically better and in a few cases on par with previous effective core potential (ECP) tables on all tested criteria and provide a significant increase in accuracy for valence-only calculations with these elements. Our study confirms the importance of the AREP part in determining the overall quality of the ECP even in the presence of sizable spin-orbit effects. The subsequent quantum Monte Carlo (QMC) calculations point out the importance of accurate trial wave functions which in some cases (mid series transition elements) require treatment well beyond single-reference.

preprint2022arXiv

DetFlowTrack: 3D Multi-object Tracking based on Simultaneous Optimization of Object Detection and Scene Flow Estimation

3D Multi-Object Tracking (MOT) is an important part of the unmanned vehicle perception module. Most methods optimize object detection and data association independently. These methods make the network structure complicated and limit the improvement of MOT accuracy. we proposed a 3D MOT framework based on simultaneous optimization of object detection and scene flow estimation. In the framework, a detection-guidance scene flow module is proposed to relieve the problem of incorrect inter-frame assocation. For more accurate scene flow label especially in the case of motion with rotation, a box-transformation-based scene flow ground truth calculation method is proposed. Experimental results on the KITTI MOT dataset show competitive results over the state-of-the-arts and the robustness under extreme motion with rotation.

preprint2022arXiv

Efficient 3D Deep LiDAR Odometry

An efficient 3D point cloud learning architecture, named EfficientLO-Net, for LiDAR odometry is first proposed in this paper. In this architecture, the projection-aware representation of the 3D point cloud is proposed to organize the raw 3D point cloud into an ordered data form to achieve efficiency. The Pyramid, Warping, and Cost volume (PWC) structure for the LiDAR odometry task is built to estimate and refine the pose in a coarse-to-fine approach. A projection-aware attentive cost volume is built to directly associate two discrete point clouds and obtain embedding motion patterns. Then, a trainable embedding mask is proposed to weigh the local motion patterns to regress the overall pose and filter outlier points. The trainable pose warp-refinement module is iteratively used with embedding mask optimized hierarchically to make the pose estimation more robust for outliers. The entire architecture is holistically optimized end-to-end to achieve adaptive learning of cost volume and mask, and all operations involving point cloud sampling and grouping are accelerated by projection-aware 3D feature learning methods. The superior performance and effectiveness of our LiDAR odometry architecture are demonstrated on KITTI, M2DGR, and Argoverse datasets. Our method outperforms all recent learning-based methods and even the geometry-based approach, LOAM with mapping optimization, on most sequences of KITTI odometry dataset. We open sourced our codes at: https://github.com/IRMVLab/EfficientLO-Net.

preprint2022arXiv

Electronic structure of $\boldsymbolα$-RuCl$_3$ by fixed-node and fixed-phase diffusion Monte Carlo methods

Layered material $α$-RuCl$_3$ has caught wide attention due to its possible realization of Kitaev's spin liquid and its electronic structure that involves the interplay of electron-electron correlations and spin-orbit effects. Several DFT$+U$ studies have suggested that both electron-electron correlations and spin-orbit effects are crucial for accurately describing the band gap. This work studies the importance of these two effects using fixed-node and fixed-phase diffusion Monte Carlo calculations both in spin-averaged and explicit spin-orbit formalisms. In the latter, the Slater-Jastrow trial function is constructed from two-component spin-orbitals using our recent quantum Monte Carlo (QMC) developments and thoroughly tested effective core potentials. Our results show that the gap in the ideal crystal is already accurately described by the spin-averaged case, with the dominant role being played by the magnetic ground state with significant exchange and electron correlation effects. We find qualitative agreement between hybrid DFT, DFT+$U$, and QMC. In addition, QMC results agree very well with available experiments, and we identify the values of exact Fock exchange mixing that provide comparable gaps. Explicit spin-orbit QMC calculations reveal that the effect of spin-orbit coupling on the gap is minor, of the order of 0.2 eV, which corresponds to the strength of the spin-orbit of the Ru atom.

preprint2022arXiv

Interactive Multi-scale Fusion of 2D and 3D Features for Multi-object Tracking

Multiple object tracking (MOT) is a significant task in achieving autonomous driving. Traditional works attempt to complete this task, either based on point clouds (PC) collected by LiDAR, or based on images captured from cameras. However, relying on one single sensor is not robust enough, because it might fail during the tracking process. On the other hand, feature fusion from multiple modalities contributes to the improvement of accuracy. As a result, new techniques based on different sensors integrating features from multiple modalities are being developed. Texture information from RGB cameras and 3D structure information from Lidar have respective advantages under different circumstances. However, it's not easy to achieve effective feature fusion because of completely distinct information modalities. Previous fusion methods usually fuse the top-level features after the backbones extract the features from different modalities. In this paper, we first introduce PointNet++ to obtain multi-scale deep representations of point cloud to make it adaptive to our proposed Interactive Feature Fusion between multi-scale features of images and point clouds. Specifically, through multi-scale interactive query and fusion between pixel-level and point-level features, our method, can obtain more distinguishing features to improve the performance of multiple object tracking. Besides, we explore the effectiveness of pre-training on each single modality and fine-tuning on the fusion-based model. The experimental results demonstrate that our method can achieve good performance on the KITTI benchmark and outperform other approaches without using multi-scale feature fusion. Moreover, the ablation studies indicates the effectiveness of multi-scale feature fusion and pre-training on single modality.

preprint2022arXiv

Magnetic measures of purity for MnBi$_2$Te$_4$

The intrinsically anti-ferromagnetic topological insulator, MnBi$_2$Te$_4$ (MBT), has garnered significant attention recently. The excitement for this layered van der Waals bonded compound stems from its potential to host numerous exotic topological quantum states. For instance, quantum anomalous Hall states are predicted for odd-layer compounds and axion insulator states for the even-layer compounds. Unfortunately, the realization of these phenomena has been hindered by experimental challenges such as the existence of negative charge carriers, i.e., electron doping, which have been linked to anti-site defects among the Mn and Bi sub-lattices. Based on high level diffusion Monte Carlo (DMC) and DMC-tuned DFT+U calculations, we provide benchmark quality results for the bulk Mn magnetization as well as for Mn$_{Bi}$ and Bi$_{Mn}$ defects. We use this information to refine and extend models that estimate the anti-site defect concentration in actual MBT samples when combined with data from magnetic susceptibility and intermediate field magnetization measurements. Our models are validated through favorable comparison with prior experimental studies that obtained both magnetic and site occupancy data. We then extend our estimates to a larger set of prior samples to identify a probable zone of low defect density that has yet to be reached in synthesis. We anticipate our theoretically based magnetic purity measures may be used as minimization targets in the cycle of refinement needed to synthesize MBT samples with low anti-site defect concentrations and more reproducible topological properties.

preprint2022arXiv

Motion Projection Consistency Based 3D Human Pose Estimation with Virtual Bones from Monocular Videos

Real-time 3D human pose estimation is crucial for human-computer interaction. It is cheap and practical to estimate 3D human pose only from monocular video. However, recent bone splicing based 3D human pose estimation method brings about the problem of cumulative error. In this paper, the concept of virtual bones is proposed to solve such a challenge. The virtual bones are imaginary bones between non-adjacent joints. They do not exist in reality, but they bring new loop constraints for the estimation of 3D human joints. The proposed network in this paper predicts real bones and virtual bones, simultaneously. The final length of real bones is constrained and learned by the loop constructed by the predicted real bones and virtual bones. Besides, the motion constraints of joints in consecutive frames are considered. The consistency between the 2D projected position displacement predicted by the network and the captured real 2D displacement by the camera is proposed as a new projection consistency loss for the learning of 3D human pose. The experiments on the Human3.6M dataset demonstrate the good performance of the proposed method. Ablation studies demonstrate the effectiveness of the proposed inter-frame projection consistency constraints and intra-frame loop constraints.

preprint2022arXiv

Origin of Metal-Insulator Transitions in Correlated Perovskite Metals

The mechanisms that drive metal-to-insulator transitions (MIT) in correlated solids are not fully understood. For example, the perovskite (PV) SrCoO3 is a FM metal while the oxygen-deficient (n-doped) brownmillerite (BM) SrCoO2.5 is an anti-ferromagnetic (AFM) insulator. Given the magnetic and structural transitions that accompany the MIT, the driver for such a MIT transition is unclear. We also observe that the perovskite metals LaNiO3, SrFeO3, and SrCoO3 also undergo MIT when n-doped via high-to-low valence compositional changes. Also, pressurizing the insulating BM SrCoO2.5 phase, drives a gap closing. Using DFT and correlated diffusion Monte Carlo approaches we demonstrate that the ABO3 perovskites most prone to MIT are self hole-doped materials, reminiscent of a negative charge-transfer system. Upon n-doping away from the negative-charge transfer metallic phase, an underlying charge-lattice (or e-phonon) coupling drives the system to a bond-disproportionated gapped state, thereby achieving ligand hole passivation at certain sites only, leading to charge-disproportionated states. The size of the gap opened is correlated with the size of the hole-filling at these ligand sites. This suggests that the interactions driving the gap opening to realize a MIT even in correlated metals is the charge-transfer energy, but it couples with the underlying phonons to enable the transition to the insulating phase. Other orderings (magnetic, charge, etc.) driven by weaker interactions are secondary and may assist gap openings at small dopings, but its the charge-transfer energy that predominantly determines the bandgap, with a negative energy preferring the metallic phase. This n-doping can be achieved by modulations in stoichiometry or composition or pressure. Hence, controlling the amount of the ligand-hole is key in controlling MIT. We compare our predictions to experiments where possible.

preprint2022arXiv

Residual 3D Scene Flow Learning with Context-Aware Feature Extraction

Scene flow estimation is the task to predict the point-wise or pixel-wise 3D displacement vector between two consecutive frames of point clouds or images, which has important application in fields such as service robots and autonomous driving. Although many previous works have explored greatly on scene flow estimation based on point clouds, there are two problems that have not been noticed or well solved before: 1) Points of adjacent frames in repetitive patterns may be wrongly associated due to similar spatial structure in their neighbourhoods; 2) Scene flow between adjacent frames of point clouds with long-distance movement may be inaccurately estimated. To solve the first problem, a novel context-aware set convolution layer is proposed in this paper to exploit contextual structure information of Euclidean space and learn soft aggregation weights for local point features. This design is inspired by human perception of contextual structure information during scene understanding with repetitive patterns. The context-aware set convolution layer is incorporated in a context-aware point feature pyramid module of 3D point clouds for scene flow estimation. For the second problem, an explicit residual flow learning structure is proposed in the residual flow refinement layer to cope with long-distance movement. The experiments and ablation study on FlyingThings3D and KITTI scene flow datasets demonstrate the effectiveness of each proposed component. The qualitative results show that the problems of ambiguous inter-frame association and long-distance movement estimation are well handled. Quantitative results on both FlyingThings3D and KITTI scene flow datasets show that the proposed method achieves state-of-the-art performance, surpassing all other previous works to the best of our knowledge by at least 25%.

preprint2022arXiv

Unsupervised Learning of 3D Scene Flow from Monocular Camera

Scene flow represents the motion of points in the 3D space, which is the counterpart of the optical flow that represents the motion of pixels in the 2D image. However, it is difficult to obtain the ground truth of scene flow in the real scenes, and recent studies are based on synthetic data for training. Therefore, how to train a scene flow network with unsupervised methods based on real-world data shows crucial significance. A novel unsupervised learning method for scene flow is proposed in this paper, which utilizes the images of two consecutive frames taken by monocular camera without the ground truth of scene flow for training. Our method realizes the goal that training scene flow network with real-world data, which bridges the gap between training data and test data and broadens the scope of available data for training. Unsupervised learning of scene flow in this paper mainly consists of two parts: (i) depth estimation and camera pose estimation, and (ii) scene flow estimation based on four different loss functions. Depth estimation and camera pose estimation obtain the depth maps and camera pose between two consecutive frames, which provide further information for the next scene flow estimation. After that, we used depth consistency loss, dynamic-static consistency loss, Chamfer loss, and Laplacian regularization loss to carry out unsupervised training of the scene flow network. To our knowledge, this is the first paper that realizes the unsupervised learning of 3D scene flow from monocular camera. The experiment results on KITTI show that our method for unsupervised learning of scene flow meets great performance compared to traditional methods Iterative Closest Point (ICP) and Fast Global Registration (FGR). The source code is available at: https://github.com/IRMVLab/3DUnMonoFlow.

preprint2022arXiv

Unsupervised Learning of 3D Scene Flow with 3D Odometry Assistance

Scene flow represents the 3D motion of each point in the scene, which explicitly describes the distance and the direction of each point's movement. Scene flow estimation is used in various applications such as autonomous driving fields, activity recognition, and virtual reality fields. As it is challenging to annotate scene flow with ground truth for real-world data, this leaves no real-world dataset available to provide a large amount of data with ground truth for scene flow estimation. Therefore, many works use synthesized data to pre-train their network and real-world LiDAR data to finetune. Unlike the previous unsupervised learning of scene flow in point clouds, we propose to use odometry information to assist the unsupervised learning of scene flow and use real-world LiDAR data to train our network. Supervised odometry provides more accurate shared cost volume for scene flow. In addition, the proposed network has mask-weighted warp layers to get a more accurate predicted point cloud. The warp operation means applying an estimated pose transformation or scene flow to a source point cloud to obtain a predicted point cloud and is the key to refining scene flow from coarse to fine. When performing warp operations, the points in different states use different weights for the pose transformation and scene flow transformation. We classify the states of points as static, dynamic, and occluded, where the static masks are used to divide static and dynamic points, and the occlusion masks are used to divide occluded points. The mask-weighted warp layer indicates that static masks and occlusion masks are used as weights when performing warp operations. Our designs are proved to be effective in ablation experiments. The experiment results show the promising prospect of an odometry-assisted unsupervised learning method for 3D scene flow in real-world data.

preprint2022arXiv

What Matters for 3D Scene Flow Network

3D scene flow estimation from point clouds is a low-level 3D motion perception task in computer vision. Flow embedding is a commonly used technique in scene flow estimation, and it encodes the point motion between two consecutive frames. Thus, it is critical for the flow embeddings to capture the correct overall direction of the motion. However, previous works only search locally to determine a soft correspondence, ignoring the distant points that turn out to be the actual matching ones. In addition, the estimated correspondence is usually from the forward direction of the adjacent point clouds, and may not be consistent with the estimated correspondence acquired from the backward direction. To tackle these problems, we propose a novel all-to-all flow embedding layer with backward reliability validation during the initial scene flow estimation. Besides, we investigate and compare several design choices in key components of the 3D scene flow network, including the point similarity calculation, input elements of predictor, and predictor & refinement level design. After carefully choosing the most effective designs, we are able to present a model that achieves the state-of-the-art performance on FlyingThings3D and KITTI Scene Flow datasets. Our proposed model surpasses all existing methods by at least 38.2% on FlyingThings3D dataset and 24.7% on KITTI Scene Flow dataset for EPE3D metric. We release our codes at https://github.com/IRMVLab/3DFlow.

preprint2020arXiv

Accurate atomic correlation and total energies for correlation consistent effective core potentials

Very recently, we introduced a set of correlation consistent effective core potentials (ccECPs) constructed within full many-body approaches. By employing significantly more accurate correlated approaches we were able to reach a new level of accuracy for the resulting effective core Hamiltonians. We also strived for simplicity of use and easy transferability into a variety of electronic structure methods in quantum chemistry and condensed matter physics. Here, as a reference for future use, we present exact or nearly-exact total energy calculations for these ccECPs. The calculations cover H-Kr elements and are based on the state-of-the-art configuration interaction (CI), coupled-cluster (CC), and quantum Monte Carlo (QMC) calculations with systematically eliminated/improved errors. In particular, we carry out full CI/CCSD(T)/CCSDT(Q) calculations with cc-pVnZ with up to n=6 basis sets and we estimate the complete basis set limits. Using combinations of these approaches, we achieved an accuracy of $\approx$ 1-10 mHa for K-Zn atoms and $\approx$ 0.1-0.3 mHa for all other elements $-$ within about 1% or better of the ccECP total correlation energies. We also estimate the corresponding kinetic energies within the feasible limit of full CI calculations. In order to provide data for QMC calculations, we include fixed-node diffusion Monte Carlo energies for each element that give quantitative insights into the fixed-node biases for single-reference trial wave functions. The results offer a clear benchmark for future high accuracy calculations in a broad variety of correlated wave function methods such as CI and CC as well is in stochastic approaches such as real space sampling QMC.

preprint2020arXiv

QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion Quantum Monte Carlo

We review recent advances in the capabilities of the open source ab initio Quantum Monte Carlo (QMC) package QMCPACK and the workflow tool Nexus used for greater efficiency and reproducibility. The auxiliary field QMC (AFQMC) implementation has been greatly expanded to include k-point symmetries, tensor-hypercontraction, and accelerated graphical processing unit (GPU) support. These scaling and memory reductions greatly increase the number of orbitals that can practically be included in AFQMC calculations, increasing accuracy. Advances in real space methods include techniques for accurate computation of band gaps and for systematically improving the nodal surface of ground state wavefunctions. Results of these calculations can be used to validate application of more approximate electronic structure methods including GW and density functional based techniques. To provide an improved foundation for these calculations we utilize a new set of correlation-consistent effective core potentials (pseudopotentials) that are more accurate than previous sets; these can also be applied in quantum-chemical and other many-body applications, not only QMC. These advances increase the efficiency, accuracy, and range of properties that can be studied in both molecules and materials with QMC and QMCPACK.

preprint2020arXiv

Spectral Pyramid Graph Attention Network for Hyperspectral Image Classification

Convolutional neural networks (CNN) have made significant advances in hyperspectral image (HSI) classification. However, standard convolutional kernel neglects the intrinsic connections between data points, resulting in poor region delineation and small spurious predictions. Furthermore, HSIs have a unique continuous data distribution along the high dimensional spectrum domain - much remains to be addressed in characterizing the spectral contexts considering the prohibitively high dimensionality and improving reasoning capability in light of the limited amount of labelled data. This paper presents a novel architecture which explicitly addresses these two issues. Specifically, we design an architecture to encode the multiple spectral contextual information in the form of spectral pyramid of multiple embedding spaces. In each spectral embedding space, we propose graph attention mechanism to explicitly perform interpretable reasoning in the spatial domain based on the connection in spectral feature space. Experiments on three HSI datasets demonstrate that the proposed architecture can significantly improve the classification accuracy compared with the existing methods.

preprint2020arXiv

Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion from 3D Geometry

In autonomous driving, monocular sequences contain lots of information. Monocular depth estimation, camera ego-motion estimation and optical flow estimation in consecutive frames are high-profile concerns recently. By analyzing tasks above, pixels in the middle frame are modeled into three parts: the rigid region, the non-rigid region, and the occluded region. In joint unsupervised training of depth and pose, we can segment the occluded region explicitly. The occlusion information is used in unsupervised learning of depth, pose and optical flow, as the image reconstructed by depth-pose and optical flow will be invalid in occluded regions. A less-than-mean mask is designed to further exclude the mismatched pixels interfered with by motion or illumination change in the training of depth and pose networks. This method is also used to exclude some trivial mismatched pixels in the training of the optical flow network. Maximum normalization is proposed for depth smoothness term to restrain depth degradation in textureless regions. In the occluded region, as depth and camera motion can provide more reliable motion estimation, they can be used to instruct unsupervised learning of optical flow. Our experiments in KITTI dataset demonstrate that the model based on three regions, full and explicit segmentation of the occlusion region, the rigid region, and the non-rigid region with corresponding unsupervised losses can improve performance on three tasks significantly. The source code is available at: https://github.com/guangmingw/DOPlearning.

Guangming Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

From Trajectories to Phenotypes: Disease Progression as Structural Priors for Multi-organ Imaging Representation Learning

Person Parametric Physics-informed Representation for mmWave-based Human Pose Estimation

Robot Learning from Human Videos: A Survey

3D Hierarchical Refinement and Augmentation for Unsupervised Learning of Depth and Pose from Monocular Video

A new generation of effective core potentials from correlated and spin-orbit calculations: selected heavy elements

DetFlowTrack: 3D Multi-object Tracking based on Simultaneous Optimization of Object Detection and Scene Flow Estimation

Efficient 3D Deep LiDAR Odometry

Electronic structure of $\boldsymbolα$-RuCl$_3$ by fixed-node and fixed-phase diffusion Monte Carlo methods

Interactive Multi-scale Fusion of 2D and 3D Features for Multi-object Tracking

Magnetic measures of purity for MnBi$_2$Te$_4$

Motion Projection Consistency Based 3D Human Pose Estimation with Virtual Bones from Monocular Videos

Origin of Metal-Insulator Transitions in Correlated Perovskite Metals

Residual 3D Scene Flow Learning with Context-Aware Feature Extraction

Unsupervised Learning of 3D Scene Flow from Monocular Camera

Unsupervised Learning of 3D Scene Flow with 3D Odometry Assistance

What Matters for 3D Scene Flow Network

Accurate atomic correlation and total energies for correlation consistent effective core potentials

QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion Quantum Monte Carlo

Spectral Pyramid Graph Attention Network for Hyperspectral Image Classification

Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion from 3D Geometry