Researcher profile

Hao Su

Hao Su contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
38works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

38 published item(s)

preprint2026arXiv

Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning

Imitation Learning (IL) has achieved remarkable success across various domains, including robotics, autonomous driving, and healthcare, by enabling agents to learn complex behaviors from expert demonstrations. However, existing IL methods often face instability challenges, particularly when relying on adversarial reward or value formulations in world model frameworks. In this work, we propose a novel approach to online imitation learning that addresses these limitations through a reward model based on random network distillation (RND) for density estimation. Our reward model is built on the joint estimation of expert and behavioral distributions within the latent space of the world model. We evaluate our method across diverse benchmarks, including DMControl, Meta-World, and ManiSkill2, showcasing its ability to deliver stable performance and achieve expert-level results in both locomotion and manipulation tasks. Our approach demonstrates improved stability over adversarial methods while maintaining expert-level performance.

preprint2026arXiv

Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout

Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-step rollout capabilities for driver dynamics. We introduce Driver-WM, a driver-centric latent world model that rolls out in-cabin dynamics causally conditioned on out-cabin traffic context. This formulation unifies physical kinematics forecasting with auxiliary behavioral and emotional semantic recognition. Operating in a compact latent space constructed from frozen vision-language features, Driver-WM adopts a dual-stream architecture to separately encode external traffic and internal driver states. These streams are directionally coupled via a gated causal injection mechanism, which uses a learned vector gate to modulate external contextual perturbations while strictly enforcing temporal causality. Evaluations on a multi-task assistive driving benchmark demonstrate that Driver-WM yields robust long-horizon geometric forecasting for reactive high-motion maneuvers and improves semantic alignment for both driver and traffic states. Finally, the explicit external-to-internal conditioning allows for controlled test-time interventions to systematically analyze mechanism responses.

preprint2025arXiv

MorphoCopter: Design, Modeling, and Control of a New Transformable Quad-Bi Copter

This paper presents a novel morphing quadrotor, named MorphoCopter, covering its design, modeling, control, and experimental tests. It features a unique single rotary joint that enables rapid transformation into an ultra-narrow profile. Although quadrotors have seen widespread adoption in applications such as cinematography, agriculture, and disaster management with increasingly sophisticated control systems, their hardware configurations have remained largely unchanged, limiting their capabilities in certain environments. Our design addresses this by enabling the hardware configuration to change on the fly when required. In standard flight mode, the MorphoCopter adopts an X configuration, functioning as a traditional quadcopter, but can quickly fold into a stacked bicopters arrangement or any configuration in between. Existing morphing designs often sacrifice controllability in compact configurations or rely on complex multi-joint systems. Moreover, our design achieves a greater width reduction than any existing solution. We develop a new inertia and control-action aware adaptive control system that maintains robust performance across all rotary-joint configurations. The prototype can reduce its width from 447 mm to 138 mm (nearly 70\% reduction) in just a few seconds. We validated the MorphoCopter through rigorous simulations and a comprehensive series of flight experiments, including robustness tests, trajectory tracking, and narrow-gap passing tests.

preprint2025arXiv

Passage-traversing optimal path planning with sampling-based algorithms

This paper introduces a new paradigm of optimal path planning, i.e., passage-traversing optimal path planning (PTOPP), that optimizes paths' traversed passages for specified optimization objectives. In particular, PTOPP is utilized to find the path with optimal accessible free space along its entire length, which represents a basic requirement for paths in robotics. As passages are places where free space shrinks and becomes constrained, the core idea is to leverage the path's passage traversal status to characterize its accessible free space comprehensively. To this end, a novel passage detection and free space decomposition method using proximity graphs is proposed, enabling fast detection of sparse but informative passages and environment decompositions. Based on this preprocessing, optimal path planning with accessible free space objectives or constraints is formulated as PTOPP problems compatible with sampling-based optimal planners. Then, sampling-based algorithms for PTOPP, including their dependent primitive procedures, are developed leveraging partitioned environments for fast passage traversal check. All these methods are implemented and thoroughly tested for effectiveness and efficiency validation. Compared to existing approaches, such as clearance-based methods, PTOPP demonstrates significant advantages in configurability, solution optimality, and efficiency, addressing prior limitations and incapabilities. It is believed to provide an efficient and versatile solution to accessible free space optimization over conventional avenues and more generally, to a broad class of path planning problems that can be formulated as PTOPP.

preprint2024arXiv

Constrained Online Two-stage Stochastic Optimization: Algorithm with (and without) Predictions

We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm can be reduced to the regret bound of embedded adversarial learning algorithms. Based on this framework, we obtain new results under various settings. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret $O(W_T+\sqrt{T})$, where $W_T$ measures the total inaccuracy of the machine-learned predictions. We then develop another algorithm that works when no machine-learned predictions are given and show the performances.

preprint2023arXiv

Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation

In this paper, we focus on the simulation of active stereovision depth sensors, which are popular in both academic and industry communities. Inspired by the underlying mechanism of the sensors, we designed a fully physics-grounded simulation pipeline that includes material acquisition, ray-tracing-based infrared (IR) image rendering, IR noise simulation, and depth estimation. The pipeline is able to generate depth maps with material-dependent error patterns similar to a real depth sensor in real time. We conduct real experiments to show that perception algorithms and reinforcement learning policies trained in our simulation platform could transfer well to the real-world test cases without any fine-tuning. Furthermore, due to the high degree of realism of this simulation, our depth sensor simulator can be used as a convenient testbed to evaluate the algorithm performance in the real world, which will largely reduce the human effort in developing robotic algorithms. The entire pipeline has been integrated into the SAPIEN simulator and is open-sourced to promote the research of vision and robotics communities.

preprint2023arXiv

From One Hand to Multiple Hands: Imitation Learning for Dexterous Manipulation from Single-Camera Teleoperation

We propose to perform imitation learning for dexterous manipulation with multi-finger robot hand from human demonstrations, and transfer the policy to the real robot hand. We introduce a novel single-camera teleoperation system to collect the 3D demonstrations efficiently with only an iPad and a computer. One key contribution of our system is that we construct a customized robot hand for each user in the physical simulator, which is a manipulator resembling the same kinematics structure and shape of the operator's hand. This provides an intuitive interface and avoid unstable human-robot hand retargeting for data collection, leading to large-scale and high quality data. Once the data is collected, the customized robot hand trajectories can be converted to different specified robot hands (models that are manufactured) to generate training demonstrations. With imitation learning using our data, we show large improvement over baselines with multiple complex manipulation tasks. Importantly, we show our learned policy is significantly more robust when transferring to the real robot. More videos can be found in the https://yzqin.github.io/dex-teleop-imitation .

preprint2022arXiv

Approximate Convex Decomposition for 3D Meshes with Collision-Aware Concavity and Tree Search

Approximate convex decomposition aims to decompose a 3D shape into a set of almost convex components, whose convex hulls can then be used to represent the input shape. It thus enables efficient geometry processing algorithms specifically designed for convex shapes and has been widely used in game engines, physics simulations, and animation. While prior works can capture the global structure of input shapes, they may fail to preserve fine-grained details (e.g., filling a toaster's slots), which are critical for retaining the functionality of objects in interactive environments. In this paper, we propose a novel method that addresses the limitations of existing approaches from three perspectives: (a) We introduce a novel collision-aware concavity metric that examines the distance between a shape and its convex hull from both the boundary and the interior. The proposed concavity preserves collision conditions and is more robust to detect various approximation errors. (b) We decompose shapes by directly cutting meshes with 3D planes. It ensures generated convex hulls are intersection-free and avoids voxelization errors. (c) Instead of using a one-step greedy strategy, we propose employing a multi-step tree search to determine the cutting planes, which leads to a globally better solution and avoids unnecessary cuttings. Through extensive evaluation on a large-scale articulated object dataset, we show that our method generates decompositions closer to the original shape with fewer components. It thus supports delicate and efficient object interaction in downstream applications. We will release our implementation to facilitate future research.

preprint2022arXiv

Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

Differentiable physics has recently been shown as a powerful tool for solving soft-body manipulation tasks. However, the differentiable physics solver often gets stuck when the initial contact points of the end effectors are sub-optimal or when performing multi-stage tasks that require contact point switching, which often leads to local minima. To address this challenge, we propose a contact point discovery approach (CPDeform) that guides the stand-alone differentiable physics solver to deform various soft-body plasticines. The key idea of our approach is to integrate optimal transport-based contact points discovery into the differentiable physics solver to overcome the local minima from initial contact points or contact switching. On single-stage tasks, our method can automatically find suitable initial contact points based on transport priorities. On complex multi-stage tasks, we can iteratively switch the contact points of end-effectors based on transport priorities. To evaluate the effectiveness of our method, we introduce PlasticineLab-M that extends the existing differentiable physics benchmark PlasticineLab to seven new challenging multi-stage soft-body manipulation tasks. Extensive experimental results suggest that: 1) on multi-stage tasks that are infeasible for the vanilla differentiable physics solver, our approach discovers contact points that efficiently guide the solver to completion; 2) on tasks where the vanilla solver performs sub-optimally or near-optimally, our contact point discovery method performs better than or on par with the manipulation performance obtained with handcrafted contact points.

preprint2022arXiv

Improving Policy Optimization with Generalist-Specialist Learning

Generalization in deep reinforcement learning over unseen environment variations usually requires policy learning over a large set of diverse training variations. We empirically observe that an agent trained on many variations (a generalist) tends to learn faster at the beginning, yet its performance plateaus at a less optimal level for a long time. In contrast, an agent trained only on a few variations (a specialist) can often achieve high returns under a limited computational budget. To have the best of both worlds, we propose a novel generalist-specialist training framework. Specifically, we first train a generalist on all environment variations; when it fails to improve, we launch a large population of specialists with weights cloned from the generalist, each trained to master a selected small subset of variations. We finally resume the training of the generalist with auxiliary rewards induced by demonstrations of all specialists. In particular, we investigate the timing to start specialist training and compare strategies to learn generalists with assistance from specialists. We show that this framework pushes the envelope of policy learning on several challenging and popular benchmarks including Procgen, Meta-World and ManiSkill.

preprint2022arXiv

Multi-skill Mobile Manipulation for Object Rearrangement

We study a modular approach to tackle long-horizon mobile manipulation tasks for object rearrangement, which decomposes a full task into a sequence of subtasks. To tackle the entire task, prior work chains multiple stationary manipulation skills with a point-goal navigation skill, which are learned individually on subtasks. Although more effective than monolithic end-to-end RL policies, this framework suffers from compounding errors in skill chaining, e.g., navigating to a bad location where a stationary manipulation skill can not reach its target to manipulate. To this end, we propose that the manipulation skills should include mobility to have flexibility in interacting with the target object from multiple locations and at the same time the navigation skill could have multiple end points which lead to successful manipulation. We operationalize these ideas by implementing mobile manipulation skills rather than stationary ones and training a navigation skill trained with region goal instead of point goal. We evaluate our multi-skill mobile manipulation method M3 on 3 challenging long-horizon mobile manipulation tasks in the Home Assistant Benchmark (HAB), and show superior performance as compared to the baselines.

preprint2022arXiv

NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction

While NeRF has shown great success for neural reconstruction and rendering, its limited MLP capacity and long per-scene optimization times make it challenging to model large-scale indoor scenes. In contrast, classical 3D reconstruction methods can handle large-scale scenes but do not produce realistic renderings. We propose NeRFusion, a method that combines the advantages of NeRF and TSDF-based fusion techniques to achieve efficient large-scale reconstruction and photo-realistic rendering. We process the input image sequence to predict per-frame local radiance fields via direct network inference. These are then fused using a novel recurrent neural network that incrementally reconstructs a global, sparse scene representation in real-time at 22 fps. This global volume can be further fine-tuned to boost rendering quality. We demonstrate that NeRFusion achieves state-of-the-art quality on both large-scale indoor and small-scale object scenes, with substantially faster reconstruction than NeRF and other recent methods.

preprint2022arXiv

Particle Cloud Generation with Message Passing Generative Adversarial Networks

In high energy physics (HEP), jets are collections of correlated particles produced ubiquitously in particle collisions such as those at the CERN Large Hadron Collider (LHC). Machine learning (ML)-based generative models, such as generative adversarial networks (GANs), have the potential to significantly accelerate LHC jet simulations. However, despite jets having a natural representation as a set of particles in momentum-space, a.k.a. a particle cloud, there exist no generative models applied to such a dataset. In this work, we introduce a new particle cloud dataset (JetNet), and apply to it existing point cloud GANs. Results are evaluated using (1) 1-Wasserstein distances between high- and low-level feature distributions, (2) a newly developed Fréchet ParticleNet Distance, and (3) the coverage and (4) minimum matching distance metrics. Existing GANs are found to be inadequate for physics applications, hence we develop a new message passing GAN (MPGAN), which outperforms existing point cloud GANs on virtually every metric and shows promise for use in HEP. We propose JetNet as a novel point-cloud-style dataset for the ML community to experiment with, and set MPGAN as a benchmark to improve upon for future generative models. Additionally, to facilitate research and improve accessibility and reproducibility in this area, we release the open-source JetNet Python package with interfaces for particle cloud datasets, implementations for evaluation and loss metrics, and more tools for ML in HEP development.

preprint2022arXiv

Provably Efficient Kernelized Q-Learning

We propose and analyze a kernelized version of Q-learning. Although a kernel space is typically infinite-dimensional, extensive study has shown that generalization is only affected by the effective dimension of the data. We incorporate such ideas into the Q-learning framework and derive regret bounds for arbitrary kernels. In particular, we provide concrete bounds for linear kernels and Gaussian RBF kernels; notably, the latter bound looks almost identical to the former, only that the actual dimension is replaced by a different notion of dimensionality. Finally, we test our algorithm on a suite of classic control tasks; remarkably, under the Gaussian RBF kernel, it achieves reasonably good performance after only 1000 environmental steps, while its neural network counterpart, deep Q-learning, still struggles.

preprint2022arXiv

Temporal Difference Learning for Model Predictive Control

Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases. However, it is both costly to plan over long horizons and challenging to obtain an accurate model of the environment. In this work, we combine the strengths of model-free and model-based methods. We use a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and use a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning. Our method, TD-MPC, achieves superior sample efficiency and asymptotic performance over prior work on both state and image-based continuous control tasks from DMControl and Meta-World. Code and video results are available at https://nicklashansen.github.io/td-mpc.

preprint2021arXiv

Design and Actuator Optimization of Lightweight and Compliant Knee Exoskeleton for Mobility Assistance of Children with Crouch Gait

Pediatric exoskeletons offer great promise to increase mobility for children with crouch gait caused by cerebral palsy. A lightweight, compliant and user-specific actuator is critical for maximizing the benefits of an exoskeleton to users. To date, pediatric exoskeletons generally use the same actuators as adult exoskeletons, which are heavy and resistive to natural movement. There is yet no easy way for robotic exoskeletons to accommodate the changes in design requirements that occur as a child ages. We developed a lightweight (1.65 kg unilateral mass) and compliant pediatric knee exoskeleton with a bandwidth of 22.6 Hz that can provide torque assistance to children with crouch gait using high torque density motor. Experimental results demonstrated that the robot exhibited low mechanical impedance (1.79 Nm average backdrive torque) under the unpowered condition and 0.32 Nm with zero-torque tracking control. Root mean square (RMS) error of torque tracking result is less than 0.73 Nm (5.7% with respect to 12 Nm torque). To achieve optimal age-specific performance, we proposed the first optimization framework that considered both motor and transmission of the actuator system that can produce optimal settings for children between 3 and 18 years old. The optimization generated an optimal motor air gap radius that monotonically increases with age from 0.011 to 0.033 meters, and optimal gear ratio varies from 2.6 to 11.6 (3-13 years old) and 11.6 to 10.2 (13-18 years old), leading to actuators of minimal mass.

preprint2021arXiv

FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization

Generating images from natural language instructions is an intriguing yet highly challenging task. We approach text-to-image generation by combining the power of the retrained CLIP representation with an off-the-shelf image generator (GANs), optimizing in the latent space of GAN to find images that achieve maximum CLIP score with the given input text. Compared to traditional methods that train generative models from text to image starting from scratch, the CLIP+GAN approach is training-free, zero shot and can be easily customized with different generators. However, optimizing CLIP score in the GAN space casts a highly challenging optimization problem and off-the-shelf optimizers such as Adam fail to yield satisfying results. In this work, we propose a FuseDream pipeline, which improves the CLIP+GAN approach with three key techniques: 1) an AugCLIP score which robustifies the CLIP objective by introducing random augmentation on image. 2) a novel initialization and over-parameterization strategy for optimization which allows us to efficiently navigate the non-convex landscape in GAN space. 3) a composed generation technique which, by leveraging a novel bi-level optimization formulation, can compose multiple images to extend the GAN space and overcome the data-bias. When promoted by different input text, FuseDream can generate high-quality images with varying objects, backgrounds, artistic styles, even novel counterfactual concepts that do not appear in the training data of the GAN we use. Quantitatively, the images generated by FuseDream yield top-level Inception score and FID score on MS COCO dataset, without additional architecture design or training. Our code is publicly available at \url{https://github.com/gnobitab/FuseDream}.

preprint2021arXiv

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

Recent work has demonstrated that volumetric scene representations combined with differentiable volume rendering can enable photo-realistic rendering for challenging scenes that mesh reconstruction fails on. However, these methods entangle geometry and appearance in a "black-box" volume that cannot be edited. Instead, we present an approach that explicitly disentangles geometry--represented as a continuous 3D volume--from appearance--represented as a continuous 2D texture map. We achieve this by introducing a 3D-to-2D texture mapping (or surface parameterization) network into volumetric representations. We constrain this texture mapping network using an additional 2D-to-3D inverse mapping network and a novel cycle consistency loss to make 3D surface points map to 2D texture points that map back to the original 3D points. We demonstrate that this representation can be reconstructed using only multi-view image supervision and generates high-quality rendering results. More importantly, by separating geometry and texture, we allow users to edit appearance by simply editing 2D texture maps.

preprint2020arXiv

Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints

Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry (VO) and simultaneous localization and mapping (SLAM), where classic methods consisting of hand-crafted features and sampling-based outlier rejection have been a dominant choice for over a decade. Although multiple works propose to replace these modules with learning-based counterparts, most have not yet been as accurate, robust and generalizable as conventional methods. In this paper, we design an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection, while directly optimizing for the geometric pose objective. We show both quantitatively and qualitatively that pose estimation performance may be achieved on par with the classic pipeline. Moreover, we are able to show by end-to-end training, the key components of the pipeline could be significantly improved, which leads to better generalizability to unseen datasets compared to existing learning-based methods.

preprint2020arXiv

Deep Photon Mapping

Recently, deep learning-based denoising approaches have led to dramatic improvements in low sample-count Monte Carlo rendering. These approaches are aimed at path tracing, which is not ideal for simulating challenging light transport effects like caustics, where photon mapping is the method of choice. However, photon mapping requires very large numbers of traced photons to achieve high-quality reconstructions. In this paper, we develop the first deep learning-based method for particle-based rendering, and specifically focus on photon density estimation, the core of all particle-based methods. We train a novel deep neural network to predict a kernel function to aggregate photon contributions at shading points. Our network encodes individual photons into per-photon features, aggregates them in the neighborhood of a shading point to construct a photon local context vector, and infers a kernel function from the per-photon and photon local context features. This network is easy to incorporate in many previous photon mapping methods (by simply swapping the kernel density estimator) and can produce high-quality reconstructions of complex global illumination effects like caustics with an order of magnitude fewer photons compared to previous photon mapping methods.

preprint2020arXiv

Deep Stereo using Adaptive Thin Volume Representation with Uncertainty Awareness

We present Uncertainty-aware Cascaded Stereo Network (UCS-Net) for 3D reconstruction from multiple RGB images. Multi-view stereo (MVS) aims to reconstruct fine-grained scene geometry from multi-view images. Previous learning-based MVS methods estimate per-view depth using plane sweep volumes with a fixed depth hypothesis at each plane; this generally requires densely sampled planes for desired accuracy, and it is very hard to achieve high-resolution depth. In contrast, we propose adaptive thin volumes (ATVs); in an ATV, the depth hypothesis of each plane is spatially varying, which adapts to the uncertainties of previous per-pixel depth predictions. Our UCS-Net has three stages: the first stage processes a small standard plane sweep volume to predict low-resolution depth; two ATVs are then used in the following stages to refine the depth with higher resolution and higher accuracy. Our ATV consists of only a small number of planes; yet, it efficiently partitions local depth ranges within learned small intervals. In particular, we propose to use variance-based uncertainty estimates to adaptively construct ATVs; this differentiable process introduces reasonable and fine-grained spatial partitioning. Our multi-stage framework progressively subdivides the vast scene space with increasing depth resolution and precision, which enables scene reconstruction with high completeness and accuracy in a coarse-to-fine fashion. We demonstrate that our method achieves superior performance compared with state-of-the-art benchmarks on various challenging datasets.

preprint2020arXiv

Electronic structure of a Si-containing topological Dirac semimetal CaAl2Si2

There has been an upsurge in the discovery of topological quantum materials, where various topological insulators and semimetals have been theoretically predicted and experimentally observed. However, only very few of them contains silicon, the most widely used element in electronic industry. Recently, ternary compound CaAl2Si2 has been predicted to be a topological Dirac semimetal, hosting Lorentz-symmetry-violating quasiparticles with a strongly tilted conical band dispersion. In this work, by using high-resolution angle-resolved photoemission spectroscopy (ARPES), we investigated the comprehensive electronic structure of CaAl2Si2. A pair of topological Dirac crossings is observed along the kz direction, in good agreement with the ab initio calculations, confirming the topological Dirac semimetal nature of the compound. Our study expands the topological material family on Si-containing compounds, which have great application potential in realizing low-cost, nontoxic electronic device with topological quantum states.

preprint2020arXiv

Information-Theoretic Local Minima Characterization and Regularization

Recent advances in deep learning theory have evoked the study of generalizability across different local minima of deep neural networks (DNNs). While current work focused on either discovering properties of good local minima or developing regularization techniques to induce good local minima, no approach exists that can tackle both problems. We achieve these two goals successfully in a unified manner. Specifically, based on the observed Fisher information we propose a metric both strongly indicative of generalizability of local minima and effectively applied as a practical regularizer. We provide theoretical analysis including a generalization bound and empirically demonstrate the success of our approach in both capturing and improving the generalizability of DNNs. Experiments are performed on CIFAR-10, CIFAR-100 and ImageNet for various network architectures.

preprint2020arXiv

Leveraging Elastic instabilities for Amplified Performance: spine-inspired high-speed and high-force soft robots

Soft machines typically exhibit slow locomotion speed and low manipulation strength because of intrinsic limitations of soft materials. Here, we present a generic design principle that harnesses mechanical instability for a variety of spine-inspired fast and strong soft machines. Unlike most current soft robots that are designed as inherently and unimodally stable, our design leverages tunable snap-through bistability to fully explore the ability of soft robots to rapidly store and release energy within tens of milliseconds. We demonstrate this generic design principle with three high-performance soft machines: High-speed cheetah-like galloping crawlers with locomotion speeds of 2.68 body length/s, high-speed underwater swimmers (0.78 body length/s), and tunable low-to-high-force soft grippers with over 1 to 103 stiffness modulation (maximum load capacity is 11.4 kg). Our study establishes a new generic design paradigm of next-generation high-performance soft robots that are applicable for multifunctionality, different actuation methods, and materials at multiscales.

preprint2020arXiv

Magnetic critical behavior of the van der Waals Fe5GeTe2 crystal with near room temperature ferromagnetism

The van der Waals ferromagnet Fe5GeTe2 has a Curie temperature TC of about 270 K, which can be raised above room temperature by tuning the Fe deficiency content. To achieve insights into its ferromagnetic exchange, we have studied the critical behavior by measuring the magnetization in bulk Fe5GeTe2 crystal around the ferromagnetic to paramagnetic phase transition. The analysis of the magnetization by employing various techniques including the modified Arrott plot, Kouvel-Fisher plot and critical isotherm analysis achieved a set of reliable critical exponents with TC = 273.7 K, beta = 0.3457, gamma = 1.40617, and delta = 5.021, suggesting a three-dimensional magnetic exchange with the distance decaying as J(r) ~ (r)$^-4.916, which is close to that of a three-dimensional Heisenberg model with long-range magnetic coupling.

preprint2020arXiv

Magnetism-induced topological transition in EuAs3

The nature of the interaction between magnetism and topology in magnetic topological semimetals remains mysterious, but may be expected to lead to a variety of novel physics. We present $ab$ $initio$ band calculations, electrical transport and angle-resolved photoemission spectroscopy (ARPES) measurements on the magnetic semimetal EuAs$_3$, demonstrating a magnetism-induced topological transition from a topological nodal-line semimetal in the paramagnetic or the spin-polarized state to a topological massive Dirac metal in the antiferromagnetic (AFM) ground state at low temperature, featuring a pair of massive Dirac points, inverted bands and topological surface states on the (010) surface. Shubnikov-de Haas (SdH) oscillations in the AFM state identify nonzero Berry phase and a negative longitudinal magnetoresistance ($n$-LMR) induced by the chiral anomaly, confirming the topological nature predicted by band calculations. When magnetic moments are fully polarized by an external magnetic field, an unsaturated and extremely large magnetoresistance (XMR) of $\sim$ 2$\times10^5$ % at 1.8 K and 28.3 T is observed, likely arising from topological protection. Consistent with band calculations for the spin-polarized state, four new bands in quantum oscillations different from those in the AFM state are discerned, of which two are topologically protected. Nodal-line structures at the $Y$ point in the Brillouin zone (BZ) are proposed in both the spin-polarized and paramagnetic states, and the latter is proven by ARPES. Moreover, a temperature-induced Lifshitz transition accompanied by the emergence of a new band below 3 K is revealed. These results indicate that magnetic EuAs$_3$ provides a rich platform to explore exotic physics arising from the interaction of magnetism with topology.

preprint2020arXiv

Model Imitation for Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) aims to learn a dynamic model to reduce the number of interactions with real-world environments. However, due to estimation error, rollouts in the learned model, especially those of long horizons, fail to match the ones in real-world environments. This mismatching has seriously impacted the sample complexity of MBRL. The phenomenon can be attributed to the fact that previous works employ supervised learning to learn the one-step transition models, which has inherent difficulty ensuring the matching of distributions from multi-step rollouts. Based on the claim, we propose to learn the transition model by matching the distributions of multi-step rollouts sampled from the transition model and the real ones via WGAN. We theoretically show that matching the two can minimize the difference of cumulative rewards between the real transition and the learned one. Our experiments also show that the proposed Model Imitation method can compete or outperform the state-of-the-art in terms of sample complexity and average return.

preprint2020arXiv

Normal Assisted Stereo Depth Estimation

Accurate stereo depth estimation plays a critical role in various 3D tasks in both indoor and outdoor environments. Recently, learning-based multi-view stereo methods have demonstrated competitive performance with a limited number of views. However, in challenging scenarios, especially when building cross-view correspondences is hard, these methods still cannot produce satisfying results. In this paper, we study how to leverage a normal estimation model and the predicted normal maps to improve the depth quality. We couple the learning of a multi-view normal estimation module and a multi-view depth estimation module. In addition, we propose a novel consistency loss to train an independent consistency module that refines the depths from depth/normal pairs. We find that the joint learning can improve both the prediction of normal and depth, and the accuracy & smoothness can be further improved by enforcing the consistency. Experiments on MVS, SUN3D, RGBD, and Scenes11 demonstrate the effectiveness of our method and state-of-the-art performance.

preprint2020arXiv

Quasi-Direct Drive Actuation for a Lightweight Hip Exoskeleton with High Backdrivability and High Bandwidth

High-performance actuators are crucial to enable mechanical versatility of lower-limb wearable robots, which are required to be lightweight, highly backdrivable, and with high bandwidth. State-of-the-art actuators, e.g., series elastic actuators (SEAs), have to compromise bandwidth to improve compliance (i.e., backdrivability). In this paper, we describe the design and human-robot interaction modeling of a portable hip exoskeleton based on our custom quasi-direct drive (QDD) actuation (i.e., a high torque density motor with low ratio gear). We also present a model-based performance benchmark comparison of representative actuators in terms of torque capability, control bandwidth, backdrivability, and force tracking accuracy. This paper aims to corroborate the underlying philosophy of "design for control", namely meticulous robot design can simplify control algorithms while ensuring high performance. Following this idea, we create a lightweight bilateral hip exoskeleton (overall mass is 3.4 kg) to reduce joint loadings during normal activities, including walking and squatting. Experimental results indicate that the exoskeleton is able to produce high nominal torque (17.5 Nm), high backdrivability (0.4 Nm backdrive torque), high bandwidth (62.4 Hz), and high control accuracy (1.09 Nm root mean square tracking error, i.e., 5.4% of the desired peak torque). Its controller is versatile to assist walking at different speeds (0.8-1.4 m/s) and squatting at 2 s cadence. This work demonstrates significant improvement in backdrivability and control bandwidth compared with state-of-the-art exoskeletons powered by the conventional actuation or SEA.

preprint2020arXiv

Resonant X-ray scattering study of diffuse magnetic scattering from the topological semimetals EuCd$_2$As$_2$ and EuCd$_2$Sb$_2$

We have investigated the magnetic correlations in the candidate Weyl semimetals EuCd$_2Pn_2$, ($Pn$=As, Sb) by resonant elastic X-ray scattering (REXS) at the Eu$^{2+}$ $M_5$ edge. The temperature and field dependence of the diffuse scattering of EuCd$_2$As$_2$ provide direct evidence that the Eu moments exhibit slow ferromagnetic correlations well above the Néel temperature. By contrast, the diffuse scattering in the paramagnetic phase of isostructural EuCd$_2$Sb$_2$ is at least an order of magnitude weaker. The FM correlations present in the paramagnetic phase of EuCd$_2$As$_2$ could create short-lived Weyl nodes.

preprint2020arXiv

Rethinking Sampling in 3D Point Cloud Generative Adversarial Networks

In this paper, we examine the long-neglected yet important effects of point sampling patterns in point cloud GANs. Through extensive experiments, we show that sampling-insensitive discriminators (e.g.PointNet-Max) produce shape point clouds with point clustering artifacts while sampling-oversensitive discriminators (e.g.PointNet++, DGCNN) fail to guide valid shape generation. We propose the concept of sampling spectrum to depict the different sampling sensitivities of discriminators. We further study how different evaluation metrics weigh the sampling pattern against the geometry and propose several perceptual metrics forming a sampling spectrum of metrics. Guided by the proposed sampling spectrum, we discover a middle-point sampling-aware baseline discriminator, PointNet-Mix, which improves all existing point cloud generators by a large margin on sampling-related metrics. We point out that, though recent research has been focused on the generator design, the main bottleneck of point cloud GAN actually lies in the discriminator design. Our work provides both suggestions and tools for building future discriminators. We will release the code to facilitate future research.

preprint2020arXiv

SAPIEN: A SimulAted Part-based Interactive ENvironment

Building home assistant robots has long been a pursuit for vision and robotics researchers. To achieve this task, a simulated environment with physically realistic simulation, sufficient articulated objects, and transferability to the real robot is indispensable. Existing environments achieve these requirements for robotics simulation with different levels of simplification and focus. We take one step further in constructing an environment that supports household tasks for training robot learning algorithm. Our work, SAPIEN, is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects. Our SAPIEN enables various robotic vision and interaction tasks that require detailed part-level understanding.We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks using heuristic approaches and reinforcement learning algorithms. We hope that our SAPIEN can open a lot of research directions yet to be explored, including learning cognition through interaction, part motion discovery, and construction of robotics-ready simulated game environment.

preprint2020arXiv

The de Hass-van Alphen quantum oscillations in a three-dimensional Dirac semimetal TiSb2

We have used the de Hass-van Alphen (dHvA) effect to investigate the Fermi surface of high-quality crystalline TiSb2, which unveiled a nontrivial topologic nature by analyzing the dHvA quantum oscillations. Moreover, our analysis on the quantum oscillation frequencies associated with nonzero Berry phase when the magnetic field is parallel to both of the ab-plane and c-axis of TiSb2 finds that the Fermi surface topology has a three-dimensional (3D) feature. The results are supported by the first-principle calculations which revealed a symmetry-protected Dirac point appeared along the Γ-Z high symmetry line near the Fermi level. On the (001) surface, the bulk Dirac points are found to project onto the -Γ point with nontrivial surface states. Our finding will substantially enrich the family of 3D Dirac semimetals which are useful for topological applications.

preprint2020arXiv

The de Hass-van Alphen quantum oscillations in BaSn3 superconductor with multiple Dirac fermions

By measuring the de Hass-van Alphen effect and calculating the electronic band structure, we have investigated the bulk Fermi surface of the BaSn3 superconductor with a transition temperature of ~ 4.4 K. Striking de Haas-van Alphen (dHvA) quantum oscillations are observed when the magnetic field B is perpendicular to both (100) and (001) planes. Our analysis unveiled nontrivial Berry phase imposed in the quantum oscillations when B is perpendicular to (100), with two fundamental frequencies at 31.5 T and 306.7 T, which likely arise from two corresponding hole pockets of the bands forming a type-II Dirac point. The results are supported by the ab initio calculations indicating a type-II Dirac point setting and tilting along the high symmetric K-H line of the Brillouin zone, about 0.13 eV above the Fermi level. Moreover, the calculations also revealed other two type-I Dirac points on the high symmetric Γ-A direction, but slightly far below the Fermi level. The results demonstrate BaSn3 as an excellent platform for the study of not only exotic properties of different types of Dirac fermions in a single material, but also the interplay between nontrivial topological states and superconductivity.

preprint2020arXiv

Weakly-supervised 3D Shape Completion in the Wild

3D shape completion for real data is important but challenging, since partial point clouds acquired by real-world sensors are usually sparse, noisy and unaligned. Different from previous methods, we address the problem of learning 3D complete shape from unaligned and real-world partial point clouds. To this end, we propose a weakly-supervised method to estimate both 3D canonical shape and 6-DoF pose for alignment, given multiple partial observations associated with the same instance. The network jointly optimizes canonical shapes and poses with multi-view geometry constraints during training, and can infer the complete shape given a single partial point cloud. Moreover, learned pose estimation can facilitate partial point cloud registration. Experiments on both synthetic and real data show that it is feasible and promising to learn 3D shape completion through large-scale data without shape and pose supervision.

preprint2019arXiv

Bulk Fermi surface of the layered superconductor TaSe3 with three-dimensional strong topological insulator state

High magnetic field transport measurements and ab initio calculations on the layered superconductor TaSe3 have provided compelling evidences for the existence of a three-dimensional strong topological insulator state. Longitudinal magnetotransport measurements up to ~ 33 T unveiled striking Shubnikov-de Hass oscillations with two fundamental frequencies at 100 T and 175 T corresponding to a nontrivial electron Fermi pocket at the B point and a nontrivial hole Fermi pocket at the Γ point respectively in the Brillouin zone. However, calculations revealed one more electron pocket at the B point, which was not detected by the magnetotransport measurements, presumably due to the limited carrier momentum relaxation time. Angle dependent quantum oscillations by rotating the sample with respect to the magnetic field revealed clear changes in the two fundamental frequencies, indicating anisotropic electronic Fermi pockets. The ab initio calculations gave the topological Z2 invariants of (1; 100) and revealed a single Dirac cone on the (1 0 -1) surface at the X point with helical spin texture at a constant-energy contour, suggesting a strong topological insulator state. The results demonstrate TaSe3 an excellent platform to study the interplay between topological phase and superconductivity and a promising system for the exploration of topological superconductivity.

preprint2019arXiv

Magnetotransport properties of the layered CaAl2Si2 semimetal hosting multiple nontrivial topological states

Combination of different nontrivial topological states in a single material is capable of realizing multiple functionalities and exotic physics, but such materials are still very sparse. We report herein the results of magnetotransport measurements and ab initio calculations on single crystalline CaAl2Si2 semimetal. The transport properties could be well understood in connection with the two-band model, agreeing well with the theoretical calculations indicating four main sheets of Fermi surface consisting of three hole pockets centered at the Γ point and one electron pocket centered at the M point in the Brillouin zone. The single fundamental frequency imposed in the quantum oscillations of magnetoresistance corresponds to the electron Fermi pocket. Without spin-orbit coupling (SOC), the ab initio calculations suggest CaAl2Si2 as a system hosting a topological nodal-line setting around the Γ point in the Brillouin zone close to the Fermi level. Once including the SOC, the fragile nodal-line will be gapped and a pair of Dirac points emerge along the high symmetric Γ-A direction, which is about 1.22 eV below the Fermi level. The SOC can also induce a topological insulator state along the Γ-A direction with a gap of about 3 meV. The results demonstrate CaAl2Si2 as an excellent platform for the study of novel topological physics with multiple topological states.