Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
25works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

25 published item(s)

preprint2026arXiv

Improving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression

Large language models (LLMs) have demonstrated promising capabilities in Text-Attributed Graph (TAG) understanding. Recent studies typically focus on verbalizing the graph structures via handcrafted prompts, feeding the target node and its neighborhood context into LLMs. However, constrained by the context window, existing methods mainly resort to random sampling, often implemented via dropping node/edge randomly, which inevitably introduces noise and cause reasoning instability. We argue that graphs inherently contain rich structural and semantic information, and that their effective exploitation can unlock potential gains in LLMs reasoning performance. To this end, we propose Homophily-aware Structural and Semantic Compression for LLMs (HS2C), a framework centered on exploiting graph homophily. Structurally, guided by the principle of Structural Entropy minimization, we perform a global hierarchical partition that decodes the graph's essential topology. This partition identifies naturally cohesive, homophilic communities, while discarding stochastic connectivity noise. Semantically, we deliver the detected structural homophily to the LLM, empowering it to perform differentiated semantic aggregation based on predefined community type. This process compresses redundant background contexts into concise community-level consensus, selectively preserving semantically homophilic information aligned with the target nodes. Extensive experiments on 10 node-level benchmarks across LLMs of varying sizes and families demonstrate that, by feeding LLMs with structurally and semantically compressed inputs, HS2C simultaneously enhances the compression rate and downstream inference accuracy, validating its superiority and scalability. Extensions to 7 diverse graph-level benchmarks further consolidate HS2C's task generalizability.

preprint2026arXiv

The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms

Autonomous systems are increasingly deployed in open and dynamic environments -- from city streets to aerial and indoor spaces -- where perception models must remain reliable under sensor noise, environmental variation, and platform shifts. However, even state-of-the-art methods often degrade under unseen conditions, highlighting the need for robust and generalizable robot sensing. The RoboSense 2025 Challenge is designed to advance robustness and adaptability in robot perception across diverse sensing scenarios. It unifies five complementary research tracks spanning language-grounded decision making, socially compliant navigation, sensor configuration generalization, cross-view and cross-modal correspondence, and cross-platform 3D perception. Together, these tasks form a comprehensive benchmark for evaluating real-world sensing reliability under domain shifts, sensor failures, and platform discrepancies. RoboSense 2025 provides standardized datasets, baseline models, and unified evaluation protocols, enabling large-scale and reproducible comparison of robust perception methods. The challenge attracted 143 teams from 85 institutions across 16 countries, reflecting broad community engagement. By consolidating insights from 23 winning solutions, this report highlights emerging methodological trends, shared design principles, and open challenges across all tracks, marking a step toward building robots that can sense reliably, act robustly, and adapt across platforms in real-world environments.

preprint2023arXiv

Sequential Structure and Control Co-design of Lightweight Precision Stages with Active control of flexible modes

Precision motion stages are playing a prominent role in various manufacturing equipment. The drastically increasing demand for higher throughput in integrated circuit (IC) manufacturing and inspection calls for the next-generation precision stages that have light weight and high control bandwidth simultaneously. In today's design techniques, the stage's first flexible mode is limiting its achievable control bandwidth, which enforces a trade-off between the stage's acceleration and closed-loop stiffness and thus limits the system's overall performance. To overcome this challenge, this paper proposes a new hardware design and control framework for lightweight precision motion stages with the stage's low-frequency flexible modes actively controlled. Our method proposes to minimize the resonance frequency of the controlled mode to reduce the stage's weight, and to maximize that of the uncontrolled mode to enable high control bandwidth. In addition, the proposed framework determines the placement of the actuators and sensors to maximize the controllability/observability of the stage's controlled flexible mode while minimizing that of the uncontrolled mode, which effectively simplifies the controller designs. Two case studies are used to evaluate the effectiveness of the proposed framework. Simulation results show that the stage designed using the proposed method has a weight reduction of more than 55% compared to a baseline stage design. Improvement in control bandwidth was also achieved. These results demonstrate the effectiveness of the proposed method in achieving lightweight precision positioning stages with high acceleration, bandwidth, and precision.

preprint2022arXiv

ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications. To capture context at both global and local granularity, we propose ASpanFormer, a Transformer-based detector-free matcher that is built on hierarchical attention structure, adopting a novel attention operation which is capable of adjusting attention span in a self-adaptive manner. To achieve this goal, first, flow maps are regressed in each cross attention phase to locate the center of search region. Next, a sampling grid is generated around the center, whose size, instead of being empirically configured as fixed, is adaptively computed from a pixel uncertainty estimated along with the flow map. Finally, attention is computed across two images within derived regions, referred to as attention span. By these means, we are able to not only maintain long-range dependencies, but also enable fine-grained attention among pixels of high relevance that compensates essential locality and piece-wise smoothness in matching tasks. State-of-the-art accuracy on a wide range of evaluation benchmarks validates the strong matching capability of our method.

preprint2022arXiv

Automatic detection of multilevel communities: scalable and resolution-limit-free

Community structure is one of the most important features of complex networks. Modularity-based methods for community detection typically rely on heuristic algorithms to optimize a specific community quality function. Such methods are limited by two major defects: (1) the resolution limit problem, which prohibits communities of heterogeneous sizes being simultaneously detected, and (2) divergent outputs of the heuristic algorithm, which make it difficult to differentiate relevant and irrelevant results. In this paper, we propose an improved method for community detection based on a scalable community "fitness function." We introduced a new parameter to enhance its scalability, and a strict strategy to filter the outputs. Due to the scalability, on the one hand our method is free of the resolution limit problem and performs excellently on large heterogeneous networks, while on the other hand it is capable of detecting more levels of communities than previous methods in deep hierarchical networks. Moreover, our strict strategy automatically removes redundant and irrelevant results, without any artificial selection. As a result, our method neatly outputs only the stable and unique communities, which are largely interpretable by the a priori knowledge about the network, including the implanted structures within synthetic networks, or metadata for real-world networks.

preprint2022arXiv

Control Co-design of Actively Controlled Lightweight Structures for High-acceleration Precision Motion Systems

Precision motion stages are an essential part of a wide range of manufacturing equipment, and their motion performance are critical to the quality and throughput of the systems. The drastically increasing demand for higher manufacturing throughput in various processes necessities the development of next-generation motion systems with reduced moving weight and high control bandwidth. However, the reduction of moving stage's weight can lower the stage's structural resonance frequencies, making the hardware dynamics and controller design problem strongly coupled. Aiming at this challenge, this paper proposes a new formulation of nested hardware and control co-design framework for precision motion stages. The proposed framework explicitly optimizes the closed-loop control bandwidth with guaranteed robustness, and explicitly considers the limits in the physical system. Two case studies, including a motivating example using lumped-parameter mechanical system and a finite-element-simulated lightweight motion stage, are being used to evaluate the effectiveness of the proposed nested CCD framework. Simulation results show that the proposed nested CCD framework has 42\% of weight reduction and 28\% bandwidth improvement compared with a sequential design baseline, which demonstrates the efficacy of the proposed approach.

preprint2022arXiv

Economical Precise Manipulation and Auto Eye-Hand Coordination with Binocular Visual Reinforcement Learning

Precision robotic manipulation tasks (insertion, screwing, precisely pick, precisely place) are required in many scenarios. Previous methods achieved good performance on such manipulation tasks. However, such methods typically require tedious calibration or expensive sensors. 3D/RGB-D cameras and torque/force sensors add to the cost of the robotic application and may not always be economical. In this work, we aim to solve these but using only weak-calibrated and low-cost webcams. We propose Binocular Alignment Learning (BAL), which could automatically learn the eye-hand coordination and points alignment capabilities to solve the four tasks. Our work focuses on working with unknown eye-hand coordination and proposes different ways of performing eye-in-hand camera calibration automatically. The algorithm was trained in simulation and used a practical pipeline to achieve sim2real and test it on the real robot. Our method achieves a competitively good result with minimal cost on the four tasks.

preprint2022arXiv

Half a Dozen Real-World Applications of Evolutionary Multitasking, and More

Until recently, the potential to transfer evolved skills across distinct optimization problem instances (or tasks) was seldom explored in evolutionary computation. The concept of evolutionary multitasking (EMT) fills this gap. It unlocks a population's implicit parallelism to jointly solve a set of tasks, hence creating avenues for skills transfer between them. Despite it being early days, the idea of EMT has begun to show promise in a range of real-world applications. In the backdrop of recent advances, the contribution of this paper is twofold. First, a review of several application-oriented explorations of EMT in the literature is presented; the works are assimilated into half a dozen broad categories according to their respective application domains. Each of these six categories elaborates fundamental motivations to multitask, and contains a representative experimental study (referred from the literature). Second, a set of recipes is provided showing how problem formulations of general interest, those that cut across different disciplines, could be transformed in the new light of EMT. Our discussions emphasize the many practical use-cases of EMT, and is intended to spark future research towards crafting novel algorithms for real-world deployment.

preprint2022arXiv

Learning Prototype via Placeholder for Zero-shot Recognition

Zero-shot learning (ZSL) aims to recognize unseen classes by exploiting semantic descriptions shared between seen classes and unseen classes. Current methods show that it is effective to learn visual-semantic alignment by projecting semantic embeddings into the visual space as class prototypes. However, such a projection function is only concerned with seen classes. When applied to unseen classes, the prototypes often perform suboptimally due to domain shift. In this paper, we propose to learn prototypes via placeholders, termed LPL, to eliminate the domain shift between seen and unseen classes. Specifically, we combine seen classes to hallucinate new classes which play as placeholders of the unseen classes in the visual and semantic space. Placed between seen classes, the placeholders encourage prototypes of seen classes to be highly dispersed. And more space is spared for the insertion of well-separated unseen ones. Empirically, well-separated prototypes help counteract visual-semantic misalignment caused by domain shift. Furthermore, we exploit a novel semantic-oriented fine-tuning to guarantee the semantic reliability of placeholders. Extensive experiments on five benchmark datasets demonstrate the significant performance gain of LPL over the state-of-the-art methods. Code is available at https://github.com/zaiquanyang/LPL.

preprint2022arXiv

Lung Swapping Autoencoder: Learning a Disentangled Structure-texture Representation of Chest Radiographs

Well-labeled datasets of chest radiographs (CXRs) are difficult to acquire due to the high cost of annotation. Thus, it is desirable to learn a robust and transferable representation in an unsupervised manner to benefit tasks that lack labeled data. Unlike natural images, medical images have their own domain prior; e.g., we observe that many pulmonary diseases, such as the COVID-19, manifest as changes in the lung tissue texture rather than the anatomical structure. Therefore, we hypothesize that studying only the texture without the influence of structure variations would be advantageous for downstream prognostic and predictive modeling tasks. In this paper, we propose a generative framework, the Lung Swapping Autoencoder (LSAE), that learns factorized representations of a CXR to disentangle the texture factor from the structure factor. Specifically, by adversarial training, the LSAE is optimized to generate a hybrid image that preserves the lung shape in one image but inherits the lung texture of another. To demonstrate the effectiveness of the disentangled texture representation, we evaluate the texture encoder $Enc^t$ in LSAE on ChestX-ray14 (N=112,120), and our own multi-institutional COVID-19 outcome prediction dataset, COVOC (N=340 (Subset-1) + 53 (Subset-2)). On both datasets, we reach or surpass the state-of-the-art by finetuning $Enc^t$ in LSAE that is 77% smaller than a baseline Inception v3. Additionally, in semi-and-self supervised settings with a similar model budget, $Enc^t$ in LSAE is also competitive with the state-of-the-art MoCo. By "re-mixing" the texture and shape factors, we generate meaningful hybrid images that can augment the training set. This data augmentation method can further improve COVOC prediction performance. The improvement is consistent even when we directly evaluate the Subset-1 trained model on Subset-2 without any fine-tuning.

preprint2022arXiv

Self-Sensing Hysteresis-Type Bearingless Motor

Bearingless motors use a single stator assembly to apply torque and magnetic suspension forces on the rotor, making these machines compact with frictionless operation and thus well suited to high-speed applications. One major challenge that prevents wide usage of bearingless motors is the need for air-gap position sensors, which are typically expensive. Here we present a method to estimate the radial position of a hysteresis-type bearingless motor using the inductance variation of the stator coils amplified by an injected high-frequency signal. We have carried out finite element (FE) simulations to demonstrate its feasibility, and have constructed a prototype self-sensing bearingless motor for experimental validations.

preprint2021arXiv

Goal-Oriented Gaze Estimation for Zero-Shot Learning

Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Since semantic knowledge is built on attributes shared between different classes, which are highly local, strong prior for localization of object attribute is beneficial for visual-semantic embedding. Interestingly, when recognizing unseen images, human would also automatically gaze at regions with certain semantic clue. Therefore, we introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization based on the class-level attributes for ZSL. We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description. Specifically, the task-dependent attention is learned with the goal-oriented GEM, and the global image features are simultaneously optimized with the regression of local attribute features. Experiments on three ZSL benchmarks, i.e., CUB, SUN and AWA2, show the superiority or competitiveness of our proposed method against the state-of-the-art ZSL methods. The ablation analysis on real gaze data CUB-VWSW also validates the benefits and accuracy of our gaze estimation module. This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks. The code is available at https://github.com/osierboy/GEM-ZSL.

preprint2021arXiv

Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning

Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Though many ZSL methods rely on a direct mapping between the visual and the semantic space, the calibration deviation and hubness problem limit the generalization capability to unseen classes. Recently emerged generative ZSL methods generate unseen image features to transform ZSL into a supervised classification problem. However, most generative models still suffer from the seen-unseen bias problem as only seen data is used for training. To address these issues, we propose a novel bidirectional embedding based generative model with a tight visual-semantic coupling constraint. We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces. Since the embedding from high-dimensional visual features comprise much non-semantic information, the alignment of visual and semantic in latent space would inevitably been deviated. Therefore, we introduce information bottleneck (IB) constraint to ZSL for the first time to preserve essential attribute information during the mapping. Specifically, we utilize the uncertainty estimation and the wake-sleep procedure to alleviate the feature noises and improve model abstraction capability. In addition, our method can be easily extended to transductive ZSL setting by generating labels for unseen images. We then introduce a robust loss to solve this label noise problem. Extensive experimental results show that our method outperforms the state-of-the-art methods in different ZSL settings on most benchmark datasets. The code will be available at https://github.com/osierboy/IBZSL.

preprint2021arXiv

PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Removing outlier correspondences is one of the critical steps for successful feature-based point cloud registration. Despite the increasing popularity of introducing deep learning methods in this field, spatial consistency, which is essentially established by a Euclidean transformation between point clouds, has received almost no individual attention in existing learning frameworks. In this paper, we present PointDSC, a novel deep neural network that explicitly incorporates spatial consistency for pruning outlier correspondences. First, we propose a nonlocal feature aggregation module, weighted by both feature and spatial coherence, for feature embedding of the input correspondences. Second, we formulate a differentiable spectral matching module, supervised by pairwise spatial compatibility, to estimate the inlier confidence of each correspondence from the embedded features. With modest computation cost, our method outperforms the state-of-the-art hand-crafted and learning-based outlier rejection approaches on several real-world datasets by a significant margin. We also show its wide applicability by combining PointDSC with different 3D local descriptors.

preprint2020arXiv

ASLFeat: Learning Local Features of Accurate Shape and Localization

This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors. First, the ability to estimate the local shape (scale, orientation, etc.) of feature points is often neglected during dense feature extraction, while the shape-awareness is crucial to acquire stronger geometric invariance. Second, the localization accuracy of detected keypoints is not sufficient to reliably recover camera geometry, which has become the bottleneck in tasks such as 3D reconstruction. In this paper, we present ASLFeat, with three light-weight yet effective modifications to mitigate above issues. First, we resort to deformable convolutional networks to densely estimate and apply local transformation. Second, we take advantage of the inherent feature hierarchy to restore spatial resolution and low-level details for accurate keypoint localization. Finally, we use a peakiness measurement to relate feature responses and derive more indicative detection scores. The effect of each modification is thoroughly studied, and the evaluation is extensively conducted across a variety of practical scenarios. State-of-the-art results are reported that demonstrate the superiority of our methods.

preprint2020arXiv

BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks

While deep learning has recently achieved great success on multi-view stereo (MVS), limited training data makes the trained model hard to be generalized to unseen scenarios. Compared with other computer vision tasks, it is rather difficult to collect a large-scale MVS dataset as it requires expensive active scanners and labor-intensive process to obtain ground truth 3D structures. In this paper, we introduce BlendedMVS, a novel large-scale dataset, to provide sufficient training ground truth for learning-based MVS. To create the dataset, we apply a 3D reconstruction pipeline to recover high-quality textured meshes from images of well-selected scenes. Then, we render these mesh models to color images and depth maps. To introduce the ambient lighting information during training, the rendered color images are further blended with the input images to generate the training input. Our dataset contains over 17k high-resolution images covering a variety of scenes, including cities, architectures, sculptures and small objects. Extensive experiments demonstrate that BlendedMVS endows the trained model with significantly better generalization ability compared with other MVS datasets. The dataset and pretrained models are available at \url{https://github.com/YoYo000/BlendedMVS}.

preprint2020arXiv

D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

A successful point cloud registration often lies on robust establishment of sparse matches through discriminative 3D local features. Despite the fast evolution of learning-based 3D feature descriptors, little attention has been drawn to the learning of 3D feature detectors, even less for a joint learning of the two tasks. In this paper, we leverage a 3D fully convolutional network for 3D point clouds, and propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point. In particular, we propose a keypoint selection strategy that overcomes the inherent density variations of 3D point clouds, and further propose a self-supervised detector loss guided by the on-the-fly feature matching results during training. Finally, our method achieves state-of-the-art results in both indoor and outdoor scenarios, evaluated on 3DMatch and KITTI datasets, and shows its strong generalization ability on the ETH dataset. Towards practical use, we show that by adopting a reliable feature detector, sampling a smaller number of features is sufficient to achieve accurate and fast point cloud alignment.[code release](https://github.com/XuyangBai/D3Feat)

preprint2020arXiv

Dynamics of charged dust in the orbit of Venus

We study the dynamics of co-orbital dust in the inner solar system, i.e. the role of the solar radiation pressure, Poynting-Robertson effect, solar wind, and the interplanetary magnetic field on the location, width and stability of resonant motion of charged, and micron sized dust grains situated in the 1:1 mean motion resonance with planet Venus. We find deviations and asymmetry between $L_4$ and $L_5$ in locations of libration centers and libration width under the influence of non-gravitational effects via both analytical and numerical methods. The triangular Lagrangian points become unstable once we take into consideration solar radiation pressure, the Poynting-Robertson effect and solar wind drag. The Lorentz force could further destabilize the orbits, especially for small dust particles. We also make a comparison between the circular, elliptic restricted three-body model and a more complete model including all planets.

preprint2020arXiv

End-to-end Optimized Video Compression with MV-Residual Prediction

We present an end-to-end trainable framework for P-frame compression in this paper. A joint motion vector (MV) and residual prediction network MV-Residual is designed to extract the ensembled features of motion representations and residual information by treating the two successive frames as inputs. The prior probability of the latent representations is modeled by a hyperprior autoencoder and trained jointly with the MV-Residual network. Specially, the spatially-displaced convolution is applied for video frame prediction, in which a motion kernel for each pixel is learned to generate predicted pixel by applying the kernel at a displaced location in the source image. Finally, novel rate allocation and post-processing strategies are used to produce the final compressed bits, considering the bits constraint of the challenge. The experimental results on validation set show that the proposed optimized framework can generate the highest MS-SSIM for P-frame compression competition.

preprint2020arXiv

Joint Semantic Segmentation and Boundary Detection using Iterative Pyramid Contexts

In this paper, we present a joint multi-task learning framework for semantic segmentation and boundary detection. The critical component in the framework is the iterative pyramid context module (PCM), which couples two tasks and stores the shared latent semantics to interact between the two tasks. For semantic boundary detection, we propose the novel spatial gradient fusion to suppress nonsemantic edges. As semantic boundary detection is the dual task of semantic segmentation, we introduce a loss function with boundary consistency constraint to improve the boundary pixel accuracy for semantic segmentation. Our extensive experiments demonstrate superior performance over state-of-the-art works, not only in semantic segmentation but also in semantic boundary detection. In particular, a mean IoU score of 81:8% on Cityscapes test set is achieved without using coarse data or any external data for semantic segmentation. For semantic boundary detection, we improve over previous state-of-the-art works by 9.9% in terms of AP and 6:8% in terms of MF(ODS).

preprint2020arXiv

KFNet: Learning Temporal Camera Relocalization using Kalman Filtering

Temporal camera relocalization estimates the pose with respect to each video frame in sequence, as opposed to one-shot relocalization which focuses on a still image. Even though the time dependency has been taken into account, current temporal relocalization methods still generally underperform the state-of-the-art one-shot approaches in terms of accuracy. In this work, we improve the temporal relocalization method by using a network architecture that incorporates Kalman filtering (KFNet) for online camera relocalization. In particular, KFNet extends the scene coordinate regression problem to the time domain in order to recursively establish 2D and 3D correspondences for the pose determination. The network architecture design and the loss formulation are based on Kalman filtering in the context of Bayesian learning. Extensive experiments on multiple relocalization benchmarks demonstrate the high accuracy of KFNet at the top of both one-shot and temporal relocalization approaches. Our codes are released at https://github.com/zlthinker/KFNet.

preprint2020arXiv

Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation

In this paper, we introduce a novel network, called discriminative feature network (DFNet), to address the unsupervised video object segmentation task. To capture the inherent correlation among video frames, we learn discriminative features (D-features) from the input images that reveal feature distribution from a global perspective. The D-features are then used to establish correspondence with all features of test image under conditional random field (CRF) formulation, which is leveraged to enforce consistency between pixels. The experiments verify that DFNet outperforms state-of-the-art methods by a large margin with a mean IoU score of 83.4% and ranks first on the DAVIS-2016 leaderboard while using much fewer parameters and achieving much more efficient performance in the inference phase. We further evaluate DFNet on the FBMS dataset and the video saliency dataset ViSal, reaching a new state-of-the-art. To further demonstrate the generalizability of our framework, DFNet is also applied to the image object co-segmentation task. We perform experiments on a challenging dataset PASCAL-VOC and observe the superiority of DFNet. The thorough experiments verify that DFNet is able to capture and mine the underlying relations of images and discover the common foreground objects.

preprint2020arXiv

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency

Recent learning-based approaches, in which models are trained by single-view images have shown promising results for monocular 3D face reconstruction, but they suffer from the ill-posed face pose and depth ambiguity issue. In contrast to previous works that only enforce 2D feature constraints, we propose a self-supervised training architecture by leveraging the multi-view geometry consistency, which provides reliable constraints on face pose and depth estimation. We first propose an occlusion-aware view synthesis method to apply multi-view geometry consistency to self-supervised learning. Then we design three novel loss functions for multi-view consistency, including the pixel consistency loss, the depth consistency loss, and the facial landmark-based epipolar loss. Our method is accurate and robust, especially under large variations of expressions, poses, and illumination conditions. Comprehensive experiments on the face alignment and 3D face reconstruction benchmarks have demonstrated superiority over state-of-the-art methods. Our code and model are released in https://github.com/jiaxiangshang/MGCNet.

preprint2020arXiv

Tunable Graphene Split-Ring Resonators

A split-ring resonator is a prototype of meta-atom in metamaterials. Though noble metal-based split-ring resonators have been extensively studied, up to date, there is no experimental demonstration of split-ring resonators made from graphene, an emerging intriguing plasmonic material. Here, we experimentally demonstrate graphene split-ring resonators with deep subwavelength (about one hundredth of the excitation wavelength) magnetic dipole response in the terahertz regime. Meanwhile, the quadrupole and electric dipole are observed,depending on the incident light polarization. All modes can be tuned via chemical doping or stacking multiple graphene layers. The strong interaction with surface polar phonons of the SiO2 substrate also significantly modifies the response. Finite-element frequency domain simulations nicely reproduce experimental results. Our study moves one stride forward toward the multi-functional graphene metamaterials, beyond simple graphene ribbon or disk arrays with electrical dipole resonances only.

preprint2019arXiv

A systematic survey of the dynamics of Uranus Trojans

We aim to locate the stability region for Uranus Trojans (UT hereafter) and find out the dynamical mechanisms responsible for the structures in the phase space. Using the spectral number as the stability indicator, we construct the dynamical maps on the (a0, i0) plane. The proper frequencies of UTs are determined precisely so that we can depict the resonance web via a semi-analytical method. Two main stability regions are found, one each for the low-inclination (0-14deg) and high-inclination regime (32-59deg). There is also an instability strip in each of them, at 9deg and 51deg respectively. All stability regions are in the tadpole regime and no stable horseshoe orbits exist for UTs. The lack of moderate-inclined UTs is caused by the nu5 and nu7 secular resonances. The fine structures in the dynamical maps are shaped by high-degree secular resonances and secondary resonances. During the planetary migration, about 36.3% and 0.4% of the pre-formed orbits survive the fast and slow migrations (with migrating time scales of 1 and 10Myr) respectively, most of which are in high inclination. Since the low-inclined UTs are more likely to survive the age of the solar system, they make up 77% of all such long-life orbits by the end of the migration, making a total fraction up to 4.06E-3 and 9.07E-5 of the original population for the fast and slow migrations, respectively. About 3.81% UTs are able to survive the age of the solar system, among which 95.5% are on low-inclined orbits with i0<7.5deg. However, the depletion of the planetary migration seems to prevent a large fraction of such orbits, especially for the slow migration model.