Researcher profile

Xiaodong Yang

Xiaodong Yang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
27works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

27 published item(s)

preprint2026arXiv

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenarios where supervision is sparse and causal understanding is limited. We introduce Alpamayo-R1 (AR1), a vision-language-action model (VLA) that integrates Chain of Causation reasoning with trajectory planning for complex driving scenarios. Our approach features three key innovations: (1) the Chain of Causation (CoC) dataset, built through a hybrid auto-labeling and human-in-the-loop pipeline producing decision-grounded, causally linked reasoning traces aligned with driving behaviors; (2) a modular VLA architecture combining Cosmos-Reason, a vision-language model pre-trained for Physical AI, with a diffusion-based trajectory decoder that generates dynamically feasible trajectories in real time; (3) a multi-stage training strategy using supervised fine-tuning to elicit reasoning and reinforcement learning (RL) to enforce reasoning-action consistency and optimize reasoning quality. AR1 achieves up to a 12% improvement in planning accuracy on challenging cases compared to a trajectory-only baseline, with a 35% reduction in close encounter rate in closed-loop simulation. RL post-training improves reasoning quality by 45% and reasoning-action consistency by 37%. Model scaling from 0.5B to 7B parameters shows consistent improvements. On-vehicle road tests confirm real-time performance (99 ms latency) and successful urban deployment. By bridging interpretable reasoning with precise control, AR1 demonstrates a practical path towards Level 4 autonomous driving. Model weights are available at https://huggingface.co/nvidia/Alpamayo-R1-10B with inference code at https://github.com/NVlabs/alpamayo.

preprint2026arXiv

Continuous Unitary Designs for Universally Robust Quantum Control

Unitary designs are unitary ensembles that emulate Haar-random unitary statistics. They provide a vital tool for studying quantum randomness and have found broad applications in quantum technologies. However, existing research has focused on discrete ensembles, despite that many physical processes, such as in quantum chaos, thermalization, and control, naturally involve continuous ensembles generated from continuous time-evolution. Here we initial the study of continuous unitary designs, addressing fundamental questions about their construction and practical utility. For single-qubit system, we construct explicit unitary 1-design paths from spherical 2-design curves and Hopf fibration theory. For arbitrary dimensions, we develop two systematic construction frameworks, one based on topological bundle theory of the unitary group and the other based on the Heisenberg-Weyl group. On the practical front, our unitary design paths provide analytical solutions to universally robust quantum control. Simulations show they outperform conventional pulse techniques in mitigating arbitrary unknown static noises, demonstrating immediate utility for quantum engineering. Extending unitary designs to the continuous domain not only introduces powerful geometric and topological tools that complement conventional combinatorial and group-theoretic methods, but also enhances experimental feasibility over discrete counterparts which usually involve instantaneous pulses. As an outlook, we anticipate that this work will pave the way for using continuous unitary designs to explore complex quantum dynamics and devise quantum information protocols.

preprint2026arXiv

Experimental realization of quantum Zeno dynamics for robust quantum metrology

Quantum Zeno dynamics (QZD), which restricts the system's evolution to a protected subspace, provides a promising approach for protecting quantum information from noise. Here, we explore a practical approach to harnessing QZD for robust quantum metrology. By introducing strong inter-particle interactions during the parameter encoding stage, we overcome the typical limitations of previous QZD studies, which have largely focused on single-particle systems and faced challenges where QZD could interfere with the encoding process. We experimentally validate the proposed scheme on a nuclear magnetic resonance platform, achieving near-optimal precision scaling under amplitude damping in both parallel and sequential settings. Numerical simulations further demonstrate the scalability of the approach and its compatibility with other control techniques for suppressing more general types of noise. These findings highlight QZD as a powerful strategy for noise-resilient quantum metrology.

preprint2026arXiv

Parallel Quantum Gates via Scalable Subsystem-Optimized Robust Control

Accurate and efficient implementation of parallel quantum gates is crucial for scalable quantum information processing. However, the unavoidable crosstalk between qubits in current noisy processors impedes the achievement of high gate fidelities and renders full Hilbert-space control optimization prohibitively difficult. Here, we overcome this challenge by reducing the full-system optimization to crosstalk-robust control over constant-sized subsystems, which dramatically reduces the computational cost. Our method effectively eliminates the leading-order gate operation deviations induced by crosstalk, thereby suppressing error rates. Within this framework, we construct analytical pulse solutions for parallel single-qubit gates and numerical pulses for parallel multi-qubit operations. We validate the proposed approach numerically across multiple platforms, including coupled nitrogen-vacancy centers, a nuclear-spin processor, and superconducting-qubit arrays with up to 200 qubits. As a result, the noise scaling is reduced from exponential to linear for parallel single-qubit gates, and an order-of-magnitude reduction is achieved for parallel multi-qubit gates. Moreover, our method does not require precise knowledge of crosstalk strengths and makes no assumption about the underlying qubit connectivity or lattice geometry, thereby establishing a scalable framework for parallel quantum control in large-scale quantum architectures.

preprint2022arXiv

Experimental Realization of a Quantum Refrigerator Driven by Indefinite Causal Orders

Indefinite causal order (ICO) is playing a key role in recent quantum technologies. Here, we experimentally study quantum thermodynamics driven by ICO on nuclear spins using the nuclear magnetic resonance system. We realize the ICO of two thermalizing channels to exhibit how the mechanism works, and show that the working substance can be cooled or heated albeit it undergoes thermal contacts with reservoirs of the same temperature. Moreover, we construct a single cycle of the ICO refrigerator based on the Maxwell's demon mechanism, and evaluate its performance by measuring the work consumption and the heat energy extracted from the low-temperature reservoir. Unlike classical refrigerators in which the coefficient of performance (COP) is perversely higher the closer the temperature of the high-temperature and low-temperature reservoirs are to each other, the ICO refrigerator's COP is always bounded to small values due to the non-unit success probability in projecting the ancillary qubit to the preferable subspace. To enhance the COP, we propose and experimentally demonstrate a general framework based on the density matrix exponentiation (DME) approach, as an extension to the ICO refrigeration. The COP is observed to be enhanced by more than three times with the DME approach. Our work demonstrates a new way for non-classical heat exchange, and paves the way towards construction of quantum refrigerators on a quantum system.

preprint2022arXiv

Hierarchical Contrastive Motion Learning for Video Action Recognition

One central question for video action recognition is how to model motion. In this paper, we present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames. Our approach progressively learns a hierarchy of motion features that correspond to different abstraction levels in a network. This hierarchical design bridges the semantic gap between low-level motion cues and high-level recognition tasks, and promotes the fusion of appearance and motion information at multiple levels. At each level, an explicit motion self-supervision is provided via contrastive learning to enforce the motion features at the current level to predict the future ones at the previous level. Thus, the motion features at higher levels are trained to gradually capture semantic dynamics and evolve more discriminative for action recognition. Our motion learning module is lightweight and flexible to be embedded into various backbone networks. Extensive experiments on four benchmarks show that the proposed approach consistently achieves superior results.

preprint2022arXiv

Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization

By ensuring differential privacy in the learning algorithms, one can rigorously mitigate the risk of large models memorizing sensitive training data. In this paper, we study two algorithms for this purpose, i.e., DP-SGD and DP-NSGD, which first clip or normalize \textit{per-sample} gradients to bound the sensitivity and then add noise to obfuscate the exact information. We analyze the convergence behavior of these two algorithms in the non-convex optimization setting with two common assumptions and achieve a rate $\mathcal{O}\left(\sqrt[4]{\frac{d\log(1/δ)}{N^2ε^2}}\right)$ of the gradient norm for a $d$-dimensional model, $N$ samples and $(ε,δ)$-DP, which improves over previous bounds under much weaker assumptions. Specifically, we introduce a regularizing factor in DP-NSGD and show that it is crucial in the convergence proof and subtly controls the bias and noise trade-off. Our proof deliberately handles the per-sample gradient clipping and normalization that are specified for the private setting. Empirically, we demonstrate that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD and hence may help further save the privacy budget when accounting the tuning effort.

preprint2022arXiv

QML for Argoverse 2 Motion Forecasting Challenge

To safely navigate in various complex traffic scenarios, autonomous driving systems are generally equipped with a motion forecasting module to provide vital information for the downstream planning module. For the real-world onboard applications, both accuracy and latency of a motion forecasting model are essential. In this report, we present an effective and efficient solution, which ranks the 3rd place in the Argoverse 2 Motion Forecasting Challenge 2022.

preprint2022arXiv

Quantum Control for Time-dependent Noise by Inverse Geometric Optimization

Quantum systems are exceedingly difficult to engineer because they are sensitive to various types of noises. In particular, time-dependent noises are frequently encountered in experiments but how to overcome them remains a challenging problem. In this work, we extend and apply the recently proposed robust control technique of inverse geometric optimization to time-dependent noises by working it in the filter-function formalism. The basic idea is to parameterize the control filter function geometrically and minimize its overlap with the noise spectral density. This then effectively reduces the noise susceptibility of the controlled system evolution. We show that the proposed method can produce high-quality robust pulses for realizing desired quantum evolutions under realistic noise models, and thus will find practical applications for current physical platforms.

preprint2022arXiv

Robust quantum control for the manipulation of solid-state spins

Robust and high-fidelity control of electron spins in solids is the cornerstone for facilitating applications of solid-state spins in quantum information processing and quantum sensing. However, precise control of spin systems is always challenging due to the presence of a variety of noises originating from the environment and control fields. Herein, noise-resilient quantum gates, designed with robust optimal control (ROC) algorithms, are demonstrated experimentally with nitrogen-vacancy (NV) centers in diamond to realize tailored robustness against detunings and Rabi errors simultaneously. In the presence of both 10% off-resonant detuning and deviation of a Rabi frequency, we achieve an average single-qubit gate fidelity of up to 99.97%. Our experiments also show that, ROCbased multipulse quantum sensing sequences can suppress spurious responses resulting from finite widths and imperfections of microwave pulses, which provides an efficient strategy for enhancing the performance of existing multipulse quantum sensing sequences.

preprint2022arXiv

TL-GAN: Improving Traffic Light Recognition via Data Synthesis for Autonomous Driving

Traffic light recognition, as a critical component of the perception module of self-driving vehicles, plays a vital role in the intelligent transportation systems. The prevalent deep learning based traffic light recognition methods heavily hinge on the large quantity and rich diversity of training data. However, it is quite challenging to collect data in various rare scenarios such as flashing, blackout or extreme weather, thus resulting in the imbalanced distribution of training data and consequently the degraded performance in recognizing rare classes. In this paper, we seek to improve traffic light recognition by leveraging data synthesis. Inspired by the generative adversarial networks (GANs), we propose a novel traffic light generation approach TL-GAN to synthesize the data of rare classes to improve traffic light recognition for autonomous driving. TL-GAN disentangles traffic light sequence generation into image synthesis and sequence assembling. In the image synthesis stage, our approach enables conditional generation to allow full control of the color of the generated traffic light images. In the sequence assembling stage, we design the style mixing and adaptive template to synthesize realistic and diverse traffic light sequences. Extensive experiments show that the proposed TL-GAN renders remarkable improvement over the baseline without using the generated data, leading to the state-of-the-art performance in comparison with the competing algorithms that are used for general image synthesis and data imbalance tackling.

preprint2021arXiv

Experimental Adiabatic Quantum Metrology with the Heisenberg scaling

The critical quantum metrology, which exploits the quantum phase transition for high precision measurement, has gained increasing attention recently. The critical quantum metrology with the continuous quantum phase transition, however, is experimentally very challenging since the continuous quantum phase transition only exists at the thermal dynamical limit. Here, we propose an adiabatic scheme on a perturbed Ising spin model with the first order quantum phase transition. By employing the Landau-Zener anticrossing, we can not only encode the unknown parameter in the ground state but also tune the energy gap to control the evolution time of the adiabatic passage. We experimentally implement the adiabatic scheme on the nuclear magnetic resonance and show that the achieved precision attains the Heisenberg scaling. The advantages of the scheme-easy implementation, robust against the decay, tunable energy gap-are critical for practical applications of quantum metrology.

preprint2021arXiv

Hybrid quantum-classical approach to enhanced quantum metrology

Quantum metrology plays a fundamental role in many scientific areas. However, the complexity of engineering entangled probes and the external noise raise technological barriers for realizing the expected precision of the to-be-estimated parameter with given resources. Here, we address this problem by introducing adjustable controls into the encoding process and then utilizing a hybrid quantum-classical approach to automatically optimize the controls online. Our scheme does not require any complex or intractable off-line design, and it can inherently correct certain unitary errors during the learning procedure. We also report the first experimental demonstration of this promising scheme for the task of finding optimal probes for frequency estimation on a nuclear magnetic resonance (NMR) processor. The proposed scheme paves the way to experimentally auto-search optimal protocol for improving the metrology precision.

preprint2021arXiv

Robust Dynamical Decoupling for the Manipulation of a Spin Network via a Single Spin

High-fidelity control of quantum systems is crucial for quantum information processing, but is often limited by perturbations from the environment and imperfections in the applied control fields. Here, we investigate the combination of dynamical decoupling (DD) and robust optimal control (ROC) to address this problem. In this combination, ROC is employed to find robust shaped pulses, wherein the directional derivatives of the controlled dynamics with respect to control errors are reduced to a desired order. Then, we incorporate ROC pulses into DD sequences, achieving a remarkable improvement of robustness against multiple error channels. We demonstrate this method in the example of manipulating nuclear spin bath via an electron spin in the NV center system. Simulation results indicate that ROC based DD sequences outperform the state-of-the-art robust DD sequences. Our work has implications for robust quantum control on near-term noisy quantum devices.

preprint2020arXiv

Combining the synergistic control capabilities of modelling and experiments: illustration of finding a minimum time quantum objective

A common way to manipulate a quantum system, for example spins or artificial atoms, is to use properly tailored control pulses. In order to accomplish quantum information tasks before coherence is lost, it is crucial to implement the control in the shortest possible time. Here we report the near time-optimal preparation of a Bell state with fidelity higher than $99\%$ in an NMR experiment, which is feasible by combining the synergistic capabilities of modelling and experiments operating in tandem. The pulses preparing the Bell state are found by experiments that are recursively assisted with a gradient-based optimization algorithm working with a model. Thus, we explore the interplay between model-based numerical optimal design and experimental-based learning control. Utilizing the balanced synergism between the dual approaches should have broad applications for accelerating the search for optimal quantum controls.

preprint2020arXiv

Contrastive Learning for Weakly Supervised Phrase Grounding

Phrase grounding, the problem of associating image regions to caption words, is a crucial component of vision-language tasks. We show that phrase grounding can be learned by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words. Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions. A key idea is to construct effective negative captions for learning through language model guided word substitutions. Training with our negatives yields a $\sim10\%$ absolute gain in accuracy over randomly-sampled negatives from the training data. Our weakly supervised phrase grounding model trained on COCO-Captions shows a healthy gain of $5.7\%$ to achieve $76.7\%$ accuracy on Flickr30K Entities benchmark.

preprint2020arXiv

FOCUS: Dealing with Label Quality Disparity in Federated Learning

Ubiquitous systems with End-Edge-Cloud architecture are increasingly being used in healthcare applications. Federated Learning (FL) is highly useful for such applications, due to silo effect and privacy preserving. Existing FL approaches generally do not account for disparities in the quality of local data labels. However, the clients in ubiquitous systems tend to suffer from label noise due to varying skill-levels, biases or malicious tampering of the annotators. In this paper, we propose Federated Opportunistic Computing for Ubiquitous Systems (FOCUS) to address this challenge. It maintains a small set of benchmark samples on the FL server and quantifies the credibility of the client local data without directly observing them by computing the mutual cross-entropy between performance of the FL model on the local datasets and that of the client local FL model on the benchmark dataset. Then, a credit weighted orchestration is performed to adjust the weight assigned to clients in the FL model based on their credibility values. FOCUS has been experimentally evaluated on both synthetic data and real-world data. The results show that it effectively identifies clients with noisy labels and reduces their impact on the model performance, thereby significantly outperforming existing FL approaches.

preprint2020arXiv

Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification

Although a significant progress has been witnessed in supervised person re-identification (re-id), it remains challenging to generalize re-id models to new domains due to the huge domain gaps. Recently, there has been a growing interest in using unsupervised domain adaptation to address this scalability issue. Existing methods typically conduct adaptation on the representation space that contains both id-related and id-unrelated factors, thus inevitably undermining the adaptation efficacy of id-related features. In this paper, we seek to improve adaptation by purifying the representation space to be adapted. To this end, we propose a joint learning framework that disentangles id-related/unrelated features and enforces adaptation to work on the id-related feature space exclusively. Our model involves a disentangling module that encodes cross-domain images into a shared appearance space and two separate structure spaces, and an adaptation module that performs adversarial alignment and self-training on the shared appearance space. The two modules are co-designed to be mutually beneficial. Extensive experiments demonstrate that the proposed joint learning framework outperforms the state-of-the-art methods by clear margins.

preprint2020arXiv

NNV: The Neural Network Verification Tool for Deep Neural Networks and Learning-Enabled Cyber-Physical Systems

This paper presents the Neural Network Verification (NNV) software tool, a set-based verification framework for deep neural networks (DNNs) and learning-enabled cyber-physical systems (CPS). The crux of NNV is a collection of reachability algorithms that make use of a variety of set representations, such as polyhedra, star sets, zonotopes, and abstract-domain representations. NNV supports both exact (sound and complete) and over-approximate (sound) reachability algorithms for verifying safety and robustness properties of feed-forward neural networks (FFNNs) with various activation functions. For learning-enabled CPS, such as closed-loop control systems incorporating neural networks, NNV provides exact and over-approximate reachability analysis schemes for linear plant models and FFNN controllers with piecewise-linear activation functions, such as ReLUs. For similar neural network control systems (NNCS) that instead have nonlinear plant models, NNV supports over-approximate analysis by combining the star set analysis used for FFNN controllers with zonotope-based analysis for nonlinear plant dynamics building on CORA. We evaluate NNV using two real-world case studies: the first is safety verification of ACAS Xu networks and the second deals with the safety verification of a deep learning-based adaptive cruise control system.

preprint2020arXiv

Optimizing adiabatic quantum pathways via a learning algorithm

Designing proper time-dependent control fields for slowly varying the system to the ground state that encodes the problem solution is crucial for adiabatic quantum computation. However, inevitable perturbations in real applications demand us to accelerate the evolution so that the adiabatic errors can be prevented from accumulation. Here, by treating this trade-off task as a multiobjective optimization problem, we propose a gradient-free learning algorithm with pulse smoothing technique to search optimal adiabatic quantum pathways and apply it to the Landau-Zener Hamiltonian and Grover search Hamiltonian. Numerical comparisons with a linear schedule, local adiabatic theorem induced schedule, and gradient-based algorithm searched schedule reveal that the proposed method can achieve significant performance improvements in terms of the adiabatic time and the instantaneous ground-state population maintenance. The proposed method can be used to solve more complex and real adiabatic quantum computation problems.

preprint2020arXiv

PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data

In comparison with person re-identification (ReID), which has been widely studied in the research community, vehicle ReID has received less attention. Vehicle ReID is challenging due to 1) high intra-class variability (caused by the dependency of shape and appearance on viewpoint), and 2) small inter-class variability (caused by the similarity in shape and appearance between vehicles produced by different manufacturers). To address these challenges, we propose a Pose-Aware Multi-Task Re-Identification (PAMTRI) framework. This approach includes two innovations compared with previous methods. First, it overcomes viewpoint-dependency by explicitly reasoning about vehicle pose and shape via keypoints, heatmaps and segments from pose estimation. Second, it jointly classifies semantic vehicle attributes (colors and types) while performing ReID, through multi-task learning with the embedded pose representations. Since manually labeling images with detailed pose and attribute information is prohibitive, we create a large-scale highly randomized synthetic dataset with automatically annotated vehicle attributes for training. Extensive experiments validate the effectiveness of each proposed component, showing that PAMTRI achieves significant improvement over state-of-the-art on two mainstream vehicle ReID benchmarks: VeRi and CityFlow-ReID. Code and models are available at https://github.com/NVlabs/PAMTRI.

preprint2020arXiv

Probe optimization for quantum metrology via closed-loop learning control

Experimentally achieving the precision that standard quantum metrology schemes promise is always challenging. Recently, additional controls were applied to design feasible quantum metrology schemes. However, these approaches generally does not consider ease of implementation, raising technological barriers impeding its realization. In this paper, we circumvent this problem by applying closed-loop learning control to propose a practical controlled sequential scheme for quantum metrology. Purity loss of the probe state, which relates to quantum Fisher information, is measured efficiently as the fitness to guide the learning loop. We confirm its feasibility and certain superiorities over standard quantum metrology schemes by numerical analysis and proof-of-principle experiments in a nuclear magnetic resonance (NMR) system.

preprint2020arXiv

Reachability Analysis for Feed-Forward Neural Networks using Face Lattices

Deep neural networks have been widely applied as an effective approach to handle complex and practical problems. However, one of the most fundamental open problems is the lack of formal methods to analyze the safety of their behaviors. To address this challenge, we propose a parallelizable technique to compute exact reachable sets of a neural network to an input set. Our method currently focuses on feed-forward neural networks with ReLU activation functions. One of the primary challenges for polytope-based approaches is identifying the intersection between intermediate polytopes and hyperplanes from neurons. In this regard, we present a new approach to construct the polytopes with the face lattice, a complete combinatorial structure. The correctness and performance of our methodology are evaluated by verifying the safety of ACAS Xu networks and other benchmarks. Compared to state-of-the-art methods such as Reluplex, Marabou, and NNV, our approach exhibits a significantly higher efficiency. Additionally, our approach is capable of constructing the complete input set given an output set, so that any input that leads to safety violation can be tracked.

preprint2020arXiv

Reachable Set Estimation for Neural Network Control Systems: A Simulation-Guided Approach

The vulnerability of artificial intelligence (AI) and machine learning (ML) against adversarial disturbances and attacks significantly restricts their applicability in safety-critical systems including cyber-physical systems (CPS) equipped with neural network components at various stages of sensing and control. This paper addresses the reachable set estimation and safety verification problems for dynamical systems embedded with neural network components serving as feedback controllers. The closed-loop system can be abstracted in the form of a continuous-time sampled-data system under the control of a neural network controller. First, a novel reachable set computation method in adaptation to simulations generated out of neural networks is developed. The reachability analysis of a class of feedforward neural networks called multilayer perceptrons (MLP) with general activation functions is performed in the framework of interval arithmetic. Then, in combination with reachability methods developed for various dynamical system classes modeled by ordinary differential equations, a recursive algorithm is developed for over-approximating the reachable set of the closed-loop system. The safety verification for neural network control systems can be performed by examining the emptiness of the intersection between the over-approximation of reachable sets and unsafe sets. The effectiveness of the proposed approach has been validated with evaluations on a robotic arm model and an adaptive cruise control system.

preprint2020arXiv

Robust Multimodal Image Registration Using Deep Recurrent Reinforcement Learning

The crucial components of a conventional image registration method are the choice of the right feature representations and similarity measures. These two components, although elaborately designed, are somewhat handcrafted using human knowledge. To this end, these two components are tackled in an end-to-end manner via reinforcement learning in this work. Specifically, an artificial agent, which is composed of a combined policy and value network, is trained to adjust the moving image toward the right direction. We train this network using an asynchronous reinforcement learning algorithm, where a customized reward function is also leveraged to encourage robust image registration. This trained network is further incorporated with a lookahead inference to improve the registration capability. The advantage of this algorithm is fully demonstrated by our superior performance on clinical MR and CT image pairs to other state-of-the-art medical image registration methods.

preprint2020arXiv

Simulating Content Consistent Vehicle Datasets with Attribute Descent

This paper uses a graphic engine to simulate a large amount of training data with free annotations. Between synthetic and real data, there is a two-level domain gap, i.e., content level and appearance level. While the latter has been widely studied, we focus on reducing the content gap in attributes like illumination and viewpoint. To reduce the problem complexity, we choose a smaller and more controllable application, vehicle re-identification (re-ID). We introduce a large-scale synthetic dataset VehicleX. Created in Unity, it contains 1,362 vehicles of various 3D models with fully editable attributes. We propose an attribute descent approach to let VehicleX approximate the attributes in real-world datasets. Specifically, we manipulate each attribute in VehicleX, aiming to minimize the discrepancy between VehicleX and real data in terms of the Fréchet Inception Distance (FID). This attribute descent algorithm allows content domain adaptation (DA) orthogonal to existing appearance DA methods. We mix the optimized VehicleX data with real-world vehicle re-ID datasets, and observe consistent improvement. With the augmented datasets, we report competitive accuracy. We make the dataset, engine and our codes available at https://github.com/yorkeyao/VehicleX.

preprint2020arXiv

The 4th AI City Challenge

The AI City Challenge was created to accelerate intelligent video analysis that helps make cities smarter and safer. Transportation is one of the largest segments that can benefit from actionable insights derived from data captured by sensors, where computer vision and deep learning have shown promise in achieving large-scale practical deployment. The 4th annual edition of the AI City Challenge has attracted 315 participating teams across 37 countries, who leveraged city-scale real traffic data and high-quality synthetic data to compete in four challenge tracks. Track 1 addressed video-based automatic vehicle counting, where the evaluation is conducted on both algorithmic effectiveness and computational efficiency. Track 2 addressed city-scale vehicle re-identification with augmented synthetic data to substantially increase the training set for the task. Track 3 addressed city-scale multi-target multi-camera vehicle tracking. Track 4 addressed traffic anomaly detection. The evaluation system shows two leader boards, in which a general leader board shows all submitted results, and a public leader board shows results limited to our contest participation rules, that teams are not allowed to use external data in their work. The public leader board shows results more close to real-world situations where annotated data are limited. Our results show promise that AI technology can enable smarter and safer transportation systems.