Source author record

Yuan Gao

Yuan Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

70works

48topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models

Existing robot video world models are typically trained with low-level objectives such as reconstruction and perceptual similarity, which are poorly aligned with the capabilities that matter most for robot decision making, including instruction following, manipulation success, and physical plausibility. They also suffer from error accumulation in long-horizon autoregressive prediction. We present RoboAlign-R1, a framework that combines reward-aligned post-training with stabilized long-horizon inference for robot video world models. We construct RobotWorldBench, a benchmark of 10,000 annotated video-instruction pairs collected from four robot data sources, and train a multimodal teacher judge, RoboAlign-Judge, to provide fine-grained six-dimensional evaluation of generated videos. We then distill the teacher into a lightweight student reward model for efficient reinforcement-learning-based post-training. To reduce long-horizon rollout drift, we further introduce Sliding Window Re-encoding (SWR), a training-free inference strategy that periodically refreshes the generation context. Under our in-domain evaluation protocol, RoboAlign-R1 improves the aggregate six-dimension score by 10.1% over the strongest baseline, including gains of 7.5% on Manipulation Accuracy and 4.6% on Instruction Following; these ranking improvements are further supported by an external VLM-based cross-check and a blinded human study. Meanwhile, SWR improves long-horizon prediction quality with only about 1% additional latency, yielding a 2.8% gain in SSIM and a 9.8% reduction in LPIPS. Together, these results show that reward-aligned post-training and stabilized long-horizon decoding improve task consistency, physical realism, and long-horizon prediction quality in robot video world models.

preprint2026arXiv

STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation

Deep generative models have advanced rapidly across text and vision, motivating unified multimodal systems that can understand, reason over, and generate interleaved text-image sequences. Most existing approaches combine autoregressive language modeling with diffusion-based image generators, inheriting a structural mismatch between causal text generation and iterative visual denoising. We observe that autoregressive normalizing flows are autoregressive Transformers--sharing the same causal mask, KV-cache mechanism, and left-to-right structure as LLMs--making them the most natural paradigm for true unified multimodal generation. We present STARFlow2, built on the Pretzel architecture that vertically interleaves a pretrained VLM stream with a TarFlow stream via residual skip connections, both operating under the same causal mask. Combined with a deep-shallow flow design and a unified FAE latent space, STARFlow2 enables cache-friendly interleaved generation where both text and visual outputs directly enter the KV-cache without re-encoding. Experiments demonstrate strong performance across image generation and multimodal understanding benchmarks, validating autoregressive flows as a viable foundation for unified multimodal modeling.

preprint2024arXiv

Magnon Damping Minimum and Logarithmic Scaling in a Kondo-Heisenberg Model

Recently, an anomalous temperature evolution of spin wave excitations has been observed in a van der Waals metallic ferromagnet Fe$_3$GeTe$_2$ (FGT) [S. Bao, et al., Phys. Rev. X 12, 011022 (2022)], whose theoretical understanding yet remains elusive. Here we study the spin dynamics of a ferromagnetic Kondo-Heisenberg lattice model at finite temperature, and propose a mechanism of magnon damping that explains the intriguing experimental results. In particular, we find the magnon damping rate $γ(T)$ firstly decreases as temperature lowers, due to the reduced magnon-magnon scatterings. It then reaches a minimum at $T_{\rm d}^*$, and rises up again following a logarithmic scaling $γ(T) \sim \ln{(T_0/T)}$ (with $T_0$ a constant) for $T < T_{\rm d}^*$, which can be attributed to electron-magnon scatterings of spin-flip type. Moreover, we obtain the phase diagram containing the ferromagnetic and Kondo insulator phases by varying the Kondo coupling, which may be relevant for experiments on pressured FGT. The presence of a magnon damping minimum and logarithmic scaling at low temperature indicates the emergence of the Kondo effect reflected in the collective excitations of local moments in a Kondo lattice system.

preprint2024arXiv

MvKSR: Multi-view Knowledge-guided Scene Recovery for Hazy and Rainy Degradation

High-quality imaging is crucial for ensuring safety supervision and intelligent deployment in fields like transportation and industry. It enables precise and detailed monitoring of operations, facilitating timely detection of potential hazards and efficient management. However, adverse weather conditions, such as atmospheric haziness and precipitation, can have a significant impact on image quality. When the atmosphere contains dense haze or water droplets, the incident light scatters, leading to degraded captured images. This degradation is evident in the form of image blur and reduced contrast, increasing the likelihood of incorrect assessments and interpretations by intelligent imaging systems (IIS). To address the challenge of restoring degraded images in hazy and rainy conditions, this paper proposes a novel multi-view knowledge-guided scene recovery network (termed MvKSR). Specifically, guided filtering is performed on the degraded image to separate high/low-frequency components. Subsequently, an en-decoder-based multi-view feature coarse extraction module (MCE) is used to coarsely extract features from different views of the degraded image. The multi-view feature fine fusion module (MFF) will learn and infer the restoration of degraded images through mixed supervision under different views. Additionally, we suggest an atrous residual block to handle global restoration and local repair in hazy/rainy/mixed scenes. Extensive experimental results demonstrate that MvKSR outperforms other state-of-the-art methods in terms of efficiency and stability for restoring degraded scenarios in IIS.

preprint2023arXiv

Convergence of Extragradient SVRG for Variational Inequalities: Error Bounds and Increasing Iterate Averaging

We study the last-iterate convergence of variance reduction methods for extragradient (EG) algorithms for a class of variational inequalities satisfying error-bound conditions. Previously, last-iterate linear convergence was only known under strong monotonicity. We show that EG algorithms with SVRG-style variance reduction, denoted SVRG-EG, attain last-iterate linear convergence under a general error-bound condition much weaker than strong monotonicity. This condition captures a broad class of non-strongly monotone problems, such as bilinear saddle-point problems commonly encountered in two-player zero-sum Nash equilibrium computation. Next, we establish linear last-iterate convergence of SVRG-EG with an improved guarantee under the weak sharpness assumption. Furthermore, motivated by the empirical efficiency of increasing iterate averaging techniques in solving saddle-point problems, we also establish new convergence results for SVRG-EG with such techniques.

preprint2022arXiv

1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation

The task of referring video object segmentation aims to segment the object in the frames of a given video to which the referring expressions refer. Previous methods adopt multi-stage approach and design complex pipelines to obtain promising results. Recently, the end-to-end method based on Transformer has proved its superiority. In this work, we draw on the advantages of the above methods to provide a simple and effective pipeline for RVOS. Firstly, We improve the state-of-the-art one-stage method ReferFormer to obtain mask sequences that are strongly correlated with language descriptions. Secondly, based on a reliable and high-quality keyframe, we leverage the superior performance of video object segmentation model to further enhance the quality and temporal consistency of the mask results. Our single model reaches 70.3 J &F on the Referring Youtube-VOS validation set and 63.0 on the test set. After ensemble, we achieve 64.1 on the final leaderboard, ranking 1st place on CVPR2022 Referring Youtube-VOS challenge. Code will be available at https://github.com/Zhiweihhh/cvpr2022-rvos-challenge.git.

preprint2022arXiv

A New Combinatorial Property of Geometric Unique Sink Orientations

A unique sink orientation (USO) is an orientation of the hypercube graph with the property that every face has a unique sink. A number of well-studied problems reduce in strongly polynomial time to finding the global sink of a USO; most notably, linear programming (LP) and the P-matrix linear complementarity problem (P-LCP). The former is not known to have a strongly polynomial-time algorithm, while the latter is not known to even have a polynomial-time algorithm, motivating the problem to find the global sink of a USO. Although, every known class of geometric USOs, arising from a concrete problem such as LP, is exponentially small, relative to the class of all USOs. Accordingly, geometric USOs exhibit additional properties that set them apart from general USOs, and it may be advantageous, if not necessary, to leverage these properties to find the global sink of a USO faster. Only a few such properties are known. In this paper, we establish a new combinatorial property of the USOs that arise from symmetric P-LCP, which includes the USOs that arise from linear and simple convex quadratic programming.

preprint2022arXiv

Asymptotic stability for diffusion with dynamic boundary reaction from Ginzburg-Landau energy

The nonequilibrium process in dislocation dynamics and its relaxation to the metastable transition profile is crucial for understanding the plastic deformation caused by line defects in materials. In this paper, we consider the full dynamics of a scalar dislocation model in two dimensions described by the bulk diffusion equation coupled with dynamic boundary condition on the interface, where a nonconvex misfit potential, due to the presence of dislocation, yields an interfacial reaction term on the interface. We prove the dynamic solution to this bulk-interface coupled system will uniformly converge to the metastable transition profile, which has a bi-states with fat-tail decay rate at the far fields. This global stability for the metastable pattern is the first result for a bulk-interface coupled dynamics driven only by an interfacial reaction on the slip plane.

preprint2022arXiv

Bidding Agent Design in the LinkedIn Ad Marketplace

We establish a general optimization framework for the design of automated bidding agent in dynamic online marketplaces. It optimizes solely for the buyer's interest and is agnostic to the auction mechanism imposed by the seller. As a result, the framework allows, for instance, the joint optimization of a group of ads across multiple platforms each running its own auction format. Bidding strategy derived from this framework automatically guarantees the optimality of budget allocation across ad units and platforms. Common constraints such as budget delivery schedule, return on investments and guaranteed results, directly translates to additional parameters in the bidding formula. We share practical learnings of the deployed bidding system in the LinkedIn ad marketplace based on this framework.

preprint2022arXiv

Exploring High-quality Target Domain Information for Unsupervised Domain Adaptive Semantic Segmentation

In unsupervised domain adaptive (UDA) semantic segmentation, the distillation based methods are currently dominant in performance. However, the distillation technique requires complicate multi-stage process and many training tricks. In this paper, we propose a simple yet effective method that can achieve competitive performance to the advanced distillation methods. Our core idea is to fully explore the target-domain information from the views of boundaries and features. First, we propose a novel mix-up strategy to generate high-quality target-domain boundaries with ground-truth labels. Different from the source-domain boundaries in previous works, we select the high-confidence target-domain areas and then paste them to the source-domain images. Such a strategy can generate the object boundaries in target domain (edge of target-domain object areas) with the correct labels. Consequently, the boundary information of target domain can be effectively captured by learning on the mixed-up samples. Second, we design a multi-level contrastive loss to improve the representation of target-domain data, including pixel-level and prototype-level contrastive learning. By combining two proposed methods, more discriminative features can be extracted and hard object boundaries can be better addressed for the target domain. The experimental results on two commonly adopted benchmarks (\textit{i.e.}, GTA5 $\rightarrow$ Cityscapes and SYNTHIA $\rightarrow$ Cityscapes) show that our method achieves competitive performance to complicated distillation methods. Notably, for the SYNTHIA$\rightarrow$ Cityscapes scenario, our method achieves the state-of-the-art performance with $57.8\%$ mIoU and $64.6\%$ mIoU on 16 classes and 13 classes. Code is available at https://github.com/ljjcoder/EHTDI.

preprint2022arXiv

Factor-augmented model for functional data

We propose modeling raw functional data as a mixture of a smooth function and a high-dimensional factor component. The conventional approach to retrieving the smooth function from the raw data is through various smoothing techniques. However, the smoothing model is inadequate to recover the smooth curve or capture the data variation in some situations. These include cases where there is a large amount of measurement error, the smoothing basis functions are incorrectly identified, or the step jumps in the functional mean levels are neglected. A factor-augmented smoothing model is proposed to address these challenges, and an iterative numerical estimation approach is implemented in practice. Including the factor model component in the proposed method solves the aforementioned problems since a few common factors often drive the variation that cannot be captured by the smoothing model. Asymptotic theorems are also established to demonstrate the effects of including factor structures on the smoothing results. Specifically, we show that the smoothing coefficients projected on the complement space of the factor loading matrix are asymptotically normal. As a byproduct of independent interest, an estimator for the population covariance matrix of the raw data is presented based on the proposed model. Extensive simulation studies illustrate that these factor adjustments are essential in improving estimation accuracy and avoiding the curse of dimensionality. The superiority of our model is also shown in modeling Australian temperature data.

preprint2022arXiv

Finding Dynamics Preserving Adversarial Winning Tickets

Modern deep neural networks (DNNs) are vulnerable to adversarial attacks and adversarial training has been shown to be a promising method for improving the adversarial robustness of DNNs. Pruning methods have been considered in adversarial context to reduce model capacity and improve adversarial robustness simultaneously in training. Existing adversarial pruning methods generally mimic the classical pruning methods for natural training, which follow the three-stage 'training-pruning-fine-tuning' pipelines. We observe that such pruning methods do not necessarily preserve the dynamics of dense networks, making it potentially hard to be fine-tuned to compensate the accuracy degradation in pruning. Based on recent works of \textit{Neural Tangent Kernel} (NTK), we systematically study the dynamics of adversarial training and prove the existence of trainable sparse sub-network at initialization which can be trained to be adversarial robust from scratch. This theoretically verifies the \textit{lottery ticket hypothesis} in adversarial context and we refer such sub-network structure as \textit{Adversarial Winning Ticket} (AWT). We also show empirical evidences that AWT preserves the dynamics of adversarial training and achieve equal performance as dense adversarial training.

preprint2022arXiv

Hermite-Gaussian-mode coherently composed states and deep learning based free-space optical communication link

In laser-based free-space optical communication, besides OAM beams, Hermite-Gaussian (HG) modes or HG-mode coherently composed states (HG-MCCS) can also be adopted as the information carrier to extend the channel capacity with the spatial pattern based encoding and decoding link. The light field of HG-MCCS is mainly determined by three independent parameters, including indexes of HG modes, relative initial phases between two eigenmodes, and scale coefficients of the eigenmodes, which can obtain a large number of effective coding modes at a low mode order. The beam intensity distributions of the HG-MCCSs have obvious distinguishable spatial characteristics and can keep propagation invariance, which are convenient to be decoded by the convolutional neural network (CNN) based image recognition method. We experimentally utilize HG-MCCS to realize a communication link including encoding, transmission under atmospheric turbulence (AT), and decoding based on CNN. With the index order of eigenmodes within six, 125 HG-MCCS are generated and used for information encoding, and the average recognition accuracy reached 99.5% for non-AT conditions. For the 125-level color images transmission, the error rate of the system is less than 1.8% even under the weak AT condition. Our work provides a useful basis for the future combination of dense data communication and artificial intelligence technology.

preprint2022arXiv

Invariant Filtering for Legged Humanoid Locomotion on Dynamic Rigid Surfaces

State estimation for legged locomotion over a dynamic rigid surface (DRS), which is a rigid surface moving in the world frame (e.g., ships, aircraft, and trains), remains an under-explored problem. This paper introduces an invariant extended Kalman filter that estimates the robot's pose and velocity during DRS locomotion by using common sensors of legged robots (e.g., inertial measurement units (IMU), joint encoders, and RDB-D camera). A key feature of the filter lies in that it explicitly addresses the nonstationary surface-foot contact point and the hybrid robot behaviors. Another key feature is that, in the absence of IMU biases, the filter satisfies the attractive group affine and invariant observation conditions, and is thus provably convergent for the deterministic continuous phases. The observability analysis is performed to reveal the effects of DRS movement on the state observability, and the convergence property of the hybrid, deterministic filter system is examined for the observable state variables. Experiments of a Digit humanoid robot walking on a pitching treadmill validate the effectiveness of the proposed filter under large estimation errors and moderate DRS movement. The video of the experiments can be found at: https://youtu.be/ScQIBFUSKzo.

preprint2022arXiv

Latent-Variable Advantage-Weighted Policy Optimization for Offline RL

Offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions. This setting is particularly well-suited for continuous control robotic applications for which online data collection based on trial-and-error is costly and potentially unsafe. In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios, such as data from several human demonstrators or from policies that act with different purposes. Unfortunately, such datasets can exacerbate the distribution shift between the behavior policy underlying the data and the optimal policy to be learned, leading to poor performance. To address this challenge, we propose to leverage latent-variable policies that can represent a broader class of policy distributions, leading to better adherence to the training data distribution while maximizing reward via a policy over the latent variable. As we empirically show on a range of simulated locomotion, navigation, and manipulation tasks, our method referred to as latent-variable advantage-weighted policy optimization (LAPO), improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets, and by 8% on datasets with narrow and biased distributions.

preprint2022arXiv

Manipulating propagation and evolution of polarization singularities in composite Bessel-like fields

Structured optical fields embedded with polarization singularities (PSs) have attracted extensive attention due to their capability to retain topological invariance during propagation. Many advances in PSs research have been made over the past 20 years in the areas of mathematical description, generation and detection technologies, propagation dynamics, and applications. However, one of the most crucial and difficult tasks continues to be manipulating PSs with multiple degrees of freedom, especially in three-dimensional (3D) tailored optical fields. We propose and demonstrate the longitudinal PS lines obtained by superimposing Bessel-like modes with orthogonal polarization states on composite vector optical fields (VOFs). The embedded PSs in the fields can be manipulated to propagate robustly along arbitrary trajectories, or to annihilate, revive, and transform each other at on-demand positions in 3D space, allowing complex PSs topological morphology and intensity pattern to be flexibly customized. Our findings could spur further research into singular optics and help with applications such as micromanipulation, microstructure fabrication, and optical encryption.

preprint2022arXiv

Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering

Video question answering (VideoQA) is challenging given its multimodal combination of visual understanding and natural language processing. While most existing approaches ignore the visual appearance-motion information at different temporal scales, it is unknown how to incorporate the multilevel processing capacity of a deep learning model with such multiscale information. Targeting these issues, this paper proposes a novel Multilevel Hierarchical Network (MHN) with multiscale sampling for VideoQA. MHN comprises two modules, namely Recurrent Multimodal Interaction (RMI) and Parallel Visual Reasoning (PVR). With a multiscale sampling, RMI iterates the interaction of appearance-motion information at each scale and the question embeddings to build the multilevel question-guided visual representations. Thereon, with a shared transformer encoder, PVR infers the visual cues at each level in parallel to fit with answering different question types that may rely on the visual information at relevant levels. Through extensive experiments on three VideoQA datasets, we demonstrate improved performances than previous state-of-the-arts and justify the effectiveness of each part of our method.

preprint2022arXiv

Physarum Inspired Dynamics to Solve Semi-Definite Programs

Physarum Polycephalum is a slime mold that can solve shortest path problems. A mathematical model based on Physarum's behavior, known as the Physarum Directed Dynamics, can solve positive linear programs. In this paper, we present a family of Physarum-based dynamics extending the previous work and introduce a new algorithm to solve positive Semi-Definite Programs (SDP). The Physarum dynamics are governed by orthogonal projections (w.r.t. time-dependent scalar products) on the affine subspace defined by the linear constraints. We present a natural generalization of the scalar products used in the LP case to the matrix space for SDPs, which boils down to the linear case when all matrices in the SDP are diagonal, thus, representing an LP. We investigate the behavior of the induced dynamics theoretically and experimentally, highlight challenges arising from the non-commutative nature of matrix products, and prove soundness and convergence under mild conditions. Moreover, we consider a more abstract view on the dynamics that suggests a slight variation to guarantee unconditional soundness and convergence-to-optimality. By simulating these dynamics using suitable discretizations, one obtains numerical algorithms for solving positive SDPs, which have applications in discrete optimization, e.g., for computing the Goemans-Williamson approximation for MaxCut or the Lovasz theta number for determining the clique/chromatic number in perfect graphs.

preprint2022arXiv

Rabinowitz Fukaya categories and the categorical formal punctured neighborhood of infinity

This paper constructs and studies the Rabinowitz (wrapped) Fukaya category, a categorical invariant of exact cylindrical Lagrangians in a Liouville manifold whose cohomological morphisms, ``Rabinowitz wrapped Floer homology groups" measure the failure of wrapped Floer cohomology to satisfy Poincare duality (and in particular vanish for any pair with at least one compact Lagrangian). Our main result, answering a conjecture of Abouzaid, relates the Rabinowitz and usual wrapped Fukaya category by way of a general construction introduced by Efimov, the categorical formal punctured neighborhood of infinity. As an application, we show how Rabinowitz Fukaya categories can be fit into - and in particular often computed in terms of - mirror symmetry.

preprint2022arXiv

Rumor Detection with Self-supervised Learning on Texts and Social Graph

Rumor detection has become an emerging and active research field in recent years. At the core is to model the rumor characteristics inherent in rich information, such as propagation patterns in social network and semantic patterns in post content, and differentiate them from the truth. However, existing works on rumor detection fall short in modeling heterogeneous information, either using one single information source only (e.g. social network, or post content) or ignoring the relations among multiple sources (e.g. fusing social and content features via simple concatenation). Therefore, they possibly have drawbacks in comprehensively understanding the rumors, and detecting them accurately. In this work, we explore contrastive self-supervised learning on heterogeneous information sources, so as to reveal their relations and characterize rumors better. Technically, we supplement the main supervised task of detection with an auxiliary self-supervised task, which enriches post representations via post self-discrimination. Specifically, given two heterogeneous views of a post (i.e. representations encoding social patterns and semantic patterns), the discrimination is done by maximizing the mutual information between different views of the same post compared to that of other posts. We devise cluster-wise and instance-wise approaches to generate the views and conduct the discrimination, considering different relations of information sources. We term this framework as Self-supervised Rumor Detection (SRD). Extensive experiments on three real-world datasets validate the effectiveness of SRD for automatic rumor detection on social media.

preprint2022arXiv

Some results on locally repairable codes with minimum distance $7$ and locality $2$

Locally repairable codes(LRCs) play important roles in distributed storage systems(DSS). LRCs with small locality have their own advantages since fewer available symbols are needed in the recovery of erased symbols. In this paper, we prove an upper bound on the dimension of LRCs with minimum distance $d\geq 7$. An upper bound on the length of almost optimal LRCs with $d=7$, $r=2$ at $q^2+q+3$ is proved. Then based on the $t$-spread structure, we give an algorithm to construct almost optimal LRCs with $d=7$, $r=2$ and length $n\geq 3\lceil\frac{\sqrt{2}q}{3}\rceil$ when $q\geq 4$, whose dimension attains the aforementioned upper bound.

preprint2022arXiv

Spin Supersolidity in Nearly Ideal Easy-axis Triangular Quantum Antiferromagnet Na$_2$BaCo(PO$_4$)$_2$

Prototypical models and their material incarnations are cornerstones to the understanding of quantum magnetism. Here we show theoretically that the recently synthesized magnetic compound Na$_2$BaCo(PO$_4$)$_2$ (NBCP) is a rare, nearly ideal material realization of the $S=1/2$ triangular-lattice antiferromagnet with significant easy-axis spin exchange anisotropy. By combining the automatic parameter searching and tensor-network simulations, we establish a microscopic model description of this material with realistic model parameters, which can not only fit well the experimental thermodynamic data but also reproduce the measured magnetization curves without further adjustment of parameters. According to the established model, the NBCP hosts a spin supersolid state that breaks both the lattice translation symmetry and the spin rotational symmetry. Such a state is a spin analogue of the long-sought supersolid state, thought to exist in solid Helium and optical lattice systems, and share similar traits. The NBCP therefore represents an ideal material-based platform to explore the physics of supersolidity as well as its quantum and thermal melting.

preprint2022arXiv

Towards Autonomous Atlas-based Ultrasound Acquisitions in Presence of Articulated Motion

Robotic ultrasound (US) imaging aims at overcoming some of the limitations of free-hand US examinations, e.g. difficulty in guaranteeing intra- and inter-operator repeatability. However, due to anatomical and physiological variations between patients and relative movement of anatomical substructures, it is challenging to robustly generate optimal trajectories to examine the anatomies of interest, in particular, when they comprise articulated joints. To address this challenge, this paper proposes a vision-based approach allowing autonomous robotic US limb scanning. To this end, an atlas MRI template of a human arm with annotated vascular structures is used to generate trajectories and register and project them onto patients' skin surfaces for robotic US acquisition. To effectively segment and accurately reconstruct the targeted 3D vessel, we make use of spatial continuity in consecutive US frames by incorporating channel attention modules into a U-Net-type neural network. The automatic trajectory generation method is evaluated on six volunteers with various articulated joint angles. In all cases, the system can successfully acquire the planned vascular structure on volunteers' limbs. For one volunteer the MRI scan was also available, which allows the evaluation of the average radius of the scanned artery from US images, resulting in a radius estimation ($1.2\pm0.05~mm$) comparable to the MRI ground truth ($1.2\pm0.04~mm$).

preprint2022arXiv

Versatile Non-diffracting Perfect Vortex Beams

The rapid scale broadening and divergence increasing of vortex beams (VBs) with orbital angular momentum (OAM), e.g., Laguerre-Gaussian beams, severely impede the wide applications of VBs ranging from optical manipulation to high-dimensional quantum information communications, which call for VBs to have the same transverse scale and divergence for distinct OAM or even the small vortex ring for large OAM. Non-diffracting beams, on the other hand, that are capable of overcoming diffraction without divergence, are very evocative and indeed appealing in numerous applications including atom optics and medical imaging. Here, we propose theoretically and demonstrate experimentally a brand new type of VB having OAM-independent radii meanwhile holding propagation-invariant without divergence as well as self-healing properties, named non-diffracting perfect vortex beam (NDPVB). We work out a versatile toolkit based on Fourier-space analysis to multidimensionally customize NDPVBs at will so that it is of propagating intensity and phase controllability with intriguing customizable behaviors of self-accelerating, self-similar, and self-rotating. This goes beyond tailoring the transverse plane to the higher-dimensional propagating characteristics in structured light beams. A deeper insight into the internal flow revealed and confirmed that the multidimensional customization of NDPVBs is dominated by inducing corresponding multidimensional internal flow, facilitating our understanding of how our design scheme of propagating properties manipulates the internal flows, unveiling the nature of structure formation and behavior transformation of structured light beams.

preprint2022arXiv

VesNet-RL: Simulation-based Reinforcement Learning for Real-World US Probe Navigation

Ultrasound (US) is one of the most common medical imaging modalities since it is radiation-free, low-cost, and real-time. In freehand US examinations, sonographers often navigate a US probe to visualize standard examination planes with rich diagnostic information. However, reproducibility and stability of the resulting images often suffer from intra- and inter-operator variation. Reinforcement learning (RL), as an interaction-based learning method, has demonstrated its effectiveness in visual navigating tasks; however, RL is limited in terms of generalization. To address this challenge, we propose a simulation-based RL framework for real-world navigation of US probes towards the standard longitudinal views of vessels. A UNet is used to provide binary masks from US images; thereby, the RL agent trained on simulated binary vessel images can be applied in real scenarios without further training. To accurately characterize actual states, a multi-modality state representation structure is introduced to facilitate the understanding of environments. Moreover, considering the characteristics of vessels, a novel standard view recognition approach based on the minimum bounding rectangle is proposed to terminate the searching process. To evaluate the effectiveness of the proposed method, the trained policy is validated virtually on 3D volumes of a volunteer's in-vivo carotid artery, and physically on custom-designed gel phantoms using robotic US. The results demonstrate that proposed approach can effectively and accurately navigate the probe towards the longitudinal view of vessels.

preprint2022arXiv

Wormhole MAML: Meta-Learning in Glued Parameter Space

In this paper, we introduce a novel variation of model-agnostic meta-learning, where an extra multiplicative parameter is introduced in the inner-loop adaptation. Our variation creates a shortcut in the parameter space for the inner-loop adaptation and increases model expressivity in a highly controllable manner. We show both theoretically and numerically that our variation alleviates the problem of conflicting gradients and improves training dynamics. We conduct experiments on 3 distinctive problems, including a toy classification problem for threshold comparison, a regression problem for wavelet transform, and a classification problem on MNIST. We also discuss ways to generalize our method to a broader class of problems.

preprint2021arXiv

A Multiobjective State Transition Algorithm Based on Decomposition

Aggregation functions largely determine the convergence and diversity performance of multi-objective evolutionary algorithms in decomposition methods. Nevertheless, the traditional Tchebycheff function does not consider the matching relationship between the weight vectors and candidate solutions. In this paper, the concept of matching degree is proposed which employs vectorial angles between weight vectors and candidate solutions. Based on the matching degree, a new modified Tchebycheff aggregation function is proposed, which integrates matching degree into the Tchebycheff aggregation function. Moreover, the proposed decomposition method has the same functionality with the Tchebycheff aggregation function. Based on the proposed decomposition approach, a new multiobjective optimization algorithm named decomposition-based multi-objective state transition algorithm is proposed. Relevant experimental results show that the proposed algorithm is highly competitive in comparison with other state-of-the-art multiobjetive optimization algorithms.

preprint2021arXiv

Factor-augmented Smoothing Model for Functional Data

We propose modeling raw functional data as a mixture of a smooth function and a highdimensional factor component. The conventional approach to retrieving the smooth function from the raw data is through various smoothing techniques. However, the smoothing model is not adequate to recover the smooth curve or capture the data variation in some situations. These include cases where there is a large amount of measurement error, the smoothing basis functions are incorrectly identified, or the step jumps in the functional mean levels are neglected. To address these challenges, a factor-augmented smoothing model is proposed, and an iterative numerical estimation approach is implemented in practice. Including the factor model component in the proposed method solves the aforementioned problems since a few common factors often drive the variation that cannot be captured by the smoothing model. Asymptotic theorems are also established to demonstrate the effects of including factor structures on the smoothing results. Specifically, we show that the smoothing coefficients projected on the complement space of the factor loading matrix is asymptotically normal. As a byproduct of independent interest, an estimator for the population covariance matrix of the raw data is presented based on the proposed model. Extensive simulation studies illustrate that these factor adjustments are essential in improving estimation accuracy and avoiding the curse of dimensionality. The superiority of our model is also shown in modeling Canadian weather data and Australian temperature data.

preprint2021arXiv

High-performance green and blue quantum-dot light-emitting diodes with eliminated charge leakage

Quantum-dot light-emitting diodes (QD-LEDs) promise a new generation of efficient, low-cost, large-area, and flexible electroluminescent devices. However, the inferior performance of green and blue QD-LEDs is hindering the commercialization of QD-LEDs in display and solid-state lighting. Here, we demonstrate best-performing green and blue QD-LEDs with ~100% conversion of the injected charge carriers into emissive excitons. Key to this success is eliminating electron leakage at the organic/inorganic interface by using hole-transport polymers with low electron affinity and reduced energetic disorder. Our devices exhibit record-high peak external quantum efficiencies (28.7% for green, 21.9% for blue), exceptionally high efficiencies in wide ranges of luminance, and unprecedented stability (T95 lifetime: 580,000 h for green, 4,400 h for blue). The overall performance surpasses previously reported solution-processed green and blue LEDs.

preprint2021arXiv

Leveraging Activity Recognition to Enable Protective Behavior Detection in Continuous Data

Protective behavior exhibited by people with chronic pain (CP) during physical activities is the key to understanding their physical and emotional states. Existing automatic protective behavior detection (PBD) methods rely on pre-segmentation of activities predefined by users. However, in real life, people perform activities casually. Therefore, where those activities present difficulties for people with chronic pain, technology-enabled support should be delivered continuously and automatically adapted to activity type and occurrence of protective behavior. Hence, to facilitate ubiquitous CP management, it becomes critical to enable accurate PBD over continuous data. In this paper, we propose to integrate human activity recognition (HAR) with PBD via a novel hierarchical HAR-PBD architecture comprising graph-convolution and long short-term memory (GC-LSTM) networks, and alleviate class imbalances using a class-balanced focal categorical-cross-entropy (CFCC) loss. Through in-depth evaluation of the approach using a CP patients' dataset, we show that the leveraging of HAR, GC-LSTM networks, and CFCC loss leads to clear increase in PBD performance against the baseline (macro F1 score of 0.81 vs. 0.66 and precision-recall area-under-the-curve (PR-AUC) of 0.60 vs. 0.44). We conclude by discussing possible use cases of the hierarchical architecture in CP management and beyond. We also discuss current limitations and ways forward.

preprint2021arXiv

Partial FC: Training 10 Million Identities on a Single Machine

Face recognition has been an active and vital topic among computer vision community for a long time. Previous researches mainly focus on loss functions used for facial feature extraction network, among which the improvements of softmax-based loss functions greatly promote the performance of face recognition. However, the contradiction between the drastically increasing number of face identities and the shortage of GPU memories is gradually becoming irreconcilable. In this paper, we thoroughly analyze the optimization goal of softmax-based loss functions and the difficulty of training massive identities. We find that the importance of negative classes in softmax function in face representation learning is not as high as we previously thought. The experiment demonstrates no loss of accuracy when training with only 10\% randomly sampled classes for the softmax-based loss functions, compared with training with full classes using state-of-the-art models on mainstream benchmarks. We also implement a very efficient distributed sampling algorithm, taking into account model accuracy and training efficiency, which uses only eight NVIDIA RTX2080Ti to complete classification tasks with tens of millions of identities. The code of this paper has been made available https://github.com/deepinsight/insightface/tree/master/recognition/partial_fc.

preprint2021arXiv

Thermal Conductivities and Interfacial Thermal Conductance of 1- to 3-Layer WSe$_2$

Atomically thin materials such as graphene and semiconducting transition metal dichalcogenides have attracted extensive interest in recent years, motivating investigation into multiple properties. In this work, we used the opto thermal Raman technique to measure the thermal transport properties of a popular TMDC material WSe$_2$, in single atomic layer, bilayer, and trilayer forms.

preprint2020arXiv

An Improved Analysis of Stochastic Gradient Descent with Momentum

SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise manner. Despite of its empirical advantage over SGD, the role of momentum is still unclear in general since previous analyses on SGDM either provide worse convergence bounds than those of SGD, or assume Lipschitz or quadratic objectives, which fail to hold in practice. Furthermore, the role of dynamic parameters has not been addressed. In this work, we show that SGDM converges as fast as SGD for smooth objectives under both strongly convex and nonconvex settings. We also establish \textit{the first} convergence guarantee for the multistage setting, and show that the multistage strategy is beneficial for SGDM compared to using fixed parameters. Finally, we verify these theoretical claims by numerical experiments.

preprint2020arXiv

Application of Deep Q-Network in Portfolio Management

Machine Learning algorithms and Neural Networks are widely applied to many different areas such as stock market prediction, face recognition and population analysis. This paper will introduce a strategy based on the classic Deep Reinforcement Learning algorithm, Deep Q-Network, for portfolio management in stock market. It is a type of deep neural network which is optimized by Q Learning. To make the DQN adapt to financial market, we first discretize the action space which is defined as the weight of portfolio in different assets so that portfolio management becomes a problem that Deep Q-Network can solve. Next, we combine the Convolutional Neural Network and dueling Q-net to enhance the recognition ability of the algorithm. Experimentally, we chose five lowrelevant American stocks to test the model. The result demonstrates that the DQN based strategy outperforms the ten other traditional strategies. The profit of DQN algorithm is 30% more than the profit of other strategies. Moreover, the Sharpe ratio associated with Max Drawdown demonstrates that the risk of policy made with DQN is the lowest.

preprint2020arXiv

Automatic Differentiation for Second Renormalization of Tensor Networks

Tensor renormalization group (TRG) constitutes an important methodology for accurate simulations of strongly correlated lattice models. Facilitated by the automatic differentiation technique widely used in deep learning, we propose a uniform framework of differentiable TRG ($\partial$TRG) that can be applied to improve various TRG methods, in an automatic fashion. Essentially, $\partial$TRG systematically extends the concept of second renormalization [PRL 103, 160601 (2009)] where the tensor environment is computed recursively in the backward iteration, in the sense that given the forward process of TRG, $\partial$TRG automatically finds the gradient through backpropagation, with which one can deeply "train" the tensor networks. We benchmark $\partial$TRG in solving the square-lattice Ising model, and demonstrate its power by simulating one- and two-dimensional quantum systems at finite temperature. The deep optimization as well as GPU acceleration renders $\partial$TRG manybody simulations with high efficiency and accuracy.

preprint2020arXiv

CAD-PU: A Curvature-Adaptive Deep Learning Solution for Point Set Upsampling

Point set is arguably the most direct approximation of an object or scene surface, yet its practical acquisition often suffers from the shortcoming of being noisy, sparse, and possibly incomplete, which restricts its use for a high-quality surface recovery. Point set upsampling aims to increase its density and regularity such that a better surface recovery could be achieved. The problem is severely ill-posed and challenging, considering that the upsampling target itself is only an approximation of the underlying surface. Motivated to improve the surface approximation via point set upsampling, we identify the factors that are critical to the objective, by pairing the surface approximation error bounds of the input and output point sets. It suggests that given a fixed budget of points in the upsampling result, more points should be distributed onto the surface regions where local curvatures are relatively high. To implement the motivation, we propose a novel design of Curvature-ADaptive Point set Upsampling network (CAD-PU), the core of which is a module of curvature-adaptive feature expansion. To train CAD-PU, we follow the same motivation and propose geometrically intuitive surrogates that approximate discrete notions of surface curvature for the upsampled point set. We further integrate the proposed surrogates into an adversarial learning based curvature minimization objective, which gives a practically effective learning of CAD-PU. We conduct thorough experiments that show the efficacy of our contributions and the advantages of our method over existing ones. Our implementation codes are publicly available at https://github.com/JiehongLin/CAD-PU.

preprint2020arXiv

DRST: Deep Residual Shearlet Transform for Densely Sampled Light Field Reconstruction

The Image-Based Rendering (IBR) approach using Shearlet Transform (ST) is one of the most effective methods for Densely-Sampled Light Field (DSLF) reconstruction. The ST-based DSLF reconstruction typically relies on an iterative thresholding algorithm for Epipolar-Plane Image (EPI) sparse regularization in shearlet domain, involving dozens of transformations between image domain and shearlet domain, which are in general time-consuming. To overcome this limitation, a novel learning-based ST approach, referred to as Deep Residual Shearlet Transform (DRST), is proposed in this paper. Specifically, for an input sparsely-sampled EPI, DRST employs a deep fully Convolutional Neural Network (CNN) to predict the residuals of the shearlet coefficients in shearlet domain in order to reconstruct a densely-sampled EPI in image domain. The DRST network is trained on synthetic Sparsely-Sampled Light Field (SSLF) data only by leveraging elaborately-designed masks. Experimental results on three challenging real-world light field evaluation datasets with varying moderate disparity ranges (8 - 16 pixels) demonstrate the superiority of the proposed learning-based DRST approach over the non-learning-based ST method for DSLF reconstruction. Moreover, DRST provides a 2.4x speedup over ST, at least.

preprint2020arXiv

Generation of Chinese-characters-like modes with transverse mode locked lasers

Spontaneous transverse mode locking to generate optical vortex arrays is already widely studied for the frequency degenerated transverse modes. While, the mode locking between transverse modes in different orders (non-frequency degenerated families) to achieve spatial stationary beam patterns is rarely reported. The theory of transverse mode locking is discussed. Concepts of general transverse mode locking (G-TML) and restricted transverse mode locking (R-TML) are mentioned. It's experimentally shown that R-TML is possible in microchip cavities with high nonlinearity. More interestingly, some Chinese characters like laser modes can be generated by microchip lasers with this special transverse mode locking effect. And various experimental results and corresponding simulations are presented.

preprint2020arXiv

Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement

Recently, most of the state-of-the-art human pose estimation methods are based on heatmap regression. The final coordinates of keypoints are obtained by decoding heatmap directly. In this paper, we aim to find a better approach to get more accurate localization results. We mainly put forward two suggestions for improvement: 1) different features and methods should be applied for rough and accurate localization, 2) relationship between keypoints should be considered. Specifically, we propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN, with a localization subnet and a graph pose refinement module added onto the original heatmap regression network. In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled. In the second stage, for each guided point, different visual feature is extracted by the localization subnet. The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results. Experiments show that Graph-PCNN can be used in various backbones to boost the performance by a large margin. Without bells and whistles, our best model can achieve a new state-of-the-art 76.8% AP on COCO test-dev split.

preprint2020arXiv

Grid-Forming Converters control based on DC voltage feedback

The renewable energy is connected to the power grid through power electronic converters, which are lack of make the inertia of synchronous generator/machine (SM) be lost. The increasing penetration of renewable energy in power system weakens the frequency and voltage stability. The Grid-Forming Converters (GFCs) simulate the function of synchronous motor through control method in order to improve the stability of power grid by providing inertia and stability regulation mechanism. This kind of converter control methods include virtual synchronous machine, schedulable virtual oscillator control and so on. These control method mainly use AC side state feedback and do not monitor the DC side state. This paper analyzes the control strategy of GFC considering power grid stability, including Frequency Droop Control, Virtual Synchronous Machine Control and dispatchable Virtual Oscillator Control. The DC side voltage collapse problem is found when a large load disturbance occurs. The control methods of GFC considering DC side voltage feedback are proposed, which can ensure the synchronization characteristics of grid connection and solve the problem of DC side voltage collapse. The proposed method is verified by IEEE-9 bus system, which shows the effectiveness of the proposed method.

preprint2020arXiv

Increasing Iterate Averaging for Solving Saddle-Point Problems

Many problems in machine learning and game theory can be formulated as saddle-point problems, for which various first-order methods have been developed and proven efficient in practice. Under the general convex-concave assumption, most first-order methods only guarantee an ergodic convergence rate, that is, the uniform averages of the iterates converge at a $O(1/T)$ rate in terms of the saddle-point residual. However, numerically, the iterates themselves can often converge much faster than the uniform averages. This observation motivates increasing averaging schemes that put more weight on later iterates, in contrast to the usual uniform averaging. We show that such increasing averaging schemes, applied to various first-order methods, are able to preserve the $O(1/T)$ convergence rate with no additional assumptions or computational overhead. Extensive numerical experiments on zero-sum game solving, market equilibrium computation and image denoising demonstrate the effectiveness of the proposed schemes. In particular, the increasing averages consistently outperform the uniform averages in all test problems by orders of magnitude. When solving matrix and extensive-form games, increasing averages consistently outperform the last iterates as well. For matrix games, a first-order method equipped with increasing averaging outperforms the highly competitive CFR$^+$ algorithm.

preprint2020arXiv

Intra-Ensemble in Neural Networks

Improving model performance is always the key problem in machine learning including deep learning. However, stand-alone neural networks always suffer from marginal effect when stacking more layers. At the same time, ensemble is an useful technique to further enhance model performance. Nevertheless, training several independent deep neural networks for ensemble costs multiple resources. If so, is it possible to utilize ensemble in only one neural network? In this work, we propose Intra-Ensemble, an end-to-end ensemble strategy with stochastic channel recombination operations to train several sub-networks simultaneously within one neural network. Additional parameter size is marginal since the majority of parameters are mutually shared. Meanwhile, stochastic channel recombination significantly increases the diversity of sub-networks, which finally enhances ensemble performance. Extensive experiments and ablation studies prove the applicability of intra-ensemble on various kinds of datasets and network architectures.

preprint2020arXiv

Learning Implicit Generative Models with Theoretical Guarantees

We propose a \textbf{uni}fied \textbf{f}ramework for \textbf{i}mplicit \textbf{ge}nerative \textbf{m}odeling (UnifiGem) with theoretical guarantees by integrating approaches from optimal transport, numerical ODE, density-ratio (density-difference) estimation and deep neural networks. First, the problem of implicit generative learning is formulated as that of finding the optimal transport map between the reference distribution and the target distribution, which is characterized by a totally nonlinear Monge-Ampère equation. Interpreting the infinitesimal linearization of the Monge-Ampère equation from the perspective of gradient flows in measure spaces leads to the continuity equation or the McKean-Vlasov equation. We then solve the McKean-Vlasov equation numerically using the forward Euler iteration, where the forward Euler map depends on the density ratio (density difference) between the distribution at current iteration and the underlying target distribution. We further estimate the density ratio (density difference) via deep density-ratio (density-difference) fitting and derive explicit upper bounds on the estimation error. Experimental results on both synthetic datasets and real benchmark datasets support our theoretical findings and demonstrate the effectiveness of UnifiGem.

preprint2020arXiv

Limiting Behaviors of High Dimensional Stochastic Spin Ensembles

Lattice spin models in statistical physics are used to understand magnetism. Their Hamiltonians are a discrete form of a version of a Dirichlet energy, signifying a relationship to the Harmonic map heat flow equation. The Gibbs distribution, defined with this Hamiltonian, is used in the Metropolis-Hastings (M-H) algorithm to generate dynamics tending towards an equilibrium state. In the limiting situation when the inverse temperature is large, we establish the relationship between the discrete M-H dynamics and the continuous Harmonic map heat flow associated with the Hamiltonian. We show the convergence of the M-H dynamics to the Harmonic map heat flow equation in two steps: First, with fixed lattice size and proper choice of proposal size in one M-H step, the M-H dynamics acts as gradient descent and will be shown to converge to a system of Langevin stochastic differential equations (SDE). Second, with proper scaling of the inverse temperature in the Gibbs distribution and taking the lattice size to infinity, it will be shown that this SDE system converges to the deterministic Harmonic map heat flow equation. Our results are not unexpected, but show remarkable connections between the M-H steps and the SDE Stratonovich formulation, as well as reveal trajectory-wise out of equilibrium dynamics to be related to a canonical PDE system with geometric constraints.

preprint2020arXiv

MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning

We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL). Existing NAS methods typically define different search spaces according to different tasks. In order to adapt to different task combinations (i.e., task sets), we disentangle the GP-MTL networks into single-task backbones (optionally encode the task priors), and a hierarchical and layerwise features sharing/fusing scheme across them. This enables us to design a novel and general task-agnostic search space, which inserts cross-task edges (i.e., feature fusion connections) into fixed single-task network backbones. Moreover, we also propose a novel single-shot gradient-based search algorithm that closes the performance gap between the searched architectures and the final evaluation architecture. This is realized with a minimum entropy regularization on the architecture weights during the search phase, which makes the architecture weights converge to near-discrete values and therefore achieves a single model. As a result, our searched model can be directly used for evaluation without (re-)training from scratch. We perform extensive experiments using different single-task backbones on various task sets, demonstrating the promising performance obtained by exploiting the hierarchical and layerwise features, as well as the desirable generalizability to different i) task sets and ii) single-task backbones. The code of our paper is available at https://github.com/bhpfelix/MTLNAS.

preprint2020arXiv

PP-YOLO: An Effective and Efficient Implementation of Object Detector

Object detection is one of the most important areas in computer vision, which plays a key role in various practical scenarios. Due to limitation of hardware, it is often necessary to sacrifice accuracy to ensure the infer speed of the detector in practice. Therefore, the balance between effectiveness and efficiency of object detector must be considered. The goal of this paper is to implement an object detector with relatively balanced effectiveness and efficiency that can be directly applied in actual application scenarios, rather than propose a novel detection model. Considering that YOLOv3 has been widely used in practice, we develop a new object detector based on YOLOv3. We mainly try to combine various existing tricks that almost not increase the number of model parameters and FLOPs, to achieve the goal of improving the accuracy of detector as much as possible while ensuring that the speed is almost unchanged. Since all experiments in this paper are conducted based on PaddlePaddle, we call it PP-YOLO. By combining multiple tricks, PP-YOLO can achieve a better balance between effectiveness (45.2% mAP) and efficiency (72.9 FPS), surpassing the existing state-of-the-art detectors such as EfficientDet and YOLOv4.Source code is at https://github.com/PaddlePaddle/PaddleDetection.

preprint2020arXiv

Stacking Domain Wall Magnons in Twisted van der Waals Magnets

Using bilayer CrI$_3$ as an example, we demonstrate that stacking domain walls in van der Waals magnets can host one dimensional (1D) magnon channels, which have lower energies than bulk magnons. Interestingly, some magnon channels are hidden in magnetically homogeneous background and can only be inferred with the knowledge of stacking domain walls. Compared to 1D magnons confined in magnetic domain walls, 1D magnons in stacking domain walls are more stable against external perturbations. We show that the relaxed moiré superlattices of small-angle twisted bilayer CrI$_3$ is a natural realization of stacking domain walls and host interconnected moiré magnon network. Our work reveals the importance of stacking domain walls in understanding magnetic properties of van der Waals magnets, and extends the scope of stacking engineering to magnetic dynamics.

preprint2020arXiv

Stochastic Flows and Geometric Optimization on the Orthogonal Group

We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$. We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, normalizing flows and metric learning. We show an intriguing connection between efficient stochastic optimization on the orthogonal group and graph theory (e.g. matching problem, partition functions over graphs, graph-coloring). We leverage the theory of Lie groups and provide theoretical results for the designed class of algorithms. We demonstrate broad applicability of our methods by showing strong performance on the seemingly unrelated tasks of learning world models to obtain stable policies for the most difficult $\mathrm{Humanoid}$ agent from $\mathrm{OpenAI}$ $\mathrm{Gym}$ and improving convolutional neural networks.

preprint2020arXiv

Unity: A General Platform for Intelligent Agents

Recent advances in artificial intelligence have been driven by the presence of increasingly realistic and complex simulated environments. However, many of the existing environments provide either unrealistic visuals, inaccurate physics, low task complexity, restricted agent perspective, or a limited capacity for interaction among artificial agents. Furthermore, many platforms lack the ability to flexibly configure the simulation, making the simulated environment a black-box from the perspective of the learning system. In this work, we propose a novel taxonomy of existing simulation platforms and discuss the highest level class of general platforms which enable the development of learning environments that are rich in visual, physical, task, and social complexity. We argue that modern game engines are uniquely suited to act as general platforms and as a case study examine the Unity engine and open source Unity ML-Agents Toolkit. We then survey the research enabled by Unity and the Unity ML-Agents Toolkit, discussing the kinds of research a flexible, interactive and easily configurable general platform can facilitate.

preprint2020arXiv

Weakly Supervised Deep Learning for COVID-19 Infection Detection and Classification from CT Images

An outbreak of a novel coronavirus disease (i.e., COVID-19) has been recorded in Wuhan, China since late December 2019, which subsequently became pandemic around the world. Although COVID-19 is an acutely treated disease, it can also be fatal with a risk of fatality of 4.03% in China and the highest of 13.04% in Algeria and 12.67% Italy (as of 8th April 2020). The onset of serious illness may result in death as a consequence of substantial alveolar damage and progressive respiratory failure. Although laboratory testing, e.g., using reverse transcription polymerase chain reaction (RT-PCR), is the golden standard for clinical diagnosis, the tests may produce false negatives. Moreover, under the pandemic situation, shortage of RT-PCR testing resources may also delay the following clinical decision and treatment. Under such circumstances, chest CT imaging has become a valuable tool for both diagnosis and prognosis of COVID-19 patients. In this study, we propose a weakly supervised deep learning strategy for detecting and classifying COVID-19 infection from CT images. The proposed method can minimise the requirements of manual labelling of CT images but still be able to obtain accurate infection detection and distinguish COVID-19 from non-COVID-19 cases. Based on the promising results obtained qualitatively and quantitatively, we can envisage a wide deployment of our developed technique in large-scale clinical studies.

preprint2019arXiv

Automated Testing for Deep Learning Systems with Differential Behavior Criteria

In this work, we conducted a study on building an automated testing system for deep learning systems based on differential behavior criteria. The automated testing goals were achieved by jointly optimizing two objective functions: maximizing differential behaviors from models under testing and maximizing neuron coverage. By observing differential behaviors from three pre-trained models during each testing iteration, the input image that triggered erroneous feedback was registered as a corner-case. The generated corner-cases can be used to examine the robustness of DNNs and consequently improve model accuracy. A project called DeepXplore was also used as a baseline model. After we fully implemented and optimized the baseline system, we explored its application as an augmenting training dataset with newly generated corner cases. With the GTRSB dataset, by retraining the model based on automated generated corner cases, the accuracy of three generic models increased by 259.2%, 53.6%, and 58.3%, respectively. Further, to extend the capability of automated testing, we explored other approaches based on differential behavior criteria to generate photo-realistic images for deep learning systems. One approach was to apply various transformations to the seed images for the deep learning framework. The other approach was to utilize the Generative Adversarial Networks (GAN) technique, which was implemented on MNIST and Driving datasets. The style transferring capability has been observed very effective in adding additional visual effects, replacing image elements, and style-shifting (virtual image to real images). The GAN-based testing sample generation system was shown to be the next frontier for automated testing for deep learning systems.

preprint2019arXiv

Large time behavior, bi-Hamiltonian structure and kinetic formulation for complex Burgers equation

We prove the existence and uniqueness of positive analytical solutions with positive initial data to the mean field equation (the Dyson equation) of the Dyson Brownian motion through the complex Burgers equation with a force term on the upper half complex plane. These solutions converge to a steady state given by Wigner's semicircle law. A unique global weak solution with nonnegative initial data to the Dyson equation is obtained and some explicit solutions are given by Wigner's semicircle laws. We also construct a bi-Hamiltonian structure for the system of the real and imaginary components of the complex Burgers equation (coupled Burgers system). We establish a kinetic formulation for the coupled Burgers system and prove the existence and uniqueness of entropy solutions. The coupled Burgers system in Lagrangian variable naturally leads to two interacting particle systems: Fermi-Pasta-Ulam-Tsingou model with nearest-neighbor interactions, and Calogero-Moser model. These two particle systems yield the same Lagrangian dynamics in the continuum limit.

preprint2016arXiv

Deep Gate Recurrent Neural Network

This paper introduces two recurrent neural network structures called Simple Gated Unit (SGU) and Deep Simple Gated Unit (DSGU), which are general structures for learning long term dependencies. Compared to traditional Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), both structures require fewer parameters and less computation time in sequence classification tasks. Unlike GRU and LSTM, which require more than one gates to control information flow in the network, SGU and DSGU only use one multiplicative gate to control the flow of information. We show that this difference can accelerate the learning speed in tasks that require long dependency information. We also show that DSGU is more numerically stable than SGU. In addition, we also propose a standard way of representing inner structure of RNN called RNN Conventional Graph (RCG), which helps analyzing the relationship between input units and hidden units of RNN.

preprint2016arXiv

Interfacial spin-orbit torque without bulk spin-orbit coupling

An electric current in the presence of spin-orbit coupling can generate a spin accumulation that exerts torques on a nearby magnetization. We demonstrate that, even in the absence of materials with strong bulk spin-orbit coupling, a torque can arise solely due to interfacial spin-orbit coupling, namely Rashba-Eldestein effects at metal/insulator interfaces. In magnetically soft NiFe sandwiched between a weak spin-orbit metal (Ti) and insulator (Al$_2$O$_3$), this torque appears as an effective field, which is significantly larger than the Oersted field and sensitive to insertion of an additional layer between NiFe and Al$_2$O$_3$. Our findings point to new routes for tuning spin-orbit torques by engineering interfacial electric dipoles.

preprint2016arXiv

Symmetric Non-Rigid Structure from Motion for Category-Specific Object Structure Estimation

Many objects, especially these made by humans, are symmetric, e.g. cars and aeroplanes. This paper addresses the estimation of 3D structures of symmetric objects from multiple images of the same object category, e.g. different cars, seen from various viewpoints. We assume that the deformation between different instances from the same object category is non-rigid and symmetric. In this paper, we extend two leading non-rigid structure from motion (SfM) algorithms to exploit symmetry constraints. We model the both methods as energy minimization, in which we also recover the missing observations caused by occlusions. In particularly, we show that by rotating the coordinate system, the energy can be decoupled into two independent terms, which still exploit symmetry, to apply matrix factorization separately on each of them for initialization. The results on the Pascal3D+ dataset show that our methods significantly improve performance over baseline methods.

preprint2015arXiv

A Survey on Operational State Complexity

Descriptional complexity is the study of the conciseness of the various models representing formal languages. The state complexity of a regular language is the size, measured by the number of states of the smallest, either deterministic or nondeterministic, finite automaton that recognises it. Operational state complexity is the study of the state complexity of operations over languages. In this survey, we review the state complexities of individual regularity preserving language operations on regular and some subregular languages. Then we revisit the state complexities of the combination of individual operations. We also review methods of estimation and approximation of state complexity of more complex combined operations.

preprint2015arXiv

Control of magnetic relaxation by electric-field-induced ferroelectric phase transition and inhomogeneous domain switching

Electric-field modulation of magnetism in strain-mediated multiferroic heterostructures is considered a promising scheme for enabling memory and magnetic microwave devices with ultralow power consumption. However, it is not well understood how electric-field-induced strain influences magnetic relaxation, an important physical process for device applications. Here we investigate resonant magnetization dynamics in ferromagnet/ferrolectric multiferroic heterostructures, FeGaB/PMN-PT and NiFe/PMN-PT, in two distinct strain states provided by electric-field-induced ferroelectric phase transition. The strain not only modifies magnetic anisotropy but also magnetic relaxation. In FeGaB/PMN-PT, we observe a nearly two-fold change in intrinsic Gilbert damping by electric field, which is attributed to strain-induced tuning of spin-orbit coupling. By contrast, a small but measurable change in extrinsic linewidth broadening is attributed to inhomogeneous ferroelastic domain switching during the phase transition of the PMN-PT substrate.

preprint2014arXiv

Naming Game on Networks: Let Everyone be Both Speaker and Hearer

To investigate how consensus is reached on a large self-organized peer-to-peer network, we extended the naming game model commonly used in language and communication to Naming Game in Groups (NGG). Differing from other existing naming game models, in NGG, everyone in the population (network) can be both speaker and hearer simultaneously, which resembles in a closer manner to real-life scenarios. Moreover, NGG allows the transmission (communication) of multiple words (opinions) for multiple intra-group consensuses. The communications among indirectly-connected nodes are also enabled in NGG. We simulated and analyzed the consensus process in some typical network topologies, including random-graph networks, small-world networks and scale-free networks, to better understand how global convergence (consensus) could be reached on one common word. The results are interpreted on group negotiation of a peer-to-peer network, which shows that global consensus in the population can be reached more rapidly when more opinions are permitted within each group or when the negotiating groups in the population are larger in size. The novel features and properties introduced by our model have demonstrated its applicability in better investigating general consensus problems on peer-to-peer networks.

preprint2013arXiv

The influence of the symmetry energy on the cone-azimuthal emission

In the framework of the isospin-dependent Boltzmann-Uehling-Uhlenbeck transport model, effects of the symmetry energy on the evolutions of free n/p ratio and charged pion ratio in the semi-central collision of $^{197}$Au+$^{197}$Au at an incident beam energy of 400 MeV/nucleon are studied. At the beginning of the reaction (before 11 fm/c) they are both affected by the low-density behavior of the symmetry energy but soon after are affected by the high-density behavior of the symmetry energy after nuclei are compressed (after 11 fm/c) and the effects of the symmetry energy are generally smaller compared with the central collision case. Interestingly, their dependences on the symmetry energy are shown to arise with increase of cone-azimuthal angle of the emitted particles. In the direction perpendicular to the reaction plane, the $π^{-}/π^{+}$ ratio or free n/p ratio especially at high kinetic energies exhibits significant sensitivity to the symmetry energy.

preprint2012arXiv

Effect of the momentum dependence of nuclear symmetry potential on the transverse and elliptic flows

In the framework of the isospin-dependent Boltzmann-Uehling-Uhlenbeck transport model, effect of the momentum dependence of nuclear symmetry potential on nuclear transverse and elliptic flows in the neutron-rich reaction $^{132}$Sn+$^{124}$Sn at a beam energy of 400 MeV/nucleon is studied. We find that the momentum dependence of nuclear symmetry potential affects the rapidity distribution of the free neutron to proton ratio, the neutron and the proton transverse flows as a function of rapidity. The momentum dependence of nuclear symmetry potential affects the neutron-proton differential transverse flow more evidently than the difference of neutron and proton transverse flows as well as the difference of proton and neutron elliptic flows. It is thus better to probe the symmetry energy by using the difference of neutron and proton flows since the momentum dependence of nuclear symmetry potential is still an open question. And it is better to probe the momentum dependence of nuclear symmetry potential by using the neutron-proton differential transverse flow and the rapidity distribution of the free neutron to proton ratio.

preprint2011arXiv

Effect of the momentum dependence of nuclear symmetry potential on pion-/pion+ ratio in heavy-ion collisions

In the framework of the isospin-dependent Boltzmann-Uehling-Uhlenbeck transport model, effect of the momentum dependence of nuclear symmetry potential on pion-/pion+ ratio in the neutron-rich reaction 132Sn+124Sn at a beam energy of 400 MeV/nucleon is studied. We find that the momentum dependence of nuclear symmetry potential affects the compressed density of colliding nuclei, numbers of produced pion- and pion+, as well as the value of pion-/pion+ ratio. The momentum dependent nuclear symmetry potential increases the compressed density of colliding nuclei, numbers of produced resonances delta(1232), N*(1440), pion- and pion+, as well as the value of pion-/pion+ ratio.

preprint2011arXiv

Initialization effect in heavy-ion collisions at intermediate energies

Based on the isospin-dependent Boltzmann-Uehling-Uhlenbeck transport model plus the Skyrme force parameters, initialization effect is studied in heavy-ion collision at intermediate energies. We find that there are moderate initialization effects in the observables of free neutron to proton ratio (n/p), pion-/pion + ratio, as well as neutron to proton differential flow (F^x_n-p). Effects of initialization are larger for charged pion-/pion ratios than n/p ratios. And the effects of initialization are more evident in nuclear reactions at lower incident beam energies. We do not see large effects of initialization for light reaction systems or large asymmetric (neutron-richer) reaction systems. We also see relatively large effects of initialization on the neutron to proton differential flow at relatively lower incident beam energies or with large impact parameters. These results may be useful for the delicate studies of Equation of Sate (EoS) of asymmetric nuclear matter.

preprint2010arXiv

Metropolitan all-pass and inter-city quantum communication network

We have demonstrated a metropolitan all-pass quantum communication network in field fiber for four nodes. Any two nodes of them can be connected in the network to perform quantum key distribution (QKD). An optical switching module is presented that enables arbitrary 2-connectivity among output ports. Integrated QKD terminals are worked out, which can operate either as a transmitter, a receiver, or even both at the same time. Furthermore, an additional link in another city of 60 km fiber (up to 130 km) is seamless integrated into this network based on a trusted relay architecture. On all the links, we have implemented protocol of decoy state scheme. All of necessary electrical hardware, synchronization, feedback control, network software, execution of QKD protocols are made by tailored designing, which allow a completely automatical and stable running. Our system has been put into operation in Hefei in August 2009, and publicly demonstrated during an evaluation conference on quantum network organized by the Chinese Academy of Sciences on August 29, 2009. Real-time voice telephone with one-time pad encoding between any two of the five nodes (four all-pass nodes plus one additional node through relay) is successfully established in the network within 60km.

preprint2010arXiv

State Complexity of Catenation Combined with Star and Reversal

This paper is a continuation of our research work on state complexity of combined operations. Motivated by applications, we study the state complexities of two particular combined operations: catenation combined with star and catenation combined with reversal. We show that the state complexities of both of these combined operations are considerably less than the compositions of the state complexities of their individual participating operations.

preprint2010arXiv

State Complexity of Two Combined Operations: Reversal-Catenation and Star-Catenation

In this paper, we show that, due to the structural properties of the resulting automaton obtained from a prior operation, the state complexity of a combined operation may not be equal but close to the mathematical composition of the state complexities of its component operations. In particular, we provide two witness combined operations: reversal combined with catenation and star combined with catenation.

preprint2010arXiv

State complexity of union and intersection combined with star and reversal

In this paper, we study the state complexities of union and intersection combined with star and reversal, respectively. We obtain the state complexities of these combined operations on regular languages and show that they are less than the mathematical composition of the state complexities of their individual participating operations.

preprint2010arXiv

Transition Complexity of Incomplete DFAs

In this paper, we consider the transition complexity of regular languages based on the incomplete deterministic finite automata. A number of results on Boolean operations have been obtained. It is shown that the transition complexity results for union and complementation are very different from the state complexity results for the same operations. However, for intersection, the transition complexity result is similar to that of state complexity.

preprint2009arXiv

Approximate homotopy symmetry method and homotopy series solutions to the six-order boussinesq equation

An approximate homotopy symmetry method for nonlinear problems is proposed and applied to the six-order boussinesq equation. We summarize the general formulas for similarity reduction solutions and similarity reduction equations of different orders, educing the related homotopy series solutions. The convergence region of homotopy series solutions can be adjusted by the auxiliary parameter. Series solutions and similarity reduction equations from approximate symmetry method can be retrieved from approximate homotopy symmetry method.

preprint2009arXiv

Nonsensitive nonlinear homotopy approach

Generally, natural scientific problems are so complicated that one has to establish some effective perturbation or nonperturbation theories with respect to some associated ideal models. In this Letter, a new theory that combines perturbation and nonperturbation is constructed. An artificial nonlinear homotopy parameter plays the role of a perturbation parameter, while other artificial nonlinear parameters, of which the original problems are independent, introduced in the nonlinear homotopy models are nonperturbatively determined by means of a principle minimal sensitivity. The method is demonstrated through several quantum anharmonic oscillators and a non-hermitian parity-time symmetric Hamiltonian system. In fact, the framework of the theory is rather general that can be applied to a broad range of natural phenomena. Possible applications to condensed matter physics, matter wave systems, and nonlinear optics are briefly discussed.

preprint2006arXiv

Topological structure of the vortex solution in Jackiw-Pi model

By using $ϕ$ -mapping method, we discuss the topological structure of the self-duality solution in Jackiw-Pi model in terms of gauge potential decomposition. We set up relationship between Chern-Simons vortices solution and topological number which is determined by Hopf index and and Brouwer degree. We also give the quantization of flux in the case. Then, we study the angular momentum of the vortex, it can be expressed in terms of the flux.

Yuan Gao

What is connected

Connect this record

See the researcher in context

Building this map preview

70 published item(s)

RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models

STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation

Magnon Damping Minimum and Logarithmic Scaling in a Kondo-Heisenberg Model

MvKSR: Multi-view Knowledge-guided Scene Recovery for Hazy and Rainy Degradation

Convergence of Extragradient SVRG for Variational Inequalities: Error Bounds and Increasing Iterate Averaging

1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation

A New Combinatorial Property of Geometric Unique Sink Orientations

Asymptotic stability for diffusion with dynamic boundary reaction from Ginzburg-Landau energy

Bidding Agent Design in the LinkedIn Ad Marketplace

Exploring High-quality Target Domain Information for Unsupervised Domain Adaptive Semantic Segmentation

Factor-augmented model for functional data

Finding Dynamics Preserving Adversarial Winning Tickets

Hermite-Gaussian-mode coherently composed states and deep learning based free-space optical communication link

Invariant Filtering for Legged Humanoid Locomotion on Dynamic Rigid Surfaces

Latent-Variable Advantage-Weighted Policy Optimization for Offline RL

Manipulating propagation and evolution of polarization singularities in composite Bessel-like fields

Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering

Physarum Inspired Dynamics to Solve Semi-Definite Programs

Rabinowitz Fukaya categories and the categorical formal punctured neighborhood of infinity

Rumor Detection with Self-supervised Learning on Texts and Social Graph

Some results on locally repairable codes with minimum distance $7$ and locality $2$

Spin Supersolidity in Nearly Ideal Easy-axis Triangular Quantum Antiferromagnet Na$_2$BaCo(PO$_4$)$_2$

Towards Autonomous Atlas-based Ultrasound Acquisitions in Presence of Articulated Motion

Versatile Non-diffracting Perfect Vortex Beams

VesNet-RL: Simulation-based Reinforcement Learning for Real-World US Probe Navigation

Wormhole MAML: Meta-Learning in Glued Parameter Space

A Multiobjective State Transition Algorithm Based on Decomposition

Factor-augmented Smoothing Model for Functional Data

High-performance green and blue quantum-dot light-emitting diodes with eliminated charge leakage

Leveraging Activity Recognition to Enable Protective Behavior Detection in Continuous Data

Partial FC: Training 10 Million Identities on a Single Machine

Thermal Conductivities and Interfacial Thermal Conductance of 1- to 3-Layer WSe$_2$

An Improved Analysis of Stochastic Gradient Descent with Momentum

Application of Deep Q-Network in Portfolio Management

Automatic Differentiation for Second Renormalization of Tensor Networks

CAD-PU: A Curvature-Adaptive Deep Learning Solution for Point Set Upsampling

DRST: Deep Residual Shearlet Transform for Densely Sampled Light Field Reconstruction

Generation of Chinese-characters-like modes with transverse mode locked lasers

Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement

Grid-Forming Converters control based on DC voltage feedback

Increasing Iterate Averaging for Solving Saddle-Point Problems

Intra-Ensemble in Neural Networks

Learning Implicit Generative Models with Theoretical Guarantees

Limiting Behaviors of High Dimensional Stochastic Spin Ensembles

MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning

PP-YOLO: An Effective and Efficient Implementation of Object Detector

Stacking Domain Wall Magnons in Twisted van der Waals Magnets

Stochastic Flows and Geometric Optimization on the Orthogonal Group

Unity: A General Platform for Intelligent Agents

Weakly Supervised Deep Learning for COVID-19 Infection Detection and Classification from CT Images

Automated Testing for Deep Learning Systems with Differential Behavior Criteria

Large time behavior, bi-Hamiltonian structure and kinetic formulation for complex Burgers equation

Deep Gate Recurrent Neural Network

Interfacial spin-orbit torque without bulk spin-orbit coupling

Symmetric Non-Rigid Structure from Motion for Category-Specific Object Structure Estimation

A Survey on Operational State Complexity

Control of magnetic relaxation by electric-field-induced ferroelectric phase transition and inhomogeneous domain switching

Naming Game on Networks: Let Everyone be Both Speaker and Hearer

The influence of the symmetry energy on the cone-azimuthal emission

Effect of the momentum dependence of nuclear symmetry potential on the transverse and elliptic flows

Effect of the momentum dependence of nuclear symmetry potential on pion-/pion+ ratio in heavy-ion collisions

Initialization effect in heavy-ion collisions at intermediate energies

Metropolitan all-pass and inter-city quantum communication network

State Complexity of Catenation Combined with Star and Reversal

State Complexity of Two Combined Operations: Reversal-Catenation and Star-Catenation

State complexity of union and intersection combined with star and reversal

Transition Complexity of Incomplete DFAs

Approximate homotopy symmetry method and homotopy series solutions to the six-order boussinesq equation

Nonsensitive nonlinear homotopy approach

Topological structure of the vortex solution in Jackiw-Pi model