Source author record

Kuo Gai

Kuo Gai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning physics.flu-dyn Artificial Intelligence Biological Physics Computer Vision math.OC physics.comp-ph

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Deciphering Shortcut Learning from an Evolutionary Game Theory Perspective

Shortcut learning causes deep learning models to rely on non-essential features within the data. However, its formation in deep neural network training still lacks theoretical understanding. In this paper, we provide a formal definition of core and shortcut features and employ evolutionary game theory to analyze the origins of shortcut bias by modeling data samples as players and their corresponding neural tangent features as strategies, assuming the existence of core and shortcut subnetworks. We find that gradient descent (GD) and stochastic gradient descent (SGD) lead to two distinct stochastically stable states, each corresponding to a different strategy. The former primarily optimizes the shortcut subnetwork, while the latter primarily optimizes the core subnetwork. We investigate the influence of these strategies on shortcut bias through a continuous stochastic differential equation, and reveal the impact of data noise and optimization noise on the formation of shortcut bias. In brief, our work employs evolutionary game theory to characterize the dynamics of shortcut bias formation and provides a theoretical view on its mitigation.

preprint2021arXiv

A Mathematical Principle of Deep Learning: Learn the Geodesic Curve in the Wasserstein Space

Recent studies revealed the mathematical connection of deep neural network (DNN) and dynamic system. However, the fundamental principle of DNN has not been fully characterized with dynamic system in terms of optimization and generalization. To this end, we build the connection of DNN and continuity equation where the measure is conserved to model the forward propagation process of DNN which has not been addressed before. DNN learns the transformation of the input distribution to the output one. However, in the measure space, there are infinite curves connecting two distributions. Which one can lead to good optimization and generaliztion for DNN? By diving the optimal transport theory, we find DNN with weight decay attempts to learn the geodesic curve in the Wasserstein space, which is induced by the optimal transport map. Compared with plain network, ResNet is a better approximation to the geodesic curve, which explains why ResNet can be optimized and generalize better. Numerical experiments show that the data tracks of both plain network and ResNet tend to be line-shape in term of line-shape score (LSS), and the map learned by ResNet is closer to the optimal transport map in term of optimal transport score (OTS). In a word, we conclude a mathematical principle of deep learning is to learn the geodesic curve in the Wasserstein space; and deep learning is a great engineering realization of continuous transformation in high-dimensional space.

preprint2021arXiv

Matrix Normal PCA for Interpretable Dimension Reduction and Graphical Noise Modeling

Principal component analysis (PCA) is one of the most widely used dimension reduction and multivariate statistical techniques. From a probabilistic perspective, PCA seeks a low-dimensional representation of data in the presence of independent identical Gaussian noise. Probabilistic PCA (PPCA) and its variants have been extensively studied for decades. Most of them assume the underlying noise follows a certain independent identical distribution. However, the noise in the real world is usually complicated and structured. To address this challenge, some variants of PCA for data with non-IID noise have been proposed. However, most of the existing methods only assume that the noise is correlated in the feature space while there may exist two-way structured noise. To this end, we propose a powerful and intuitive PCA method (MN-PCA) through modeling the graphical noise by the matrix normal distribution, which enables us to explore the structure of noise in both the feature space and the sample space. MN-PCA obtains a low-rank representation of data and the structure of noise simultaneously. And it can be explained as approximating data over the generalized Mahalanobis distance. We develop two algorithms to solve this model: one maximizes the regularized likelihood, the other exploits the Wasserstein distance, which is more robust. Extensive experiments on various data demonstrate their effectiveness.

preprint2021arXiv

Tessellated Wasserstein Auto-Encoders

Non-adversarial generative models such as variational auto-encoder (VAE), Wasserstein auto-encoders with maximum mean discrepancy (WAE-MMD), sliced-Wasserstein auto-encoder (SWAE) are relatively easy to train and have less mode collapse compared to Wasserstein auto-encoder with generative adversarial network (WAE-GAN). However, they are not very accurate in approximating the target distribution in the latent space because they don't have a discriminator to detect the minor difference between real and fake. To this end, we develop a novel non-adversarial framework called Tessellated Wasserstein Auto-encoders (TWAE) to tessellate the support of the target distribution into a given number of regions by the centroidal Voronoi tessellation (CVT) technique and design batches of data according to the tessellation instead of random shuffling for accurate computation of discrepancy. Theoretically, we demonstrate that the error of estimate to the discrepancy decreases when the numbers of samples $n$ and regions $m$ of the tessellation become larger with rates of $\mathcal{O}(\frac{1}{\sqrt{n}})$ and $\mathcal{O}(\frac{1}{\sqrt{m}})$, respectively. Given fixed $n$ and $m$, a necessary condition for the upper bound of measurement error to be minimized is that the tessellation is the one determined by CVT. TWAE is very flexible to different non-adversarial metrics and can substantially enhance their generative performance in terms of Fréchet inception distance (FID) compared to VAE, WAE-MMD, SWAE. Moreover, numerical results indeed demonstrate that TWAE is competitive to the adversarial model WAE-GAN, demonstrating its powerful generative ability.

preprint2011arXiv

Cicada: a Heavy but Agile Flyer

"Cicada: a Heavy but Agile Flyer" is a fluid dynamic video submitted to Gallery of Fluid Motion in APS-DFD 2011. Comparing to other insects, cicadas can generate much higher lift to overcome their large body weight. The hidden mechanism may help in designing a Micro Air Vehicle (MAV) to carry large payloads. However, it is lack of literatures in discussing how cicadas use their wings to accomplish various flights. In this work, a high-speed photogrammetry system and 3D surface reconstruction technology are used to reveal cicada wing kinematics and deformation during a freely forward flight. The aerodynamic performance is studied using in-house immerse boundary method based Computational Fluid Dynamics(CFD) solver.

preprint2011arXiv

Deterioration of Damselfly Flight Performance due to Wing Damage

In this video, effect of chordwise damage on a damselfly (American Rubyspot)'s wings is investigated. High speed photogrammetry was used to collect the data of damselflies' flight with intact and damaged wings along the wing chord. Different level of deterioration of flight performance can be observed. Further investigation will be on the dynamic and aerodynamic roles of each wing with and without damage.

Kuo Gai

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Deciphering Shortcut Learning from an Evolutionary Game Theory Perspective

A Mathematical Principle of Deep Learning: Learn the Geodesic Curve in the Wasserstein Space

Matrix Normal PCA for Interpretable Dimension Reduction and Graphical Noise Modeling

Tessellated Wasserstein Auto-Encoders

Cicada: a Heavy but Agile Flyer

Deterioration of Damselfly Flight Performance due to Wing Damage