Researcher profile

Xinhua Zhang

Xinhua Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Adaptive Data Harvesting for Efficient Neural Network Learning with Universal Constraints

Training neural networks to satisfy universal constraints over continuous domains poses unique challenges. Common examples include Lyapunov Neural Networks (Lyapunov NNs) and Physics-Informed Neural Networks (PINNs), where analytical solutions are generally either unavailable or overly restrictive. Sample-based methods are therefore commonly used to enforce these constraints, and the choice of samples has a substantial impact on convergence speed, stability, and solution quality. Most existing methods rely on fixed heuristics or handcrafted rules, and are suboptimal in practice. In this paper, we aim to improve upon them by learning, from data and experience, how to dynamically and iteratively adjust the samples in response to the model's evolving learning performance. Trained by reinforcement learning, the learned policy improves empirical constraint satisfaction on test problems while significantly improving efficiency. We validate the approach on both Lyapunov NNs and PINNs, and demonstrate its broader applicability to domains where adaptive input selection is essential for effective training.

preprint2026arXiv

Language Model Distillation: A Temporal Difference Imitation Learning Perspective

Large language models have led to significant progress across many NLP tasks, although their massive sizes often incur substantial computational costs. Distillation has become a common practice to compress these large and highly capable models into smaller, more efficient ones. Many existing language model distillation methods can be viewed as behavior cloning from the perspective of imitation learning or inverse reinforcement learning. This viewpoint has inspired subsequent studies that leverage (inverse) reinforcement learning techniques, including variations of behavior cloning and temporal difference learning methods. Rather than proposing yet another specific temporal difference method, we introduce a general framework for temporal difference-based distillation by exploiting the distributional sparsity of the teacher model. Specifically, it is often observed that language models assign most probability mass to a small subset of tokens. Motivated by this observation, we design a temporal difference learning framework that operates on a reduced action space (a subset of vocabulary), and demonstrate how practical algorithms can be derived and the resulting performance improvements.

preprint2022arXiv

Euclidean Invariant Recognition of 2D Shapes Using Histograms of Magnitudes of Local Fourier-Mellin Descriptors

Because the magnitude of inner products with its basis functions are invariant to rotation and scale change, the Fourier-Mellin transform has long been used as a component in Euclidean invariant 2D shape recognition systems. Yet Fourier-Mellin transform magnitudes are only invariant to rotation and scale changes about a known center point, and full Euclidean invariant shape recognition is not possible except when this center point can be consistently and accurately identified. In this paper, we describe a system where a Fourier-Mellin transform is computed at every point in the image. The spatial support of the Fourier-Mellin basis functions is made local by multiplying them with a polynomial envelope. Significantly, the magnitudes of convolutions with these complex filters at isolated points are not (by themselves) used as features for Euclidean invariant shape recognition because reliable discrimination would require filters with spatial support large enough to fully encompass the shapes. Instead, we rely on the fact that normalized histograms of magnitudes are fully Euclidean invariant. We demonstrate a system based on the VLAD machine learning method that performs Euclidean invariant recognition of 2D shapes and requires an order of magnitude less training data than comparable methods based on convolutional neural networks.

preprint2022arXiv

Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound

Comparing structured data from possibly different metric-measure spaces is a fundamental task in machine learning, with applications in, e.g., graph classification. The Gromov-Wasserstein (GW) discrepancy formulates a coupling between the structured data based on optimal transportation, tackling the incomparability between different structures by aligning the intra-relational geometries. Although efficient \emph{local} solvers such as conditional gradient and Sinkhorn are available, the inherent non-convexity still prevents a tractable evaluation, and the existing lower bounds are not tight enough for practical use. To address this issue, we take inspiration from the connection with the quadratic assignment problem, and propose the orthogonal Gromov-Wasserstein (OGW) discrepancy as a surrogate of GW. It admits an efficient and \emph{closed-form} lower bound with $\mathcal{O}(n^3)$ complexity, and directly extends to the fused Gromov-Wasserstein (FGW) distance, incorporating node features into the coupling. Extensive experiments on both the synthetic and real-world datasets show the tightness of our lower bounds, and both OGW and its lower bounds efficiently deliver accurate predictions and satisfactory barycenters for graph sets.

preprint2022arXiv

Plasma Image Classification Using Cosine Similarity Constrained CNN

Plasma jets are widely investigated both in the laboratory and in nature. Astrophysical objects such as black holes, active galactic nuclei, and young stellar objects commonly emit plasma jets in various forms. With the availability of data from plasma jet experiments resembling astrophysical plasma jets, classification of such data would potentially aid in investigating not only the underlying physics of the experiments but the study of astrophysical jets. In this work we use deep learning to process all of the laboratory plasma images from the Caltech Spheromak Experiment spanning two decades. We found that cosine similarity can aid in feature selection, classify images through comparison of feature vector direction, and be used as a loss function for the training of AlexNet for plasma image classification. We also develop a simple vector direction comparison algorithm for binary and multi-class classification. Using our algorithm we demonstrate 93% accurate binary classification to distinguish unstable columns from stable columns and 92% accurate five-way classification of a small, labeled data set which includes three classes corresponding to varying levels of kink instability.

preprint2022arXiv

Similarity Equivariant Linear Transformation of Joint Orientation-Scale Space Representations

Convolution is conventionally defined as a linear operation on functions of one or more variables which commutes with shifts. Group convolution generalizes the concept to linear operations on functions of group elements representing more general geometric transformations and which commute with those transformations. Since similarity transformation is the most general geometric transformation on images that preserves shape, the group convolution that is equivariant to similarity transformation is the most general shape preserving linear operator. Because similarity transformations have four free parameters, group convolutions are defined on four-dimensional, joint orientation-scale spaces. Although prior work on equivariant linear operators has been limited to discrete groups, the similarity group is continuous. In this paper, we describe linear operators on discrete representations that are equivariant to continuous similarity transformation. This is achieved by using a basis of functions that is it joint shiftable-twistable-scalable. These pinwheel functions use Fourier series in the orientation dimension and Laplace transform in the log-scale dimension to form a basis of spatially localized functions that can be continuously interpolated in position, orientation and scale. Although this result is potentially significant with respect to visual computation generally, we present an initial demonstration of its utility by using it to compute a shape equivariant distribution of closed contours traced by particles undergoing Brownian motion in velocity. The contours are constrained by sets of points and line endings representing well known bistable illusory contour inducing patterns.

preprint2020arXiv

Convex Representation Learning for Generalized Invariance in Semi-Inner-Product Space

Invariance (defined in a general sense) has been one of the most effective priors for representation learning. Direct factorization of parametric models is feasible only for a small range of invariances, while regularization approaches, despite improved generality, lead to nonconvex optimization. In this work, we develop a convex representation learning algorithm for a variety of generalized invariances that can be modeled as semi-norms. Novel Euclidean embeddings are introduced for kernel representers in a semi-inner-product space, and approximation bounds are established. This allows invariant representations to be learned efficiently and effectively as confirmed in our experiments, along with accurate predictions.

preprint2020arXiv

Generalised Lipschitz Regularisation Equals Distributional Robustness

The problem of adversarial examples has highlighted the need for a theory of regularisation that is general enough to apply to exotic function classes, such as universal approximators. In response, we give a very general equality result regarding the relationship between distributional robustness and regularisation, as defined with a transportation cost uncertainty set. The theory allows us to (tightly) certify the robustness properties of a Lipschitz-regularised model with very mild assumptions. As a theoretical application we show a new result explicating the connection between adversarial learning and distributional robustness. We then give new results for how to achieve Lipschitz regularisation of kernel classifiers, which are demonstrated experimentally.

preprint2020arXiv

Proximal Mapping for Deep Regularization

Underpinning the success of deep learning is effective regularizations that allow a variety of priors in data to be modeled. For example, robustness to adversarial perturbations, and correlations between multiple modalities. However, most regularizers are specified in terms of hidden layer outputs, which are not themselves optimization variables. In contrast to prevalent methods that optimize them indirectly through model weights, we propose inserting proximal mapping as a new layer to the deep network, which directly and explicitly produces well regularized hidden layer outputs. The resulting technique is shown well connected to kernel warping and dropout, and novel algorithms were developed for robust temporal learning and multiview modeling, both outperforming state-of-the-art methods.

preprint2010arXiv

Faster Rates for training Max-Margin Markov Networks

Structured output prediction is an important machine learning problem both in theory and practice, and the max-margin Markov network (\mcn) is an effective approach. All state-of-the-art algorithms for optimizing \mcn\ objectives take at least $O(1/ε)$ number of iterations to find an $ε$ accurate solution. Recent results in structured optimization suggest that faster rates are possible by exploiting the structure of the objective function. Towards this end \citet{Nesterov05} proposed an excessive gap reduction technique based on Euclidean projections which converges in $O(1/\sqrtε)$ iterations on strongly convex functions. Unfortunately when applied to \mcn s, this approach does not admit graphical model factorization which, as in many existing algorithms, is crucial for keeping the cost per iteration tractable. In this paper, we present a new excessive gap reduction technique based on Bregman projections which admits graphical model factorization naturally, and converges in $O(1/\sqrtε)$ iterations. Compared with existing algorithms, the convergence rate of our method has better dependence on $ε$ and other parameters of the problem, and can be easily kernelized.