Researcher profile

Jack Xin

Jack Xin contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

A Stochastic Genetic Interacting Particle Method for Reaction-Diffusion-Advection Equations

We develop and analyze a stochastic genetic interacting particle method (SGIP) for reaction-diffusion-advection (RDA) equations. The SGIP method employs operator splitting to approximate the advection-diffusion and reaction processes, treating the former using particle drift-diffusion and the latter via exact or implicit integration of reaction dynamics over bins, where particle density is estimated using a histogram. A key innovation is the incorporation of adaptive resampling to close the loop of particle and density field description of solutions, mimicking the selection mechanism in genetics. Resampling is also crucial for maintaining long-term stability by redistributing particles in accordance with the evolving density field. We provide a comprehensive error analysis and establish convergence bounds under appropriate regularity assumptions. Numerical experiments in one to three space dimensions demonstrate the method's effectiveness across various reaction types (Fisher-Kolmogorov-Petrovsky-Piskunov (FKPP), cubic, Arrhenius) and flow configurations (shear, cellular, cat's eye, Arnold-Beltrami-Childress (ABC) flows), showing excellent agreement with the finite difference method (FDM) while offering computational advantages for complex flow geometries and higher-dimensional problems.

preprint2026arXiv

USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation

Accurate medical image segmentation is an integral part of the medical image analysis pipeline that requires the ability to merge local and global information. While vision transformers are able to capture global interactions using vanilla self-attention, their quadratic computational complexity in the input size remains a struggle for medical image segmentation tasks. Motivated by the dispersion property of vanilla self-attention and recent development of Mamba form of attention, Scalable and Efficient Mamba like Attention (SEMA) utilizes token localization via local window attention to avoid dispersion and maintain focusing, complemented by theoretically consistent arithmetic averaging to capture global aspect of attention. In this work, we present USEMA, a hybrid UNet architecture that merges the local feature extraction ability of convolutional neural networks (CNNs) with SEMA attention. We conduct experiments with USEMA across a variety of modalities and image sizes, demonstrating improved computational efficiency compared to transformer based models using full self-attention, and superior segmentation performance relative to purely convolution and Mamba-based models.

preprint2022arXiv

An integrated recurrent neural network and regression model with spatial and climatic couplings for vector-borne disease dynamics

We developed an integrated recurrent neural network and nonlinear regression spatio-temporal model for vector-borne disease evolution. We take into account climate data and seasonality as external factors that correlate with disease transmitting insects (e.g. flies), also spill-over infections from neighboring regions surrounding a region of interest. The climate data is encoded to the model through a quadratic embedding scheme motivated by recommendation systems. The neighboring regions' influence is modeled by a long short-term memory neural network. The integrated model is trained by stochastic gradient descent and tested on leish-maniasis data in Sri Lanka from 2013-2018 where infection outbreaks occurred. Our model outperformed ARIMA models across a number of regions with high infections, and an associated ablation study renders support to our modeling hypothesis and ideas.

preprint2022arXiv

Channel Pruning In Quantization-aware Training: An Adaptive Projection-gradient Descent-shrinkage-splitting Method

We propose an adaptive projection-gradient descent-shrinkage-splitting method (APGDSSM) to integrate penalty based channel pruning into quantization-aware training (QAT). APGDSSM concurrently searches weights in both the quantized subspace and the sparse subspace. APGDSSM uses shrinkage operator and a splitting technique to create sparse weights, as well as the Group Lasso penalty to push the weight sparsity into channel sparsity. In addition, we propose a novel complementary transformed l1 penalty to stabilize the training for extreme compression.

preprint2022arXiv

DeepParticle: learning invariant measure by a deep neural network minimizing Wasserstein distance on data generated from an interacting particle method

We introduce the so called DeepParticle method to learn and generate invariant measures of stochastic dynamical systems with physical parameters based on data computed from an interacting particle method (IPM). We utilize the expressiveness of deep neural networks (DNNs) to represent the transform of samples from a given input (source) distribution to an arbitrary target distribution, neither assuming distribution functions in closed form nor a finite state space for the samples. In training, we update the network weights to minimize a discrete Wasserstein distance between the input and target samples. To reduce computational cost, we propose an iterative divide-and-conquer (a mini-batch interior point) algorithm, to find the optimal transition matrix in the Wasserstein distance. We present numerical results to demonstrate the performance of our method for accelerating IPM computation of invariant measures of stochastic dynamical systems arising in computing reaction-diffusion front speeds through chaotic flows. The physical parameter is a large Peclét number reflecting the advection dominated regime of our interest.

preprint2022arXiv

Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE

Variational auto-encoder (VAE) is an effective neural network architecture to disentangle a speech utterance into speaker identity and linguistic content latent embeddings, then generate an utterance for a target speaker from that of a source speaker. This is possible by concatenating the identity embedding of the target speaker and the content embedding of the source speaker uttering a desired sentence. In this work, we propose to improve VAE models with self-attention and structural regularization (RGSM). Specifically, we found a suitable location of VAE's decoder to add a self-attention layer for incorporating non-local information in generating a converted utterance and hiding the source speaker's identity. We applied relaxed group-wise splitting method (RGSM) to regularize network weights and remarkably enhance generalization performance. In experiments of zero-shot many-to-many voice conversion task on VCTK data set, with the self-attention layer and relaxed group-wise splitting method, our model achieves a gain of speaker classification accuracy on unseen speakers by 28.3\% while slightly improved conversion voice quality in terms of MOSNet scores. Our encouraging findings point to future research on integrating more variety of attention structures in VAE framework while controlling model size and overfitting for advancing zero-shot many-to-many voice conversions.

preprint2022arXiv

glassoformer: a query-sparse transformer for post-fault power grid voltage prediction

We propose GLassoformer, a novel and efficient transformer architecture leveraging group Lasso regularization to reduce the number of queries of the standard self-attention mechanism. Due to the sparsified queries, GLassoformer is more computationally efficient than the standard transformers. On the power grid post-fault voltage prediction task, GLassoformer shows remarkably better prediction than many existing benchmark algorithms in terms of accuracy and stability.

preprint2022arXiv

Proximal Implicit ODE Solvers for Accelerating Learning Neural ODEs

Learning neural ODEs often requires solving very stiff ODE systems, primarily using explicit adaptive step size ODE solvers. These solvers are computationally expensive, requiring the use of tiny step sizes for numerical stability and accuracy guarantees. This paper considers learning neural ODEs using implicit ODE solvers of different orders leveraging proximal operators. The proximal implicit solver consists of inner-outer iterations: the inner iterations approximate each implicit update step using a fast optimization algorithm, and the outer iterations solve the ODE system over time. The proximal implicit ODE solver guarantees superiority over explicit solvers in numerical stability and computational efficiency. We validate the advantages of proximal implicit solvers over existing popular neural ODE solvers on various challenging benchmark tasks, including learning continuous-depth graph neural networks and continuous normalizing flows.

preprint2022arXiv

RARTS: An Efficient First-Order Relaxed Architecture Search Method

Differentiable architecture search (DARTS) is an effective method for data-driven neural network design based on solving a bilevel optimization problem. Despite its success in many architecture search tasks, there are still some concerns about the accuracy of first-order DARTS and the efficiency of the second-order DARTS. In this paper, we formulate a single level alternative and a relaxed architecture search (RARTS) method that utilizes the whole dataset in architecture learning via both data and network splitting, without involving mixed second derivatives of the corresponding loss functions like DARTS. In our formulation of network splitting, two networks with different but related weights cooperate in search of a shared architecture. The advantage of RARTS over DARTS is justified by a convergence theorem and an analytically solvable model. Moreover, RARTS outperforms DARTS and its variants in accuracy and search efficiency, as shown in adequate experimental results. For the task of searching topological architecture, i.e., the edges and the operations, RARTS obtains a higher accuracy and 60\% reduction of computational cost than second-order DARTS on CIFAR-10. RARTS continues to out-perform DARTS upon transfer to ImageNet and is on par with recent variants of DARTS even though our innovation is purely on the training algorithm without modifying search space. For the task of searching width, i.e., the number of channels in convolutional layers, RARTS also outperforms the traditional network pruning benchmarks. Further experiments on the public architecture search benchmark like NATS-Bench also support the preeminence of RARTS.

preprint2022arXiv

Searching Intrinsic Dimensions of Vision Transformers

It has been shown by many researchers that transformers perform as well as convolutional neural networks in many computer vision tasks. Meanwhile, the large computational costs of its attention module hinder further studies and applications on edge devices. Some pruning methods have been developed to construct efficient vision transformers, but most of them have considered image classification tasks only. Inspired by these results, we propose SiDT, a method for pruning vision transformer backbones on more complicated vision tasks like object detection, based on the search of transformer dimensions. Experiments on CIFAR-100 and COCO datasets show that the backbones with 20\% or 40\% dimensions/parameters pruned can have similar or even better performance than the unpruned models. Moreover, we have also provided the complexity analysis and comparisons with the previous pruning methods.

preprint2021arXiv

Structure Assisted NMF Methods for Separation of Degenerate Mixture Data with Application to NMR Spectroscopy

In this paper, we develop structure assisted nonnegative matrix factorization (NMF) methods for blind source separation of degenerate data. The motivation originates from nuclear magnetic resonance (NMR) spectroscopy, where a multiple mixture NMR spectra are recorded to identify chemical compounds with similar structures. Consider the linear mixing model (LMM), we aim to identify the chemical compounds involved when the mixing process is known to be nearly singular. We first consider a class of data with dominant interval(s) (DI) where each of source signals has dominant peaks over others. Besides, a nearly singular mixing process produces degenerate mixtures. The DI condition implies clustering structures in the data points. Hence, the estimation of the mixing matrix could be achieved by data clustering. Due to the presence of the noise and the degeneracy of the data, a small deviation in the estimation may introduce errors in the output. To resolve this problem and improve robustness of the separation, methods are developed in two aspects. One is to find better estimation of the mixing matrix by allowing a constrained perturbation to the clustering output, and it can be achieved by a quadratic programming. The other is to seek sparse source signals by exploiting the DI condition, and it solves an $\ell_1$ optimization. If no source information is available, we propose to adopt the nonnegative matrix factorization approach by incorporating the matrix structure (parallel columns of the mixing matrix) into the cost function and develop multiplicative iteration rules for the numerical solutions. We present experimental results of NMR data to show the performance and reliability of the method in the applications arising in NMR spectroscopy.

preprint2020arXiv

A Recurrent Neural Network and Differential Equation Based Spatiotemporal Infectious Disease Model with Application to COVID-19

The outbreaks of Coronavirus Disease 2019 (COVID-19) have impacted the world significantly. Modeling the trend of infection and real-time forecasting of cases can help decision making and control of the disease spread. However, data-driven methods such as recurrent neural networks (RNN) can perform poorly due to limited daily samples in time. In this work, we develop an integrated spatiotemporal model based on the epidemic differential equations (SIR) and RNN. The former after simplification and discretization is a compact model of temporal infection trend of a region while the latter models the effect of nearest neighboring regions. The latter captures latent spatial information. %that is not publicly reported. We trained and tested our model on COVID-19 data in Italy, and show that it out-performs existing temporal models (fully connected NN, SIR, ARIMA) in 1-day, 3-day, and 1-week ahead forecasting especially in the regime of limited training data.

preprint2020arXiv

Convergence of a Relaxed Variable Splitting Method for Learning Sparse Neural Networks via $\ell_1, \ell_0$, and transformed-$\ell_1$ Penalties

Sparsification of neural networks is one of the effective complexity reduction methods to improve efficiency and generalizability. We consider the problem of learning a one hidden layer convolutional neural network with ReLU activation function via gradient descent under sparsity promoting penalties. It is known that when the input data is Gaussian distributed, no-overlap networks (without penalties) in regression problems with ground truth can be learned in polynomial time at high probability. We propose a relaxed variable splitting method integrating thresholding and gradient descent to overcome the lack of non-smoothness in the loss function. The sparsity in network weight is realized during the optimization (training) process. We prove that under $\ell_1, \ell_0$; and transformed-$\ell_1$ penalties, no-overlap networks can be learned with high probability, and the iterative weights converge to a global limit which is a transformation of the true weight under a novel thresholding operation. Numerical experiments confirm theoretical findings, and compare the accuracy and sparsity trade-off among the penalties.

preprint2020arXiv

Convergence of stochastic structure-preserving schemes for computing effective diffusivity in random flows

In this paper, we propose stochastic structure-preserving schemes to compute the effective diffusivity for particles moving in random flows. We first introduce the motion of particles using the Lagrangian formulation, which is modeled by stochastic differential equations (SDEs). We also discuss the definition of the corrector problem and effective diffusivity. Then we propose stochastic structure-preserving schemes to solve the SDEs and provide a sharp convergence analysis for the numerical schemes in computing effective diffusivity. The convergence analysis follows a probabilistic approach, which interprets the solution process generated by our numerical schemes as a Markov process. By using the central limit theorem for the solution process, we obtain the convergence analysis of our method in computing long time solutions. Most importantly our convergence analysis reveals the connection of discrete-type and continuous-type corrector problems, which is fundamental and interesting. We present numerical results to demonstrate the accuracy and efficiency of the proposed method and investigate the convection-enhanced diffusion phenomenon in two- and three-dimensional incompressible random flows.

preprint2020arXiv

Enhanced Diffusivity in Perturbed Senile Reinforced Random Walk Models

We consider diffusivity of random walks with transition probabilities depending on the number of consecutive traversals of the last traversed edge, the so called senile reinforced random walk (SeRW). In one dimension, the walk is known to be sub-diffusive with identity reinforcement function. We perturb the model by introducing a small probability $δ$ of escaping the last traversed edge at each step. The perturbed SeRW model is diffusive for any $δ>0 $, with enhanced diffusivity ($\gg O(δ^2)$) in the small $δ$ regime. We further study stochastically perturbed SeRW models by having the last edge escape probability of the form $δ\, ξ_n$ with $ξ_n$'s being independent random variables. Enhanced diffusivity in such models are logarithmically close to the so called residual diffusivity (positive in the zero $δ$ limit), with diffusivity between $O\left(\frac{1}{|\logδ|}\right)$ and $O\left(\frac{1}{\log|\logδ|}\right)$. Finally, we generalize our results to higher dimensions where the unperturbed model is already diffusive. The enhanced diffusivity can be as much as $O(\log^{-2}δ)$.

preprint2020arXiv

Lorentzian Peak Sharpening and Sparse Blind Source Separation for NMR Spectroscopy

In this paper, we introduce a preprocessing technique for blind source separation (BSS) of nonnegative and overlapped data. For Nuclear Magnetic Resonance spectroscopy (NMR), the classical method of Naanaa and Nuzillard (NN) requires the condition that source signals to be non-overlapping at certain locations while they are allowed to overlap with each other elsewhere. NN's method works well with data signals that possess stand alone peaks (SAP). The SAP does not hold completely for realistic NMR spectra however. Violation of SAP often introduces errors or artifacts in the NN's separation results. To address this issue, a preprocessing technique is developed here based on Lorentzian peak shapes and weighted peak sharpening. The idea is to superimpose the original peak signal with its weighted negative second order derivative. The resulting sharpened (narrower and taller) peaks enable NN's method to work with a more relaxed SAP condition, the so called dominant peaks condition (DPS), and deliver improved results. To achieve an optimal sharpening while preserving the data nonnegativity, we prove the existence of an upper bound of the weight parameter and propose a selection criterion. Numerical experiments on NMR spectroscopy data show satisfactory performance of our proposed method.

preprint2020arXiv

Two-Grid based Adaptive Proper Orthogonal Decomposition Algorithm for Time Dependent Partial Differential Equations

In this article, we propose a two-grid based adaptive proper orthogonal decomposition (POD) method to solve the time dependent partial differential equations. Based on the error obtained in the coarse grid, we propose an error indicator for the numerical solution obtained in the fine grid. Our new algorithm is cheap and easy to be implement. We apply our new method to the solution of time-dependent advection-diffusion equations with the Kolmogorov flow and the ABC flow. The numerical results show that our method is more efficient than the existing POD methods.