Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
30works
0followers
21topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

30 published item(s)

preprint2026arXiv

A POD-DeepONet Framework for Forward and Inverse Design of 2D Photonic Crystals

We develop a reduced-order operator-learning framework for forward and inverse band-structure design of two-dimensional photonic crystals with binary, pixel-based $p4m$-symmetric unit cells. We construct a POD--DeepONet surrogate for the discrete band map along the standard high-symmetry path by coupling a POD trunk extracted from high-fidelity finite-element band snapshots with a neural branch network that predicts reduced coefficients. This architecture yields a compact and differentiable forward model that is tailored to the underlying Bloch eigenvalue discretization. We establish continuity of the discrete band map on the relaxed design space and prove a uniform approximation property of the POD--DeepONet surrogate, leading to a natural decomposition of the total surrogate error into POD truncation and network approximation contributions. Building on this forward surrogate, we formulate two end-to-end neural inverse design procedures, namely dispersion-to-structure and band-gap inverse design, with training objectives that combine data misfit, binarity promotion, and supervised regularization to address the intrinsic non-uniqueness of the inverse mapping and to enable stable gradient-based optimization in the relaxed space. Our numerical results show that the proposed framework achieves accurate forward predictions and produces effective inverse designs on practical high-contrast, pixel-based photonic layouts.

preprint2026arXiv

AdamFLIP: Adaptive Momentum Feedback Linearization Optimization for Hard Constrained PINN Training

Physics-informed neural networks (PINNs) provide a flexible framework for solving forward and inverse problems governed by partial differential equations (PDEs), but standard PINN training typically relies on soft penalty formulations that combine PDE residuals, data mismatch, and initial/boundary conditions using manually chosen weights. This often leads to ill-conditioning, sensitivity to loss weights, and poor constraint satisfaction. In this work, we reformulate PINN training as an equality-constrained optimization problem and propose a novel Adaptive Momentum Feedback Linearization Optimization for Hard Constrained PINN (AdamFLIP). The key idea is to view the constraint residuals as the output of a controlled dynamical system and to compute the Lagrange multiplier as a feedback input that locally drives these residuals toward stable linear contraction dynamics. AdamFLIP then applies Adam-style first- and second-moment adaptation to the resulting feedback-linearized Lagrangian gradient, combining principled constraint handling with the scalability and robustness of adaptive neural-network optimization. We test AdamFLIP on a range of benchmark forward and inverse PDE problem, and it consistently outperforms both the standard soft-constrained PINN and state-of-the-art constrained optimizers. Specifically, on the Navier--Stokes equations benchmark, AdamFLIP \textbf{reduces relative $L_2$ error by more than two thirds} for the predicted solution compared to the next best method. Our AdamFLIP framework provides an effective and computationally scalable hard constraint optimization method for PINN training.

preprint2026arXiv

Conformalized Quantum DeepONet Ensembles for Scalable Operator Learning with Distribution-Free Uncertainty

Operator learning enables fast surrogate modeling of high-dimensional dynamical systems, but existing approaches face two fundamental limitations: quadratic inference complexity and unreliable uncertainty quantification in safety-critical settings. We propose Conformalized Quantum DeepONet Ensembles, a framework that addresses both challenges simultaneously. By leveraging Quantum Orthogonal Neural Networks (QOrthoNNs), we reduce operator inference complexity from O(n^2) to O(n), enabling scalable evaluation over fine discretizations. To provide rigorous uncertainty quantification, we combine ensemble-based epistemic modeling with adaptive conformal prediction, yielding distribution-free coverage guarantees. A key challenge in ensembling is that naive parallelism scales hardware resources linearly with the number of models. We resolve this by using Superposed Parameterized Quantum Circuits (SPQCs), which compress multiple ensemble members into a single circuit and enable simultaneous multi-model execution. Experiments on synthetic partial differential equations and real-world power system dynamics demonstrate that our approach achieves accurate predictions while maintaining calibrated uncertainty under realistic quantum noise. These results establish a practical pathway toward scalable, uncertainty-aware operator learning in quantum machine learning.

preprint2026arXiv

fPINN-DeepONet: A Physics-Informed Operator Learning Framework for Multi-term Time-fractional Mixed Diffusion-wave Equations

In this paper, we develop a physics-informed deep operator learning framework for solving multi-term time-fractional mixed diffusion-wave equations (TFMDWEs). We begin by deriving an $L_2$ approximation, which achieves first-order accuracy for the Caputo fractional derivative of order $β\in (1,2)$. Building upon this foundation, we propose the fPINN-DeepONet framework, a novel approach that integrates operator learning with the $L_2$ approximation to efficiently solve fractional partial differential equations (FPDEs). Our framework is successfully applied to both fixed and variable fractional-order PDEs, demonstrating the framework's versatility and broad applicability. To evaluate the performance of the proposed model, we conduct a series of numerical experiments that involve dynamically varying fractional orders in both space and time, as well as scenarios with noisy data. These results highlight the accuracy, robustness, and efficiency of the fPINN-DeepONet framework.

preprint2026arXiv

Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning

A central challenge in continual learning for large language models (LLMs) is catastrophic forgetting, where adapting to new tasks can substantially degrade performance on previously learned ones. Existing projection-based methods mitigate such interference by restricting parameter updates to subspaces that are orthogonal to directions associated with past tasks. However, these methods are typically formulated under Euclidean parameter geometry, with update magnitudes and projections governed by the Frobenius norm. The recent empirical success of the Muon optimizer, which applies orthogonalized matrix updates and admits a spectral-norm interpretation, suggests that Frobenius geometry may not be the most effective choice for matrix-valued LLM parameters. Motivated by this observation, we propose Muon-OGD, a spectral-norm-aware continual learning framework that integrates Muon-style operator-norm geometry with orthogonal projection constraints. Our method formulates each update as a spectral-norm-constrained optimization problem with linear non-interference constraints, and solves it efficiently through dual iterations and Newton--Schulz matrix-sign approximations. By applying orthogonalized momentum updates that avoid protected directions associated with prior tasks, Muon-OGD aims to improve the stability--plasticity trade-off in sequential LLM adaptation. We evaluate the proposed method on standard continual learning benchmarks, TRACE, and domain-specific Coding--Math--Medical curricula using both encoder--decoder and decoder-only architectures. Empirically, Muon-OGD consistently improves over sequential fine-tuning and competitive orthogonal-gradient baselines, while remaining computationally scalable. These results suggest that spectral-norm-aware update geometry provides a practical and effective alternative to Frobenius-norm projection for continual learning in LLMs.

preprint2026arXiv

Noise estimation of SDE from a single data trajectory

In this paper, we propose a data-driven framework for model discovery of stochastic differential equations (SDEs) from a single trajectory, without requiring the ergodicity or stationary assumption on the underlying continuous process. By combining (stochastic) Taylor expansions with Girsanov transformations, and using the drift function's initial value as input, we construct drift estimators while simultaneously recovering the model noise. This allows us to recover the underlying $\mathbb P$ Brownian motion increments. Building on these estimators, we introduce the first stochastic Sparse Identification of Stochastic Differential Equation (SSISDE) algorithm, capable of identifying the governing SDE dynamics from a single observed trajectory without requiring ergodicity or stationarity. To validate the proposed approach, we conduct numerical experiments with both linear and quadratic drift-diffusion functions. Among these, the Black-Scholes SDE is included as a representative case of a system that does not satisfy ergodicity or stationarity.

preprint2026arXiv

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

Preference learning methods such as Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy and length bias in today's language models and potentially severe goal misgeneralization in future systems. In this work, we provide a unified theoretical analysis of this phenomenon, characterizing the mechanisms of spurious learning, its consequences on deployment, and a provable mitigation strategy. Focusing on log-linear policies, we show that standard preference-learning objectives induce reliance on spurious features at the population level through two channels: mean spurious bias and causal--spurious correlation leakage. We then show that this reliance creates an irreducible vulnerability to distribution shift: more data from the same training distribution fails to reduce the model's dependence on spurious features. To address this, we propose tie training, a data augmentation strategy using ties (equal-utility preference pairs) to introduce data-driven regularization. We demonstrate that this approach selectively reduces spurious learning without degrading causal learning. Finally, we validate our theory on log-linear models and provide empirical evidence that both the spurious learning mechanisms and the benefits of tie training persist for neural networks and large language models.

preprint2026arXiv

Task-tailored Pre-processing: Fair Downstream Supervised Learning

Fairness-aware machine learning has recently attracted various communities to mitigate discrimination against certain societal groups in data-driven tasks. For fair supervised learning, particularly in pre-processing, there have been two main categories: data fairness and task-tailored fairness. The former directly finds an intermediate distribution among the groups, independent of the type of the downstream model, so a learned downstream classification/regression model returns similar predictive scores to individuals inputting the same covariates irrespective of their sensitive attributes. The latter explicitly takes the supervised learning task into account when constructing the pre-processing map. In this work, we study algorithmic fairness for supervised learning and argue that the data fairness approaches impose overly strong regularization from the perspective of the HGR correlation. This motivates us to devise a novel pre-processing approach tailored to supervised learning. We account for the trade-off between fairness and utility in obtaining the pre-processing map. Then we study the behavior of arbitrary downstream supervised models learned on the transformed data to find sufficient conditions to guarantee their fairness improvement and utility preservation. To our knowledge, no prior work in the branch of task-tailored methods has theoretically investigated downstream guarantees when using pre-processed data. We further evaluate our framework through comparison studies based on tabular and image data sets, showing the superiority of our framework which preserves consistent trade-offs among multiple downstream models compared to recent competing models. Particularly for computer vision data, we see our method alters only necessary semantic features related to the central machine learning task to achieve fairness.

preprint2023arXiv

Fast Replica Exchange Stochastic Gradient Langevin Dynamics

Application of the replica exchange (i.e., parallel tempering) technique to Langevin Monte Carlo algorithms, especially stochastic gradient Langevin dynamics (SGLD), has scored great success in non-convex learning problems, but one potential limitation is the computational cost caused by running multiple chains. Upon observing that a large variance of the gradient estimator in SGLD essentially increases the temperature of the stationary distribution, we propose expediting tempering schemes for SGLD by directly estimating the bias caused by the stochastic gradient estimator. This simple idea enables us to simulate high-temperature chains at a negligible computational cost (compared to that of the low-temperature chain) while preserving the convergence to the target distribution. Our method is fundamentally different from the recently proposed m-reSGLD (multi-variance replica exchange SGLD) method in that the latter suffers from the low accuracy of the gradient estimator (e.g., the chain can fail to converge to the target) while our method benefits from it. Further, we derive a swapping rate that can be easily evaluated, providing another significant improvement over m-reSGLD. To theoretically demonstrate the advantage of our method, we develop convergence bounds in Wasserstein distances. Numerical examples for Gaussian mixture and inverse PDE models are also provided, which show that our method can converge quicker than the vanilla multi-variance replica exchange method.

preprint2022arXiv

2-d signature of images and texture classification

We introduce a proper notion of 2-dimensional signature for images. This object is inspired by the so-called rough paths theory, and it captures many essential features of a 2-dimensional object such as an image. It thus serves as a low-dimensional feature for pattern classification. Here we implement a simple procedure for texture classification. In this context, we show that a low dimensional set of features based on signatures produces an excellent accuracy.

preprint2022arXiv

A consistent and conservative Phase-Field method for multiphase incompressible flows

A consistent and conservative Phase-Field method, including both the model and scheme, is developed for multiphase flows with an arbitrary number of immiscible and incompressible fluid phases. The consistency of mass conservation and the consistency of mass and momentum transport are implemented to address the issue of physically coupling the Phase-Field equation, which locates different phases, to the hydrodynamics. These two consistency conditions provide the ``optimal'' coupling because (i) the new momentum equation resulting from them is Galilean invariant and implies the kinetic energy conservation, regardless of the details of the Phase-Field equation, and (ii) failures of satisfying the second law of thermodynamics or the consistency of reduction of the multiphase flow model only result from the same failures of the Phase-Field equation but are not due to the new momentum equation. Physical interpretation of the consistency conditions and their formulations are first provided, and general formulations that are obtained from the consistency conditions and independent of the interpretation of the velocity are summarized. Several novel techniques are developed to inherit the physical properties of the multiphase flows after discretization, including the gradient-based phase selection procedure, the momentum conservative method for the surface force, and the general theorems to preserve the consistency conditions on the discrete level. Equipped with those novel techniques, a consistent and conservative scheme for the present multiphase flow model is developed and analyzed. Numerical applications demonstrate that the present model and scheme are robust and effective in studying complicated multiphase dynamics, especially for those with large-density ratios.

preprint2022arXiv

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

We propose an adaptively weighted stochastic gradient Langevin dynamics algorithm (SGLD), so-called contour stochastic gradient Langevin dynamics (CSGLD), for Bayesian learning in big data statistics. The proposed algorithm is essentially a \emph{scalable dynamic importance sampler}, which automatically \emph{flattens} the target distribution such that the simulation for a multi-modal distribution can be greatly facilitated. Theoretically, we prove a stability condition and establish the asymptotic convergence of the self-adapting parameter to a {\it unique fixed-point}, regardless of the non-convexity of the original energy function; we also present an error analysis for the weighted averaging estimators. Empirically, the CSGLD algorithm is tested on multiple benchmark datasets including CIFAR10 and CIFAR100. The numerical results indicate its superiority to avoid the local trap problem in training deep neural networks.

preprint2022arXiv

DeepONet-Grid-UQ: A Trustworthy Deep Operator Framework for Predicting the Power Grid's Post-Fault Trajectories

This paper proposes a new data-driven method for the reliable prediction of power system post-fault trajectories. The proposed method is based on the fundamentally new concept of Deep Operator Networks (DeepONets). Compared to traditional neural networks that learn to approximate functions, DeepONets are designed to approximate nonlinear operators. Under this operator framework, we design a DeepONet to (1) take as inputs the fault-on trajectories collected, for example, via simulation or phasor measurement units, and (2) provide as outputs the predicted post-fault trajectories. In addition, we endow our method with a much-needed ability to balance efficiency with reliable/trustworthy predictions via uncertainty quantification. To this end, we propose and compare two methods that enable quantifying the predictive uncertainty. First, we propose a \textit{Bayesian DeepONet} (B-DeepONet) that uses stochastic gradient Hamiltonian Monte-Carlo to sample from the posterior distribution of the DeepONet parameters. Then, we propose a \textit{Probabilistic DeepONet} (Prob-DeepONet) that uses a probabilistic training strategy to equip DeepONets with a form of automated uncertainty quantification, at virtually no extra computational cost. Finally, we validate the predictive power and uncertainty quantification capability of the proposed B-DeepONet and Prob-DeepONet using the IEEE 16-machine 68-bus system.

preprint2022arXiv

Efficient Chemical Space Exploration Using Active Learning Based on Marginalized Graph Kernel: an Application for Predicting the Thermodynamic Properties of Alkanes with Molecular Simulation

We introduce an explorative active learning (AL) algorithm based on Gaussian process regression and marginalized graph kernel (GPR-MGK) to explore chemical space with minimum cost. Using high-throughput molecular dynamics simulation to generate data and graph neural network (GNN) to predict, we constructed an active learning molecular simulation framework for thermodynamic property prediction. In specific, targeting 251,728 alkane molecules consisting of 4 to 19 carbon atoms and their liquid physical properties: densities, heat capacities, and vaporization enthalpies, we use the AL algorithm to select the most informative molecules to represent the chemical space. Validation of computational and experimental test sets shows that only 313 (0.124\% of the total) molecules were sufficient to train an accurate GNN model with $\rm R^2 > 0.99$ for computational test sets and $\rm R^2 > 0.94$ for experimental test sets. We highlight two advantages of the presented AL algorithm: compatibility with high-throughput data generation and reliable uncertainty quantification.

preprint2022arXiv

Federated Online Sparse Decision Making

This paper presents a novel federated linear contextual bandits model, where individual clients face different K-armed stochastic bandits with high-dimensional decision context and coupled through common global parameters. By leveraging the sparsity structure of the linear reward , a collaborative algorithm named \texttt{Fedego Lasso} is proposed to cope with the heterogeneity across clients without exchanging local decision context vectors or raw reward data. \texttt{Fedego Lasso} relies on a novel multi-client teamwork-selfish bandit policy design, and achieves near-optimal regrets for shared parameter cases with logarithmic communication costs. In addition, a new conceptual tool called federated-egocentric policies is introduced to delineate exploration-exploitation trade-off. Experiments demonstrate the effectiveness of the proposed algorithms on both synthetic and real-world datasets.

preprint2022arXiv

Flow-driven spectral chaos (FSC) method for long-time integration of second-order stochastic dynamical systems

For decades, uncertainty quantification techniques based on the spectral approach have been demonstrated to be computationally more efficient than the Monte Carlo method for a wide variety of problems, particularly when the dimensionality of the probability space is relatively low. The time-dependent generalized polynomial chaos (TD-gPC) is one such technique that uses an evolving orthogonal basis to better represent the stochastic part of the solution space in time. In this paper, we present a new numerical method that uses the concept of 'enriched stochastic flow maps' to track the evolution of the stochastic part of the solution space in time. The computational cost of this proposed flow-driven stochastic chaos (FSC) method is an order of magnitude lower than TD-gPC for comparable solution accuracy. This gain in computational cost is realized because, unlike most existing methods, the number of basis vectors required to track the stochastic part of the solution space, and consequently the computational cost associated with the solution of the resulting system of equations, does not depend upon the dimensionality of the probability space. Four representative numerical examples are presented to demonstrate the performance of the FSC method for long-time integration of second-order stochastic dynamical systems in the context of stochastic dynamics of structures.

preprint2022arXiv

Flow-driven spectral chaos (FSC) method for simulating long-time dynamics of arbitrary-order non-linear stochastic dynamical systems

Uncertainty quantification techniques such as the time-dependent generalized polynomial chaos (TD-gPC) use an adaptive orthogonal basis to better represent the stochastic part of the solution space (aka random function space) in time. However, because the random function space is constructed using tensor products, TD-gPC-based methods are known to suffer from the curse of dimensionality. In this paper, we introduce a new numerical method called the 'flow-driven spectral chaos' (FSC) which overcomes this curse of dimensionality at the random-function-space level. The proposed method is not only computationally more efficient than existing TD-gPC-based methods but is also far more accurate. The FSC method uses the concept of 'enriched stochastic flow maps' to track the evolution of a finite-dimensional random function space efficiently in time. To transfer the probability information from one random function space to another, two approaches are developed and studied herein. In the first approach, the probability information is transferred in the mean-square sense, whereas in the second approach the transfer is done exactly using a new theorem that was developed for this purpose. The FSC method can quantify uncertainties with high fidelity, especially for the long-time response of stochastic dynamical systems governed by ODEs of arbitrary order. Six representative numerical examples, including a nonlinear problem (the Van-der-Pol oscillator), are presented to demonstrate the performance of the FSC method and corroborate the claims of its superior numerical properties. Finally, a parametric, high-dimensional stochastic problem is used to demonstrate that when the FSC method is used in conjunction with Monte Carlo integration, the curse of dimensionality can be overcome altogether.

preprint2022arXiv

glassoformer: a query-sparse transformer for post-fault power grid voltage prediction

We propose GLassoformer, a novel and efficient transformer architecture leveraging group Lasso regularization to reduce the number of queries of the standard self-attention mechanism. Due to the sparsified queries, GLassoformer is more computationally efficient than the standard transformers. On the power grid post-fault voltage prediction task, GLassoformer shows remarkably better prediction than many existing benchmark algorithms in terms of accuracy and stability.

preprint2022arXiv

Multi-element flow-driven spectral chaos (ME-FSC) method for uncertainty quantification of dynamical systems

The flow-driven spectral chaos (FSC) is a recently developed method for tracking and quantifying uncertainties in the long-time response of stochastic dynamical systems using the spectral approach. The method uses a novel concept called 'enriched stochastic flow maps' as a means to construct an evolving finite-dimensional random function space that is both accurate and computationally efficient in time. In this paper, we present a multi-element version of the FSC method (the ME-FSC method for short) to tackle (mainly) those dynamical systems that are inherently discontinuous over the probability space. In ME-FSC, the random domain is partitioned into several elements, and then the problem is solved separately on each random element using the FSC method. Subsequently, results are aggregated to compute the probability moments of interest using the law of total probability. To demonstrate the effectiveness of the ME-FSC method in dealing with discontinuities and long-time integration of stochastic dynamical systems, four representative numerical examples are presented in this paper, including the Van-der-Pol oscillator problem and the Kraichnan-Orszag three-mode problem. Results show that the ME-FSC method is capable of solving problems that have strong nonlinear dependencies over the probability space, both reliably and at low computational cost.

preprint2022arXiv

MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncertainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems

A new data-driven method for operator learning of stochastic differential equations(SDE) is proposed in this paper. The central goal is to solve forward and inverse stochastic problems more effectively using limited data. Deep operator network(DeepONet) has been proposed recently for operator learning. Compared to other neural networks to learn functions, it aims at the problem of learning nonlinear operators. However, it can be challenging by using the original model to learn nonlinear operators for high-dimensional stochastic problems. We propose a new multi-resolution autoencoder DeepONet model referred to as MultiAuto-DeepONet to deal with this difficulty with the aid of convolutional autoencoder. The encoder part of the network is designed to reduce the dimensionality as well as discover the hidden features of high-dimensional stochastic inputs. The decoder is designed to have a special structure, i.e. in the form of DeepONet. The first DeepONet in decoder is designed to reconstruct the input function involving randomness while the second one is used to approximate the solution of desired equations. Those two DeepONets has a common branch net and two independent trunk nets. This architecture enables us to deal with multi-resolution inputs naturally. By adding $L_1$ regularization to our network, we found the outputs from the branch net and two trunk nets all have sparse structures. This reduces the number of trainable parameters in the neural network thus making the model more efficient. Finally, we conduct several numerical experiments to illustrate the effectiveness of our proposed MultiAuto-DeepONet model with uncertainty quantification.

preprint2022arXiv

PAGP: A physics-assisted Gaussian process framework with active learning for forward and inverse problems of partial differential equations

In this work, a Gaussian process regression(GPR) model incorporated with given physical information in partial differential equations(PDEs) is developed: physics-assisted Gaussian processes(PAGP). The targets of this model can be divided into two types of problem: finding solutions or discovering unknown coefficients of given PDEs with initial and boundary conditions. We introduce three different models: continuous time, discrete time and hybrid models. The given physical information is integrated into Gaussian process model through our designed GP loss functions. Three types of loss function are provided in this paper based on two different approaches to train the standard GP model. The first part of the paper introduces the continuous time model which treats temporal domain the same as spatial domain. The unknown coefficients in given PDEs can be jointly learned with GP hyper-parameters by minimizing the designed loss function. In the discrete time models, we first choose a time discretization scheme to discretize the temporal domain. Then the PAGP model is applied at each time step together with the scheme to approximate PDE solutions at given test points of final time. To discover unknown coefficients in this setting, observations at two specific time are needed and a mixed mean square error function is constructed to obtain the optimal coefficients. In the last part, a novel hybrid model combining the continuous and discrete time models is presented. It merges the flexibility of continuous time model and the accuracy of the discrete time model. The performance of choosing different models with different GP loss functions is also discussed. The effectiveness of the proposed PAGP methods is illustrated in our numerical section.

preprint2022arXiv

RMFGP: Rotated Multi-fidelity Gaussian process with Dimension Reduction for High-dimensional Uncertainty Quantification

Multi-fidelity modelling arises in many situations in computational science and engineering world. It enables accurate inference even when only a small set of accurate data is available. Those data often come from a high-fidelity model, which is computationally expensive. By combining the realizations of the high-fidelity model with one or more low-fidelity models, the multi-fidelity method can make accurate predictions of quantities of interest. This paper proposes a new dimension reduction framework based on rotated multi-fidelity Gaussian process regression and a Bayesian active learning scheme when the available precise observations are insufficient. By drawing samples from the trained rotated multi-fidelity model, the so-called supervised dimension reduction problems can be solved following the idea of the sliced average variance estimation (SAVE) method combined with a Gaussian process regression dimension reduction technique. This general framework we develop can effectively solve high-dimensional problems while the data are insufficient for applying traditional dimension reduction methods. Moreover, a more accurate surrogate Gaussian process model of the original problem can be obtained based on our trained model. The effectiveness of the proposed rotated multi-fidelity Gaussian process(RMFGP) model is demonstrated in four numerical examples. The results show that our method has better performance in all cases and uncertainty propagation analysis is performed for last two cases involving stochastic partial differential equations.

preprint2021arXiv

DeepLight: Deep Lightweight Feature Interactions for Accelerating CTR Predictions in Ad Serving

Click-through rate (CTR) prediction is a crucial task in online display advertising. The embedding-based neural networks have been proposed to learn both explicit feature interactions through a shallow component and deep feature interactions using a deep neural network (DNN) component. These sophisticated models, however, slow down the prediction inference by at least hundreds of times. To address the issue of significantly increased serving delay and high memory usage for ad serving in production, this paper presents \emph{DeepLight}: a framework to accelerate the CTR predictions in three aspects: 1) accelerate the model inference via explicitly searching informative feature interactions in the shallow component; 2) prune redundant layers and parameters at intra-layer and inter-layer level in the DNN component; 3) promote the sparsity of the embedding layer to preserve the most discriminant signals. By combining the above efforts, the proposed approach accelerates the model inference by 46X on Criteo dataset and 27X on Avazu dataset without any loss on the prediction accuracy. This paves the way for successfully deploying complicated embedding-based neural networks in production for ad serving.

preprint2021arXiv

HEI: hybrid explicit-implicit learning for multiscale problems

Splitting is a method to handle application problems by splitting physics, scales, domain, and so on. Many splitting algorithms have been designed for efficient temporal discretization. In this paper, our goal is to use temporal splitting concepts in designing machine learning algorithms and, at the same time, help splitting algorithms by incorporating data and speeding them up. Since the spitting solution usually has an explicit and implicit part, we will call our method hybrid explicit-implict (HEI) learning. We will consider a recently introduced multiscale splitting algorithms. To approximate the dynamics, only a few degrees of freedom are solved implicitly, while others explicitly. In this paper, we use this splitting concept in machine learning and propose several strategies. First, the implicit part of the solution can be learned as it is more difficult to solve, while the explicit part can be computed. This provides a speed-up and data incorporation for splitting approaches. Secondly, one can design a hybrid neural network architecture because handling explicit parts requires much fewer communications among neurons and can be done efficiently. Thirdly, one can solve the coarse grid component via PDEs or other approximation methods and construct simpler neural networks for the explicit part of the solutions. We discuss these options and implement one of them by interpreting it as a machine translation task. This interpretation successfully enables us using the Transformer since it can perform model reduction for multiple time series and learn the connection. We also find that the splitting scheme is a great platform to predict the coarse solution with insufficient information of the target model: the target problem is partially given and we need to solve it through a known problem. We conduct four numerical examples and the results show that our method is stable and accurate.

preprint2020arXiv

An Adaptive Empirical Bayesian Method for Sparse Deep Learning

We propose a novel adaptive empirical Bayesian method for sparse deep learning, where the sparsity is ensured via a class of self-adaptive spike-and-slab priors. The proposed method works by alternatively sampling from an adaptive hierarchical posterior distribution using stochastic gradient Markov Chain Monte Carlo (MCMC) and smoothly optimizing the hyperparameters using stochastic approximation (SA). We further prove the convergence of the proposed method to the asymptotically correct distribution under mild conditions. Empirical applications of the proposed method lead to the state-of-the-art performance on MNIST and Fashion MNIST with shallow convolutional neural networks and the state-of-the-art compression performance on CIFAR10 with Residual Networks. The proposed method also improves resistance to adversarial attacks.

preprint2020arXiv

Improving Simulation Efficiency of MCMC for Inverse Modeling of Hydrologic Systems with a Kalman-Inspired Proposal Distribution

Bayesian analysis is widely used in science and engineering for real-time forecasting, decision making, and to help unravel the processes that explain the observed data. These data are some deterministic and/or stochastic transformations of the underlying parameters. A key task is then to summarize the posterior distribution of these parameters. When models become too difficult to analyze analytically, Monte Carlo methods can be used to approximate the target distribution. Of these, Markov chain Monte Carlo (MCMC) methods are particularly powerful. Such methods generate a random walk through the parameter space and, under strict conditions of reversibility and ergodicity, will successively visit solutions with frequency proportional to the underlying target density. This requires a proposal distribution that generates candidate solutions starting from an arbitrary initial state. The speed of the sampled chains converging to the target distribution deteriorates rapidly, however, with increasing parameter dimensionality. In this paper, we introduce a new proposal distribution that enhances significantly the efficiency of MCMC simulation for highly parameterized models. This proposal distribution exploits the cross-covariance of model parameters, measurements and model outputs, and generates candidate states much alike the analysis step in the Kalman filter. We embed the Kalman-inspired proposal distribution in the DREAM algorithm during burn-in, and present several numerical experiments with complex, high-dimensional or multi-modal target distributions. Results demonstrate that this new proposal distribution can greatly improve simulation efficiency of MCMC. Specifically, we observe a speed-up on the order of 10-30 times for groundwater models with more than one-hundred parameters.

preprint2020arXiv

Multi-Fidelity Gaussian Process based Empirical Potential Development for Si:H Nanowires

In material modeling, the calculation speed using the empirical potentials is fast compared to the first principle calculations, but the results are not as accurate as of the first principle calculations. First principle calculations are accurate but slow and very expensive to calculate. In this work, first, the H-H binding energy and H$_2$-H$_2$ interaction energy are calculated using the first principle calculations which can be applied to the Tersoff empirical potential. Second, the H-H parameters are estimated. After fitting H-H parameters, the mechanical properties are obtained. Finally, to integrate both the low-fidelity empirical potential data and the data from the high-fidelity first-principle calculations, the multi-fidelity Gaussian process regression is employed to predict the H-H binding energy and the H$_2$-H$_2$ interaction energy. Numerical results demonstrate the accuracy of the developed empirical potentials.

preprint2020arXiv

Peri-Net-Pro: The neural processes with quantified uncertainty for crack patterns

This paper uses the peridynamic theory, which is well-suited to crack studies, to predict the crack patterns in a moving disk and classify them according to the modes and finally perform regression analysis. In that way, the crack patterns are obtained according to each mode by Molecular Dynamic (MD) simulation using the peridynamics. Image classification and regression studies are conducted through Convolutional Neural Networks (CNNs) and the neural processes. First, we increased the amount and quality of the data using peridynamics, which can theoretically compensate for the problems of the finite element method (FEM) in generating crack pattern images. Second, we did the case study for the PMB, LPS, and VES models that were obtained using the peridynamic theory. Case studies were performed to classify the images using CNNs and determine the PMB, LBS, and VES models' suitability. Finally, we performed the regression analysis for the images of the crack patterns with neural processes to predict the crack patterns. In the regression problem, by representing the results of the variance according to the epochs, it can be confirmed that the result of the variance is decreased by increasing the epoch numbers through the neural processes. The most critical point of this study is that the neural processes make an accurate prediction even if there are missing or insufficient training data.

preprint2020arXiv

RotEqNet: Rotation-Equivariant Network for Fluid Systems with Symmetric High-Order Tensors

In the recent application of scientific modeling, machine learning models are largely applied to facilitate computational simulations of fluid systems. Rotation symmetry is a general property for most symmetric fluid systems. However, in general, current machine learning methods have no theoretical way to guarantee rotational symmetry. By observing an important property of contraction and rotation operation on high-order symmetric tensors, we prove that the rotation operation is preserved via tensor contraction. Based on this theoretical justification, in this paper, we introduce Rotation-Equivariant Network (RotEqNet) to guarantee the property of rotation-equivariance for high-order tensors in fluid systems. We implement RotEqNet and evaluate our claims through four case studies on various fluid systems. The property of error reduction and rotation-equivariance is verified in these case studies. Results from the comparative study show that our method outperforms conventional methods, which rely on data augmentation.

preprint2019arXiv

Efficient Deep Learning Techniques for Multiphase Flow Simulation in Heterogeneous Porous Media

We present efficient deep learning techniques for approximating flow and transport equations for both single phase and two-phase flow problems. The proposed methods take advantages of the sparsity structures in the underlying discrete systems and can be served as efficient alternatives to the system solvers at the full order. In particular, for the flow problem, we design a network with convolutional and locally connected layers to perform model reductions. Moreover, we employ a custom loss function to impose local mass conservation constraints. This helps to preserve the physical property of velocity solution which we are interested in learning. For the saturation problem, we propose a residual type of network to approximate the dynamics. Our main contribution here is the design of custom sparsely connected layers which take into account the inherent sparse interaction between the input and output. After training, the approximated feed-forward map can be applied iteratively to predict solutions in the long range. Our trained networks, especially in two-phase flow where the maps are nonlinear, show their great potential in accurately approximating the underlying physical system and improvement in computational efficiency. Some numerical experiments are performed and discussed to demonstrate the performance of our proposed techniques.