Source author record

Zhi-Ming Ma

Zhi-Ming Ma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.PR Distributed, Parallel, and Cluster Computing math.AP math.CV math.OC math.ST physics.comp-ph physics.flu-dyn Statistics Theory

Catalog footprint

What is connected

12works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Deep Random Vortex Method for Simulation and Inference of Navier-Stokes Equations

Navier-Stokes equations are significant partial differential equations that describe the motion of fluids such as liquids and air. Due to the importance of Navier-Stokes equations, the development on efficient numerical schemes is important for both science and engineer. Recently, with the development of AI techniques, several approaches have been designed to integrate deep neural networks in simulating and inferring the fluid dynamics governed by incompressible Navier-Stokes equations, which can accelerate the simulation or inferring process in a mesh-free and differentiable way. In this paper, we point out that the capability of existing deep Navier-Stokes informed methods is limited to handle non-smooth or fractional equations, which are two critical situations in reality. To this end, we propose the \emph{Deep Random Vortex Method} (DRVM), which combines the neural network with a random vortex dynamics system equivalent to the Navier-Stokes equation. Specifically, the random vortex dynamics motivates a Monte Carlo based loss function for training the neural network, which avoids the calculation of derivatives through auto-differentiation. Therefore, DRVM not only can efficiently solve Navier-Stokes equations involving rough path, non-differentiable initial conditions and fractional operators, but also inherits the mesh-free and differentiable benefits of the deep-learning-based solver. We conduct experiments on the Cauchy problem, parametric solver learning, and the inverse problem of both 2-d and 3-d incompressible Navier-Stokes equations. The proposed method achieves accurate results for simulation and inference of Navier-Stokes equations. Especially for the cases that include singular initial conditions, DRVM significantly outperforms existing PINN method.

preprint2022arXiv

Does Momentum Change the Implicit Regularization on Separable Data?

The momentum acceleration technique is widely adopted in many optimization algorithms. However, there is no theoretical answer on how the momentum affects the generalization performance of the optimization algorithms. This paper studies this problem by analyzing the implicit regularization of momentum-based optimization. We prove that on the linear classification problem with separable data and exponential-tailed loss, gradient descent with momentum (GDM) converges to the L2 max-margin solution, which is the same as vanilla gradient descent. That means gradient descent with momentum acceleration still converges to a low-complexity model, which guarantees their generalization. We then analyze the stochastic and adaptive variants of GDM (i.e., SGDM and deterministic Adam) and show they also converge to the L2 max-margin solution. Technically, to overcome the difficulty of the error accumulation in analyzing the momentum, we construct new potential functions to analyze the gap between the model parameter and the max-margin solution. Numerical experiments are conducted and support our theoretical results.

preprint2022arXiv

Neural Operator with Regularity Structure for Modeling Dynamics Driven by SPDEs

Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics. Neural Operators, generations of neural networks with capability of learning maps between infinite-dimensional spaces, are strong tools for solving parametric PDEs. However, they lack the ability to modeling SPDEs which usually have poor regularity due to the driving noise. As the theory of regularity structure has achieved great successes in analyzing SPDEs and provides the concept model feature vectors that well-approximate SPDEs' solutions, we propose the Neural Operator with Regularity Structure (NORS) which incorporates the feature vectors for modeling dynamics driven by SPDEs. We conduct experiments on various of SPDEs including the dynamic Phi41 model and the 2d stochastic Navier-Stokes equation, and the results demonstrate that the NORS is resolution-invariant, efficient, and achieves one order of magnitude lower error with a modest amount of data.

preprint2021arXiv

BN-invariant sharpness regularizes the training model to better generalization

It is arguably believed that flatter minima can generalize better. However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a $δ$ ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e.g., networks with batch normalization layer. In this paper, we first propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN. It achieves the property of scale invariance by connecting the integral diameter with the scale of parameter. Then we present a computation-efficient way to calculate the BN-sharpness approximately i.e., one dimensional integral along the "sharpest" direction. Furthermore, we use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective. Our algorithm achieves considerably better performance than vanilla SGD over various experiment settings.

preprint2020arXiv

Asynchronous Stochastic Gradient Descent with Delay Compensation

With the fast development of deep learning, it has become common to learn big neural networks using massive training data. Asynchronous Stochastic Gradient Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which is, however, known to suffer from the problem of delayed gradients. That is, when a local worker adds its gradient to the global model, the global model may have been updated by other workers and this gradient becomes "delayed". We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD. This is achieved by leveraging Taylor expansion of the gradient function and efficient approximation to the Hessian matrix of the loss function. We call the new algorithm Delay Compensated ASGD (DC-ASGD). We evaluated the proposed algorithm on CIFAR-10 and ImageNet datasets, and the experimental results demonstrate that DC-ASGD outperforms both synchronous SGD and asynchronous SGD, and nearly approaches the performance of sequential SGD.

preprint2016arXiv

A Communication-Efficient Parallel Algorithm for Decision Tree

Decision tree (and its extensions such as Gradient Boosting Decision Trees and Random Forest) is a widely used machine learning algorithm, due to its practical effectiveness and model interpretability. With the emergence of big data, there is an increasing need to parallelize the training process of decision tree. However, most existing attempts along this line suffer from high communication costs. In this paper, we propose a new algorithm, called \emph{Parallel Voting Decision Tree (PV-Tree)}, to tackle this challenge. After partitioning the training data onto a number of (e.g., $M$) machines, this algorithm performs both local voting and global voting in each iteration. For local voting, the top-$k$ attributes are selected from each machine according to its local data. Then, globally top-$2k$ attributes are determined by a majority voting among these local candidates. Finally, the full-grained histograms of the globally top-$2k$ attributes are collected from local machines in order to identify the best (most informative) attribute and its split point. PV-Tree can achieve a very low communication cost (independent of the total number of attributes) and thus can scale out very well. Furthermore, theoretical analysis shows that this algorithm can learn a near optimal decision tree, since it can find the best attribute with a large probability. Our experiments on real-world datasets show that PV-Tree significantly outperforms the existing parallel decision tree algorithms in the trade-off between accuracy and efficiency.

preprint2016arXiv

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction

Regularized empirical risk minimization (R-ERM) is an important branch of machine learning, since it constrains the capacity of the hypothesis space and guarantees the generalization ability of the learning algorithm. Two classic proximal optimization algorithms, i.e., proximal stochastic gradient descent (ProxSGD) and proximal stochastic coordinate descent (ProxSCD) have been widely used to solve the R-ERM problem. Recently, variance reduction technique was proposed to improve ProxSGD and ProxSCD, and the corresponding ProxSVRG and ProxSVRCD have better convergence rate. These proximal algorithms with variance reduction technique have also achieved great success in applications at small and moderate scales. However, in order to solve large-scale R-ERM problems and make more practical impacts, the parallel version of these algorithms are sorely needed. In this paper, we propose asynchronous ProxSVRG (Async-ProxSVRG) and asynchronous ProxSVRCD (Async-ProxSVRCD) algorithms, and prove that Async-ProxSVRG can achieve near linear speedup when the training data is sparse, while Async-ProxSVRCD can achieve near linear speedup regardless of the sparse condition, as long as the number of block partitions are appropriately set. We have conducted experiments on a regularized logistic regression task. The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.

preprint2016arXiv

Generalization Error Bounds for Optimization Algorithms via Stability

Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG). Conventional analysis on these optimization algorithms focuses on their convergence rates during the training process, however, people in the machine learning community may care more about the generalization performance of the learned model on unseen test data. In this paper, we investigate on this issue, by using stability as a tool. In particular, we decompose the generalization error for R-ERM, and derive its upper bound for both convex and non-convex cases. In convex cases, we prove that the generalization error can be bounded by the convergence rate of the optimization algorithm and the stability of the R-ERM process, both in expectation (in the order of $\mathcal{O}((1/n)+\mathbb{E}ρ(T))$, where $ρ(T)$ is the convergence error and $T$ is the number of iterations) and in high probability (in the order of $\mathcal{O}\left(\frac{\log{1/δ}}{\sqrt{n}}+ρ(T)\right)$ with probability $1-δ$). For non-convex cases, we can also obtain a similar expected generalization error bound. Our theorems indicate that 1) along with the training process, the generalization error will decrease for all the optimization algorithms under our investigation; 2) Comparatively speaking, SVRG has better generalization ability than GD and SGD. We have conducted experiments on both convex and non-convex problems, and the experimental results verify our theoretical findings.

preprint2014arXiv

Fukushima type decomposition for semi-Dirichlet forms

We present a Fukushima type decomposition in the setting of general quasi-regular semi-Dirichlet forms. The decomposition is then employed to give a transformation formula for martingale additive functionals. Applications of the results to some concrete examples of semi-Dirichlet forms are given at the end of the paper. We discuss also the uniqueness question about Doob-Meyer decomposition on optional sets of interval type.

preprint2014arXiv

Markov jump processes in modeling coalescent with recombination

Genetic recombination is one of the most important mechanisms that can generate and maintain diversity, and recombination information plays an important role in population genetic studies. However, the phenomenon of recombination is extremely complex, and hence simulation methods are indispensable in the statistical inference of recombination. So far there are mainly two classes of simulation models practically in wide use: back-in-time models and spatially moving models. However, the statistical properties shared by the two classes of simulation models have not yet been theoretically studied. Based on our joint research with CAS-MPG Partner Institute for Computational Biology and with Beijing Jiaotong University, in this paper we provide for the first time a rigorous argument that the statistical properties of the two classes of simulation models are identical. That is, they share the same probability distribution on the space of ancestral recombination graphs (ARGs). As a consequence, our study provides a unified interpretation for the algorithms of simulating coalescent with recombination, and will facilitate the study of statistical inference on recombination.

preprint2011arXiv

Fukushima's decomposition for diffusions associated with semi-Dirichlet forms

Diffusion processes associated with semi-Dirichlet forms are studied in the paper. The main results are Fukushima's decomposition for the diffusions and a transformation formula for the corresponding martingale part of the decomposition. The results are applied to some concrete examples.

preprint1992arXiv

A general correspondence between Dirichlet forms and right processes

The theory of Dirichlet forms as originated by Beurling-Deny and developed particularly by Fukushima and Silverstein, is a natural functional analytic extension of classical (and axiomatic) potential theory. Although some parts of it have abstract measure theoretic versions, the basic general construction of a Hunt process properly associated with the form, obtained by Fukushima and Silverstein, requires the form to be defined on a locally compact separable space with a Radon measure $m$ and the form to be regular (in the sense of the continuous functions of compact support being dense in the domain of the form, both in the supremum norm and in the natural norm given by the form and the $L^2(m)$-space). This setting excludes infinite dimensional situations. In this letter we announce that there exists an extension of Fukushima-Silverstein's construction of the associated process to the case where the space is only supposed to be metrizable and the form is not required to be regular.

Zhi-Ming Ma

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Deep Random Vortex Method for Simulation and Inference of Navier-Stokes Equations

Does Momentum Change the Implicit Regularization on Separable Data?

Neural Operator with Regularity Structure for Modeling Dynamics Driven by SPDEs

BN-invariant sharpness regularizes the training model to better generalization

Asynchronous Stochastic Gradient Descent with Delay Compensation

A Communication-Efficient Parallel Algorithm for Decision Tree

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction

Generalization Error Bounds for Optimization Algorithms via Stability

Fukushima type decomposition for semi-Dirichlet forms

Markov jump processes in modeling coalescent with recombination

Fukushima's decomposition for diffusions associated with semi-Dirichlet forms

A general correspondence between Dirichlet forms and right processes