Source author record

Guodong Zhang

Guodong Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.PR econ.TH math.OC astro-ph.EP Computer Vision gr-qc math.NA

Catalog footprint

What is connected

11works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A new solar radiation pressure model for some orbit types in the cislunar space

For satellites in the cislunar space, solar radiation pressure (SRP) is the third largest perturbation, which is only less significant than the lunisolar gravity perturbations. It is the primary factor limiting the accuracy of orbit determination for such satellites. Up to now, numerous SRP models have been proposed for artificial satellites close to the Earth, but these models have their shortcomings when applied to satellites in the cislunar space. In this study, we concentrate on various scenarios of cislunar satellites in periodic or quasi-periodic orbits. We first employ the box-wing model to simulate the SRP effects and then propose an appropriate general SRP model based on these simulations, termed Empirical NJU Cislunar Model (ENCM). Additionally, several scenario-specific sub-models suited to different mission profiles are developed. Furthermore, the proposed model is verified in the orbit determination process. Comparisons with the conventional cannonball and ECOM models demonstrate that the ENCM model yields a significant improvement in orbit determination accuracy, showing promising potential for future cislunar missions.

preprint2023arXiv

Approximate optimality and the risk/reward tradeoff in a class of bandit problems

This paper studies a sequential decision problem where payoff distributions are known and where the riskiness of payoffs matters. Equivalently, it studies sequential choice from a repeated set of independent lotteries. The decision-maker is assumed to pursue strategies that are approximately optimal for large horizons. By exploiting the tractability afforded by asymptotics, conditions are derived characterizing when specialization in one action or lottery throughout is asymptotically optimal and when optimality requires intertemporal diversification. The key is the constancy or variability of risk attitude. The main technical tool is a new central limit theorem.

preprint2022arXiv

A Central Limit Theorem, Loss Aversion and Multi-Armed Bandits

This paper studies a multi-armed bandit problem where the decision-maker is loss averse, in particular she is risk averse in the domain of gains and risk loving in the domain of losses. The focus is on large horizons. Consequences of loss aversion for asymptotic (large horizon) properties are derived in a number of analytical results. The analysis is based on a new central limit theorem for a set of measures under which conditional variances can vary in a largely unstructured history-dependent way subject only to the restriction that they lie in a fixed interval.

preprint2022arXiv

Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

Training very deep neural networks is still an extremely challenging task. The common solution is to use shortcut connections and normalization layers, which are both crucial ingredients in the popular ResNet architecture. However, there is strong evidence to suggest that ResNets behave more like ensembles of shallower networks than truly deep ones. Recently, it was shown that deep vanilla networks (i.e. networks without normalization layers or shortcut connections) can be trained as fast as ResNets by applying certain transformations to their activation functions. However, this method (called Deep Kernel Shaping) isn't fully compatible with ReLUs, and produces networks that overfit significantly more than ResNets on ImageNet. In this work, we rectify this situation by developing a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs. We show in experiments that our method, which introduces negligible extra computational cost, achieves validation accuracies with deep vanilla networks that are competitive with ResNets (of the same width/depth), and significantly higher than those obtained with the Edge of Chaos (EOC) method. And unlike with EOC, the validation accuracies we obtain do not get worse with depth.

preprint2022arXiv

Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization

Smooth minimax games often proceed by simultaneous or alternating gradient updates. Although algorithms with alternating updates are commonly used in practice, the majority of existing theoretical analyses focus on simultaneous algorithms for convenience of analysis. In this paper, we study alternating gradient descent-ascent (Alt-GDA) in minimax games and show that Alt-GDA is superior to its simultaneous counterpart~(Sim-GDA) in many settings. We prove that Alt-GDA achieves a near-optimal local convergence rate for strongly convex-strongly concave (SCSC) problems while Sim-GDA converges at a much slower rate. To our knowledge, this is the \emph{first} result of any setting showing that Alt-GDA converges faster than Sim-GDA by more than a constant. We further adapt the theory of integral quadratic constraints (IQC) and show that Alt-GDA attains the same rate \emph{globally} for a subclass of SCSC minimax problems. Empirically, we demonstrate that alternating updates speed up GAN training significantly and the use of optimism only helps for simultaneous algorithms.

preprint2022arXiv

Strategy-Driven Limit Theorems Associated Bandit Problems

Motivated by the study of asymptotic behaviour of the bandit problems, we obtain several strategy-driven limit theorems including the law of large numbers, the large deviation principle, and the central limit theorem. Different from the classical limit theorems, we develop sampling strategy-driven limit theorems that generate the maximum or minimum average reward. The law of large numbers identifies all possible limits that are achievable under various strategies. The large deviation principle provides the maximum decay probabilities for deviations from the limiting domain. To describe the fluctuations around averages, we obtain strategy-driven central limit theorems under optimal strategies. The limits in these theorem are identified explicitly, and depend heavily on the structure of the events or the integrating functions and strategies. This demonstrates the key signature of the learning structure. Our results can be used to estimate the maximal (minimal) rewards, and to identify the conditions of avoiding the Parrondo's paradox in the two-armed bandit problem. It also lays the theoretical foundation for statistical inference in determining the arm that offers the higher mean reward.

preprint2021arXiv

On the Suboptimality of Negative Momentum for Minimax Optimization

Smooth game optimization has recently attracted great interest in machine learning as it generalizes the single-objective optimization paradigm. However, game dynamics is more complex due to the interaction between different players and is therefore fundamentally different from minimization, posing new challenges for algorithm design. Notably, it has been shown that negative momentum is preferred due to its ability to reduce oscillation in game dynamics. Nevertheless, the convergence rate of negative momentum was only established in simple bilinear games. In this paper, we extend the analysis to smooth and strongly-convex strongly-concave minimax games by taking the variational inequality formulation. By connecting momentum method with Chebyshev polynomials, we show that negative momentum accelerates convergence of game dynamics locally, though with a suboptimal rate. To the best of our knowledge, this is the \emph{first work} that provides an explicit convergence rate for negative momentum in this setting.

preprint2020arXiv

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

The choice of batch-size in a stochastic optimization algorithm plays a substantial role for both optimization and generalization. Increasing the batch-size used typically improves optimization but degrades generalization. To address the problem of improving generalization while maintaining optimal convergence in large-batch training, we propose to add covariance noise to the gradients. We demonstrate that the learning performance of our method is more accurately captured by the structure of the covariance matrix of the noise rather than by the variance of gradients. Moreover, over the convex-quadratic, we prove in theory that it can be characterized by the Frobenius norm of the noise matrix. Our empirical studies with standard deep learning model-architectures and datasets shows that our method not only improves generalization performance in large-batch training, but furthermore, does so in a way where the optimization performance remains desirable and the training duration is not elongated.

preprint2020arXiv

Circular orbit of a particle and weak gravitational lensing

The purpose of this paper is twofold. First, we introduce a geometric approach to study the circular orbit of a particle in static and spherically symmetric spacetime based on Jacobi metric. Second, we apply the circular orbit to study the weak gravitational deflection of null and time-like particles based on Gauss-Bonnet theorem. By this way, we obtain an expression of deflection angle and extend the study of deflection angle to asymptotically non-flat black hole spacetimes. Some black holes as lens are considered such as a static and spherically symmetric black hole in the conformal Weyl gravity and a Schwarzschild-like black hole in bumblebee gravity. Our results are consistent with the previous literature. In particular, we find that the connection between Gaussian curvature and the radius of a circular orbit greatly simplifies the calculation.

preprint2020arXiv

Picking Winning Tickets Before Training by Preserving Gradient Flow

Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can reduce test-time resource requirements, but is typically applied to trained networks and therefore cannot avoid the expensive training process. We aim to prune networks at initialization, thereby saving resources at training time as well. Specifically, we argue that efficient training requires preserving the gradient flow through the network. This leads to a simple but effective pruning criterion we term Gradient Signal Preservation (GraSP). We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet architectures. Our method can prune 80% of the weights of a VGG-16 network on ImageNet at initialization, with only a 1.6% drop in top-1 accuracy. Moreover, our method achieves significantly better performance than the baseline at extreme sparsity levels.

preprint2016arXiv

Error estimates for structure-preserving discretization of the incompressible MHD system

In this paper, we carry out the error analysis for the structure-preserving discretization of the incompressible MHD system. This system, as a coupled system of Navier-Stokes equations and Maxwell's equations, is nonlinear. We use its energy estimate and the underlying physical structure to facilitate the error analysis. Under certain CFL conditions, we prove the optimal order of convergence. To support the theoretical results, we also present numerical tests.

Guodong Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

A new solar radiation pressure model for some orbit types in the cislunar space

Approximate optimality and the risk/reward tradeoff in a class of bandit problems

A Central Limit Theorem, Loss Aversion and Multi-Armed Bandits

Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization

Strategy-Driven Limit Theorems Associated Bandit Problems

On the Suboptimality of Negative Momentum for Minimax Optimization

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

Circular orbit of a particle and weak gravitational lensing

Picking Winning Tickets Before Training by Preserving Gradient Flow

Error estimates for structure-preserving discretization of the incompressible MHD system