Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Investigating the Anisotropy of Dispersion Measure Contribution from the Galactic Halo by Using Fast Radio Bursts

We propose a data-driven approach to reconstruct the all-sky distribution of the dispersion measure contribution from the Galactic halo ($\mathrm{DM_{halo}}$) through a spherical harmonic expansion, enabling an investigation of its possible anisotropies. Based on the NE2001 model and using 92 localized and 574 unlocalized non-repeating fast radio bursts (FRBs) at Galactic latitudes $|b|>15^\circ$, we find a significant dipole anisotropy in $\mathrm{DM_{halo}}$, pointing toward $(l=130^\circ,\, b=+5^\circ)$ with a $1σ$ uncertainty of approximately $28^\circ$. The $\mathrm{DM_{halo}}$ value in this direction is $63\pm9~\mathrm{pc~cm^{-3}}$, exceeding the all-sky mean by about $2.6σ$. This result is not significantly affected by the choice of Galactic ISM models. Furthermore, even when using a refined sample of 62 localized FRBs (excluding CHIME detections, repeaters, and unlocalized events), the dipole anisotropic structure persists, with a direction of $(l=141^\circ,\, b=+51^\circ)$ and a larger 1$σ$ uncertainty of $\sim 44^\circ$. Model comparisons using the Akaike Information Criterion and Bayesian evidence yield consistent preferences, and together they suggest that current FRB data slightly favor the existence of a dipole structure in $\mathrm{DM_{halo}}$. If this feature is not a statistical fluctuation or systematic error, its physical origin requires further investigation. Future FRB samples with larger sizes and more complete sky coverage will be essential to confirm or refute this possible anisotropic structure.

preprint2024arXiv

Observations favor the redshift-evolutionary $L_X$-$L_{UV}$ relation of quasars from copula

We compare, with data from the quasars, the Hubble parameter measurements, and the Pantheon+ type Ia supernova, three different relations between X-ray luminosity ($L_X$) and ultraviolet luminosity ($L_{UV}$) of quasars. These three relations consist of the standard and two redshift-evolutionary $L_X$-$L_{UV}$ relations which are constructed respectively by considering a redshift dependent correction to the luminosities of quasars and using the statistical tool called copula. By employing the PAge approximation for a cosmological-model-independent description of the cosmic background evolution and dividing the quasar data into the low-redshift and high-redshift parts, we find that the constraints on the PAge parameters from the low-redshift and high-redshift data, which are obtained with the redshift-evolutionary relations, are consistent with each other, while they are not when the standard relation is considered. If the data are used to constrain the coefficients of the relations and the PAge parameters simultaneously, then the observations support the redshift-evolutionary relations at more than $3σ$. The Akaike and Bayes information criteria indicate that there is strong evidence against the standard relation and mild evidence against the redshift-evolutionary relation constructed by considering a redshift dependent correction to the luminosities of quasars. This suggests that the redshift-evolutionary $L_X$-$L_{UV}$ relation of quasars constructed from copula is favored by the observations.

preprint2022arXiv

glassoformer: a query-sparse transformer for post-fault power grid voltage prediction

We propose GLassoformer, a novel and efficient transformer architecture leveraging group Lasso regularization to reduce the number of queries of the standard self-attention mechanism. Due to the sparsified queries, GLassoformer is more computationally efficient than the standard transformers. On the power grid post-fault voltage prediction task, GLassoformer shows remarkably better prediction than many existing benchmark algorithms in terms of accuracy and stability.

preprint2022arXiv

Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence. Leveraging techniques include sparse and linear attention and hashing tricks; efficient transformers have been proposed to reduce the quadratic complexity of transformers but significantly degrade the accuracy. In response, we first interpret the linear attention and residual connections in computing the attention map as gradient descent steps. We then introduce momentum into these components and propose the \emph{momentum transformer}, which utilizes momentum to improve the accuracy of linear transformers while maintaining linear memory and computational complexities. Furthermore, we develop an adaptive strategy to compute the momentum value for our model based on the optimal momentum for quadratic optimization. This adaptive momentum eliminates the need to search for the optimal momentum value and further enhances the performance of the momentum transformer. A range of experiments on both autoregressive and non-autoregressive tasks, including image generation and machine translation, demonstrate that the momentum transformer outperforms popular linear transformers in training efficiency and accuracy.

preprint2022arXiv

Proximal Implicit ODE Solvers for Accelerating Learning Neural ODEs

Learning neural ODEs often requires solving very stiff ODE systems, primarily using explicit adaptive step size ODE solvers. These solvers are computationally expensive, requiring the use of tiny step sizes for numerical stability and accuracy guarantees. This paper considers learning neural ODEs using implicit ODE solvers of different orders leveraging proximal operators. The proximal implicit solver consists of inner-outer iterations: the inner iterations approximate each implicit update step using a fast optimization algorithm, and the outer iterations solve the ODE system over time. The proximal implicit ODE solver guarantees superiority over explicit solvers in numerical stability and computational efficiency. We validate the advantages of proximal implicit solvers over existing popular neural ODE solvers on various challenging benchmark tasks, including learning continuous-depth graph neural networks and continuous normalizing flows.

preprint2021arXiv

A formula for symmetry recursion operators from non-variational symmetries of partial differential equations

An explicit formula to find symmetry recursion operators for partial differential equations (PDEs) is obtained from new results connecting variational integrating factors and non-variational symmetries. The formula is special case of a general formula that produces a pre-symplectic operator from a non-gradient adjoint-symmetry. These formulas are illustrated by several examples of linear PDEs and integrable nonlinear PDEs. Additionally, a classification of quasilinear second-order PDEs admitting a multiplicative symmetry recursion operator through the first formula is presented.

preprint2021arXiv

Efficient and Reliable Overlay Networks for Decentralized Federated Learning

We propose near-optimal overlay networks based on $d$-regular expander graphs to accelerate decentralized federated learning (DFL) and improve its generalization. In DFL a massive number of clients are connected by an overlay network, and they solve machine learning problems collaboratively without sharing raw data. Our overlay network design integrates spectral graph theory and the theoretical convergence and generalization bounds for DFL. As such, our proposed overlay networks accelerate convergence, improve generalization, and enhance robustness to clients failures in DFL with theoretical guarantees. Also, we present an efficient algorithm to convert a given graph to a practical overlay network and maintaining the network topology after potential client failures. We numerically verify the advantages of DFL with our proposed networks on various benchmark tasks, ranging from image classification to language modeling using hundreds of clients.

preprint2021arXiv

Stability and Generalization of the Decentralized Stochastic Gradient Descent

The stability and generalization of stochastic gradient-based methods provide valuable insights into understanding the algorithmic performance of machine learning models. As the main workhorse for deep learning, stochastic gradient descent has received a considerable amount of studies. Nevertheless, the community paid little attention to its decentralized variants. In this paper, we provide a novel formulation of the decentralized stochastic gradient descent. Leveraging this formulation together with (non)convex optimization theory, we establish the first stability and generalization guarantees for the decentralized stochastic gradient descent. Our theoretical results are built on top of a few common and mild assumptions and reveal that the decentralization deteriorates the stability of SGD for the first time. We verify our theoretical findings by using a variety of decentralized settings and benchmark machine learning models.

preprint2020arXiv

Adversarial Defense via Data Dependent Activation Function and Total Variation Minimization

We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation. This data-dependent activation remarkably improves both the generalization and robustness of DNN. In the CIFAR10 benchmark, we raise the robust accuracy of the adversarially trained ResNet20 from $\sim 46\%$ to $\sim 69\%$ under the state-of-the-art Iterative Fast Gradient Sign Method (IFGSM) based adversarial attack. When we combine this data-dependent activation with total variation minimization on adversarial images and training data augmentation, we achieve an improvement in robust accuracy by 38.9$\%$ for ResNet56 under the strongest IFGSM attack. Furthermore, We provide an intuitive explanation of our defense by analyzing the geometry of the feature space.

preprint2020arXiv

Reflections in the Sky: Joint Trajectory and Passive Beamforming Design for Secure UAV Networks with Reconfigurable Intelligent Surface

This paper investigates the problem of secure energy efficiency maximization for a reconfigurable intelligent surface (RIS) assisted uplink wireless communication system, where an unmanned aerial vehicle (UAV) equipped with an RIS works as a mobile relay between the base station (BS) and a group of users. We focus on maximizing the secure energy efficiency of the system via jointly optimizing the UAV's trajectory, the RIS's phase shift, users' association and transmit power. To tackle this problem, we divide the original problem into three sub-problems, and propose an efficient iterative algorithm. In particular, the successive convex approximation method (SCA) is applied to solve the nonconvex UAV trajectory, the RIS's phase shift, and transmit power optimization sub-problems. We further provide two schemes to simplify the solution of phase and trajectory sub-problem. Simulation results demonstrate that the proposed algorithm converges fast, and the proposed design can enhance the secure energy efficiency by up to 38\% gains, as compared to the traditional schemes without any RIS.

preprint2020arXiv

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent

Stochastic gradient descent (SGD) with constant momentum and its variants such as Adam are the optimization algorithms of choice for training deep neural networks (DNNs). Since DNN training is incredibly computationally expensive, there is great interest in speeding up the convergence. Nesterov accelerated gradient (NAG) improves the convergence rate of gradient descent (GD) for convex optimization using a specially designed momentum; however, it accumulates error when an inexact gradient is used (such as in SGD), slowing convergence at best and diverging at worst. In this paper, we propose Scheduled Restart SGD (SRSGD), a new NAG-style scheme for training DNNs. SRSGD replaces the constant momentum in SGD by the increasing momentum in NAG but stabilizes the iterations by resetting the momentum to zero according to a schedule. Using a variety of models and benchmarks for image classification, we demonstrate that, in training DNNs, SRSGD significantly improves convergence and generalization; for instance in training ResNet200 for ImageNet classification, SRSGD achieves an error rate of 20.93% vs. the benchmark of 22.13%. These improvements become more significant as the network grows deeper. Furthermore, on both CIFAR and ImageNet, SRSGD reaches similar or even better error rates with significantly fewer training epochs compared to the SGD baseline.

preprint2020arXiv

Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets

Deep neural nets (DNNs) compression is crucial for adaptation to mobile devices. Though many successful algorithms exist to compress naturally trained DNNs, developing efficient and stable compression algorithms for robustly trained DNNs remains widely open. In this paper, we focus on a co-design of efficient DNN compression algorithms and sparse neural architectures for robust and accurate deep learning. Such a co-design enables us to advance the goal of accommodating both sparsity and robustness. With this objective in mind, we leverage the relaxed augmented Lagrangian based algorithms to prune the weights of adversarially trained DNNs, at both structured and unstructured levels. Using a Feynman-Kac formalism principled robust and sparse DNNs, we can at least double the channel sparsity of the adversarially trained ResNet20 for CIFAR10 classification, meanwhile, improve the natural accuracy by $8.69$\% and the robust accuracy under the benchmark $20$ iterations of IFGSM attack by $5.42$\%. The code is available at \url{https://github.com/BaoWangMath/rvsm-rgsm-admm}.