Researcher profile

Shi Pu

Shi Pu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2026arXiv

Accelerating Decentralized Optimization via Overlapping Local Steps

Decentralized optimization has emerged as a critical paradigm for distributed learning, enabling scalable training while preserving data privacy through peer-to-peer collaboration. However, existing methods often suffer from communication bottlenecks due to frequent synchronization between nodes. We present Overlapping Local Decentralized SGD (OLDSGD), a novel approach to accelerate decentralized training by computation-communication overlapping, significantly reducing network idle time. With a deliberately designed update, OLDSGD preserves the same average update as Local SGD while avoiding communication-induced stalls. Theoretically, we establish non-asymptotic convergence rates for smooth non-convex objectives, showing that OLDSGD retains the same iteration complexity as standard Local Decentralized SGD while improving per-iteration runtime. Empirical results demonstrate OLDSGD's consistent improvements in wall-clock time convergence under different levels of communication delays. With minimal modifications to existing frameworks, OLDSGD offers a practical solution for faster decentralized learning without sacrificing theoretical guarantees.

preprint2022arXiv

A Compressed Gradient Tracking Method for Decentralized Optimization with Linear Convergence

Communication compression techniques are of growing interests for solving the decentralized optimization problem under limited communication, where the global objective is to minimize the average of local cost functions over a multi-agent network using only local computation and peer-to-peer communication. In this paper, we propose a novel compressed gradient tracking algorithm (C-GT) that combines gradient tracking technique with communication compression. In particular, C-GT is compatible with a general class of compression operators that unifies both unbiased and biased compressors. We show that C-GT inherits the advantages of gradient tracking-based algorithms and achieves linear convergence rate for strongly convex and smooth objective functions. Numerical examples complement the theoretical findings and demonstrate the efficiency and flexibility of the proposed algorithm.

preprint2022arXiv

Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification

Most methods tackle zero-shot video classification by aligning visual-semantic representations within seen classes, which limits generalization to unseen classes. To enhance model generalizability, this paper presents an end-to-end framework that preserves alignment and uniformity properties for representations on both seen and unseen classes. Specifically, we formulate a supervised contrastive loss to simultaneously align visual-semantic features (i.e., alignment) and encourage the learned features to distribute uniformly (i.e., uniformity). Unlike existing methods that only consider the alignment, we propose uniformity to preserve maximal-info of existing features, which improves the probability that unobserved features fall around observed data. Further, we synthesize features of unseen classes by proposing a class generator that interpolates and extrapolates the features of seen classes. Besides, we introduce two metrics, closeness and dispersion, to quantify the two properties and serve as new measurements of model generalizability. Experiments show that our method significantly outperforms SoTA by relative improvements of 28.1% on UCF101 and 27.0% on HMDB51. Code is available.

preprint2022arXiv

Berry phase in the phase space worldline representation: the axial anomaly and classical kinetic theory

The Berry phase is analyzed for Weyl and Dirac fermions in a phase space representation of the worldline formalism. Kinetic theories are constructed for both at a classical level. Whereas the Weyl fermion case reduces in dimension, resembling a theory in quantum mechanics, the Dirac fermion case takes on a manifestly Lorentz covariant form. To achieve a classical kinetic theory for the non-Abelian Dirac fermion Berry phase a spinor construction of Barut and Zanghi is utilized. The axial anomaly is also studied at a quantum level. It is found that under an adiabatic approximation, which is necessary for facilitating a classical kinetic theory, the index of the Dirac operator for massless fermions vanishes. Even so, similarities of an axial rotation to an exact non-covariant Berry phase transform are drawn by application of the Fujikawa method to the Barut and Zanghi spinors on the worldline.

preprint2022arXiv

Hydrodynamic helicity polarization in relativistic heavy ion collisions

We study helicity polarization through the (3+1) dimensional relativistic viscous hydrodynamic models at $\sqrt{s_{NN}}=200$GeV Au+Au collisions. Similar to the local spin polarization, we consider the helicity polarization beyond global equilibrium and investigate the contributions induced by thermal vorticity, shear viscous tensor, and the fluid acceleration. We find that the local helicity polarization induced by thermal vorticity dominates over other contributions. It also implies that in the low-energy collisions, the the fluid vorticity as part of thermal vorticity may play the crucial role to the total helicity polarization. Such a finding could be useful for probing the local strength of vorticity in rotational quark gluon plasmas by measuring helicity polarization. Our simulation confirms the strict space reversal symmetry, whereas we also compare our numerical results with approximated relations derived from ideal Bjorken flow. Our studies also provide a baseline for the future investigation on local parity violation through the correlations of helicity polarization.

preprint2022arXiv

Lepton pair photoproduction in peripheral relativistic heavy-ion collisions

We study the lepton pair photoproduction in peripheral heavy-ion collisions based on the formalism in our previous work [Phys. Rev. D 104, 056011 (2021)]. We present the numerical results for the distributions of the transverse momentum, azimuthal angle and invariant mass for $e^{+}e^{-}$ and $μ^{+}μ^{-}$ pairs as functions of the impact parameter and other kinematic variables in Au+Au collisions. Our calculation incorporates the information on the transverse momentum and polarization of photons which is essential to describe the experimental data. We observe a broadening effect in the transverse momentum for lepton pairs with and without smear effects. We also observe a significant enhancement in the distribution of $\cos(2φ)$ for $μ^{+}μ^{-}$ pairs. Our results provide a baseline for future studies of other higher order corrections beyond Born approximation and medium effects in the lepton pair production.

preprint2022arXiv

Local and global polarization of $Λ$ hyperons across RHIC-BES energies: the roles of spin hall effect, initial condition and baryon diffusion

We perform a systematic study on the local and global spin polarization of $Λ$ and $\overlineΛ$ hyperons in relativistic heavy-ion collisions at beam energy scan energies via the (3+1)-dimensional CLVisc hydrodynamics model with AMPT and SMASH initial conditions. Following the quantum kinetic theory, we decompose the polarization vector as the parts induced by thermal vorticity, shear tensor and the spin Hall effect (SHE). We find that the polarization induced by SHE and the total polarization strongly depends on the initial conditions. At $7.7$GeV, SHE gives a sizeable contribution and even flips the sign of the local polarization along the beam direction for AMPT initial condition, which is not observed for SMASH initial condition. Meanwhile, the local polarization along the out-of-plane direction induced by SHE with AMPT initial condition does not always increase with decreasing collision energies. Next, we find that the polarization along the beam direction is sensitive to the baryon diffusion coefficient, but the local polarization along the out-of-plane direction is not. Our results for the global polarization of $Λ$ and $\overlineΛ$ agree well with the STAR data. Interestingly, the global polarization of $\overlineΛ$ is not always larger than that of $Λ$ due to various competing effects. Our findings are helpful for understanding the polarization phenomenon and the detailed structure of quark-gluon plasma in relativistic heavy-ion collisions.

preprint2022arXiv

Quantum kinetic theory for dynamical spin polarization from QED-type interaction

We investigate the dynamical spin polarization of a massless electron probing an electron plasma in locally thermal equilibrium via the Moller scattering from the quantum kinetic theory. We derive an axial kinetic equation delineating the dynamical spin evolution in the presence of the collision term with quantum corrections up to $\mathcal{O}(\hbar)$ and the leading-logarithmic order in coupling by using the hard-thermal-loop (HTL) approximation, from which we extract the spin-polarization rate induced by the spacetime gradients of the medium. When the electron probe approaches local equilibrium, we further simplify the collision term into a relaxation-time expression. Our kinetic equation may be implemented in the future numerical simulations for dynamical spin polarization.

preprint2021arXiv

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

This paper is concerned with minimizing the average of $n$ cost functions over a network in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD). Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate, which we show behaves as $K_T=\mathcal{O}\left(\frac{n}{(1-ρ_w)^2}\right)$, where $1-ρ_w$ denotes the spectral gap of the mixing matrix. Moreover, we construct a "hard" optimization problem for which we show the transient time needed for DSGD to approach the asymptotic convergence rate is lower bounded by $Ω\left(\frac{n}{(1-ρ_w)^2} \right)$, implying the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.

preprint2021arXiv

Analytic solutions of relativistic dissipative spin hydrodynamics with Bjorken expansion

We have studied analytically the longitudinally boost-invariant motion of a relativistic dissipative fluid with spin. We have derived the analytic solutions of spin density and spin chemical potential as a function of proper time $τ$ in the presence of viscous tensor and the second order relaxation time corrections for spin. Interestingly, analogous to the ordinary particle number density and chemical potential, we find that the spin density and spin chemical potential decay as $\simτ^{-1}$ and $\simτ^{-1/3}$, respectively. It implies that the initial spin density may not survive at the freezeout hyper-surface. These solutions can serve both to gain insight on the dynamics of spin polarization in relativistic heavy-ion collisions and as testbeds for further numerical codes.

preprint2020arXiv

A general framework for decentralized optimization with first-order methods

Decentralized optimization to minimize a finite sum of functions over a network of nodes has been a significant focus within control and signal processing research due to its natural relevance to optimal control and signal estimation problems. More recently, the emergence of sophisticated computing and large-scale data science needs have led to a resurgence of activity in this area. In this article, we discuss decentralized first-order gradient methods, which have found tremendous success in control, signal processing, and machine learning problems, where such methods, due to their simplicity, serve as the first method of choice for many complex inference and training tasks. In particular, we provide a general framework of decentralized first-order methods that is applicable to undirected and directed communication networks alike, and show that much of the existing work on optimization and consensus can be related explicitly to this framework. We further extend the discussion to decentralized stochastic first-order methods that rely on stochastic gradients at each node and describe how local variance reduction schemes, previously shown to have promise in the centralized settings, are able to improve the performance of decentralized methods when combined with what is known as gradient tracking. We motivate and demonstrate the effectiveness of the corresponding methods in the context of machine learning and signal processing problems that arise in decentralized environments.

preprint2020arXiv

A Robust Gradient Tracking Method for Distributed Optimization over Directed Networks

In this paper, we consider the problem of distributed consensus optimization over multi-agent networks with directed network topology. Assuming each agent has a local cost function that is smooth and strongly convex, the global objective is to minimize the average of all the local cost functions. To solve the problem, we introduce a robust gradient tracking method (R-Push-Pull) adapted from the recently proposed Push-Pull/AB algorithm. R-Push-Pull inherits the advantages of Push-Pull and enjoys linear convergence to the optimal solution with exact communication. Under noisy information exchange, R-Push-Pull is more robust than the existing gradient tracking based algorithms; the solutions obtained by each agent reach a neighborhood of the optimum in expectation exponentially fast under a constant stepsize policy. We provide a numerical example that demonstrate the effectiveness of R-Push-Pull.

preprint2020arXiv

Anomalous magnetohydrodynamics with constant anisotropic electric conductivities

We study anomalous magnetohydrodynamics in a longitudinal boost invariant Bjorken flow with constant anisotropic electric conductivities as outlined in Ref. [1]. For simplicity, we consider a neutral fluid and a force-free magnetic field in the transverse direction. We derived analytic solutions of the electromagnetic fields in the laboratory frame, the chiral density, and the energy density as functions of proper time.

preprint2020arXiv

Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning

We provide a discussion of several recent results which, in certain scenarios, are able to overcome a barrier in distributed stochastic optimization for machine learning. Our focus is the so-called asymptotic network independence property, which is achieved whenever a distributed method executed over a network of n nodes asymptotically converges to the optimal solution at a comparable rate to a centralized method with the same computational power as the entire network. We explain this property through an example involving the training of ML models and sketch a short mathematical analysis for comparing the performance of distributed stochastic gradient descent (DSGD) with centralized stochastic gradient decent (SGD).

preprint2020arXiv

Distributed Stochastic Gradient Tracking Methods

In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method (DSGT) and a gossip-like stochastic gradient tracking method (GSGT). We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant stepsize choice). Under DSGT, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size $n$, which is a comparable performance to a centralized stochastic gradient algorithm. Moreover, we show that when the network is well-connected, GSGT incurs lower communication cost than DSGT while maintaining a similar computational cost. Numerical example further demonstrates the effectiveness of the proposed methods.

preprint2020arXiv

Push-Pull Gradient Methods for Distributed Optimization in Networks

In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents' cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider new distributed gradient-based methods where each node maintains two estimates, namely, an estimate of the optimal decision variable and an estimate of the gradient for the average of the agents' objective functions. From the viewpoint of an agent, the information about the gradients is pushed to the neighbors, while the information about the decision variable is pulled from the neighbors hence giving the name "push-pull gradient methods". The methods utilize two different graphs for the information exchange among agents, and as such, unify the algorithms with different types of distributed architecture, including decentralized (peer-to-peer), centralized (master-slave), and semi-centralized (leader-follower) architecture. We show that the proposed algorithms and their many variants converge linearly for strongly convex and smooth objective functions over a network (possibly with unidirectional data links) in both synchronous and asynchronous random-gossip settings. In particular, under the random-gossip setting, "push-pull" is the first class of algorithms for distributed optimization over directed graphs. Moreover, we numerically evaluate our proposed algorithms in both scenarios, and show that they outperform other existing linearly convergent schemes, especially for ill-conditioned problems and networks that are not well balanced.

preprint2020arXiv

Recent developments in chiral and spin polarization effects in heavy-ion collisions

We give a brief overview of recent theoretical and experimental results on the chiral magnetic effect and spin polarization effect in heavy-ion collisions. We present updated experimental results for the chiral magnetic effect and related phenomena. The time evolution of the magnetic fields in different models is discussed. The newly developed quantum kinetic theory for massive fermions is reviewed. We present theoretical and experimental results for the polarization of $Λ$ hyperons and the $ρ_{00}$ value of vector mesons.

preprint2020arXiv

Relativistic Kelvin circulation theorem for ideal Magnetohydrodynamics

We have studied the relativistic Kelvin circulation theorem for ideal Magnetohydrodynamics. The relativistic Kelvin circulation theorem is a conservation equation for the called $T$-vorticity. We have briefly reviewed the ideal magnetohydrodynamics in relativistic heavy ion collisions. The highlight of this work is that we have obtained the general expression of relativistic Kelvin circulation theorem for ideal Magnetohydrodynamics. We have also applied the analytic solutions of ideal magnetohydrodynamics in Bjorken flow to check our results. Our main results can also be implemented to relativistic magnetohydrodynamics in relativistic heavy ion collisions.