Source author record

Edoardo di Napoli

Edoardo di Napoli appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.comp-ph Mathematical Software Distributed, Parallel, and Cluster Computing cond-mat.mtrl-sci Performance Computational Engineering, Finance, and Science Numerical Analysis Data Structures and Algorithms math.NA cond-mat.dis-nn cond-mat.str-el Discrete Mathematics hep-th math-ph math.MP

Catalog footprint

What is connected

18works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

ChASE -- A Distributed Hybrid CPU-GPU Eigensolver for Large-scale Hermitian Eigenvalue Problems

As modern massively parallel clusters are getting larger with beefier compute nodes, traditional parallel eigensolvers, such as direct solvers, struggle keeping the pace with the hardware evolution and being able to scale efficiently due to additional layers of communication and synchronization. This difficulty is especially important when porting traditional libraries to heterogeneous computing architectures equipped with accelerators, such as Graphics Processing Unit (GPU). Recently, there have been significant scientific contributions to the development of filter-based subspace eigensolver to compute partial eigenspectrum. The simpler structure of these type of algorithms makes for them easier to avoid the communication and synchronization bottlenecks typical of direct solvers. The Chebyshev Accelerated Subspace Eigensolver (ChASE) is a modern subspace eigensolver to compute partial extremal eigenpairs of large-scale Hermitian eigenproblems with the acceleration of a filter based on Chebyshev polynomials. In this work, we extend our previous work on ChASE by adding support for distributed hybrid CPU-multi-GPU computing architectures. Our tests show that ChASE achieves very good scaling performance up to 144 nodes with 526 NVIDIA A100 GPUs in total on dense eigenproblems of size up to $360$k.

preprint2020arXiv

A Long-Range Ising Model of a Barabási-Albert Network

Networks that have power-law connectivity, commonly referred to as the scale-free networks, are an important class of complex networks. A heterogeneous mean-field approximation has been previously proposed for the Ising model of the Barabási-Albert model of scale-free networks with classical spins on the nodes wherein it was shown that the critical temperature for such a system scales logarithmically with network size. For finite sizes, there is no criticality for such a system and hence no true phase transition in terms of singular behavior. Further, in the thermodynamic limit, the mean-field prediction of an infinite critical temperature for the system may exclude any true phase transition even then. Nevertheless, with an eye on potential applications of the model on biological systems that are generally finite, one may still try to find approximations that describe the relevant observables quantitatively. Here we present an alternative, approximate formulation for the description of the Ising model of a Barabási-Albert Network. Using the classical definition of magnetization, we show that Ising models on a network can be well-approximated by a long-range interacting homogeneous Ising model wherein each node of the network couples to all other spins with a strength determined by the mean degree of the Barabási-Albert Network. In such an effective long-range Ising model of a Barabási-Albert Network, the critical temperature is directly proportional to the number of preferentially attached links added to grow the network. The proposed model describes the magnetization of the majority of the sites with average or smaller than average degree better compared to the heterogeneous mean-field approximation. The long-range Ising model is the only homogeneous description of Barabási-Albert networks that we know of.

preprint2020arXiv

Solution to the modified Helmholtz equation for arbitrary periodic charge densities

We present a general method for solving the modified Helmholtz equation without shape approximation for an arbitrary periodic charge distribution, whose solution is known as the Yukawa potential or the screened Coulomb potential. The method is an extension of Weinert's pseudo-charge method [M. Weinert, J. Math. Phys. 22, 2433 (1981)] for solving the Poisson equation for the same class of charge density distributions. The inherent differences between the Poisson and the modified Helmholtz equation are in their respective radial solutions. These are polynomial functions, for the Poisson equation, and modified spherical Bessel functions, for the modified Helmholtz equation. This leads to a definition of a modified pseudo-charge density and modified multipole moments. We have shown that Weinert's convergence analysis of an absolutely and uniformly convergent Fourier series of the pseudo-charge density is transferred to the modified pseudo-charge density. We conclude by illustrating the algorithmic changes necessary to turn an available implementation of the Poisson solver into a solver for the modified Helmholtz equation.

preprint2020arXiv

The LAPW method with eigendecomposition based on the Hari--Zimmermann generalized hyperbolic SVD

In this paper we propose an accurate, highly parallel algorithm for the generalized eigendecomposition of a matrix pair $(H, S)$, given in a factored form $(F^{\ast} J F, G^{\ast} G)$. Matrices $H$ and $S$ are generally complex and Hermitian, and $S$ is positive definite. This type of matrices emerges from the representation of the Hamiltonian of a quantum mechanical system in terms of an overcomplete set of basis functions. This expansion is part of a class of models within the broad field of Density Functional Theory, which is considered the golden standard in condensed matter physics. The overall algorithm consists of four phases, the second and the fourth being optional, where the two last phases are computation of the generalized hyperbolic SVD of a complex matrix pair $(F,G)$, according to a given matrix $J$ defining the hyperbolic scalar product. If $J = I$, then these two phases compute the GSVD in parallel very accurately and efficiently.

preprint2017arXiv

Accelerating the computation of FLAPW methods on heterogeneous architectures

Legacy codes in computational science and engineering have been very successful in providing essential functionality to researchers. However, they are not capable of exploiting the massive parallelism provided by emerging heterogeneous architectures. The lack of portable performance and scalability puts them at high risk: either they evolve or they are doomed to disappear. One example of legacy code which would heavily benefit from a modern design is FLEUR, a software for electronic structure calculations. In previous work, the computational bottleneck of FLEUR was partially re-engineered to have a modular design that relies on standard building blocks, namely BLAS and LAPACK. In this paper, we demonstrate how the initial redesign enables the portability to heterogeneous architectures. More specifically, we study different approaches to port the code to architectures consisting of multi-core CPUs equipped with one or more coprocessors such as Nvidia GPUs and Intel Xeon Phis. Our final code attains over 70\% of the architectures' peak performance, and outperforms Nvidia's and Intel's libraries. Finally, on JURECA, the supercomputer where FLEUR is often executed, the code takes advantage of the full power of the computing nodes, attaining $5\times$ speedup over the sole use of the CPUs.

preprint2016arXiv

Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods

In this paper we focus on the integration of high-performance numerical libraries in ab initio codes and the portability of performance and scalability. The target of our work is FLEUR, a software for electronic structure calculations developed in the Forschungszentrum Jülich over the course of two decades. The presented work follows up on a previous effort to modernize legacy code by re-engineering and rewriting it in terms of highly optimized libraries. We illustrate how this initial effort to get efficient and portable shared-memory code enables fast porting of the code to emerging heterogeneous architectures. More specifically, we port the code to nodes equipped with multiple GPUs. We divide our study in two parts. First, we show considerable speedups attained by minor and relatively straightforward code changes to off-load parts of the computation to the GPUs. Then, we identify further possible improvements to achieve even higher performance and scalability. On a system consisting of 16-cores and 2 GPUs, we observe speedups of up to 5x with respect to our optimized shared-memory code, which in turn means between 7.5x and 12.5x speedup with respect to the original FLEUR code.

preprint2016arXiv

Methodology for determining the electronic thermal conductivity of metals via direct non-equilibrium ab initio molecular dynamics

Many physical properties of metals can be understood in terms of the free electron model, as proven by the Wiedemann-Franz law. According to this model, electronic thermal conductivity ($κ_{el}$) can be inferred from the Boltzmann transport equation (BTE). However, the BTE does not perform well for some complex metals, such as Cu. Moreover, the BTE cannot clearly describe the origin of the thermal energy carried by electrons or how this energy is transported in metals. The charge distribution of conduction electrons in metals is known to reflect the electrostatic potential (EP) of the ion cores. Based on this premise, we develop a new methodology for evaluating $κ_{el}$ by combining the free electron model and non-equilibrium ab initio molecular dynamics (NEAIMD) simulations. We demonstrate that the kinetic energy of thermally excited electrons originates from the energy of the spatial electrostatic potential oscillation (EPO), which is induced by the thermal motion of ion cores. This method directly predicts the $κ_{el}$ of pure metals with a high degree of accuracy.

preprint2016arXiv

Parallel adaptive integration in high-performance functional Renormalization Group computations

The conceptual framework provided by the functional Renormalization Group (fRG) has become a formidable tool to study correlated electron systems on lattices which, in turn, provided great insights to our understanding of complex many-body phenomena, such as high- temperature superconductivity or topological states of matter. In this work we present one of the latest realizations of fRG which makes use of an adaptive numerical quadrature scheme specifically tailored to the described fRG scheme. The final result is an increase in performance thanks to improved parallelism and scalability.

preprint2014arXiv

An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems

In many scientific applications the solution of non-linear differential equations are obtained through the set-up and solution of a number of successive eigenproblems. These eigenproblems can be regarded as a sequence whenever the solution of one problem fosters the initialization of the next. In addition, in some eigenproblem sequences there is a connection between the solutions of adjacent eigenproblems. Whenever it is possible to unravel the existence of such a connection, the eigenproblem sequence is said to be correlated. When facing with a sequence of correlated eigenproblems the current strategy amounts to solving each eigenproblem in isolation. We propose a alternative approach which exploits such correlation through the use of an eigensolver based on subspace iteration and accelerated with Chebyshev polynomials (ChFSI). The resulting eigensolver is optimized by minimizing the number of matrix-vector multiplications and parallelized using the Elemental library framework. Numerical results show that ChFSI achieves excellent scalability and is competitive with current dense linear algebra parallel eigensolvers.

preprint2014arXiv

Efficient estimation of eigenvalue counts in an interval

Estimating the number of eigenvalues located in a given interval of a large sparse Hermitian matrix is an important problem in certain applications and it is a prerequisite of eigensolvers based on a divide-and-conquer paradigm. Often an exact count is not necessary and methods based on stochastic estimates can be utilized to yield rough approximations. This paper examines a number of techniques tailored to this specific task. It reviews standard approaches and explores new ones based on polynomial and rational approximation filtering combined with a stochastic procedure.

preprint2013arXiv

A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW

In one of the most important methods in Density Functional Theory - the Full-Potential Linearized Augmented Plane Wave (FLAPW) method - dense generalized eigenproblems are organized in long sequences. Moreover each eigenproblem is strongly correlated to the next one in the sequence. We propose a novel approach which exploits such correlation through the use of an eigensolver based on subspace iteration and accelerated with Chebyshev polynomials. The resulting solver, parallelized using the Elemental library framework, achieves excellent scalability and is competitive with current dense parallel eigensolvers.

preprint2013arXiv

Block Iterative Eigensolvers for Sequences of Correlated Eigenvalue Problems

In Density Functional Theory simulations based on the LAPW method, each self-consistent field cycle comprises dozens of large dense generalized eigenproblems. In contrast to real-space methods, eigenpairs solving for problems at distinct cycles have either been believed to be independent or at most very loosely connected. In a recent study [7], it was demonstrated that, contrary to belief, successive eigenproblems in a sequence are strongly correlated with one another. In particular, by monitoring the subspace angles between eigenvectors of successive eigenproblems, it was shown that these angles decrease noticeably after the first few iterations and become close to collinear. This last result suggests that we can manipulate the eigenvectors, solving for a specific eigenproblem in a sequence, as an approximate solution for the following eigenproblem. In this work we present results that are in line with this intuition. We provide numerical examples where opportunely selected block iterative eigensolvers benefit from the reuse of eigenvectors by achieving a substantial speed-up. The results presented will eventually open the way to a widespread use of block iterative eigensolvers in ab initio electronic structure codes based on the LAPW approach.

preprint2013arXiv

Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions

Mathematical operators whose transformation rules constitute the building blocks of a multi-linear algebra are widely used in physics and engineering applications where they are very often represented as tensors. In the last century, thanks to the advances in tensor calculus, it was possible to uncover new research fields and make remarkable progress in the existing ones, from electromagnetism to the dynamics of fluids and from the mechanics of rigid bodies to quantum mechanics of many atoms. By now, the formal mathematical and geometrical properties of tensors are well defined and understood; conversely, in the context of scientific and high-performance computing, many tensor- related problems are still open. In this paper, we address the problem of efficiently computing contractions among two tensors of arbitrary dimension by using kernels from the highly optimized BLAS library. In particular, we establish precise conditions to determine if and when GEMM, the kernel for matrix products, can be used. Such conditions take into consideration both the nature of the operation and the storage scheme of the tensors, and induce a classification of the contractions into three groups. For each group, we provide a recipe to guide the users towards the most effective use of BLAS.

preprint2012arXiv

Correlations in sequences of generalized eigenproblems arising in Density Functional Theory

Density Functional Theory (DFT) is one of the most used ab initio theoretical frameworks in materials science. It derives the ground state properties of a multi-atomic ensemble directly from the computation of its one-particle density \nr .In DFT-based simulations the solution is calculated through a chain of successive self-consistent cycles; in each cycle a series of coupled equations (Kohn-Sham) translates to a large number of generalized eigenvalue problems whose eigenpairs are the principal means for expressing \nr. A simulation ends when \nr\ has converged to the solution within the required numerical accuracy. This usually happens after several cycles, resulting in a process calling for the solution of many sequences of eigenproblems. In this paper, the authors report evidence showing unexpected correlations between adjacent eigenproblems within each sequence. By investigating the numerical properties of the sequences of generalized eigenproblems it is shown that the eigenvectors undergo an "evolution" process. At the same time it is shown that the Hamiltonian matrices exhibit a similar evolution and manifest a specific pattern in the information they carry. Correlation between eigenproblems within a sequence is of capital importance: information extracted from the simulation at one step of the sequence could be used to compute the solution at the next step. Although they are not explored in this work, the implications could be manifold: from increasing the performance of material simulations, to the development of an improved iterative solver, to modifying the mathematical foundations of the DFT computational paradigm in use, thus opening the way to the investigation of new materials.

preprint2012arXiv

Dissecting the FEAST algorithm for generalized eigenproblems

We analyze the FEAST method for computing selected eigenvalues and eigenvectors of large sparse matrix pencils. After establishing the close connection between FEAST and the well-known Rayleigh-Ritz method, we identify several critical issues that influence convergence and accuracy of the solver: the choice of the starting vector space, the stopping criterion, how the inner linear systems impact the quality of the solution, and the use of FEAST for computing eigenpairs from multiple intervals. We complement the study with numerical examples, and hint at possible improvements to overcome the existing problems.

preprint2012arXiv

Solving Dense Generalized Eigenproblems on Multi-threaded Architectures

We compare two approaches to compute a portion of the spectrum of dense symmetric definite generalized eigenproblems: one is based on the reduction to tridiagonal form, and the other on the Krylov-subspace iteration. Two large-scale applications, arising in molecular dynamics and material science, are employed to investigate the contributions of the application, architecture, and parallelism of the method to the performance of the solvers. The experimental results on a state-of-the-art 8-core platform, equipped with a graphics processing unit (GPU), reveal that in real applications, iterative Krylov-subspace methods can be a competitive approach also for the solution of dense problems.

preprint2010arXiv

Matrix Structure Exploitation in Generalized Eigenproblems Arising in Density Functional Theory

In this short paper, the authors report a new computational approach in the context of Density Functional Theory (DFT). It is shown how it is possible to speed up the self-consistent cycle (iteration) characterizing one of the most well-known DFT implementations: FLAPW. Generating the Hamiltonian and overlap matrices and solving the associated generalized eigenproblems $Ax = λBx$ constitute the two most time-consuming fractions of each iteration. Two promising directions, implementing the new methodology, are presented that will ultimately improve the performance of the generalized eigensolver and save computational time.

preprint2006arXiv

Quantum Deconstruction of 5D SQCD

We deconstruct the fifth dimension of 5D SCQD with general numbers of colors and flavors and general 5D Chern-Simons level; the latter is adjusted by adding extra quarks to the 4D quiver. We use deconstruction as a non-stringy UV completion of the quantum 5D theory; to prove its usefulness, we compute quantum corrections to the SQCD_5 prepotential. We also explore the moduli/parameter space of the deconstructed SQCD_5 and show that for |K_CS| < N_F/2 it continues to negative values of 1/(g_5)^2. In many cases there are flop transitions connecting SQCD_5 to exotic 5D theories such as E0, and we present several examples of such transitions. We compare deconstruction to brane-web engineering of the same SQCD_5 and show that the phase diagram is the same in both cases; indeed, the two UV completions are in the same universality class, although they are not dual to each other. Hence, the phase structure of an SQCD_5 (and presumably any other 5D gauge theory) is inherently five-dimensional and does not depends on a UV completion.

Edoardo di Napoli

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

ChASE -- A Distributed Hybrid CPU-GPU Eigensolver for Large-scale Hermitian Eigenvalue Problems

A Long-Range Ising Model of a Barabási-Albert Network

Solution to the modified Helmholtz equation for arbitrary periodic charge densities

The LAPW method with eigendecomposition based on the Hari--Zimmermann generalized hyperbolic SVD

Accelerating the computation of FLAPW methods on heterogeneous architectures

Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods

Methodology for determining the electronic thermal conductivity of metals via direct non-equilibrium ab initio molecular dynamics

Parallel adaptive integration in high-performance functional Renormalization Group computations

An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems

Efficient estimation of eigenvalue counts in an interval

A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW

Block Iterative Eigensolvers for Sequences of Correlated Eigenvalue Problems

Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions

Correlations in sequences of generalized eigenproblems arising in Density Functional Theory

Dissecting the FEAST algorithm for generalized eigenproblems

Solving Dense Generalized Eigenproblems on Multi-threaded Architectures

Matrix Structure Exploitation in Generalized Eigenproblems Arising in Density Functional Theory

Quantum Deconstruction of 5D SQCD