Source author record

Volker Weinberg

Volker Weinberg appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Performance Distributed, Parallel, and Cluster Computing physics.comp-ph hep-lat physics.plasm-ph Programming Languages Hardware Architecture

Catalog footprint

What is connected

9works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

Optimising PICCANTE - an Open Source Particle-in-Cell Code for Advanced Simulations on Tier-0 Systems

We present a detailed strong and weak scaling analysis of PICCANTE, an open source, massively parallel, fully-relativistic Particle-In-Cell (PIC) code. PIC codes are widely used in plasma physics and astrophysics to study the cases where kinetic effects are relevant. PICCANTE is primarily developed to study laser-plasma interaction. Within a PRACE Preparatory Access Project, various revisions of different routines of the code have been analysed on the HPC systems JUQUEEN at Juelich Supercomputing Centre (JSC), Germany, and FERMI at CINECA, Italy, to improve scalability and I/O performance of the application. The diagnostic tool Scalasca is used to identify suboptimal routines. Different output strategies are discussed. The detailed strong and weak scaling behaviour of the improved code are presented in comparison with the original version of the code.

preprint2015arXiv

The Mont-Blanc Project: First Phase Successfully Finished

Running from October 2011 to June 2015, the aim of the European project Mont-Blanc has been to develop an approach to Exascale computing based on embedded power-efficient technology. The main goals of the project were to i) build an HPC prototype using currently available energy-efficient embedded technology, ii) design a Next Generation system to overcome the limitations of the built prototype and iii) port a set of representative Exascale applications to the system. This article summarises the contributions from the Leibniz Supercomputing Centre (LRZ) and the Juelich Supercomputing Centre (JSC), Germany, to the Mont-Blanc project.

preprint2014arXiv

Scalability of the plasma physics code GEM

We discuss a detailed weak scaling analysis of GEM, a 3D MPI-parallelised gyrofluid code used in theoretical plasma physics at the Max Planck Institute of Plasma Physics, IPP at Garching b. München, Germany. Within a PRACE Preparatory Access Project various versions of the code have been analysed on the HPC systems SuperMUC at LRZ and JUQUEEN at Jülich Supercomputing Centre (JSC) to improve the parallel scalability of the application. The diagnostic tool Scalasca has been used to filter out suboptimal routines. The code uses the electromagnetic gyrofluid model which is a superset of magnetohydrodynamic and drift-Alfvén microturbulance and also includes several relevant kinetic processes. GEM can be used with different geometries depending on the targeted use case, and has been proven to show good scalability when the computational domain is distributed amongst two dimensions. Such a distribution allows grids with sufficient size to describe small scale tokamak devices. In order to enable simulation of very large tokamaks (such as the next generation nuclear fusion device ITER in Cadarache, France) the third dimension has been parallelised and weak scaling has been achieved for significantly larger grids.

preprint2013arXiv

Extreme Scaling of Lattice Quantum Chromodynamics

As the complexity and size of challenges in science and engineering are continually increasing, it is highly important that applications are able to scale strongly to very large numbers of cores (>100,000 cores) to enable HPC systems to be utilised efficiently. This paper presents results of strong scaling tests performed with an MPI only and a hybrid MPI + OpenMP version of the Lattice QCD application BQCD on the European Tier-0 system SuperMUC at LRZ.

preprint2013arXiv

First experiences with the Intel MIC architecture at LRZ

With the rapidly growing demand for computing power new accelerator based architectures have entered the world of high performance computing since around 5 years. In particular GPGPUs have recently become very popular, however programming GPGPUs using programming languages like CUDA or OpenCL is cumbersome and error-prone. Trying to overcome these difficulties, Intel developed their own Many Integrated Core (MIC) architecture which can be programmed using standard parallel programming techniques like OpenMP and MPI. In the beginning of 2013, the first production-level cards named Intel Xeon Phi came on the market. LRZ has been considered by Intel as a leading research centre for evaluating coprocessors based on the MIC architecture since 2010 under strict NDA. Since the Intel Xeon Phi is now generally available, we can share our experience on programming Intel's new MIC architecture.

preprint2012arXiv

Data-parallel programming with Intel Array Building Blocks (ArBB)

Intel Array Building Blocks is a high-level data-parallel programming environment designed to produce scalable and portable results on existing and upcoming multi- and many-core platforms. We have chosen several mathematical kernels - a dense matrix-matrix multiplication, a sparse matrix-vector multiplication, a 1-D complex FFT and a conjugate gradients solver - as synthetic benchmarks and representatives of scientific codes and ported them to ArBB. This whitepaper describes the ArBB ports and presents performance and scaling measurements on the Westmere-EX based system SuperMIG at LRZ in comparison with OpenMP and MKL.

preprint2010arXiv

Momentum dependence of the topological susceptibility with overlap fermions

Knowledge of the derivative of the topological susceptibility at zero momentum is important for assessing the validity of the Witten-Veneziano formula for the eta' mass, and likewise for the resolution of the EMC proton spin problem. We investigate the momentum dependence of the topological susceptibility and its derivative at zero momentum using overlap fermions in quenched lattice QCD simulations. We expose the role of the low-lying Dirac eigenmodes for the topological charge density, and find a negative value for the derivative. While the sign of the derivative is consistent with the QCD sum rule for pure Yang-Mills theory, the absolute value is overestimated if the contribution from higher eigenmodes is ignored.

preprint2010arXiv

OMI4papps: Optimisation, Modelling and Implementation for Highly Parallel Applications

This article reports on first results of the KONWIHR-II project OMI4papps at the Leibniz Supercomputing Centre (LRZ). The first part describes Apex-MAP, a tunable synthetic benchmark designed to simulate the performance of typical scientific applications. Apex-MAP mimics common memory access patterns and different computational intensity of scientific codes. An approach for modelling LRZ's application mix is given whichh makes use of performance counter measurements of real applications running on "HLRB II", an SGI Altix system based on 9728 Intel Montecito dual-cores. The second part will show how the Apex-MAP benchmark could be used to simulate the performance of two mathematical kernels frequently used in scientific applications: a dense matrix-matrix multiplication and a sparse matrix-vector multiplication. The performance of both kernels has been intensively studied on x86 cores and hardware accelerators. We will compare the predicted performance with measured data to validate our Apex-MAP approach.

preprint2010arXiv

RapidMind: Portability across Architectures and its Limitations

Recently, hybrid architectures using accelerators like GPGPUs or the Cell processor have gained much interest in the HPC community. The RapidMind Multi-Core Development Platform is a programming environment that allows generating code which is able to seamlessly run on hardware accelerators like GPUs or the Cell processor and multicore CPUs both from AMD and Intel. This paper describes the ports of three mathematical kernels to RapidMind which are chosen as synthetic benchmarks and representatives of scientific codes. Performance of these kernels has been measured on various RapidMind backends (cuda, cell and x86) and compared to other hardware-specific implementations (using CUDA, Cell SDK and Intel MKL). The results give an insight in the degree of portability of RapidMind code and code performance across different architectures.

Volker Weinberg

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Optimising PICCANTE - an Open Source Particle-in-Cell Code for Advanced Simulations on Tier-0 Systems

The Mont-Blanc Project: First Phase Successfully Finished

Scalability of the plasma physics code GEM

Extreme Scaling of Lattice Quantum Chromodynamics

First experiences with the Intel MIC architecture at LRZ

Data-parallel programming with Intel Array Building Blocks (ArBB)

Momentum dependence of the topological susceptibility with overlap fermions

OMI4papps: Optimisation, Modelling and Implementation for Highly Parallel Applications

RapidMind: Portability across Architectures and its Limitations