Researcher profile

Volker Weinberg

Volker Weinberg contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2013arXiv

Extreme Scaling of Lattice Quantum Chromodynamics

As the complexity and size of challenges in science and engineering are continually increasing, it is highly important that applications are able to scale strongly to very large numbers of cores (>100,000 cores) to enable HPC systems to be utilised efficiently. This paper presents results of strong scaling tests performed with an MPI only and a hybrid MPI + OpenMP version of the Lattice QCD application BQCD on the European Tier-0 system SuperMUC at LRZ.

preprint2013arXiv

First experiences with the Intel MIC architecture at LRZ

With the rapidly growing demand for computing power new accelerator based architectures have entered the world of high performance computing since around 5 years. In particular GPGPUs have recently become very popular, however programming GPGPUs using programming languages like CUDA or OpenCL is cumbersome and error-prone. Trying to overcome these difficulties, Intel developed their own Many Integrated Core (MIC) architecture which can be programmed using standard parallel programming techniques like OpenMP and MPI. In the beginning of 2013, the first production-level cards named Intel Xeon Phi came on the market. LRZ has been considered by Intel as a leading research centre for evaluating coprocessors based on the MIC architecture since 2010 under strict NDA. Since the Intel Xeon Phi is now generally available, we can share our experience on programming Intel's new MIC architecture.

preprint2012arXiv

Data-parallel programming with Intel Array Building Blocks (ArBB)

Intel Array Building Blocks is a high-level data-parallel programming environment designed to produce scalable and portable results on existing and upcoming multi- and many-core platforms. We have chosen several mathematical kernels - a dense matrix-matrix multiplication, a sparse matrix-vector multiplication, a 1-D complex FFT and a conjugate gradients solver - as synthetic benchmarks and representatives of scientific codes and ported them to ArBB. This whitepaper describes the ArBB ports and presents performance and scaling measurements on the Westmere-EX based system SuperMIG at LRZ in comparison with OpenMP and MKL.

preprint2010arXiv

OMI4papps: Optimisation, Modelling and Implementation for Highly Parallel Applications

This article reports on first results of the KONWIHR-II project OMI4papps at the Leibniz Supercomputing Centre (LRZ). The first part describes Apex-MAP, a tunable synthetic benchmark designed to simulate the performance of typical scientific applications. Apex-MAP mimics common memory access patterns and different computational intensity of scientific codes. An approach for modelling LRZ's application mix is given whichh makes use of performance counter measurements of real applications running on "HLRB II", an SGI Altix system based on 9728 Intel Montecito dual-cores. The second part will show how the Apex-MAP benchmark could be used to simulate the performance of two mathematical kernels frequently used in scientific applications: a dense matrix-matrix multiplication and a sparse matrix-vector multiplication. The performance of both kernels has been intensively studied on x86 cores and hardware accelerators. We will compare the predicted performance with measured data to validate our Apex-MAP approach.

preprint2010arXiv

RapidMind: Portability across Architectures and its Limitations

Recently, hybrid architectures using accelerators like GPGPUs or the Cell processor have gained much interest in the HPC community. The RapidMind Multi-Core Development Platform is a programming environment that allows generating code which is able to seamlessly run on hardware accelerators like GPUs or the Cell processor and multicore CPUs both from AMD and Intel. This paper describes the ports of three mathematical kernels to RapidMind which are chosen as synthetic benchmarks and representatives of scientific codes. Performance of these kernels has been measured on various RapidMind backends (cuda, cell and x86) and compared to other hardware-specific implementations (using CUDA, Cell SDK and Intel MKL). The results give an insight in the degree of portability of RapidMind code and code performance across different architectures.