Researcher profile

Ulrich Ruede

Ulrich Ruede contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2024arXiv

Development of a central-moment phase-field lattice Boltzmann model for thermocapillary flows: Droplet capture and computational performance

This study develops a computationally efficient phase-field lattice Boltzmann model with the capability to simulate thermocapillary flows. The model was implemented into the open-source simulation framework, waLBerla, and extended to conduct the collision stage using central moments. The multiphase model was coupled with both a passive-scalar thermal LB, and a RK solution to the energy equation in order to resolve temperature-dependent surface tension phenomena. Various lattice stencils (D3Q7, D3Q15, D3Q19, D3Q27) were tested for the passive-scalar LB and both the second- and fourth-order RK methods were investigated. There was no significant difference observed in the accuracy of the LB or RK schemes. The passive scalar D3Q7 LB discretisation tended to provide computational benefits, while the second order RK scheme is superior in memory usage. This paper makes contributions relating to the modelling of thermocapillary flows and to understanding the behaviour of droplet capture with thermal sources analogous to thermal tweezers. Four primary contributions to the literature are identified. First, a new 3D thermocapillary, central-moment phase-field LB model is presented and implemented in the open-source software, waLBerla. Second, the accuracy and computational performance of various techniques to resolve the energy equation for multiphase, incompressible fluids is investigated. Third, the dynamic droplet transport behaviour in the presence of thermal sources is studied and insight is provided on the potential ability to manipulate droplets based on local domain heating. Finally, a concise analysis of the computational performance together with near-perfect scaling results on NVIDIA and AMD GPU-clusters is shown. This research enables the detailed study of droplet manipulation and control in thermocapillary devices.

preprint2020arXiv

Parallel solution of saddle point systems with nested iterative solvers based on the Golub-Kahan Bidiagonalization

We present a scalability study of Golub-Kahan bidiagonalization for the parallel iterative solution of symmetric indefinite linear systems with a 2x2 block structure. The algorithms have been implemented within the parallel numerical library PETSc. Since a nested inner-outer iteration strategy may be necessary, we investigate different choices for the inner solvers, including parallel sparse direct and multigrid accelerated iterative methods. We show the strong and weak scalability of the Golub-Kahan bidiagonalization based iterative method when applied to a two-dimensional Poiseuille flow and to two- and three-dimensional Stokes test problems.

preprint2014arXiv

A Scala Prototype to Generate Multigrid Solver Implementations for Different Problems and Target Multi-Core Platforms

Many problems in computational science and engineering involve partial differential equations and thus require the numerical solution of large, sparse (non)linear systems of equations. Multigrid is known to be one of the most efficient methods for this purpose. However, the concrete multigrid algorithm and its implementation highly depend on the underlying problem and hardware. Therefore, changes in the code or many different variants are necessary to cover all relevant cases. In this article we provide a prototype implementation in Scala for a framework that allows abstract descriptions of PDEs, their discretization, and their numerical solution via multigrid algorithms. From these, one is able to generate data structures and implementations of multigrid components required to solve elliptic PDEs on structured grids. Two different test problems showcase our proposed automatic generation of multigrid solvers for both CPU and GPU target platforms.

preprint2013arXiv

Model-guided Performance Analysis of the Sparse Matrix-Matrix Multiplication

Achieving high efficiency with numerical kernels for sparse matrices is of utmost importance, since they are part of many simulation codes and tend to use most of the available compute time and resources. In addition, especially in large scale simulation frameworks the readability and ease of use of mathematical expressions are essential components for the continuous maintenance, modification, and extension of software. In this context, the sparse matrix-matrix multiplication is of special interest. In this paper we thoroughly analyze the single-core performance of sparse matrix-matrix multiplication kernels in the Blaze Smart Expression Template (SET) framework. We develop simple models for estimating the achievable maximum performance, and use them to assess the efficiency of our implementations. Additionally, we compare these kernels with several commonly used SET-based C++ libraries, which, just as Blaze, aim at combining the requirements of high performance with an elegant user interface. For the different sparse matrix structures considered here, we show that our implementations are competitive or faster than those of the other SET libraries for most problem sizes on a current Intel multicore processor.

preprint2011arXiv

Expression Templates Revisited: A Performance Analysis of the Current ET Methodology

In the last decade, Expression Templates (ET) have gained a reputation as an efficient performance optimization tool for C++ codes. This reputation builds on several ET-based linear algebra frameworks focused on combining both elegant and high-performance C++ code. However, on closer examination the assumption that ETs are a performance optimization technique cannot be maintained. In this paper we demonstrate and explain the inability of current ET-based frameworks to deliver high performance for dense and sparse linear algebra operations, and introduce a new "smart" ET implementation that truly allows the combination of high performance code with the elegance and maintainability of a domain-specific language.

preprint2010arXiv

A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in detail and it is demonstrated that the kernel performance can be sustained to a large extent. With our GPU implementation, we achieve nearly perfect weak scalability on InfiniBand clusters. However, in strong scaling scenarios multi-GPUs make less efficient use of the hardware than IBM BG/P and x86 clusters. Hence, a cost analysis must determine the best course of action for a particular simulation task. Additionally, weak scaling results of heterogeneous simulations conducted on CPUs and GPUs simultaneously are presented using clusters equipped with varying node configurations.