Topic overview

Mathematical Software

206 works671 researchers0 institutions

Topic snapshot

What this area looks like now

206works
671authors
0experts visible
0communities

Next steps

Move from topic reading into action

The graph preview below keeps the nearby papers, people and communities visible in the same reading flow.

Topic graph

See the topic as a live network

Open full explorer

Inspect nearby papers, researchers, institutions and communities without opening a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Papers in this area

24 featured work(s)

preprint2020arXiv

Scalable parallel algorithm for solving non-stationary systems of linear inequalities

In this paper, a scalable iterative projection-type algorithm for solving non-stationary systems of linear inequalities is considered. A non-stationary system is understood as a large-scale system of inequalities in which coefficients and constant terms can change during the calculation process. The proposed parallel algorithm uses the concept of pseudo-projection which generalizes the notion of orthogonal projection. The parallel pseudo-projection algorithm is implemented using the parallel BSF-skeleton. An analytical estimation of the algorithm scalability boundary is obtained on the base of the BSF cost metric. The large-scale computational experiments were performed on a cluster computing system. The obtained results confirm the efficiency of the proposed approach.

preprint2020arXiv

GPU-accelerating ImageJ Macro image processing workflows using CLIJ

This chapter introduces GPU-accelerated image processing in ImageJ/FIJI. The reader is expected to have some pre-existing knowledge of ImageJ Macro programming. Core concepts such as variables, for-loops, and functions are essential. The chapter provides basic guidelines for improved performance in typical image processing workflows. We present in a step-by-step tutorial how to translate a pre-existing ImageJ macro into a GPU-accelerated macro.

preprint2020arXiv

Using performance analysis tools for parallel-in-time integrators -- Does my time-parallel code do what I think it does?

While many ideas and proofs of concept for parallel-in-time integration methods exists, the number of large-scale, accessible time-parallel codes is rather small. This is often due to the apparent or subtle complexity of the algorithms and the many pitfalls awaiting developers of parallel numerical software. One example of such a time-parallel code is pySDC, which implements, among others, the parallel full approximation scheme in space and time (PFASST). Inspired by nonlinear multigrid ideas, PFASST allows to integrate multiple time-steps simultaneously using a space-time hierarchy of spectral deferred corrections. In this paper we demonstrate the application of performance analysis tools to the PFASST implementation pySDC. Tracing the path we took for this work, we highlight the obstacles encountered, describe remedies and explain the sometimes surprising findings made possible by the tools. Although focusing only on a single implementation of a particular parallel-in-time integrator, we hope that our results and in particular the way we obtained them are a blueprint for other time-parallel codes.

preprint2020arXiv

hIPPYlib: An Extensible Software Framework for Large-Scale Inverse Problems Governed by PDEs; Part I: Deterministic Inversion and Linearized Bayesian Inference

We present an extensible software framework, hIPPYlib, for solution of large-scale deterministic and Bayesian inverse problems governed by partial differential equations (PDEs) with infinite-dimensional parameter fields (which are high-dimensional after discretization). hIPPYlib overcomes the prohibitive nature of Bayesian inversion for this class of problems by implementing state-of-the-art scalable algorithms for PDE-based inverse problems that exploit the structure of the underlying operators, notably the Hessian of the log-posterior. The key property of the algorithms implemented in hIPPYlib is that the solution of the deterministic and linearized Bayesian inverse problem is computed at a cost, measured in linearized forward PDE solves, that is independent of the parameter dimension. The mean of the posterior is approximated by the MAP point, which is found by minimizing the negative log-posterior. This deterministic nonlinear least-squares optimization problem is solved with an inexact matrix-free Newton-CG method. The posterior covariance is approximated by the inverse of the Hessian of the negative log posterior evaluated at the MAP point. This Gaussian approximation is exac

preprint2020arXiv

A Survey of Singular Value Decomposition Methods for Distributed Tall/Skinny Data

The Singular Value Decomposition (SVD) is one of the most important matrix factorizations, enjoying a wide variety of applications across numerous application domains. In statistics and data analysis, the common applications of SVD such as Principal Components Analysis (PCA) and linear regression. Usually these applications arise on data that has far more rows than columns, so-called "tall/skinny" matrices. In the big data analytics context, this may take the form of hundreds of millions to billions of rows with only a few hundred columns. There is a need, therefore, for fast, accurate, and scalable tall/skinny SVD implementations which can fully utilize modern computing resources. To that end, we present a survey of three different algorithms for computing the SVD for these kinds of tall/skinny data layouts using MPI for communication. We contextualize these with common big data analytics techniques, principally PCA. Finally, we present both CPU and GPU timing results from the Summit supercomputer, and discuss possible alternative approaches.

preprint2020arXiv

Introduction to Medical Image Registration with DeepReg, Between Old and New

This document outlines a tutorial to get started with medical image registration using the open-source package DeepReg. The basic concepts of medical image registration are discussed, linking classical methods to newer methods using deep learning. Two iterative, classical algorithms using optimisation and one learning-based algorithm using deep learning are coded step-by-step using DeepReg utilities, all with real, open-accessible, medical data.

preprint2020arXiv

On Computing the Kronecker Structure of Polynomial and Rational Matrices using Julia

In this paper we discuss the mathematical background and the computational aspects which underly the implementation of a collection of Julia functions in the MatrixPencils package for the determination of structural properties of polynomial and rational matrices. We primarily focus on the computation of the finite and infinite spectral structures (e.g., eigenvalues, zeros, poles) as well as the left and right singular structures (e.g., Kronecker indices), which play a fundamental role in the structure of the solution of many problems involving polynomial and rational matrices. The basic analysis tool is the determination of the Kronecker structure of linear matrix pencils using numerically reliable algorithms, which is used in conjunction with several linearization techniques of polynomial and rational matrices. Examples of polynomial and rational matrices, which exhibit all relevant structural features, are considered to illustrate the main mathematical concepts and the capabilities of implemented tools.

preprint2020arXiv

Performance Analysis of FEM Solvers on Practical Electromagnetic Problems

The paper presents a comparative analysis of different commercial and academic software. The comparison aims to examine how the integrated adaptive grid refinement methodologies can deal with challenging, electromagnetic-field related problems. For this comparison, two benchmark problems were examined in the paper. The first example is a solution of an L-shape domain like test problem, which has a singularity at a certain point in the geometry. The second problem is an induction heated aluminum rod, which accurate solution needs to solve a non-linear, coupled physical fields. The accurate solution of this problem requires applying adaptive mesh generation strategies or applying a very fine mesh in the electromagnetic domain, which can significantly increase the computational complexity. The results show that the fully-hp adaptive meshing strategies, which are integrated into Agros-suite, can significantly reduce the task's computational complexity compared to the automatic h-adaptivity, which is part of the examined, popular commercial solvers.

preprint2020arXiv

COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting

We introduce COCO, an open source platform for Comparing Continuous Optimizers in a black-box setting. COCO aims at automatizing the tedious and repetitive task of benchmarking numerical optimization algorithms to the greatest possible extent. The platform and the underlying methodology allow to benchmark in the same framework deterministic and stochastic solvers for both single and multiobjective optimization. We present the rationales behind the (decade-long) development of the platform as a general proposition for guidelines towards better benchmarking. We detail underlying fundamental concepts of COCO such as the definition of a problem as a function instance, the underlying idea of instances, the use of target values, and runtime defined by the number of function calls as the central performance measure. Finally, we give a quick overview of the basic code structure and the currently available test suites.

preprint2015arXiv

FIESTA 4: optimized Feynman integral calculations with GPU support

This paper presents a new major release of the program FIESTA (Feynman Integral Evaluation by a Sector decomposiTion Approach). The new release is mainly aimed at optimal performance at large scales when one is increasing the number of sampling points in order to reduce the uncertainty estimates. The release now supports graphical processor units (GPU) for the numerical integration, methods to optimize cluster-usage, as well as other speed, memory, and stability improvements.

preprint2020arXiv

The deal.II finite element library: design, features, and insights

deal.II is a state-of-the-art finite element library focused on generality, dimension-independent programming, parallelism, and extensibility. Herein, we outline its primary design considerations and its sophisticated features such as distributed meshes, $hp$-adaptivity, support for complex geometries, and matrix-free algorithms. But deal.II is more than just a software library: It is also a diverse and worldwide community of developers and users, as well as an educational platform. We therefore also discuss some of the technical and social challenges and lessons learned in running a large community software project over the course of two decades.

preprint2020arXiv

Automatic Differentiation in ROOT

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program. AD exploits the fact that every computer program, no matter how complicated, executes a sequence of elementary arithmetic operations (addition, subtraction, multiplication, division, etc.), elementary functions (exp, log, sin, cos, etc.) and control flow statements. AD takes source code of a function as input and produces source code of the derived function. By applying the chain rule repeatedly to these operations, derivatives of arbitrary order can be computed automatically, accurately to working precision, and using at most a small constant factor more arithmetic operations than the original program. This paper presents AD techniques available in ROOT, supported by Cling, to produce derivatives of arbitrary C/C++ functions through implementing source code transformation and employing the chain rule of differential calculus in both forward mode and reverse mode. We explain its current integration for gradient computation in TFormula. We demonstrate the correctness and performance improvements in ROOT's fitting algorithms.

preprint2020arXiv

COMPLEX-IT: A Case-Based Modeling and Scenario Simulation Platform for Social Inquiry

COMPLEX-IT is a case-based, mixed-methods platform for social inquiry into complex data/systems, designed to increase non-expert access to the tools of computational social science (i.e., cluster analysis, artificial intelligence, data visualization, data forecasting, and scenario simulation). In particular, COMPLEX-IT aids social inquiry though a heavy emphasis on learning about the complex data/system under study, which it does by (a) identifying and forecasting major and minor clusters/trends; (b) visualizing their complex causality; and (c) simulating scenarios for potential interventions. COMPLEX-IT is accessible through the web or can be run locally and is powered by R and the Shiny web framework.

preprint2020arXiv

Batched computation of the singular value decompositions of order two by the AVX-512 vectorization

In this paper a vectorized algorithm for simultaneously computing up to eight singular value decompositions (SVDs, each of the form $A=UΣV^{\ast}$) of real or complex matrices of order two is proposed. The algorithm extends to a batch of matrices of an arbitrary length $n$, that arises, for example, in the annihilation part of the parallel Kogbetliantz algorithm for the SVD of a square matrix of order $2n$. The SVD algorithm for a single matrix of order two is derived first. It scales, in most instances error-free, the input matrix $A$ such that its singular values $Σ_{ii}$ cannot overflow whenever its elements are finite, and then computes the URV factorization of the scaled matrix, followed by the SVD of a non-negative upper-triangular middle factor. A vector-friendly data layout for the batch is then introduced, where the same-indexed elements of each of the input and the output matrices form vectors, and the algorithm's steps over such vectors are described. The vectorized approach is then shown to be about three times faster than processing each matrix in isolation, while slightly improving accuracy over the straightforward method for the $2\times 2$ SVD.

preprint2020arXiv

Awkward Arrays in Python, C++, and Numba

The Awkward Array library has been an important tool for physics analysis in Python since September 2018. However, some interface and implementation issues have been raised in Awkward Array's first year that argue for a reimplementation in C++ and Numba. We describe those issues, the new architecture, and present some examples of how the new interface will look to users. Of particular importance is the separation of kernel functions from data structure management, which allows a C++ implementation and a Numba implementation to share kernel functions, and the algorithm that transforms record-oriented data into columnar Awkward Arrays.

preprint2020arXiv

An Adaptive Solver for Systems of Linear Equations

Computational implementations for solving systems of linear equations often rely on a one-size-fits-all approach based on LU decomposition of dense matrices stored in column-major format. Such solvers are typically implemented with the aid of the xGESV set of functions available in the low-level LAPACK software, with the aim of reducing development time by taking advantage of well-tested routines. However, this straightforward approach does not take into account various matrix properties which can be exploited to reduce the computational effort and/or to increase numerical stability. Furthermore, direct use of LAPACK functions can be error-prone for non-expert users and results in source code that has little resemblance to originating mathematical expressions. We describe an adaptive solver that we have implemented inside recent versions of the high-level Armadillo C++ library for linear algebra. The solver automatically detects several common properties of a given system (banded, triangular, symmetric positive definite), followed by solving the system via mapping to a set of suitable LAPACK functions best matched to each property. The solver also detects poorly conditioned systems and automatically seeks a solution via singular value decomposition as a fallback. We show that the adaptive solver leads to notable speedups, while also freeing the user from using direct calls to cumbersome LAPACK functions.

preprint2020arXiv

Implicit Hari--Zimmermann algorithm for the generalized SVD on the GPUs

A parallel, blocked, one-sided Hari--Zimmermann algorithm for the generalized singular value decomposition (GSVD) of a real or a complex matrix pair $(F,G)$ is here proposed, where $F$ and $G$ have the same number of columns, and are both of the full column rank. The algorithm targets either a single graphics processing unit (GPU), or a cluster of those, performs all non-trivial computation exclusively on the GPUs, requires the minimal amount of memory to be reasonably expected, scales acceptably with the increase of the number of GPUs available, and guarantees the reproducible, bitwise identical output of the runs repeated over the same input and with the same number of GPUs.

preprint2020arXiv

Accelerating linear solvers for Stokes problems with C++ metaprogramming

The efficient solution of large sparse saddle point systems is very important in computational fluid mechanics. The discontinuous Galerkin finite element methods have become increasingly popular for incompressible flow problems but their application is limited due to high computational cost. We describe the C++ programming techniques that may help to accelerate linear solvers for such problems. The approach is based on the policy-based design pattern and partial template specialization, and is implemented in the open source AMGCL library. The efficiency is demonstrated with the example of accelerating an iterative solver of a discontinuous Galerkin finite element method for the Stokes problem. The implementation allows selecting algorithmic components of the solver by adjusting template parameters without any changes to the codebase. It is possible to switch the system matrix to use small statically sized blocks to store the nonzero values, or use a mixed precision solution, which results in up to 4 times speedup, and reduces the memory footprint of the algorithm by about 40\%. We evaluate both monolithic and composite preconditioning strategies for the 3 benchmark problems. The performance of the proposed solution is compared with a multithreaded direct Pardiso solver and a parallel iterative PETSc solver.

preprint2020arXiv

Elmer FEM-Dakota: A unified open-source computational framework for electromagnetics and data analytics

Open-source electromagnetic design software, Elmer FEM, was interfaced with data analytics toolkit, Dakota. Furthermore, the coupled software was validated against a benchmark test. The interface developed provides a unified open-source computational framework for electromagnetics and data analytics. Its key features include uncertainty quantification, surrogate modelling and parameter studies. This framework enables a richer understanding of model predictions to better design electric machines in a time sensitive manner.

preprint2021arXiv

A Julia implementation of Algorithm NCL for constrained optimization

Algorithm NCL is designed for general smooth optimization problems where first and second derivatives are available, including problems whose constraints may not be linearly independent at a solution (i.e., do not satisfy the LICQ). It is equivalent to the LANCELOT augmented Lagrangian method, reformulated as a short sequence of nonlinearly constrained subproblems that can be solved efficiently by IPOPT and KNITRO, with warm starts on each subproblem. We give numerical results from a Julia implementation of Algorithm NCL on tax policy models that do not satisfy the LICQ, and on nonlinear least-squares problems and general problems from the CUTEst test set.

preprint2021arXiv

UFL Dual Spaces, a proposal

This white paper highlights current limitations in the algebraic closure Unified Form Language (UFL). UFL currently represents forms over finite element spaces, however finite element problems naturally result in objects in the dual to a finite element space, and operators mapping between primal and dual finite element spaces. This document sketches the relevant mathematical areas and proposes changes to the UFL language to support dual spaces as first class types in UFL.

preprint2021arXiv

Robust level-3 BLAS Inverse Iteration from the Hessenberg Matrix

Inverse iteration is known to be an effective method for computing eigenvectors corresponding to simple and well-separated eigenvalues. In the non-symmetric case, the solution of shifted Hessenberg systems is a central step. Existing inverse iteration solvers approach the solution of the shifted Hessenberg systems with either RQ or LU factorizations and, once factored, solve the corresponding systems. This approach has limited level-3 BLAS potential since distinct shifts have distinct factorizations. This paper rearranges the RQ approach such that data shared between distinct shifts is exposed. Thereby the backward substitution with the triangular R factor can be expressed mostly with matrix-matrix multiplications (level-3 BLAS). The resulting algorithm computes eigenvectors in a tiled, overflow-free, and task-parallel fashion. The numerical experiments show that the new algorithm outperforms existing inverse iteration solvers for the computation of both real and complex eigenvectors.

preprint2021arXiv

A bi-directional extensible interface between Lean and Mathematica

We implement a user-extensible ad hoc connection between the Lean proof assistant and the computer algebra system Mathematica. By reflecting the syntax of each system in the other and providing a flexible interface for extending translation, our connection allows for the exchange of arbitrary information between the two systems. We show how to make use of the Lean metaprogramming framework to verify certain Mathematica computations, so that the rigor of the proof assistant is not compromised. We also use Mathematica as an untrusted oracle to guide proof search in the proof assistant and interact with a Mathematica notebook from within a Lean session. In the other direction, we import and process Lean declarations from within Mathematica. The proof assistant library serves as a database of mathematical knowledge that the CAS can display and explore.

People in this topic

12 visible researcher(s)