Source author record

Eric Darve

Eric Darve appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA Numerical Analysis Machine Learning physics.comp-ph Computation and Language Computational Engineering, Finance, and Science Distributed, Parallel, and Cluster Computing Mathematical Software cond-mat.mtrl-sci Information Retrieval math.DS physics.flu-dyn physics.geo-ph Quantitative Methods

Catalog footprint

What is connected

33works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Necessary and Sufficient Conditions for the Existence of an LU Factorization for General Rank Deficient Matrices

We establish necessary and sufficient conditions for the existence of an LU factorization $A=LU$ for an arbitrary square matrix $A$, including singular and rank-deficient cases, without the use of row or column permutations. We prove that such a factorization exists if and only if the nullity of every leading principal submatrix is bounded by the sum of the nullities of the corresponding leading column and row blocks. While building upon the work of Okunev and Johnson, we present simpler, constructive proofs. Furthermore, we extend these results to characterize rank-revealing factorizations, providing explicit sparsity bounds for the factors $L$ and $U$. Finally, we derive analogous necessary and sufficient conditions for the existence of factorizations constrained to have unit lower or unit upper triangular factors.

preprint2026arXiv

SpectraQuery: A Hybrid Retrieval-Augmented Conversational Assistant for Battery Science

Scientific reasoning increasingly requires linking structured experimental data with the unstructured literature that explains it, yet most large language model (LLM) assistants cannot reason jointly across these modalities. We introduce SpectraQuery, a hybrid natural-language query framework that integrates a relational Raman spectroscopy database with a vector-indexed scientific literature corpus using a Structured and Unstructured Query Language (SUQL)-inspired design. By combining semantic parsing with retrieval-augmented generation, SpectraQuery translates open-ended questions into coordinated SQL and literature retrieval operations, producing cited answers that unify numerical evidence with mechanistic explanation. Across SQL correctness, answer groundedness, retrieval effectiveness, and expert evaluation, SpectraQuery demonstrates strong performance: approximately 80 percent of generated SQL queries are fully correct, synthesized answers reach 93-97 percent groundedness with 10-15 retrieved passages, and battery scientists rate responses highly across accuracy, relevance, grounding, and clarity (4.1-4.6/5). These results show that hybrid retrieval architectures can meaningfully support scientific workflows by bridging data and discourse for high-volume experimental datasets.

preprint2021arXiv

Hierarchical Orthogonal Factorization: Sparse Least Squares Problems

In this work, we develop a fast hierarchical solver for solving large, sparse least squares problems. We build upon the algorithm, spaQR (sparsified QR), that was developed by the authors to solve large sparse linear systems. Our algorithm is built on top of a Nested Dissection based multifrontal QR approach. We use low-rank approximations on the frontal matrices to sparsify the vertex separators at every level in the elimination tree. Using a two-step sparsification scheme, we reduce the number of columns and maintain the ratio of rows to columns in each front without introducing any additional fill-in. With this improvised scheme, we show that the runtime of the algorithm scales as $\mathcal{O}(M \log N)$ and uses $\mathcal{O}(M)$ memory to store the factorization. This is achieved at the expense of a small and controllable approximation error. The end result is an approximate factorization of the matrix stored as a sequence of sparse orthogonal and upper-triangular factors and hence easy to apply/solve with a vector. Finally, we compare the performance of the spaQR algorithm in solving sparse least squares problems with direct multifrontal QR and CGLS iterative method with a standard diagonal preconditioner.

preprint2021arXiv

Towards a Scalable Hierarchical High-order CFD Solver

Development of highly scalable and robust algorithms for large-scale CFD simulations has been identified as one of the key ingredients to achieve NASA's CFD Vision 2030 goals. In order to improve simulation capability and to effectively leverage new high-performance computing hardware, the most computationally intensive parts of CFD solution algorithms -- namely, linear solvers and preconditioners -- need to achieve asymptotic behavior on massively parallel and heterogeneous architectures and preserve convergence rates as the meshes are refined further. In this work, we present a scalable high-order implicit Discontinuous Galerkin solver from the SU2 framework using a promising preconditioning technique based on algebraic sparsified nested dissection algorithm with low-rank approximations, and communication-avoiding Krylov subspace methods to enable scalability with very large processor counts. The overall approach is tested on a canonical 2D NACA0012 test case of increasing size to demonstrate its scalability on multiple processing cores. Both the preconditioner and the linear solver are shown to exhibit near-linear weak scaling up to 2,048 cores with no significant degradation of the convergence rate.

preprint2020arXiv

An Algebraic Sparsified Nested Dissection Algorithm Using Low-Rank Approximations

We propose a new algorithm for the fast solution of large, sparse, symmetric positive-definite linear systems, spaND -- sparsified Nested Dissection. It is based on nested dissection, sparsification and low-rank compression. After eliminating all interiors at a given level of the elimination tree, the algorithm sparsifies all separators corresponding to the interiors. This operation reduces the size of the separators by eliminating some degrees of freedom but without introducing any fill-in. This is done at the expense of a small and controllable approximation error. The result is an approximate factorization that can be used as an efficient preconditioner. We then perform several numerical experiments to evaluate this algorithm. We demonstrate that a version using orthogonal factorization and block-diagonal scaling takes fewer CG iterations to converge than previous similar algorithms on various kinds of problems. Furthermore, this algorithm is provably guaranteed to never break down and the matrix stays symmetric positive-definite throughout the process. We evaluate the algorithm on some large problems and show it exhibits near-linear scaling. The factorization time is roughly O(N) and the number of iterations grows slowly with N.

preprint2020arXiv

Anomaly Detection with Domain Adaptation

We study the problem of semi-supervised anomaly detection with domain adaptation. Given a set of normal data from a source domain and a limited amount of normal examples from a target domain, the goal is to have a well-performing anomaly detector in the target domain. We propose the Invariant Representation Anomaly Detection (IRAD) to solve this problem where we first learn to extract a domain-invariant representation. The extraction is achieved by an across-domain encoder trained together with source-specific encoders and generators by adversarial learning. An anomaly detector is then trained using the learnt representations. We evaluate IRAD extensively on digits images datasets (MNIST, USPS and SVHN) and object recognition datasets (Office-Home). Experimental results show that IRAD outperforms baseline models by a wide margin across different datasets. We derive a theoretical lower bound for the joint error that explains the performance decay from overtraining and also an upper bound for the generalization error.

preprint2020arXiv

Coupled Time-lapse Full Waveform Inversion for Subsurface Flow Problems using Intrusive Automatic Differentiation

We describe a novel framework for estimating subsurface properties, such as rock permeability and porosity, from time-lapse observed seismic data by coupling full-waveform inversion, subsurface flow processes, and rock physics models. For the inverse modeling, we handle the back-propagation of gradients by an intrusive automatic differentiation strategy that offers three levels of user control: (1) at the wave physics level, we adopted the discrete adjoint method in order to use our existing high-performance FWI code; (2) at the rock physics level, we used built-in operators from the $\texttt{TensorFlow}$ backend; (3) at the flow physics level, we implemented customized PDE operators for the potential and nonlinear saturation equations. These three levels of gradient computation strike a good balance between computational efficiency and programming efficiency, and when chained together, constitute a coupled inverse system. We use numerical experiments to demonstrate that (1) the three-level coupled inverse problem is superior in terms of accuracy to a traditional decoupled inversion strategy; (2) it is able to simultaneously invert for parameters in empirical relationships such as the rock physics models; and (3) the inverted model can be used for reservoir performance prediction and reservoir management/optimization purposes.

preprint2020arXiv

Inverse Modeling of Viscoelasticity Materials using Physics Constrained Learning

We propose a novel approach to model viscoelasticity materials using neural networks, which capture rate-dependent and nonlinear constitutive relations. However, inputs and outputs of the neural networks are not directly observable, and therefore common training techniques with input-output pairs for the neural networks are inapplicable. To that end, we develop a novel computational approach to both calibrate parametric and learn neural-network-based constitutive relations of viscoelasticity materials from indirect displacement data in the context of multi-physics interactions. We show that limited displacement data hold sufficient information to quantify the viscoelasticity behavior. We formulate the inverse computation---modeling viscoelasticity properties from observed displacement data---as a PDE-constrained optimization problem and minimize the error functional using a gradient-based optimization method. The gradients are computed by a combination of automatic differentiation and physics constrained learning. The effectiveness of our method is demonstrated through numerous benchmark problems in geomechanics and porous media transport.

preprint2020arXiv

Isogeometric Collocation Method for the Fractional Laplacian in the 2D Bounded Domain

We consider the isogeometric analysis for fractional PDEs involving the fractional Laplacian in two dimensions. An isogeometric collocation method is developed to discretize the fractional Laplacian and applied to the fractional Poisson problem and the time-dependent fractional porous media equation. Numerical studies exhibit monotonous convergence with a rate of $\mathcal{O}(N^{-1})$, where $N$ is the degrees of freedom. A comparison with finite element analysis shows that the method enjoys higher accuracy per degree of freedom and has a better convergence rate. We demonstrate that isogeometric analysis offers a novel and promising computational tool for nonlocal problems.

preprint2020arXiv

Learning Constitutive Relations from Indirect Observations Using Deep Neural Networks

We present a new approach for predictive modeling and its uncertainty quantification for mechanical systems, where coarse-grained models such as constitutive relations are derived directly from observation data. We explore the use of a neural network to represent the unknown constitutive relations, compare the neural networks with piecewise linear functions, radial basis functions, and radial basis function networks, and show that the neural network outperforms the others in certain cases. We analyze the approximation error of the neural networks using a scaling argument. The training and predicting processes in our framework combine the finite element method, automatic differentiation, and neural networks (or other function approximators). Our framework also allows uncertainty quantification in the form of confidence intervals. Numerical examples on a multiscale fiber-reinforced plate problem and a nonlinear rubbery membrane problem from solid mechanics demonstrate the effectiveness of our framework.

preprint2020arXiv

Learning Constitutive Relations using Symmetric Positive Definite Neural Networks

We present the Cholesky-factored symmetric positive definite neural network (SPD-NN) for modeling constitutive relations in dynamical equations. Instead of directly predicting the stress, the SPD-NN trains a neural network to predict the Cholesky factor of a tangent stiffness matrix, based on which the stress is calculated in the incremental form. As a result of the special structure, SPD-NN weakly imposes convexity on the strain energy function, satisfies time consistency for path-dependent materials, and therefore improves numerical stability, especially when the SPD-NN is used in finite element simulations. Depending on the types of available data, we propose two training methods, namely direct training for strain and stress pairs and indirect training for loads and displacement pairs. We demonstrate the effectiveness of SPD-NN on hyperelastic, elasto-plastic, and multiscale fiber-reinforced plate problems from solid mechanics. The generality and robustness of the SPD-NN make it a promising tool for a wide range of constitutive modeling applications.

preprint2020arXiv

Memory Augmented Generative Adversarial Networks for Anomaly Detection

In this paper, we present a memory-augmented algorithm for anomaly detection. Classical anomaly detection algorithms focus on learning to model and generate normal data, but typically guarantees for detecting anomalous data are weak. The proposed Memory Augmented Generative Adversarial Networks (MEMGAN) interacts with a memory module for both the encoding and generation processes. Our algorithm is such that most of the \textit{encoded} normal data are inside the convex hull of the memory units, while the abnormal data are isolated outside. Such a remarkable property leads to good (resp.\ poor) reconstruction for normal (resp.\ abnormal) data and therefore provides a strong guarantee for anomaly detection. Decoded memory units in MEMGAN are more interpretable and disentangled than previous methods, which further demonstrates the effectiveness of the memory mechanism. Experimental results on twenty anomaly detection datasets of CIFAR-10 and MNIST show that MEMGAN demonstrates significant improvements over previous anomaly detection methods.

preprint2020arXiv

Out-of-Vocabulary Embedding Imputation with Grounded Language Information by Graph Convolutional Networks

Due to the ubiquitous use of embeddings as input representations for a wide range of natural language tasks, imputation of embeddings for rare and unseen words is a critical problem in language processing. Embedding imputation involves learning representations for rare or unseen words during the training of an embedding model, often in a post-hoc manner. In this paper, we propose an approach for embedding imputation which uses grounded information in the form of a knowledge graph. This is in contrast to existing approaches which typically make use of vector space properties or subword information. We propose an online method to construct a graph from grounded information and design an algorithm to map from the resulting graphical structure to the space of the pre-trained embeddings. Finally, we evaluate our approach on a range of rare and unseen word tasks across various domains and show that our model can learn better representations. For example, on the Card-660 task our method improves Pearson's and Spearman's correlation coefficients upon the state-of-the-art by 11% and 17.8% respectively using GloVe embeddings.

preprint2020arXiv

Physics Constrained Learning for Data-driven Inverse Modeling from Sparse Observations

Deep neural networks (DNN) have been used to model nonlinear relations between physical quantities. Those DNNs are embedded in physical systems described by partial differential equations (PDE) and trained by minimizing a loss function that measures the discrepancy between predictions and observations in some chosen norm. This loss function often includes the PDE constraints as a penalty term when only sparse observations are available. As a result, the PDE is only satisfied approximately by the solution. However, the penalty term typically slows down the convergence of the optimizer for stiff problems. We present a new approach that trains the embedded DNNs while numerically satisfying the PDE constraints. We develop an algorithm that enables differentiating both explicit and implicit numerical solvers in reverse-mode automatic differentiation. This allows the gradients of the DNNs and the PDE solvers to be computed in a unified framework. We demonstrate that our approach enjoys faster convergence and better stability in relatively stiff problems compared to the penalty method. Our approach allows for the potential to solve and accelerate a wide range of data-driven inverse modeling, where the physical constraints are described by PDEs and need to be satisfied accurately.

preprint2020arXiv

Regularized Cycle Consistent Generative Adversarial Network for Anomaly Detection

In this paper, we investigate algorithms for anomaly detection. Previous anomaly detection methods focus on modeling the distribution of non-anomalous data provided during training. However, this does not necessarily ensure the correct detection of anomalous data. We propose a new Regularized Cycle Consistent Generative Adversarial Network (RCGAN) in which deep neural networks are adversarially trained to better recognize anomalous samples. This approach is based on leveraging a penalty distribution with a new definition of the loss function and novel use of discriminator networks. It is based on a solid mathematical foundation, and proofs show that our approach has stronger guarantees for detecting anomalous examples compared to the current state-of-the-art. Experimental results on both real-world and synthetic data show that our model leads to significant and consistent improvements on previous anomaly detection benchmarks. Notably, RCGAN improves on the state-of-the-art on the KDDCUP, Arrhythmia, Thyroid, Musk and CIFAR10 datasets.

preprint2020arXiv

Second Order Accurate Hierarchical Approximate Factorization of Sparse SPD Matrices

We describe a second-order accurate approach to sparsifying the off-diagonal blocks in the hierarchical approximate factorizations of sparse symmetric positive definite matrices. The norm of the error made by the new approach depends quadratically, not linearly, on the error in the low-rank approximation of the given block. The analysis of the resulting two-level preconditioner shows that the preconditioner is second-order accurate as well. We incorporate the new approach into the recent Sparsified Nested Dissection algorithm [SIAM J. Matrix Anal. Appl., 41 (2020), pp. 715-746], and test it on a wide range of problems. The new approach halves the number of Conjugate Gradient iterations needed for convergence, with almost the same factorization complexity, improving the total runtimes of the algorithm. Our approach can be incorporated into other rank-structured methods for solving sparse linear systems.

preprint2020arXiv

Sparse Hierarchical Preconditioners Using Piecewise Smooth Approximations of Eigenvectors

When solving linear systems arising from PDE discretizations, iterative methods (such as Conjugate Gradient, GMRES, or MINRES) are often the only practical choice. To converge in a small number of iterations, however, they have to be coupled with an efficient preconditioner. The efficiency of the preconditioner depends largely on its accuracy on the eigenvectors corresponding to small eigenvalues, and unfortunately, black-box methods typically cannot guarantee sufficient accuracy on these eigenvectors. Thus, constructing the preconditioner becomes a problem-dependent task. However, for a large class of problems, including many elliptic equations, the eigenvectors corresponding to small eigenvalues are smooth functions of the PDE grid. In this paper, we describe a hierarchical approximate factorization approach which focuses on improving accuracy on the smooth eigenvectors. The improved accuracy is achieved by preserving the action of the factorized matrix on piecewise polynomial functions of the grid. Based on the factorization, we propose a family of sparse preconditioners with $O(n)$ or $O(n \log{n})$ construction complexities. Our methods exhibit the optimal $O(n)$ solution times in benchmarks run on large elliptic problems of different types, arising for example in flow or mechanical simulations. In the case of the linear elasticity equation the preconditioners are exact on the near-kernel rigid body modes.

preprint2020arXiv

TaskTorrent: a Lightweight Distributed Task-Based Runtime System in C++

We present TaskTorrent, a lightweight distributed task-based runtime in C++. TaskTorrent uses a parametrized task graph to express the task DAG, and one-sided active messages to trigger remote tasks asynchronously. As a result the task DAG is completely distributed and discovered in parallel. It is a C++14 library and only depends on MPI. We explain the API and the implementation. We perform a series of benchmarks against StarPU and ScaLAPACK. Micro benchmarks show it has a minimal overhead compared to other solutions. We then apply it to two large linear algebra problems. TaskTorrent scales very well to thousands of cores, exhibiting good weak and strong scalings.

preprint2019arXiv

Fast Low-Rank Kernel Matrix Factorization through Skeletonized Interpolation

Integral equations are commonly encountered when solving complex physical problems. Their discretization leads to a dense kernel matrix that is block or hierarchically low-rank. This paper proposes a new way to build a low-rank factorization of those low-rank blocks at a nearly optimal cost of $\mathcal{O}(nr)$ for a $n \times n$ block submatrix of rank r. This is done by first sampling the kernel function at new interpolation points, then selecting a subset of those using a CUR decomposition and finally using this reduced set of points as pivots for a RRLU-type factorization. We also explain how this implicitly builds an optimal interpolation basis for the Kernel under consideration. We show the asymptotic convergence of the algorithm, explain his stability and demonstrate on numerical examples that it performs very well in practice, allowing to obtain rank nearly equal to the optimal rank at a fraction of the cost of the naive algorithm.

preprint2019arXiv

Parallelization of the inverse fast multipole method with an application to boundary element method

We present an algorithm to parallelize the inverse fast multipole method (IFMM), which is an approximate direct solver for dense linear systems. The parallel scheme is based on a greedy coloring algorithm, where two nodes in the hierarchy with the same color are separated by at least $σ$ nodes. We proved that when $σ\ge 6$, the workload associated with one color is embarrassingly parallel. However, the number of nodes in a group (color) may be small when $σ= 6$. Therefore, we also explored $σ= 3$, where a small fraction of the algorithm needs to be serialized, and the overall parallel efficiency was improved. We implemented the parallel IFMM using OpenMP for shared-memory machines. Successively, we applied it to a fast-multipole accelerated boundary element method (FMBEM) as a preconditioner, and compared its efficiency with (a) the original IFMM parallelized by linking a multi-threaded linear algebra library and (b) the commonly used parallel block-diagonal preconditioner. Our results showed that our parallel IFMM achieved at most $4\times$ and $11\times$ speedups over the reference method (a) and (b), respectively, in realistic examples involving more than one million variables.

preprint2016arXiv

An efficient preconditioner for the fast simulation of a 2D Stokes flow in porous media

We consider an efficient preconditioner for boundary integral equation (BIE) formulations of the two-dimensional Stokes equations in porous media. While BIEs are well-suited for resolving the complex porous geometry, they lead to a dense linear system of equations that is computationally expensive to solve for large problems. This expense is further amplified when a significant number of iterations is required in an iterative Krylov solver such as GMRES. In this paper, we apply a fast inexact direct solver, the inverse fast multipole method (IFMM), as an efficient preconditioner for GMRES. This solver is based on the framework of $\mathcal{H}^{2}$-matrices and uses low-rank compressions to approximate certain matrix blocks. It has a tunable accuracy $\varepsilon$ and a computational cost that scales as $\mathcal{O} (N \log^2 1/\varepsilon)$. We discuss various numerical benchmarks that validate the accuracy and confirm the efficiency of the proposed method. We demonstrate with several types of boundary conditions that the preconditioner is capable of significantly accelerating the convergence of GMRES when compared to a simple block-diagonal preconditioner, especially for pipe flow problems involving many pores.

preprint2016arXiv

The inverse fast multipole method: using a fast approximate direct solver as a preconditioner for dense linear systems

Although some preconditioners are available for solving dense linear systems, there are still many matrices for which preconditioners are lacking, in particular in cases where the size of the matrix $N$ becomes very large. There remains hence a great need to develop general purpose preconditioners whose cost scales well with the matrix size $N$. In this paper, we propose a preconditioner with broad applicability and with cost $\mathcal{O}(N)$ for dense matrices, when the matrix is given by a smooth kernel. Extending the method using the same framework to general $\mathcal{H}^2$-matrices is relatively straightforward. These preconditioners have a controlled accuracy (machine accuracy can be achieved if needed) and scale linearly with $N$. They are based on an approximate direct solve of the system. The linear scaling of the algorithm is achieved by means of two key ideas. First, the $\mathcal{H}^2$-structure of the dense matrix is exploited to obtain an extended sparse system of equations. Second, fill-ins arising when performing the elimination are compressed as low-rank matrices if they correspond to well-separated interactions. This ensures that the sparsity pattern of the extended sparse matrix is preserved throughout the elimination, hence resulting in a very efficient algorithm with $\mathcal{O}(N \log(1/\varepsilon)^2 )$ computational cost and $\mathcal{O}(N \log 1/\varepsilon )$ memory requirement, for an error tolerance $0 < \varepsilon < 1$. The solver is inexact, although the error can be controlled and made as small as needed. These solvers are related to ILU in the sense that the fill-in is controlled. However, in ILU, most of the fill-in is simply discarded whereas here it is approximated using low-rank blocks, with a prescribed tolerance. Numerical examples are discussed to demonstrate the linear scaling of the method and to illustrate its effectiveness as a preconditioner.

preprint2015arXiv

A Fast and Memory Efficient Sparse Solver with Applications to Finite-Element Matrices

In this article, we introduce a fast and memory efficient solver for sparse matrices arising from the finite element discretization of elliptic partial differential equations (PDEs). We use a fast direct (but approximate) multifrontal solver as a preconditioner, and use an iterative solver to achieve a desired accuracy. This approach combines the advantages of direct and iterative schemes to arrive at a fast, robust and accurate solver. We will show that this solver is faster ($\sim$ 2x) and more memory efficient ($\sim$ 2--3x) than a conventional direct multifrontal solver. Furthermore, we will demonstrate that the solver is both a faster and more effective preconditioner than other preconditioners such as the incomplete LU preconditioner. Specific speed-ups depend on the matrix size and improve as the size of the matrix increases. The solver can be applied to both structured and unstructured meshes in a similar manner. We build on our previous work and utilize the fact that dense frontal and update matrices, in the multifrontal algorithm, can be represented as hierarchically off-diagonal low-rank (HODLR) matrices. Using this idea, we replace all large dense matrix operations in the multifrontal elimination process with $O(N)$ HODLR operations to arrive at a faster and more memory efficient solver.

preprint2015arXiv

A Fast Block Low-Rank Dense Solver with Applications to Finite-Element Matrices

This article presents a fast solver for the dense "frontal" matrices that arise from the multifrontal sparse elimination process of 3D elliptic PDEs. The solver relies on the fact that these matrices can be efficiently represented as a hierarchically off-diagonal low-rank (HODLR) matrix. To construct the low-rank approximation of the off-diagonal blocks, we propose a new pseudo-skeleton scheme, the boundary distance low-rank approximation, that picks rows and columns based on the location of their corresponding vertices in the sparse matrix graph. We compare this new low-rank approximation method to the adaptive cross approximation (ACA) algorithm and show that it achieves betters speedup specially for unstructured meshes. Using the HODLR direct solver as a preconditioner (with a low tolerance) to the GMRES iterative scheme, we can reach machine accuracy much faster than a conventional LU solver. Numerical benchmarks are provided for frontal matrices arising from 3D finite element problems corresponding to a wide range of applications.

preprint2015arXiv

A New Sparse Matrix Vector Multiplication GPU Algorithm Designed for Finite Element Problems

Recently, graphics processors (GPUs) have been increasingly leveraged in a variety of scientific computing applications. However, architectural differences between CPUs and GPUs necessitate the development of algorithms that take advantage of GPU hardware. As sparse matrix vector multiplication (SPMV) operations are commonly used in finite element analysis, a new SPMV algorithm and several variations are developed for unstructured finite element meshes on GPUs. The effective bandwidth of current GPU algorithms and the newly proposed algorithms are measured and analyzed for 15 sparse matrices of varying sizes and varying sparsity structures. The effects of optimization and differences between the new GPU algorithm and its variants are then subsequently studied. Lastly, both new and current SPMV GPU algorithms are utilized in the GPU CG Solver in GPU finite element simulations of the heart. These results are then compared against parallel PETSc finite element implementation results. The effective bandwidth tests indicate that the new algorithms compare very favorably with current algorithms for a wide variety of sparse matrices and can yield very notable benefits. GPU finite element simulation results demonstrate the benefit of using GPUs for finite element analysis, and also show that the proposed algorithms can yield speedup factors up to 12-fold for real finite element applications.

preprint2015arXiv

Optimizing the adaptive fast multipole method for fractal sets

We have performed a detailed analysis of the fast multipole method (FMM) in the adaptive case, in which the depth of the FMM tree is non-uniform. Previous works in this area have focused mostly on special types of adaptive distributions, for example when points accumulate on a 2D manifold or accumulate around a few points in space. Instead, we considered a more general situation in which fractal sets, e.g., Cantor sets and generalizations, are used to create adaptive sets of points. Such sets are characterized by their dimension, a number between 0 and 3. We introduced a mathematical framework to define a converging sequence of octrees, and based on that, demonstrated how to increase $N \to \infty$. A new complexity analysis for the adaptive FMM is introduced. It is shown that the ${\cal{O}}(N)$ complexity is achievable for any distribution of particles, when a modified adaptive FMM is exploited. We analyzed how the FMM performs for fractal point distributions, and how optimal parameters can be picked, e.g., the criterion used to stop the subdivision of an FMM cell. A new subdividing double-threshold method is introduced, and better performance demonstrated. Parameters in the FMM are modeled as a function of particle distribution dimension, and the optimal values are obtained. A three dimensional kernel independent black box adaptive FMM is implemented and used for all calculations.

preprint2014arXiv

The Inverse Fast Multipole Method

This article introduces a new fast direct solver for linear systems arising out of wide range of applications, integral equations, multivariate statistics, radial basis interpolation, etc., to name a few. \emph{The highlight of this new fast direct solver is that the solver scales linearly in the number of unknowns in all dimensions.} The solver, termed as Inverse Fast Multipole Method (abbreviated as IFMM), works on the same data-structure as the Fast Multipole Method (abbreviated as FMM). More generally, the solver can be immediately extended to the class of hierarchical matrices, denoted as $\mathcal{H}^2$ matrices with strong admissibility criteria (weak low-rank structure), i.e., \emph{the interaction between neighboring cluster of particles is full-rank whereas the interaction between particles corresponding to well-separated clusters can be efficiently represented as a low-rank matrix}. The algorithm departs from existing approaches in the fact that throughout the algorithm the interaction corresponding to neighboring clusters are always treated as full-rank interactions. Our approach relies on two major ideas: (i) The $N \times N$ matrix arising out of FMM (from now on termed as FMM matrix) can be represented as an extended sparser matrix of size $M \times M$, where $M \approx 3N$. (ii) While solving the larger extended sparser matrix, \emph{the fill-in's that arise in the matrix blocks corresponding to well-separated clusters are hierarchically compressed}. The ordering of the equations and the unknowns in the extended sparser matrix is strongly related to the local and multipole coefficients in the FMM~\cite{greengard1987fast} and \emph{the order of elimination is different from the usual nested dissection approach}. Numerical benchmarks on $2$D manifold confirm the linear scaling of the algorithm.

preprint2013arXiv

Computing reaction rates in bio-molecular systems using discrete macro-states

Computing reaction rates in biomolecular systems is a common goal of molecular dynamics simulations. The reactions considered often involve conformational changes in the molecule, either changes in the structure of a protein or the relative position of two molecules, for example when modeling the binding of a protein and ligand. Here we will consider the general problem of computing the rate of transfer from a subset A of the conformational space Omega to a subset B of Omega. It is assumed that A and B are associated with minimum energy basins and are long-lived states. Rates can be obtained using many different methods. We review some of the most popular approaches. We organize the different approaches roughly in chronological order and under four main categories: reactive flux, transition path sampling, conformation dynamics. The fourth class of methods, to which we do not give any specific name, in some sense attempts to combine features from transition path sampling and conformation dynamics. They include non-equilibrium umbrella sampling (Warmflash et al. [2007], Dickson et al. [2009b]), and weighted ensemble dynamics (Huber and Kim [1996]).

preprint2013arXiv

Method and Advantages of Genetic Algorithms in Parameterization of Interatomic Potentials: Metal-Oxides

The method and the advantages of an evolutionary computing based approach using a steady state genetic algorithm (GA) for the parameterization of interatomic potentials for metal oxides within the shell model framework are developed and described. We show that the GA based methodology for the parameterization of interatomic force field functions is capable of (a) simultaneous optimization of the multiple phases or properties of a material in a single run, (b) facilitates the incremental re-optimization of the whole system as more data is available for either additional phases or material properties not included in previous runs, and (c) successful global optimization in the presence of multiple local minima in the parameter space. As an example, we apply the method towards simultaneous optimization of four distinct crystalline phases of Barium Titanate (BaTiO3 or BTO) using an ab initio density functional theory (DFT) based reference dataset. We find that the optimized force field function is capable of the prediction of the two phases not used in the optimization procedure, and that many derived physical properties such as the equilibrium lattice constants, unit cell volume, elastic properties, coefficient of thermal expansion, and average electronic polarization are in good agreement with the experimental results available from the literature.

preprint2012arXiv

Optimized M2L Kernels for the Chebyshev Interpolation based Fast Multipole Method

A fast multipole method (FMM) for asymptotically smooth kernel functions (1/r, 1/r^4, Gauss and Stokes kernels, radial basis functions, etc.) based on a Chebyshev interpolation scheme has been introduced in [Fong et al., 2009]. The method has been extended to oscillatory kernels (e.g., Helmholtz kernel) in [Messner et al., 2012]. Beside its generality this FMM turns out to be favorable due to its easy implementation and its high performance based on intensive use of highly optimized BLAS libraries. However, one of its bottlenecks is the precomputation of the multiple-to-local (M2L) operator, and its higher number of floating point operations (flops) compared to other FMM formulations. Here, we present several optimizations for that operator, which is known to be the costliest FMM operator. The most efficient ones do not only reduce the precomputation time by a factor up to 340 but they also speed up the matrix-vector product. We conclude with comparisons and numerical validations of all presented optimizations.

preprint2012arXiv

Pipelining the Fast Multipole Method over a Runtime System

Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task flow and employing a state-of-the-art runtime system, StarPU, in order to process the tasks on the different processing units. We carefully design the task flow, the mathematical operators, their Central Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as well as scheduling schemes. We compute potentials and forces of 200 million particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38 million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem processor enhanced with 3 Nvidia M2090 Fermi GPUs.

preprint2011arXiv

Extension and optimization of the FIND algorithm: computing Green's and less-than Green's functions (with technical appendix)

The FIND algorithm is a fast algorithm designed to calculate certain entries of the inverse of a sparse matrix. Such calculation is critical in many applications, e.g., quantum transport in nano-devices. We extended the algorithm to other matrix inverse related calculations. Those are required for example to calculate the less-than Green's function and the current density through the device. For a 2D device discretized as an N_x x N_y mesh, the best known algorithms have a running time of O(N_x^3 N_y), whereas FIND only requires O(N_x^2 N_y). Even though this complexity has been reduced by an order of magnitude, the matrix inverse calculation is still the most time consuming part in the simulation of transport problems. We could not reduce the order of complexity, but we were able to significantly reduce the constant factor involved in the computation cost. By exploiting the sparsity and symmetry, the size of the problem beyond which FIND is faster than other methods typically decreases from a 130x130 2D mesh down to a 40x40 mesh. These improvements make the optimized FIND algorithm even more competitive for real-life applications.

preprint2011arXiv

Fourier Based Fast Multipole Method for the Helmholtz Equation

The fast multipole method (FMM) has had great success in reducing the computational complexity of solving the boundary integral form of the Helmholtz equation. We present a formulation of the Helmholtz FMM that uses Fourier basis functions rather than spherical harmonics. By modifying the transfer function in the precomputation stage of the FMM, time-critical stages of the algorithm are accelerated by causing the interpolation operators to become straightforward applications of fast Fourier transforms, retaining the diagonality of the transfer function, and providing a simplified error analysis. Using Fourier analysis, constructive algorithms are derived to a priori determine an integration quadrature for a given error tolerance. Sharp error bounds are derived and verified numerically. Various optimizations are considered to reduce the number of quadrature points and reduce the cost of computing the transfer function.

Eric Darve

What is connected

Connect this record

See the researcher in context

Building this map preview

33 published item(s)

Necessary and Sufficient Conditions for the Existence of an LU Factorization for General Rank Deficient Matrices

SpectraQuery: A Hybrid Retrieval-Augmented Conversational Assistant for Battery Science

Hierarchical Orthogonal Factorization: Sparse Least Squares Problems

Towards a Scalable Hierarchical High-order CFD Solver

An Algebraic Sparsified Nested Dissection Algorithm Using Low-Rank Approximations

Anomaly Detection with Domain Adaptation

Coupled Time-lapse Full Waveform Inversion for Subsurface Flow Problems using Intrusive Automatic Differentiation

Inverse Modeling of Viscoelasticity Materials using Physics Constrained Learning

Isogeometric Collocation Method for the Fractional Laplacian in the 2D Bounded Domain

Learning Constitutive Relations from Indirect Observations Using Deep Neural Networks

Learning Constitutive Relations using Symmetric Positive Definite Neural Networks

Memory Augmented Generative Adversarial Networks for Anomaly Detection

Out-of-Vocabulary Embedding Imputation with Grounded Language Information by Graph Convolutional Networks

Physics Constrained Learning for Data-driven Inverse Modeling from Sparse Observations

Regularized Cycle Consistent Generative Adversarial Network for Anomaly Detection

Second Order Accurate Hierarchical Approximate Factorization of Sparse SPD Matrices

Sparse Hierarchical Preconditioners Using Piecewise Smooth Approximations of Eigenvectors

TaskTorrent: a Lightweight Distributed Task-Based Runtime System in C++

Fast Low-Rank Kernel Matrix Factorization through Skeletonized Interpolation

Parallelization of the inverse fast multipole method with an application to boundary element method

An efficient preconditioner for the fast simulation of a 2D Stokes flow in porous media

The inverse fast multipole method: using a fast approximate direct solver as a preconditioner for dense linear systems

A Fast and Memory Efficient Sparse Solver with Applications to Finite-Element Matrices

A Fast Block Low-Rank Dense Solver with Applications to Finite-Element Matrices

A New Sparse Matrix Vector Multiplication GPU Algorithm Designed for Finite Element Problems

Optimizing the adaptive fast multipole method for fractal sets

The Inverse Fast Multipole Method

Computing reaction rates in bio-molecular systems using discrete macro-states

Method and Advantages of Genetic Algorithms in Parameterization of Interatomic Potentials: Metal-Oxides

Optimized M2L Kernels for the Chebyshev Interpolation based Fast Multipole Method

Pipelining the Fast Multipole Method over a Runtime System

Extension and optimization of the FIND algorithm: computing Green's and less-than Green's functions (with technical appendix)

Fourier Based Fast Multipole Method for the Helmholtz Equation