Source author record

Mauricio Araya-Polo

Mauricio Araya-Polo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Machine Learning physics.geo-ph Computer Vision eess.SP math.NA Mathematical Software Performance physics.flu-dyn

Catalog footprint

What is connected

11works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Accelerated Full Waveform Inversion by Deep Compressed Learning

We propose and test a method to reduce the dimensionality of Full Waveform Inversion (FWI) inputs as computational cost mitigation approach. Given modern seismic acquisition systems, the data (as input for FWI) required for an industrial-strength case is in the teraflop level of storage, therefore solving complex subsurface cases or exploring multiple scenarios with FWI become prohibitive. The proposed method utilizes a deep neural network with a binarized sensing layer that learns by compressed learning a succinct but consequential seismic acquisition layout from a large corpus of subsurface models. Thus, given a large seismic data set to invert, the trained network selects a smaller subset of the data, then by using representation learning, an autoencoder computes latent representations of the data, followed by K-means clustering of the latent representations to further select the most relevant data for FWI. Effectively, this approach can be seen as a hierarchical selection. The proposed approach consistently outperforms random data sampling, even when utilizing only 10% of the data for 2D FWI, these results pave the way to accelerating FWI in large scale 3D inversion.

preprint2022arXiv

Automatic Interpretative Image-Focusing Analysis

The focusing of a seismic image is directly linked to the accuracy of the velocity model. Therefore, a critical step in a seismic imaging workflow is to perform a focusing analysis on a seismic image to determine velocity errors. While the offset/aperture-angle axis is frequently used for focusing analysis, the physical (i.e., midpoint) axes of seismic images tend to be ignored as focusing analysis of geological structures is highly interpretative and difficult to automate. We have developed an automatic data-driven approach using convolutional neural networks to automate image-focusing analysis. Using focused and unfocused geological faults, we show that our method can make use of both spatial and offset/angle focusing information to robustly estimate velocity errors within seismic images. We demonstrate that our method correctly estimates velocity errors from a 2D Gulf of Mexico limited-aperture image where a traditional semblance-based approach fails. We also show that our method has the added benefit of improving the interpretation of faults within the image.

preprint2022arXiv

Encoder-Decoder Architecture for 3D Seismic Inversion

Inverting seismic data to build 3D geological structures is a challenging task due to the overwhelming amount of acquired seismic data, and the very-high computational load due to iterative numerical solutions of the wave equation, as required by industry-standard tools such as Full Waveform Inversion (FWI). For example, in an area with surface dimensions of 4.5km $\times$ 4.5km, hundreds of seismic shot-gather cubes are required for 3D model reconstruction, leading to Terabytes of recorded data. This paper presents a deep learning solution for the reconstruction of realistic 3D models in the presence of field noise recorded in seismic surveys. We implement and analyze a convolutional encoder-decoder architecture that efficiently processes the entire collection of hundreds of seismic shot-gather cubes. The proposed solution demonstrates that realistic 3D models can be reconstructed with a structural similarity index measure (SSIM) of 0.8554 (out of 1.0) in the presence of field noise at 10dB signal-to-noise ratio.

preprint2022arXiv

LSTM-driven Forecast of CO2 Injection in Porous Media

The ability to simulate the partial differential equations (PDE's) that govern multi-phase flow in porous media is essential for different applications such as geologic sequestration of CO2, groundwater flow monitoring and hydrocarbon recovery from geologic formations [1]. These multi-phase flow problems can be simulated by solving the governing PDE's numerically, using various discretization schemes such as finite elements, finite volumes, spectral methods, etc. More recently, the application of Machine Learning (ML) to approximate the solutions to PDE's has been a very active research area. However, most researchers have focused on the performance of their models within the time-space domain in which the models were trained. In this work, we apply ML techniques to approximate PDE solutions and focus on the forecasting problem outside of the training domain. To this end, we use two different ML architectures - the feed forward neural (FFN) network and the long short-term memory (LSTM)-based neural network, to predict the PDE solutions in future times based on the knowledge of the solutions in the past. The results of our methodology are presented on two example PDE's - namely a form of PDE that models the underground injection of CO2 and its hyperbolic limit which is a common benchmark case. In both cases, the LSTM architecture shows a huge potential to predict the solution behavior at future times based on prior data

preprint2022arXiv

Massively scalable stencil algorithm

Stencil computations lie at the heart of many scientific and industrial applications. Unfortunately, stencil algorithms perform poorly on machines with cache based memory hierarchy, due to low re-use of memory accesses. This work shows that for stencil computation a novel algorithm that leverages a localized communication strategy effectively exploits the Cerebras WSE-2, which has no cache hierarchy. This study focuses on a 25-point stencil finite-difference method for the 3D wave equation, a kernel frequently used in earth modeling as numerical simulation. In essence, the algorithm trades memory accesses for data communication and takes advantage of the fast communication fabric provided by the architecture. The algorithm -- historically memory bound -- becomes compute bound. This allows the implementation to achieve near perfect weak scaling, reaching up to 503 TFLOPs on WSE-2, a figure that only full clusters can eventually yield.

preprint2020arXiv

Accelerating High-Order Stencils on GPUs

Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of proposed enhancements work well for high-order stencils, such as those used for seismic modeling. Furthermore, coping with boundary conditions often requires different computational logic, which complicates efficient exploitation of the thread-level parallelism on GPUs. In this paper, we study high-order stencils and their unique characteristics on GPUs. We manually crafted a collection of implementations of a 25-point seismic modeling stencil in CUDA and related boundary conditions. We evaluate their code shapes, memory hierarchy usage, data-fetching patterns, and other performance attributes. We conducted an empirical evaluation of these stencils using several mature and emerging tools and discuss our quantitative findings. Among our implementations, we achieve twice the performance of a proprietary code developed in C and mapped to GPUs using OpenACC. Additionally, several of our implementations have excellent performance portability.

preprint2020arXiv

Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method

Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and variance reduction to address this problem. SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process. We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.

preprint2020arXiv

Minimod: A Finite Difference solver for Seismic Modeling

This article introduces a benchmark application for seismic modeling using finite difference method, which is namedMiniMod, a mini application for seismic modeling. The purpose is to provide a benchmark suite that is, on one hand easy to build and adapt to the state of the art in programming models and changing high performance hardware landscape. On the other hand, the intention is to have a proxy application to actual production geophysical exploration workloads for Oil & Gas exploration, and other geosciences applications based on the wave equation. From top to bottom, we describe the design concepts, algorithms, code structure of the application, and present the benchmark results on different current computer architectures.

preprint2016arXiv

A survey of sparse matrix-vector multiplication performance on large matrices

We contribute a third-party survey of sparse matrix-vector (SpMV) product performance on industrial-strength, large matrices using: (1) The SpMV implementations in Intel MKL, the Trilinos project (Tpetra subpackage), the CUSPARSE library, and the CUSP library, each running on modern architectures. (2) NVIDIA GPUs and Intel multi-core CPUs (supported by each software package). (3) The CSR, BSR, COO, HYB, and ELL matrix formats (supported by each software package).

preprint2016arXiv

Performance Analysis and Optimization of a Hybrid Distributed Reverse Time Migration Application

Applications to process seismic data employ scalable parallel systems to produce timely results. To fully exploit emerging processor architectures, application will need to employ threaded parallelism within a node and message passing across nodes. Today, MPI+OpenMP is the preferred programming model for this task. However, tuning hybrid programs for clusters is difficult. Performance tools can help users identify bottlenecks and uncover opportunities for improvement. This poster describes our experiences of applying Rice University's HPCToolkit and hardware performance counters to gain insight into an MPI+OpenMP code that performs Reverse Time Migration (RTM) on a cluster of multicore processors. The tools provided us with insights into the effectiveness of the domain decomposition strategy, the use of threaded parallelism, and functional unit utilization in individual cores. By applying insights obtained from the tools, we were able to improve the performance of the RTM code by roughly 30 percent.

preprint2015arXiv

Learning with a Wasserstein Loss

Learning to predict multi-label outputs is challenging, but in many problems there is a natural metric on the outputs that can be used to improve predictions. In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance. The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact Wasserstein distance is costly, recent work has described a regularized approximation that is efficiently computed. We describe an efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures. We also describe a statistical learning bound for the loss. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data tag prediction problem, using the Yahoo Flickr Creative Commons dataset, outperforming a baseline that doesn't use the metric.

Mauricio Araya-Polo

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Accelerated Full Waveform Inversion by Deep Compressed Learning

Automatic Interpretative Image-Focusing Analysis

Encoder-Decoder Architecture for 3D Seismic Inversion

LSTM-driven Forecast of CO2 Injection in Porous Media

Massively scalable stencil algorithm

Accelerating High-Order Stencils on GPUs

Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method

Minimod: A Finite Difference solver for Seismic Modeling

A survey of sparse matrix-vector multiplication performance on large matrices

Performance Analysis and Optimization of a Hybrid Distributed Reverse Time Migration Application

Learning with a Wasserstein Loss