Source author record

Hanqi Guo

Hanqi Guo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Graphics Artificial Intelligence Machine Learning astro-ph.IM

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FFCz: Fast Fourier Correction for Spectrum-Preserving Lossy Compression of Scientific Data

This paper introduces a novel technique to preserve spectral features in lossy compression based on a novel fast Fourier correction algorithm\added{ for regular-grid data}. Preserving both spatial and frequency representations of data is crucial for applications such as cosmology, turbulent combustion, and X-ray diffraction, where spatial and frequency views provide complementary scientific insights. In particular, many analysis tasks rely on frequency-domain representations to capture key features, including the power spectrum of cosmology simulations, the turbulent energy spectrum in combustion, and diffraction patterns in reciprocal space for ptychography. However, existing compression methods guarantee accuracy only in the spatial domain while disregarding the frequency domain. To address this limitation, we propose an algorithm that corrects the errors produced by off-the-shelf ``base'' compressors such as SZ3, ZFP, and SPERR, thereby preserving both spatial and frequency representations by bounding errors in both domains. By expressing frequency-domain errors as linear combinations of spatial-domain errors, we derive a region that jointly bounds errors in both domains. Given as input the spatial errors from a base compressor and user-defined error bounds in the spatial and frequency domains, we iteratively project the spatial error vector onto the regions defined by the spatial and frequency constraints until it lies within their intersection. We further accelerate the algorithm using GPU parallelism to achieve practical performance. We validate our approach with datasets from cosmology simulations, X-ray diffraction, combustion simulation, and electroencephalography demonstrating its effectiveness in preserving critical scientific information in both spatial and frequency domains.

preprint2026arXiv

pMSz: A Distributed Parallel Algorithm for Correcting Extrema and Morse Smale Segmentations in Lossy Compression

Lossy compression, widely used by scientists to reduce data from simulations, experiments, and observations, can distort features of interest even under bounded error. Such distortions may compromise downstream analyses and lead to incorrect scientific conclusions in applications such as combustion and cosmology. This paper presents a distributed and parallel algorithm for correcting topological features, specifically, piecewise linear Morse Smale segmentations (PLMSS), which decompose the domain into monotone regions labeled by their corresponding local minima and maxima. While a single GPU algorithm (MSz) exists for PLMSS correction after compression, no methodology has been developed that scales beyond a single GPU for extreme scale data. We identify the key bottleneck in scaling PLMSS correction as the parallel computation of integral paths, a communication-intensive computation that is notoriously difficult to scale. Instead of explicitly computing and correcting integral paths, our algorithm simplifies MSz by preserving steepest ascending and descending directions across all locations, thereby minimizing interprocess communication while introducing negligible additional storage overhead. With this simplified algorithm and relaxed synchronization, our method achieves over 90% parallel efficiency on 128 GPUs on the Perlmutter supercomputer for real world datasets.

preprint2022arXiv

Reinforcement Learning for Load-balanced Parallel Particle Tracing

We explore an online reinforcement learning (RL) paradigm to dynamically optimize parallel particle tracing performance in distributed-memory systems. Our method combines three novel components: (1) a work donation algorithm, (2) a high-order workload estimation model, and (3) a communication cost model. First, we design an RL-based work donation algorithm. Our algorithm monitors workloads of processes and creates RL agents to donate data blocks and particles from high-workload processes to low-workload processes to minimize program execution time. The agents learn the donation strategy on the fly based on reward and cost functions designed to consider processes' workload changes and data transfer costs of donation actions. Second, we propose a workload estimation model, helping RL agents estimate the workload distribution of processes in future computations. Third, we design a communication cost model that considers both block and particle data exchange costs, helping RL agents make effective decisions with minimized communication costs. We demonstrate that our algorithm adapts to different flow behaviors in large-scale fluid dynamics, ocean, and weather simulation data. Our algorithm improves parallel particle tracing performance in terms of parallel efficiency, load balance, and costs of I/O and communication for evaluations with up to 16,384 processors.

preprint2022arXiv

VDL-Surrogate: A View-Dependent Latent-based Model for Parameter Space Exploration of Ensemble Simulations

We propose VDL-Surrogate, a view-dependent neural-network-latent-based surrogate model for parameter space exploration of ensemble simulations that allows high-resolution visualizations and user-specified visual mappings. Surrogate-enabled parameter space exploration allows domain scientists to preview simulation results without having to run a large number of computationally costly simulations. Limited by computational resources, however, existing surrogate models may not produce previews with sufficient resolution for visualization and analysis. To improve the efficient use of computational resources and support high-resolution exploration, we perform ray casting from different viewpoints to collect samples and produce compact latent representations. This latent encoding process reduces the cost of surrogate model training while maintaining the output quality. In the model training stage, we select viewpoints to cover the whole viewing sphere and train corresponding VDL-Surrogate models for the selected viewpoints. In the model inference stage, we predict the latent representations at previously selected viewpoints and decode the latent representations to data space. For any given viewpoint, we make interpolations over decoded data at selected viewpoints and generate visualizations with user-specified visual mappings. We show the effectiveness and efficiency of VDL-Surrogate in cosmological and ocean simulations with quantitative and qualitative evaluations. Source code is publicly available at https://github.com/trainsn/VDL-Surrogate.

preprint2021arXiv

Asynchronous and Load-Balanced Union-Find for Distributed and Parallel Scientific Data Visualization and Analysis

We present a novel distributed union-find algorithm that features asynchronous parallelism and k-d tree based load balancing for scalable visualization and analysis of scientific data. Applications of union-find include level set extraction and critical point tracking, but distributed union-find can suffer from high synchronization costs and imbalanced workloads across parallel processes. In this study, we prove that global synchronizations in existing distributed union-find can be eliminated without changing final results, allowing overlapped communications and computations for scalable processing. We also use a k-d tree decomposition to redistribute inputs, in order to improve workload balancing. We benchmark the scalability of our algorithm with up to 1,024 processes using both synthetic and application data. We demonstrate the use of our algorithm in critical point tracking and super-level set extraction with high-speed imaging experiments and fusion plasma simulations, respectively.

preprint2021arXiv

Exact Analytical Parallel Vectors

This paper demonstrates that parallel vector curves are piecewise cubic rational curves in 3D piecewise linear vector fields. Parallel vector curves -- loci of points where two vector fields are parallel -- have been widely used to extract features including ridges, valleys, and vortex core lines in scientific data. We define the term \emph{generalized and underdetermined eigensystem} in the form of $\mathbf{A}\mathbf{x}+\mathbf{a}=λ(\mathbf{B}\mathbf{x}+\mathbf{b})$ in order to derive the piecewise rational representation of 3D parallel vector curves. We discuss how singularities of the rationals lead to different types of intersections with tetrahedral cells.

Hanqi Guo

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

FFCz: Fast Fourier Correction for Spectrum-Preserving Lossy Compression of Scientific Data

pMSz: A Distributed Parallel Algorithm for Correcting Extrema and Morse Smale Segmentations in Lossy Compression

Reinforcement Learning for Load-balanced Parallel Particle Tracing

VDL-Surrogate: A View-Dependent Latent-based Model for Parameter Space Exploration of Ensemble Simulations

Asynchronous and Load-Balanced Union-Find for Distributed and Parallel Scientific Data Visualization and Analysis

Exact Analytical Parallel Vectors