Domain decomposition of the modified Born series approach for large-scale wave propagation simulations
The modified Born series (MBS) is a fast and accurate method for simulating wave propagation in complex structures. In the current implementation of the MBS, the simulation size is limited by the working memory of a single computer or graphics processing unit (GPU). Here, we present a domain decomposition method that enhances the scalability of the MBS by distributing the computations over multiple GPUs, while maintaining its accuracy, memory efficiency, and guaranteed monotonic convergence. With this new method, the computations can be performed in parallel, and a larger simulation size is possible as it is no longer limited to the memory size of a single computer or GPU. We show how to decompose large problems over subdomains and demonstrate our approach by solving the Helmholtz problem for a complex structure of $3.28\cdot 10^7$ cubic wavelengths ($320 \times 320 \times 320$ wavelengths) in just $45$ minutes with a dual-GPU simulation.