Source author record

Huan Zhou

Huan Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

22works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Recent advances in coding agents suggest rapid progress toward autonomous software development, yet existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software systems. Most prior evaluations focus on localized code generation, scaffolded completion, or short-term repair tasks, leaving open the question of whether agents can sustain coherent reasoning, planning, and execution over the extended horizons demanded by real-world repository construction. To address this gap, we present NL2Repo Bench, a benchmark explicitly designed to evaluate the long-horizon repository generation ability of coding agents. Given only a single natural-language requirements document and an empty workspace, agents must autonomously design the architecture, manage dependencies, implement multi-module logic, and produce a fully installable Python library. Our experiments across state-of-the-art open- and closed-source models reveal that long-horizon repository generation remains largely unsolved: even the strongest agents achieve below 40% average test pass rates and rarely complete an entire repository correctly. Detailed analysis uncovers fundamental long-horizon failure modes, including premature termination, loss of global coherence, fragile cross-file dependencies, and inadequate planning over hundreds of interaction steps. NL2Repo Bench establishes a rigorous, verifiable testbed for measuring sustained agentic competence and highlights long-horizon reasoning as a central bottleneck for the next generation of autonomous coding agents.

preprint2026arXiv

Testing supermassive primordial black holes with lensing signals of binary black hole merges

Next-generation ground-based gravitational wave (GW) detectors are expected to observe millions of binary black hole mergers, a fraction of which will be strongly lensed by intervening galaxies or clusters, producing multiple images with characteristic distribution of time delay. Importantly, the predicted rate and properties of such events are sensitive to the abundance and distribution of strong lensing objects which directly depends on cosmological models. One such scenario posits the existence of supermassive primordial black holes (SMPBHs) in the early universe, which would enhance the formation of dark matter halos. This mechanism has been proposed to explain the abundance of high-redshift galaxies observed by James Webb Space Telescope. Crucially, the same cosmological model with SMPBHs would also leave a distinct imprint on the population of strongly lensed GWs. It predicts both an increased event rate and a modified distribution of time delays between the multiple images. Therefore, we propose statistical measurements of the rate and time delay distribution of strong lensing GW events as a powerful probe to directly constrain the abundance of SMPBHs. Considering $Λ$CDM cosmology with (non-)clustered SMPBHs, we find that the abundance of SMPBHs $f_{\rm PBH}$ with masses above $10^8~M_{\odot}$ is constrained to be $\sim10^{-4}$ at $95\%$ confidence level. It will be comparable and complementary to the currently available constraint from large scale structure observations.

preprint2023arXiv

Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings

As a practical alternative of speech separation, target speaker extraction (TSE) aims to extract the speech from the desired speaker using additional speaker cue extracted from the speaker. Its main challenge lies in how to properly extract and leverage the speaker cue to benefit the extracted speech quality. The cue extraction method adopted in majority existing TSE studies is to directly utilize discriminative speaker embedding, which is extracted from the pre-trained models for speaker verification. Although the high speaker discriminability is a most desirable property for speaker verification task, we argue that it may be too sophisticated for TSE. In this study, we propose that a simplified speaker cue with clear class separability might be preferred for TSE. To verify our proposal, we introduce several forms of speaker cues, including naive speaker embedding (such as, x-vector and xi-vector) and new speaker embeddings produced from sparse LDA-transform. Corresponding TSE models are built by integrating these speaker cues with SepFormer (one SOTA speech separation model). Performances of these TSE models are examined on the benchmark WSJ0-2mix dataset. Experimental results validate the effectiveness and generalizability of our proposal, showing up to 9.9% relative improvement in SI-SDRi. Moreover, with SI-SDRi of 19.4 dB and PESQ of 3.78, our best TSE system significantly outperforms the current SOTA systems and offers the top TSE results reported till date on the WSJ0-2mix.

preprint2022arXiv

A Search for Millilensing Gamma-Ray Bursts in the Observations of Fermi GBM

Millilensing of Gamma-Ray Bursts (GRBs) is expected to manifest as multiple emission episodes in a single triggered GRB with similar light-curve patterns and similar spectrum properties. Identifying such lensed GRBs could help improve constraints on the abundance of compact dark matter. Here we present a systemic search for millilensing among 3000 GRBs observed by the \textit{Fermi} GBM up to 2021 April. Eventually we find 4 interesting candidates by performing auto-correlation test, hardness test, and time-integrated/resolved spectrum test to the whole sample. GRB 081126A and GRB 090717A are ranked as the first class candidate based on their excellent performance both in temporal and spectrum analysis. GRB 081122A and GRB 110517B are ranked as the second class candidates (suspected candidates), mainly because their two emission episodes show clear deviations in part of the time-resolved spectrum or in the time-integrated spectrum. Considering a point mass model for the gravitational lens, our results suggest that the density parameter of lens objects with mass $M_{\rm L}\sim10^{6} M_{\odot}$ is larger than $1.5\times10^{-3}$.

preprint2022arXiv

Breast Cancer Molecular Subtypes Prediction on Pathological Images with Discriminative Patch Selecting and Multi-Instance Learning

Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from conventional H&E pathological whole slide images (WSI) using AI method is useful and critical to assist pathologists pre-screen proper paraffin block for IHC. It's a challenging task since only WSI level labels of molecular subtypes can be obtained from IHC. Gigapixel WSIs are divided into a huge number of patches to be computationally feasible for deep learning. While with coarse slide-level labels, patch-based methods may suffer from abundant noise patches, such as folds, overstained regions, or non-tumor tissues. A weakly supervised learning framework based on discriminative patch selecting and multi-instance learning was proposed for breast cancer molecular subtype prediction from H&E WSIs. Firstly, co-teaching strategy was adopted to learn molecular subtype representations and filter out noise patches. Then, a balanced sampling strategy was used to handle the imbalance in subtypes in the dataset. In addition, a noise patch filtering algorithm that used local outlier factor based on cluster centers was proposed to further select discriminative patches. Finally, a loss function integrating patch with slide constraint information was used to finetune MIL framework on obtained discriminative patches and further improve the performance of molecular subtyping. The experimental results confirmed the effectiveness of the proposed method and our models outperformed even senior pathologists, with potential to assist pathologists to pre-screen paraffin blocks for IHC in clinic.

preprint2022arXiv

Constraints on the abundance of primordial black holes with different mass distributions from lensing of fast radio bursts

Primordial black holes (PBHs) has been considered to form a part of dark matter for a long time but the possibility has been poorly constrained over a wide mass range, including the stellar mass range ($1-100~M_{\odot}$). However, due to the discovery of merger events of black hole binaries by LIGO-Virgo gravitational wave observatories, the interest for PBHs in the stellar mass window has been aroused again. Fast radio bursts (FRBs) are bright radio transients with millisecond duration and very high all-sky occurrence rate. Lensing effect of these bursts has been proposed as one of the optimal probes for constraining the abundance of PBHs in the stellar mass range. In this paper, we first investigate constraints on the abundance of PBHs from the latest $593$ FRB observations for both the monochromatic mass distribution and three other popular extended mass distributions related to different formation mechanisms of PBHs. It is found that constraints from currently public FRB observations are relatively weaker than those from existing gravitational wave detections. Furthermore, we forecast constraining power of future FRB observations on the abundance of PBHs with different mass distributions of PBHs and different redshift distributions of FRBs taken into account. Finally, We find that constraints of parameter space on extended mass distributions from $\sim10^5$ FRBs with $\overline{Δt}\leq1 ~\rm ms$ would be comparable with what can be constrained from gravitational wave events. It is foreseen that upcoming complementary multi-messenger observations will yield considerable constraints on the possibilities of PBHs in this intriguing mass window.

preprint2022arXiv

Constraints on the abundance of supermassive primordial black holes from lensing of compact radio sources

The possibility that primordial black holes (PBHs) form a part of dark matter has been considered over a wide mass range from the Planck mass ($10^{-5}~\rm g$) to the level of the supermassive black hole in the center of the galaxy. Primordial origin might be one of the most important formation channel of supermassive black holes. We use the non-detection of lensing effect of very long baseline interferometer observations of compact radio sources with extremely high angular resolution as a promising probe to constrain the abundance of intergalactic PBHs in the mass range $\sim10^4$-$10^9~M_{\odot}$. For a sample of well-measured 543 flat-spectrum compact radio sources, no milli-lensed images are found with angular separations between $1.5$ milli-arcseconds and $50$ milli-arcseconds. From this null search result, we derive that the fraction of dark matter made up of supermassive PBHs in the mass range $\sim10^6$-$10^8~M_{\odot}$ is $\lesssim1.48\%$ at $95\%$ confidence level. This constraints would be significantly improved due to the rapid increase of the number of measured compact radio sources. For instance, on the basis of none confirmed milli-lensing candidate in the latest $\sim14000$ sources, we derive the abundance of supermassive PBHs and obtain that it is $\lesssim0.06\%$ at $95\%$ confidence level.

preprint2022arXiv

Multiple DP-coloring of planar graphs without 3-cycles and normally adjacent 4-cycles

The concept of DP-coloring of a graph is a generalization of list coloring introduced by Dvořák and Postle in 2015. Multiple DP-coloring of graphs, as a generalization of multiple list coloring, was first studied by Bernshteyn, Kostochka and Zhu in 2019. This paper proves that planar graphs without 3-cycles and normally adjacent 4-cycles are $(7m, 2m)$-DP-colorable for every integer $m$. As a consequence, the strong fractional choice number of any planar graph without 3-cycles and normally adjacent 4-cycles is at most $7/2$.

preprint2022arXiv

Performance Analysis of Fog-Aided D2D Networks with Multicast-Based Opportunistic Content Delivery

In this paper, we develop a comprehensive and tractable analytical framework based on stochastic geometry to evaluate the performance of large-scale fog-aided device-to-device (F-D2D) networks with opportunistic content multicasting. As a part of the analysis, to resolve the contentions of file requests from the cache-incapable conventional user equipments (C-UEs), two simple yet typical candidate file selection schemes for cache-enabled fog user equipments (F-UEs), namely the random file selection (RFS) scheme and the most requested file selection (MRFS) scheme, are considered. Further, to suppress the harmful interference among the concurrent transmissions of F-UEs, a multicast-based opportunistic content delivery strategy is proposed by exploring the idea of opportunistic spectrum access (OSA). Assuming decentralized probabilistic caching, we first derive the activation probability of the F-UEs. Then, by adopting an appropriate approximation, the cache-hit probability, the coverage probability, and thereby the successful content delivery probability (SCDP) of the F-D2D network are evaluated. We also develop an iterative algorithm based on the gradient projection method to obtain a suboptimal caching policy for the maximization of SCDP. Extensive simulation and numerical results are presented to verify our analysis and demonstrate the superior performance of the proposed multicast-based opportunistic content delivery strategy.

preprint2022arXiv

Search for lensing signatures from the latest fast radio burst observations and constraints on the abundance of primordial black holes

The possibility that primordial black holes (PBHs) form a part of dark matter has been considered for a long time but poorly constrained over a wide mass range. Fast radio bursts (FRBs) are bright radio transients with millisecond duration. Lensing effect of them has been proposed as one of the cleanest probes for constraining the presence of PBHs in the stellar mass window. In this paper, we first apply the normalised cross-correlation algorithm to search and identify candidates of lensed FRBs in the latest public FRB observations, i.e. $593$ FRBs which mainly consist of the first Canadian Hydrogen Intensity Mapping Experiment FRB catalog, and then derive constraints on the abundance of PBHs from the null search result of lensing signature. For a monochromatic mass distribution, the fraction of dark matter made up of PBHs could be constrained to $\leq87\%$ for $\geq500~M_{\odot}$ at 95\% confidence level by assuming flux ratio thresholds dependent signal-to-noise ratio for each FRB and that apparently one-off events are intrinsic single bursts. This result would be improved by a three times factor when a conventional constant flux ratio threshold is considered. Moreover, we derive constraints on PBHs with a log-normal mass function naturally predicted by some popular inflation models and often investigated with gravitational wave detections. We find that, in this mass distribution scenario, the constraint from currently public FRB observations is relatively weaker than the one from gravitational wave detections. It is foreseen that upcoming complementary multi-messenger observations will yield considerable constraints on the possibilities of PBHs in this intriguing mass window.

preprint2022arXiv

The $S_8$ Tension in Light of Updated Redshift-Space Distortion Data and PAge Approximation

One of the most prominent challenges to the standard Lambda cold dark matter ($Λ$CDM) cosmology is the tension between the structure growth parameter $S_8$ constrained by the cosmic microwave background (CMB) data and the smaller one suggested by the cosmic shear data. Recent studies show that, for $Λ$CDM cosmology, redshift-space distortion (RSD) data also prefers a smaller $S_8$ that is $\sim 2$-$3σ$ lower than the CMB value, but the result is sensitive to the cosmological model. In the present work we update the RSD constraint on $S_8$ with the most up-to-date RSD data set where the correlation between data points are properly taken into account. To reduce the model dependence, we add in our Monte Carlo Markov Chain calculation the most up-to-date data sets of Type Ia supernovae (SN) and baryon acoustic oscillations (BAO), whose correlation with RSD is also taken into account, to constrain the background geometry. For $Λ$CDM cosmology we find $S_8= 0.812 \pm 0.026$, which is $\sim 2σ$ larger than previous studies, and hence is consistent with the CMB constraint. By replacing $Λ$CDM with the Parameterization based on cosmic Age (PAge), an almost model-independent description of the late universe, we find that the RSD + SN + BAO constraint on $S_8$ is insensitive to the cosmological model.

preprint2021arXiv

Container Orchestration on HPC Systems

Containerisation demonstrates its efficiency in application deployment in cloud computing. Containers can encapsulate complex programs with their dependencies in isolated environments, hence are being adopted in HPC clusters. HPC workload managers lack micro-services support and deeply integrated container management, as opposed to container orchestrators (e.g. Kubernetes). We introduce Torque-Operator (a plugin) which serves as a bridge between HPC workload managers and container Orchestrators.

preprint2020arXiv

Collectives in hybrid MPI+MPI code: design, practice and performance

The use of hybrid scheme combining the message passing programming models for inter-node parallelism and the shared memory programming models for node-level parallelism is widely spread. Existing extensive practices on hybrid Message Passing Interface (MPI) plus Open Multi-Processing (OpenMP) programming account for its popularity. Nevertheless, strong programming efforts are required to gain performance benefits from the MPI+OpenMP code. An emerging hybrid method that combines MPI and the MPI shared memory model (MPI+MPI) is promising. However, writing an efficient hybrid MPI+MPI program -- especially when the collective communication operations are involved -- is not to be taken for granted. In this paper, we propose a new design method to implement hybrid MPI+MPI context-based collective communication operations. Our method avoids on-node memory replications (on-node communication overheads) that are required by semantics in pure MPI. We also offer wrapper primitives hiding all the design details from users, which comes with practices on how to structure hybrid MPI+MPI code with these primitives. The micro-benchmarks show that our collectives are comparable or superior to those in pure MPI context. We have further validated the effectiveness of the hybrid MPI+MPI model (which uses our wrapper primitives) in three computational kernels, by comparison to the pure MPI and hybrid MPI+OpenMP models.

preprint2020arXiv

MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes

The advent of multi-/many-core processors in clusters advocates hybrid parallel programming, which combines Message Passing Interface (MPI) for inter-node parallelism with a shared memory model for on-node parallelism. Compared to the traditional hybrid approach of MPI plus OpenMP, a new, but promising hybrid approach of MPI plus MPI-3 shared-memory extensions (MPI+MPI) is gaining attraction. We describe an algorithmic approach for collective operations (with allgather and broadcast as concrete examples) in the context of hybrid MPI+MPI, so as to minimize memory consumption and memory copies. With this approach, only one memory copy is maintained and shared by on-node processes. This allows the removal of unnecessary on-node copies of replicated data that are required between MPI processes when the collectives are invoked in the context of pure MPI. We compare our approach of collectives for hybrid MPI+MPI and the traditional one for pure MPI, and also have a discussion on the synchronization that is required to guarantee data integrity. The performance of our approach has been validated on a Cray XC40 system (Cray MPI) and NEC cluster (OpenMPI), showing that it achieves comparable or better performance for allgather operations. We have further validated our approach with a standard computational kernel, namely distributed matrix multiplication, and a Bayesian Probabilistic Matrix Factorization code.

preprint2020arXiv

Taking the pulse of COVID-19: A spatiotemporal perspective

The sudden outbreak of the Coronavirus disease (COVID-19) swept across the world in early 2020, triggering the lockdowns of several billion people across many countries, including China, Spain, India, the U.K., Italy, France, Germany, and most states of the U.S. The transmission of the virus accelerated rapidly with the most confirmed cases in the U.S., and New York City became an epicenter of the pandemic by the end of March. In response to this national and global emergency, the NSF Spatiotemporal Innovation Center brought together a taskforce of international researchers and assembled implemented strategies to rapidly respond to this crisis, for supporting research, saving lives, and protecting the health of global citizens. This perspective paper presents our collective view on the global health emergency and our effort in collecting, analyzing, and sharing relevant data on global policy and government responses, geospatial indicators of the outbreak and evolving forecasts; in developing research capabilities and mitigation measures with global scientists, promoting collaborative research on outbreak dynamics, and reflecting on the dynamic responses from human societies.

preprint2019arXiv

Model-independent Estimations for the Cosmic Curvature from the Latest Strong Gravitational Lensing Systems

Model-independent measurements for the cosmic spatial curvature, which is related to the nature of cosmic space-time geometry, plays an important role in cosmology. On the basis of the Distance Sum Rule in the Friedmann-Lema{î}tre-Robertson-Walker metric, (distance ratio) measurements of strong gravitational lensing (SGL) systems together with distances from type Ia supernovae observations have been proposed to directly estimate the spatial curvature without any assumptions for the theories of gravity and contents of the universe. However, previous studies indicated that a spatially closed universe was strongly preferred. In this paper, we re-estimate the cosmic curvature with the latest SGL data which includes 163 well-measured systems. In addition, possible factors, e.g. combination of SGL data from different surveys and stellar mass of the lens galaxy, which might affect estimations for the spatial curvature, are considered in our analysis. We find that, except the case where only SGL systems from the Sloan Lens ACS Survey are considered, a spatially flat universe is consistently favored at very high confidence level by the latest observations. It is suggested that the increasing number of well-measured strong lensing events might significantly reduce the bias of estimation for the cosmic curvature.

preprint2016arXiv

A Bandwidth-saving Optimization for MPI Broadcast Collective Operation

The efficiency and scalability of MPI collective operations, in particular the broadcast operation, plays an integral part in high performance computing applications. MPICH, as one of the contemporary widely-used MPI software stacks, implements the broadcast operation based on point-to-point operation. Depending on the parameters, such as message size and process count, the library chooses to use different algorithms, as for instance binomial dissemination, recursive-doubling exchange or ring all-to-all broadcast (allgather). However, the existing broadcast design in latest release of MPICH does not provide good performance for large messages (\textit{lmsg}) or medium messages with non-power-of-two process counts (\textit{mmsg-npof2}) due to the inner suboptimal ring allgather algorithm. In this paper, based on the native broadcast design in MPICH, we propose a tuned broadcast approach with bandwidth-saving in mind catering to the case of \textit{lmsg} and \textit{mmsg-npof2}. Several comparisons of the native and tuned broadcast designs are made for different data sizes and program sizes on Cray XC40 cluster. The results show that the performance of the tuned broadcast design can get improved by a range from 2\% to 54\% for \textit{lmsg} and \textit{mmsg-npof2} in terms of user-level testing.

preprint2016arXiv

Asynchronous progress design for a MPI-based PGAS one-sided communication system

Remote-memory-access models, also known as one-sided communication models, are becoming an interesting alternative to traditional two-sided communication models in the field of High Performance Computing. In this paper we extend previous work on an MPI-based, locality-aware remote-memory-access model with a asynchronous progress-engine for non-blocking communication operations. Most previous related work suggests to drive progression on communication through an additional thread within the application process. In contrast, our scheme uses an arbitrary number of dedicated processes to drive asynchronous progression. Further, we describe a prototypical library implementation of our concepts, namely DART, which is used to quantitatively evaluate our design against a MPI-3 baseline reference. The evaluation consists of micro-benchmark to measure overlap of communication and computation and a scientific application kernel to assess total performance impact on realistic use-cases. Our benchmarks shows, that our asynchronous progression scheme can overlap computation and communication efficiently and lead to substantially shorter communication cost in real applications.

preprint2016arXiv

Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems

The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers.

preprint2016arXiv

Modulated Pulses Based High Spatial Resolution Distributed Fiber System for Multi-Parameter Sensing

We demonstrate a hybrid distributed fiber sensing system for multi-parameter detection. The integration of phase-sensitive optical time domain reflectometry (Φ-OTDR) and Brillouin optical time domain reflectometry (B-OTDR) enables measurement of vibration, temperature and strain. Exploiting the fast changing property of vibration and the static property of temperature and strain, the laser pulse width and intensity are modulated and then injected into the single-mode sensing fiber proportionally, so that the three concerned parameters can be extracted simultaneously by only one photo-detector and data acquisition channel. Combining with advanced data processing methods, the modulation of laser pulse brings additional advantages because of trade and balance between the backscattering light power and nonlinear effect noise, which enhances the signal-to-noise ratio, and enables sub-meter level spatial resolution together with long sensing distance. The proposed method realizes up to 4.8 kHz vibration sensing with 3 m spatial resolution at 10 km standard single-mode fiber. And measurements of the distributed temperature and stress profile along the same fiber with 80 cm spatial resolution are also achieved concurrently.

preprint2016arXiv

Towards performance portability through locality-awareness for applications using one-sided communication primitives

MPI is the most widely used data transfer and communication model in High Performance Computing. The latest version of the standard, MPI-3, allows skilled programmers to exploit all hardware capabilities of the latest and future supercomputing systems. The revised asynchronous remote-memory-access model in combination with the shared-memory window extension, in particular, allow writing code that hides communication latencies and optimizes communication paths according to the locality of data origin and destination. The latter is particularly important for today's multi- and many-core systems. However, writing such efficient code is highly complex and error-prone. In this paper we evaluate a recent remote-memory-access model, namely DART-MPI. This model claims to hide the aforementioned complexities from the programmer, but deliver locality-aware remote-memory-access semantics which outperforms MPI-3 one-sided communication primitives on multi-core systems. Conceptually, the DART-MPI interface is simple; at the same time it takes care of the complexities of the underlying MPI-3 and system topology. This makes DART-MPI an interesting candidate for porting legacy applications. We evaluate these claims using a realistic scientific application, specifically a finite-difference stencil code which solves the heat diffusion equation, on a large-scale Cray XC40 installation.

preprint2015arXiv

DART-MPI: An MPI-based Implementation of a PGAS Runtime System

A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.

Huan Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Testing supermassive primordial black holes with lensing signals of binary black hole merges

Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings

A Search for Millilensing Gamma-Ray Bursts in the Observations of Fermi GBM

Breast Cancer Molecular Subtypes Prediction on Pathological Images with Discriminative Patch Selecting and Multi-Instance Learning

Constraints on the abundance of primordial black holes with different mass distributions from lensing of fast radio bursts

Constraints on the abundance of supermassive primordial black holes from lensing of compact radio sources

Multiple DP-coloring of planar graphs without 3-cycles and normally adjacent 4-cycles

Performance Analysis of Fog-Aided D2D Networks with Multicast-Based Opportunistic Content Delivery

Search for lensing signatures from the latest fast radio burst observations and constraints on the abundance of primordial black holes

The $S_8$ Tension in Light of Updated Redshift-Space Distortion Data and PAge Approximation

Container Orchestration on HPC Systems

Collectives in hybrid MPI+MPI code: design, practice and performance

MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes

Taking the pulse of COVID-19: A spatiotemporal perspective

Model-independent Estimations for the Cosmic Curvature from the Latest Strong Gravitational Lensing Systems

A Bandwidth-saving Optimization for MPI Broadcast Collective Operation

Asynchronous progress design for a MPI-based PGAS one-sided communication system

Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems

Modulated Pulses Based High Spatial Resolution Distributed Fiber System for Multi-Parameter Sensing

Towards performance portability through locality-awareness for applications using one-sided communication primitives

DART-MPI: An MPI-based Implementation of a PGAS Runtime System