Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
18topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Recent advances in coding agents suggest rapid progress toward autonomous software development, yet existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software systems. Most prior evaluations focus on localized code generation, scaffolded completion, or short-term repair tasks, leaving open the question of whether agents can sustain coherent reasoning, planning, and execution over the extended horizons demanded by real-world repository construction. To address this gap, we present NL2Repo Bench, a benchmark explicitly designed to evaluate the long-horizon repository generation ability of coding agents. Given only a single natural-language requirements document and an empty workspace, agents must autonomously design the architecture, manage dependencies, implement multi-module logic, and produce a fully installable Python library. Our experiments across state-of-the-art open- and closed-source models reveal that long-horizon repository generation remains largely unsolved: even the strongest agents achieve below 40% average test pass rates and rarely complete an entire repository correctly. Detailed analysis uncovers fundamental long-horizon failure modes, including premature termination, loss of global coherence, fragile cross-file dependencies, and inadequate planning over hundreds of interaction steps. NL2Repo Bench establishes a rigorous, verifiable testbed for measuring sustained agentic competence and highlights long-horizon reasoning as a central bottleneck for the next generation of autonomous coding agents.

preprint2026arXiv

Testing supermassive primordial black holes with lensing signals of binary black hole merges

Next-generation ground-based gravitational wave (GW) detectors are expected to observe millions of binary black hole mergers, a fraction of which will be strongly lensed by intervening galaxies or clusters, producing multiple images with characteristic distribution of time delay. Importantly, the predicted rate and properties of such events are sensitive to the abundance and distribution of strong lensing objects which directly depends on cosmological models. One such scenario posits the existence of supermassive primordial black holes (SMPBHs) in the early universe, which would enhance the formation of dark matter halos. This mechanism has been proposed to explain the abundance of high-redshift galaxies observed by James Webb Space Telescope. Crucially, the same cosmological model with SMPBHs would also leave a distinct imprint on the population of strongly lensed GWs. It predicts both an increased event rate and a modified distribution of time delays between the multiple images. Therefore, we propose statistical measurements of the rate and time delay distribution of strong lensing GW events as a powerful probe to directly constrain the abundance of SMPBHs. Considering $Λ$CDM cosmology with (non-)clustered SMPBHs, we find that the abundance of SMPBHs $f_{\rm PBH}$ with masses above $10^8~M_{\odot}$ is constrained to be $\sim10^{-4}$ at $95\%$ confidence level. It will be comparable and complementary to the currently available constraint from large scale structure observations.

preprint2023arXiv

Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings

As a practical alternative of speech separation, target speaker extraction (TSE) aims to extract the speech from the desired speaker using additional speaker cue extracted from the speaker. Its main challenge lies in how to properly extract and leverage the speaker cue to benefit the extracted speech quality. The cue extraction method adopted in majority existing TSE studies is to directly utilize discriminative speaker embedding, which is extracted from the pre-trained models for speaker verification. Although the high speaker discriminability is a most desirable property for speaker verification task, we argue that it may be too sophisticated for TSE. In this study, we propose that a simplified speaker cue with clear class separability might be preferred for TSE. To verify our proposal, we introduce several forms of speaker cues, including naive speaker embedding (such as, x-vector and xi-vector) and new speaker embeddings produced from sparse LDA-transform. Corresponding TSE models are built by integrating these speaker cues with SepFormer (one SOTA speech separation model). Performances of these TSE models are examined on the benchmark WSJ0-2mix dataset. Experimental results validate the effectiveness and generalizability of our proposal, showing up to 9.9% relative improvement in SI-SDRi. Moreover, with SI-SDRi of 19.4 dB and PESQ of 3.78, our best TSE system significantly outperforms the current SOTA systems and offers the top TSE results reported till date on the WSJ0-2mix.

preprint2022arXiv

A Search for Millilensing Gamma-Ray Bursts in the Observations of Fermi GBM

Millilensing of Gamma-Ray Bursts (GRBs) is expected to manifest as multiple emission episodes in a single triggered GRB with similar light-curve patterns and similar spectrum properties. Identifying such lensed GRBs could help improve constraints on the abundance of compact dark matter. Here we present a systemic search for millilensing among 3000 GRBs observed by the \textit{Fermi} GBM up to 2021 April. Eventually we find 4 interesting candidates by performing auto-correlation test, hardness test, and time-integrated/resolved spectrum test to the whole sample. GRB 081126A and GRB 090717A are ranked as the first class candidate based on their excellent performance both in temporal and spectrum analysis. GRB 081122A and GRB 110517B are ranked as the second class candidates (suspected candidates), mainly because their two emission episodes show clear deviations in part of the time-resolved spectrum or in the time-integrated spectrum. Considering a point mass model for the gravitational lens, our results suggest that the density parameter of lens objects with mass $M_{\rm L}\sim10^{6} M_{\odot}$ is larger than $1.5\times10^{-3}$.

preprint2022arXiv

Breast Cancer Molecular Subtypes Prediction on Pathological Images with Discriminative Patch Selecting and Multi-Instance Learning

Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from conventional H&E pathological whole slide images (WSI) using AI method is useful and critical to assist pathologists pre-screen proper paraffin block for IHC. It's a challenging task since only WSI level labels of molecular subtypes can be obtained from IHC. Gigapixel WSIs are divided into a huge number of patches to be computationally feasible for deep learning. While with coarse slide-level labels, patch-based methods may suffer from abundant noise patches, such as folds, overstained regions, or non-tumor tissues. A weakly supervised learning framework based on discriminative patch selecting and multi-instance learning was proposed for breast cancer molecular subtype prediction from H&E WSIs. Firstly, co-teaching strategy was adopted to learn molecular subtype representations and filter out noise patches. Then, a balanced sampling strategy was used to handle the imbalance in subtypes in the dataset. In addition, a noise patch filtering algorithm that used local outlier factor based on cluster centers was proposed to further select discriminative patches. Finally, a loss function integrating patch with slide constraint information was used to finetune MIL framework on obtained discriminative patches and further improve the performance of molecular subtyping. The experimental results confirmed the effectiveness of the proposed method and our models outperformed even senior pathologists, with potential to assist pathologists to pre-screen paraffin blocks for IHC in clinic.

preprint2022arXiv

Constraints on the abundance of primordial black holes with different mass distributions from lensing of fast radio bursts

Primordial black holes (PBHs) has been considered to form a part of dark matter for a long time but the possibility has been poorly constrained over a wide mass range, including the stellar mass range ($1-100~M_{\odot}$). However, due to the discovery of merger events of black hole binaries by LIGO-Virgo gravitational wave observatories, the interest for PBHs in the stellar mass window has been aroused again. Fast radio bursts (FRBs) are bright radio transients with millisecond duration and very high all-sky occurrence rate. Lensing effect of these bursts has been proposed as one of the optimal probes for constraining the abundance of PBHs in the stellar mass range. In this paper, we first investigate constraints on the abundance of PBHs from the latest $593$ FRB observations for both the monochromatic mass distribution and three other popular extended mass distributions related to different formation mechanisms of PBHs. It is found that constraints from currently public FRB observations are relatively weaker than those from existing gravitational wave detections. Furthermore, we forecast constraining power of future FRB observations on the abundance of PBHs with different mass distributions of PBHs and different redshift distributions of FRBs taken into account. Finally, We find that constraints of parameter space on extended mass distributions from $\sim10^5$ FRBs with $\overline{Δt}\leq1 ~\rm ms$ would be comparable with what can be constrained from gravitational wave events. It is foreseen that upcoming complementary multi-messenger observations will yield considerable constraints on the possibilities of PBHs in this intriguing mass window.

preprint2022arXiv

Constraints on the abundance of supermassive primordial black holes from lensing of compact radio sources

The possibility that primordial black holes (PBHs) form a part of dark matter has been considered over a wide mass range from the Planck mass ($10^{-5}~\rm g$) to the level of the supermassive black hole in the center of the galaxy. Primordial origin might be one of the most important formation channel of supermassive black holes. We use the non-detection of lensing effect of very long baseline interferometer observations of compact radio sources with extremely high angular resolution as a promising probe to constrain the abundance of intergalactic PBHs in the mass range $\sim10^4$-$10^9~M_{\odot}$. For a sample of well-measured 543 flat-spectrum compact radio sources, no milli-lensed images are found with angular separations between $1.5$ milli-arcseconds and $50$ milli-arcseconds. From this null search result, we derive that the fraction of dark matter made up of supermassive PBHs in the mass range $\sim10^6$-$10^8~M_{\odot}$ is $\lesssim1.48\%$ at $95\%$ confidence level. This constraints would be significantly improved due to the rapid increase of the number of measured compact radio sources. For instance, on the basis of none confirmed milli-lensing candidate in the latest $\sim14000$ sources, we derive the abundance of supermassive PBHs and obtain that it is $\lesssim0.06\%$ at $95\%$ confidence level.

preprint2022arXiv

Multiple DP-coloring of planar graphs without 3-cycles and normally adjacent 4-cycles

The concept of DP-coloring of a graph is a generalization of list coloring introduced by Dvořák and Postle in 2015. Multiple DP-coloring of graphs, as a generalization of multiple list coloring, was first studied by Bernshteyn, Kostochka and Zhu in 2019. This paper proves that planar graphs without 3-cycles and normally adjacent 4-cycles are $(7m, 2m)$-DP-colorable for every integer $m$. As a consequence, the strong fractional choice number of any planar graph without 3-cycles and normally adjacent 4-cycles is at most $7/2$.

preprint2022arXiv

Performance Analysis of Fog-Aided D2D Networks with Multicast-Based Opportunistic Content Delivery

In this paper, we develop a comprehensive and tractable analytical framework based on stochastic geometry to evaluate the performance of large-scale fog-aided device-to-device (F-D2D) networks with opportunistic content multicasting. As a part of the analysis, to resolve the contentions of file requests from the cache-incapable conventional user equipments (C-UEs), two simple yet typical candidate file selection schemes for cache-enabled fog user equipments (F-UEs), namely the random file selection (RFS) scheme and the most requested file selection (MRFS) scheme, are considered. Further, to suppress the harmful interference among the concurrent transmissions of F-UEs, a multicast-based opportunistic content delivery strategy is proposed by exploring the idea of opportunistic spectrum access (OSA). Assuming decentralized probabilistic caching, we first derive the activation probability of the F-UEs. Then, by adopting an appropriate approximation, the cache-hit probability, the coverage probability, and thereby the successful content delivery probability (SCDP) of the F-D2D network are evaluated. We also develop an iterative algorithm based on the gradient projection method to obtain a suboptimal caching policy for the maximization of SCDP. Extensive simulation and numerical results are presented to verify our analysis and demonstrate the superior performance of the proposed multicast-based opportunistic content delivery strategy.

preprint2022arXiv

Search for lensing signatures from the latest fast radio burst observations and constraints on the abundance of primordial black holes

The possibility that primordial black holes (PBHs) form a part of dark matter has been considered for a long time but poorly constrained over a wide mass range. Fast radio bursts (FRBs) are bright radio transients with millisecond duration. Lensing effect of them has been proposed as one of the cleanest probes for constraining the presence of PBHs in the stellar mass window. In this paper, we first apply the normalised cross-correlation algorithm to search and identify candidates of lensed FRBs in the latest public FRB observations, i.e. $593$ FRBs which mainly consist of the first Canadian Hydrogen Intensity Mapping Experiment FRB catalog, and then derive constraints on the abundance of PBHs from the null search result of lensing signature. For a monochromatic mass distribution, the fraction of dark matter made up of PBHs could be constrained to $\leq87\%$ for $\geq500~M_{\odot}$ at 95\% confidence level by assuming flux ratio thresholds dependent signal-to-noise ratio for each FRB and that apparently one-off events are intrinsic single bursts. This result would be improved by a three times factor when a conventional constant flux ratio threshold is considered. Moreover, we derive constraints on PBHs with a log-normal mass function naturally predicted by some popular inflation models and often investigated with gravitational wave detections. We find that, in this mass distribution scenario, the constraint from currently public FRB observations is relatively weaker than the one from gravitational wave detections. It is foreseen that upcoming complementary multi-messenger observations will yield considerable constraints on the possibilities of PBHs in this intriguing mass window.

preprint2022arXiv

The $S_8$ Tension in Light of Updated Redshift-Space Distortion Data and PAge Approximation

One of the most prominent challenges to the standard Lambda cold dark matter ($Λ$CDM) cosmology is the tension between the structure growth parameter $S_8$ constrained by the cosmic microwave background (CMB) data and the smaller one suggested by the cosmic shear data. Recent studies show that, for $Λ$CDM cosmology, redshift-space distortion (RSD) data also prefers a smaller $S_8$ that is $\sim 2$-$3σ$ lower than the CMB value, but the result is sensitive to the cosmological model. In the present work we update the RSD constraint on $S_8$ with the most up-to-date RSD data set where the correlation between data points are properly taken into account. To reduce the model dependence, we add in our Monte Carlo Markov Chain calculation the most up-to-date data sets of Type Ia supernovae (SN) and baryon acoustic oscillations (BAO), whose correlation with RSD is also taken into account, to constrain the background geometry. For $Λ$CDM cosmology we find $S_8= 0.812 \pm 0.026$, which is $\sim 2σ$ larger than previous studies, and hence is consistent with the CMB constraint. By replacing $Λ$CDM with the Parameterization based on cosmic Age (PAge), an almost model-independent description of the late universe, we find that the RSD + SN + BAO constraint on $S_8$ is insensitive to the cosmological model.

preprint2021arXiv

Container Orchestration on HPC Systems

Containerisation demonstrates its efficiency in application deployment in cloud computing. Containers can encapsulate complex programs with their dependencies in isolated environments, hence are being adopted in HPC clusters. HPC workload managers lack micro-services support and deeply integrated container management, as opposed to container orchestrators (e.g. Kubernetes). We introduce Torque-Operator (a plugin) which serves as a bridge between HPC workload managers and container Orchestrators.

preprint2020arXiv

Collectives in hybrid MPI+MPI code: design, practice and performance

The use of hybrid scheme combining the message passing programming models for inter-node parallelism and the shared memory programming models for node-level parallelism is widely spread. Existing extensive practices on hybrid Message Passing Interface (MPI) plus Open Multi-Processing (OpenMP) programming account for its popularity. Nevertheless, strong programming efforts are required to gain performance benefits from the MPI+OpenMP code. An emerging hybrid method that combines MPI and the MPI shared memory model (MPI+MPI) is promising. However, writing an efficient hybrid MPI+MPI program -- especially when the collective communication operations are involved -- is not to be taken for granted. In this paper, we propose a new design method to implement hybrid MPI+MPI context-based collective communication operations. Our method avoids on-node memory replications (on-node communication overheads) that are required by semantics in pure MPI. We also offer wrapper primitives hiding all the design details from users, which comes with practices on how to structure hybrid MPI+MPI code with these primitives. The micro-benchmarks show that our collectives are comparable or superior to those in pure MPI context. We have further validated the effectiveness of the hybrid MPI+MPI model (which uses our wrapper primitives) in three computational kernels, by comparison to the pure MPI and hybrid MPI+OpenMP models.

preprint2020arXiv

MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes

The advent of multi-/many-core processors in clusters advocates hybrid parallel programming, which combines Message Passing Interface (MPI) for inter-node parallelism with a shared memory model for on-node parallelism. Compared to the traditional hybrid approach of MPI plus OpenMP, a new, but promising hybrid approach of MPI plus MPI-3 shared-memory extensions (MPI+MPI) is gaining attraction. We describe an algorithmic approach for collective operations (with allgather and broadcast as concrete examples) in the context of hybrid MPI+MPI, so as to minimize memory consumption and memory copies. With this approach, only one memory copy is maintained and shared by on-node processes. This allows the removal of unnecessary on-node copies of replicated data that are required between MPI processes when the collectives are invoked in the context of pure MPI. We compare our approach of collectives for hybrid MPI+MPI and the traditional one for pure MPI, and also have a discussion on the synchronization that is required to guarantee data integrity. The performance of our approach has been validated on a Cray XC40 system (Cray MPI) and NEC cluster (OpenMPI), showing that it achieves comparable or better performance for allgather operations. We have further validated our approach with a standard computational kernel, namely distributed matrix multiplication, and a Bayesian Probabilistic Matrix Factorization code.

preprint2020arXiv

Taking the pulse of COVID-19: A spatiotemporal perspective

The sudden outbreak of the Coronavirus disease (COVID-19) swept across the world in early 2020, triggering the lockdowns of several billion people across many countries, including China, Spain, India, the U.K., Italy, France, Germany, and most states of the U.S. The transmission of the virus accelerated rapidly with the most confirmed cases in the U.S., and New York City became an epicenter of the pandemic by the end of March. In response to this national and global emergency, the NSF Spatiotemporal Innovation Center brought together a taskforce of international researchers and assembled implemented strategies to rapidly respond to this crisis, for supporting research, saving lives, and protecting the health of global citizens. This perspective paper presents our collective view on the global health emergency and our effort in collecting, analyzing, and sharing relevant data on global policy and government responses, geospatial indicators of the outbreak and evolving forecasts; in developing research capabilities and mitigation measures with global scientists, promoting collaborative research on outbreak dynamics, and reflecting on the dynamic responses from human societies.

preprint2019arXiv

Model-independent Estimations for the Cosmic Curvature from the Latest Strong Gravitational Lensing Systems

Model-independent measurements for the cosmic spatial curvature, which is related to the nature of cosmic space-time geometry, plays an important role in cosmology. On the basis of the Distance Sum Rule in the Friedmann-Lema{î}tre-Robertson-Walker metric, (distance ratio) measurements of strong gravitational lensing (SGL) systems together with distances from type Ia supernovae observations have been proposed to directly estimate the spatial curvature without any assumptions for the theories of gravity and contents of the universe. However, previous studies indicated that a spatially closed universe was strongly preferred. In this paper, we re-estimate the cosmic curvature with the latest SGL data which includes 163 well-measured systems. In addition, possible factors, e.g. combination of SGL data from different surveys and stellar mass of the lens galaxy, which might affect estimations for the spatial curvature, are considered in our analysis. We find that, except the case where only SGL systems from the Sloan Lens ACS Survey are considered, a spatially flat universe is consistently favored at very high confidence level by the latest observations. It is suggested that the increasing number of well-measured strong lensing events might significantly reduce the bias of estimation for the cosmic curvature.