Researcher profile

Jian Xiao

Jian Xiao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2023arXiv

HLC2: a highly efficient cross-matching framework for large astronomical catalogues on heterogeneous computing environments

Cross-matching operation, which is to find corresponding data for the same celestial object or region from multiple catalogues,is indispensable to astronomical data analysis and research. Due to the large amount of astronomical catalogues generated by the ongoing and next-generation large-scale sky surveys, the time complexity of the cross-matching is increasing dramatically. Heterogeneous computing environments provide a theoretical possibility to accelerate the cross-matching, but the performance advantages of heterogeneous computing resources have not been fully utilized. To meet the challenge of cross-matching for substantial increasing amount of astronomical observation data, this paper proposes Heterogeneous-computing-enabled Large Catalogue Cross-matcher (HLC2), a high-performance cross-matching framework based on spherical position deviation on CPU-GPU heterogeneous computing platforms. It supports scalable and flexible cross-matching and can be directly applied to the fusion of large astronomical cataloguesfrom survey missions and astronomical data centres. A performance estimation model is proposed to locate the performance bottlenecks and guide the optimizations. A two-level partitioning strategy is designed to generate an optimized data placement according to the positions of celestial objects to increase throughput. To make HLC2 a more adaptive solution, the architecture-aware task splitting, thread parallelization, and concurrent scheduling strategies are designed and integrated. Moreover, a novel quad-direction strategy is proposed for the boundary problem to effectively balance performance and completeness. We have experimentally evaluated HLC2 using public released catalogue data. Experiments demonstrate that HLC2 scales well on different sizes of catalogues and the cross-matching speed is significantly improved compared to the state-of-the-art cross-matchers.

preprint2022arXiv

HEGrid: A High Efficient Multi-Channel Radio Astronomical Data Gridding Framework in Heterogeneous Computing Environments

The challenge to fully exploit the potential of existing and upcoming scientific instruments like large single-dish radio telescopes is to process the collected massive data effectively and efficiently. As a "quasi 2D stencil computation" with the "Moore neighborhood pattern," gridding is the most computationally intensive step in data reduction pipeline for radio astronomy studies, enabling astronomers to create correct sky images for further analysis. However, the existing gridding frameworks can either only run on multi-core CPU architecture or do not support high-concurrency, multi-channel data gridding. Their performance is then limited, and there are emerging needs for innovative gridding frameworks to process data from large single-dish radio telescopes like the Five-hundred-meter Aperture Spherical Telescope (FAST). To address those challenges, we developed a High Efficient Gridding framework, HEGrid, by overcoming the above limitations. Specifically, we propose and construct the gridding pipeline in heterogeneous computing environments and achieve multi-pipeline concurrency for high performance multi-channel processing. Furthermore, we propose pipeline-based co-optimization to alleviate the potential negative performance impact of possible intra- and inter-pipeline low computation and I/O utilization, including component share-based redundancy elimination, thread-level data reuse and overlapping I/O and computation. Our experiments are based on both simulated datasets and actual FAST observational datasets. The results show that HEGrid outperforms other state-of-the-art gridding frameworks by up to 5.5x and has robust hardware portability, including AMD Radeon Instinct GPU and NVIDIA GPU.

preprint2020arXiv

A Redistribution Tool for Long-Term Archive of Astronomical Observation Data

Astronomical observation data require long-term preservation, and the rapid accumulation of observation data makes it necessary to consider the cost of long-term archive storage. In addition to low-speed disk-based online storage, optical disk or tape-based offline storage can be used to save costs. However, for astronomical research that requires historical data (particularly time-domain astronomy), the performance and energy consumption of data-accessing techniques cause problems because the requested data (which are organized according to observation time) may be located across multiple storage devices. In this study, we design and develop a tool referred to as AstroLayout to redistribute the observation data using spatial aggregation. The core algorithm uses graph partitioning to generate an optimized data placement according to the original observation data statistics and the target storage system. For the given observation data, AstroLayout can copy the long-term archive in the target storage system in accordance with this placement. An efficiency evaluation shows that AstroLayout can reduce the number of devices activated when responding to data-access requests in time-domain astronomy research. In addition to improving the performance of data-accessing techniques, AstroLayout can also reduce the storage systems power consumption. For enhanced adaptability, it supports storage systems of any media, including optical disks, tapes, and hard disks.

preprint2020arXiv

Deep residual detection of radio frequency interference for FAST

Radio frequency interference (RFI) detection and excision are key steps in the data-processing pipeline of the Five-hundred-meter Aperture Spherical radio Telescope (FAST). Because of its high sensitivity and large data rate, FAST requires more accurate and efficient RFI flagging methods than its counterparts. In the last decades, approaches based upon artificial intelligence (AI), such as codes using convolutional neural networks (CNNs), have been proposed to identify RFI more reliably and efficiently. However, RFI flagging of FAST data with such methods has often proved to be erroneous, with further manual inspections required. In addition, network construction as well as preparation of training data sets for effective RFI flagging has imposed significant additional workloads. Therefore, rapid deployment and adjustment of AI approaches for different observations is impractical to implement with existing algorithms. To overcome such problems, we propose a model called RFI-Net. With the input of raw data without any processing, RFI-Net can detect RFI automatically, producing corresponding masks without any alteration of the original data. Experiments with RFI-Net using simulated astronomical data show that our model has outperformed existing methods in terms of both precision and recall. Besides, compared with other models, our method can obtain the same relative accuracy with fewer training data, thus reducing the effort and time required to prepare the training data set. Further, the training process of RFI-Net can be accelerated, with overfittings being minimized, compared with other CNN codes. The performance of RFI-Net has also been evaluated with observing data obtained by FAST and the Bleien Observatory. Our results demonstrate the ability of RFI-Net to accurately identify RFI with fine-grained, high-precision masks that required no further modification.

preprint2020arXiv

HCGrid: A Convolution-based Gridding Framework for RadioAstronomy in Hybrid Computing Environments

Gridding operation, which is to map non-uniform data samples onto a uniformly distributedgrid, is one of the key steps in radio astronomical data reduction process. One of the mainbottlenecks of gridding is the poor computing performance, and a typical solution for suchperformance issue is the implementation of multi-core CPU platforms. Although such amethod could usually achieve good results, in many cases, the performance of gridding is stillrestricted to an extent due to the limitations of CPU, since the main workload of gridding isa combination of a large number of single instruction, multi-data-stream operations, which ismore suitable for GPU, rather than CPU implementations. To meet the challenge of massivedata gridding for the modern large single-dish radio telescopes, e.g., the Five-hundred-meterAperture Spherical radio Telescope (FAST), inspired by existing multi-core CPU griddingalgorithms such as Cygrid, here we present an easy-to-install, high-performance, and open-source convolutional gridding framework, HCGrid,in CPU-GPU heterogeneous platforms. Itoptimises data search by employing multi-threading on CPU, and accelerates the convolutionprocess by utilising massive parallelisation of GPU. In order to make HCGrid a more adaptivesolution, we also propose the strategies of thread organisation and coarsening, as well as optimalparameter settings under various GPU architectures. A thorough analysis of computing timeand performance gain with several GPU parallel optimisation strategies show that it can leadto excellent performance in hybrid computing environments.

preprint2020arXiv

On the positivity of high-degree Schur classes of an ample vector bundle

Let $X$ be a smooth projective variety of dimension $n$, and let $E$ be an ample vector bundle over $X$. We show that any non-zero Schur class of $E$, lying in the cohomology group of bidegree $(n-1, n-1)$, has a representative which is strictly positive in the sense of smooth forms. This conforms the prediction of Griffiths conjecture on the positive polynomials of Chern classes/forms of an ample vector bundle on the form level, and thus strengthens the celebrated positivity results of Fulton-Lazarsfeld for certain degrees.

preprint2020arXiv

Towards an Astronomical Science Platform: Experiences and Lessons Learned from Chinese Virtual Observatory

In the era of big data astronomy, next generation telescopes and large sky surveys produce data sets at the TB or even PB level. Due to their large data volumes, these astronomical data sets are extremely difficult to transfer and analyze using personal computers or small clusters. In order to offer better access to data, data centers now generally provide online science platforms that enable analysis close to the data. The Chinese Virtual Observatory (China-VO) is one of the member projects in the International Virtual Observatory Alliance and it is dedicated to providing a research and education environment where globally distributed astronomy archives are simple to find, access, and interoperate. In this study, we summarize highlights of the work conducted at the China-VO, as well the experiences and lessons learned during the full life-cycle management of astronomical data. Finally, We discuss the challenges and future trends for astronomical science platforms.