Source author record

Tony Pan

Tony Pan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

10works
7topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2016arXiv

Parallel Pairwise Correlation Computation On Intel Xeon Phi Clusters

Co-expression network is a critical technique for the identification of inter-gene interactions, which usually relies on all-pairs correlation (or similar measure) computation between gene expression profiles across multiple samples. Pearson's correlation coefficient (PCC) is one widely used technique for gene co-expression network construction. However, all-pairs PCC computation is computationally demanding for large numbers of gene expression profiles, thus motivating our acceleration of its execution using high-performance computing. In this paper, we present LightPCC, the first parallel and distributed all-pairs PCC computation on Intel Xeon Phi (Phi) clusters. It achieves high speed by exploring the SIMD-instruction-level and thread-level parallelism within Phis as well as accelerator-level parallelism among multiple Phis. To facilitate balanced workload distribution, we have proposed a general framework for symmetric all-pairs computation by building bijective functions between job identifier and coordinate space for the first time. We have evaluated LightPCC and compared it to two CPU-based counterparts: a sequential C++ implementation in ALGLIB and an implementation based on a parallel general matrix-matrix multiplication routine in Intel Math Kernel Library (MKL) (all use double precision), using a set of gene expression datasets. Performance evaluation revealed that with one 5110P Phi and 16 Phis, LightPCC runs up to $20.6\times$ and $218.2\times$ faster than ALGLIB, and up to $6.8\times$ and $71.4\times$ faster than single-threaded MKL, respectively. In addition, LightPCC demonstrated good parallel scalability in terms of number of Phis. Source code of LightPCC is publicly available at http://lightpcc.sourceforge.net.

preprint2014arXiv

Region Templates: Data Representation and Management for Large-Scale Image Analysis

Distributed memory machines equipped with CPUs and GPUs (hybrid computing nodes) are hard to program because of the multiple layers of memory and heterogeneous computing configurations. In this paper, we introduce a region template abstraction for the efficient management of common data types used in analysis of large datasets of high resolution images on clusters of hybrid computing nodes. The region template provides a generic container template for common data structures, such as points, arrays, regions, and object sets, within a spatial and temporal bounding box. The region template abstraction enables different data management strategies and data I/O implementations, while providing a homogeneous, unified interface to the application for data storage and retrieval. The execution of region templates applications is coordinated by a runtime system that supports efficient execution in hybrid machines. Region templates applications are represented as hierarchical dataflow in which each computing stage may be represented as another dataflow of finer-grain tasks. A number of optimizations for hybrid machines are available in our runtime system, including performance-aware scheduling for maximizing utilization of computing devices and techniques to reduce impact of data transfers between CPUs and GPUs. An experimental evaluation on a state-of-the-art hybrid cluster using a microscopy imaging study shows that this abstraction adds negligible overhead (about 3%) and achieves good scalability.

preprint2013arXiv

Finding Core Collapse Supernova from the Epoch of Reionization Behind Cluster Lenses

Current surveys are underway to utilize gravitational lensing by galaxy clusters with Einstein radii >35" in the search for the highest redshift galaxies. Associated supernova from the epoch of reionization would have their fluxes boosted above the detection threshold, extending their duration of visibility. We predict that the James Webb Space Telescope (JWST) will be able to discover lensed core-collapse supernovae at redshifts exceeding z=7-8.

preprint2013arXiv

Super-luminous X-ray Emission from the Interaction of Supernova Ejecta with Dense Circumstellar Shells

For supernova powered by the conversion of kinetic energy into radiation due to the interactions of the ejecta with a dense circumstellar shell, we show that there could be X-ray analogues of optically super-luminous SNe with comparable luminosities and energetics. We consider X-ray emission from the forward shock of SNe ejecta colliding into an optically-thin CSM shell, derive simple expressions for the X-ray luminosity as a function of the circumstellar shell characteristics, and discuss the different regimes in which the shock will be radiative or adiabatic, and whether the emission will be dominated by free-free radiation or line-cooling. We find that even with normal supernova explosion energies of 10^51 erg, there exists CSM shell configurations that can liberate a large fraction of the explosion energy in X-rays, producing unabsorbed X-ray luminosities approaching 10^44 erg/s events lasting a few months, or even 10^45 erg/s flashes lasting days. Although the large column density of the circumstellar shell can absorb most of the flux from the initial shock, the most luminous events produce hard X-rays that are less susceptible to photoelectric absorption, and can counteract such losses by completely ionizing the intervening material. Regardless, once the shock traverses the entire circumstellar shell, the full luminosity could be available to observers.

preprint2012arXiv

Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines

In this paper, we address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50x and 85x with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.

preprint2012arXiv

High-throughput Execution of Hierarchical Analysis Pipelines on Hybrid Cluster Platforms

We propose, implement, and experimentally evaluate a runtime middleware to support high-throughput execution on hybrid cluster machines of large-scale analysis applications. A hybrid cluster machine consists of computation nodes which have multiple CPUs and general purpose graphics processing units (GPUs). Our work targets scientific analysis applications in which datasets are processed in application-specific data chunks, and the processing of a data chunk is expressed as a hierarchical pipeline of operations. The proposed middleware system combines a bag-of-tasks style execution with coarse-grain dataflow execution. Data chunks and associated data processing pipelines are scheduled across cluster nodes using a demand driven approach, while within a node operations in a given pipeline instance are scheduled across CPUs and GPUs. The runtime system implements several optimizations, including performance aware task scheduling, architecture aware process placement, data locality conscious task assignment, and data prefetching and asynchronous data copy, to maximize utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. The application and performance benefits of the runtime middleware are demonstrated using an image analysis application, which is employed in a brain cancer study, on a state-of-the-art hybrid cluster in which each node has two 6-core CPUs and three GPUs. Our results show that implementing and scheduling application data processing as a set of fine-grain operations provide more opportunities for runtime optimizations and attain better performance than a coarser-grain, monolithic implementation. The proposed runtime system can achieve high-throughput processing of large datasets - we were able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles at about 150 tiles/second rate on 100 nodes.

preprint2012arXiv

Identifying Stars of Mass >150 Msun from Their Eclipse by a Binary Companion

We examine the possibility that very massive stars greatly exceeding the commonly adopted stellar mass limit of 150 Msun may be present in young star clusters in the local universe. We identify ten candidate clusters, some of which may host stars with masses up to 600 Msun formed via runaway collisions. We estimate the probabilities of these very massive stars being in eclipsing binaries to be >30%. Although most of these systems cannot be resolved at present, their transits can be detected at distances of 3 Mpc even under the contamination of the background cluster light, due to the large associated luminosities ~10^7 Lsun and mean transit depths of ~10^6 Lsun. Discovery of very massive eclipsing binaries would flag possible progenitors of pair-instability supernovae and intermediate-mass black holes.

preprint2012arXiv

Measuring the History of Cosmic Reionization using the 21-cm Difference PDF

During cosmic reionization, the 21-cm brightness fluctuations were highly non-Gaussian, and complementary statistics can be extracted from the distribution of pixel brightness temperatures that are not derivable from the 21-cm power spectrum. One such statistic is the 21-cm difference PDF, the probability distribution function of the difference in the 21-cm brightness temperatures between two points, as a function of the distance between the points. Guided by 21-cm difference PDFs extracted from simulations, we perform a maximum likelihood analysis on mock observational data, and analyze the ability of present and future low-frequency radio array experiments to estimate the shape of the 21-cm difference PDF, and measure the history of cosmic reionization. We find that one-year data with an experiment such as the Murchison Wide-field Array should suffice for probing large scales during the mid-to-late stages of reionization, while a second-generation experiment should yield detailed measurements over a wide range of scales during most of the reionization era.

preprint2012arXiv

Pair-Instability Supernovae at the Epoch of Reionization

Pristine stars with masses between ~140 and 260 M_sun are theoretically predicted to die as pair-instability supernovae. These very massive progenitors could come from Pop III stars in the early universe. We model the light curves and spectra of pair-instability supernovae over a range of masses and envelope structures. At redshifts of reionization z >= 6, we calculate the rates and detectability of pair-instability and core collapse supernovae, and show that with the James Webb Space Telescope, it is possible to determine the contribution of Pop III and Pop II stars toward reionization by constraining the stellar initial mass function at that epoch using these supernovae. We also find the rates of Type Ia supernovae, and show that they are not rare during reionization, and can be used to probe the mass function at 4-8 M_sun. If the budget of ionizing photons was dominated by contributions from top-heavy Pop III stars, we predict that the bright end of the galaxy luminosity function will be contaminated by pair-instability supernovae.

preprint2012arXiv

Pair-Instability Supernovae via Collision Runaway in Young Dense Star Clusters

Stars with helium cores between ~64 and 133 M_sun are theoretically predicted to die as pair-instability supernovae. This requires very massive progenitors, which are theoretically prohibited for Pop II/I stars within the Galactic stellar mass limit due to mass loss via line-driven winds. However, the runaway collision of stars in a dense, young star cluster could create a merged star with sufficient mass to end its life as a pair-instability supernova, even with enhanced mass loss at non-zero metallicity. We show that the predicted rate from this mechanism is consistent with the inferred volumetric rate of roughly ~2x10^-9 Mpc^-3 yr^-1 of the two observed pair-instability supernovae, SN 2007bi and PTF 10nmn, neither of which have metal-free host galaxies. Contrary to prior literature, only pair-instability supernovae at low redshifts z<2 will be observable with the Large Synoptic Survey Telescope (LSST). We estimate the telescope will observe ~10^2 such events per year that originate from the collisional runaway mergers in clusters.