Researcher profile

Toshiyuki Fukushige

Toshiyuki Fukushige contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
15works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2016arXiv

Hierarchical Tree Algorithm for Collisional N-body Simulations on GRAPE

We present an implementation of the hierarchical tree algorithm on the individual timestep algorithm (the Hermite scheme) for collisional $N$-body simulations, running on GRAPE-9 system, a special-purpose hardware accelerator for gravitational many-body simulations. Such combination of the tree algorithm and the individual timestep algorithm was not easy on the previous GRAPE system mainly because its memory addressing scheme was limited only to sequential access to a full set of particle data. The present GRAPE-9 system has an indirect memory addressing unit and a particle memory large enough to store all particles data and also tree nodes data. The indirect memory addressing unit stores interaction lists for the tree algorithm, which is constructed on host computer, and, according to the interaction lists, force pipelines calculate only the interactions necessary. In our implementation, the interaction calculations are significantly reduced compared to direct $N^2$ summation in the original Hermite scheme. For example, we can archive about a factor 30 of speedup (equivalent to about 17 teraflops) against the Hermite scheme for a simulation of $N=10^6$ system, using hardware of a peak speed of 0.6 teraflops for the Hermite scheme.

preprint2014arXiv

A development of an accelerator board dedicated for multi-precision arithmetic operations and its application to Feynman loop integrals

Higher order corrections in perturbative quantum field theory are required for precise theoretical analysis to investigate new physics beyond the Standard Model. This indicates that we need to evaluate Feynman loop diagram with multi-loop integral which may require multi-precision calculation. We developed a dedicated accelerator system for multi-precision calculation (GRAPE9-MPX). We present performance results of our system for the case of Feynman two-loop box and three-loop selfenergy diagrams with multi-precision.

preprint2010arXiv

Effects of Hardness of Primordial Binaries on Evolution of Star Clusters

We investigate effects of hardness of primordial binaries on whole evolution of star clusters by means of N-body simulations. Using newly developed code, GORILLA, we simulated eleven N=16384 clusters with primordial binaries whose binding energies are equal in each cluster in range of 1-300kT_0, where 1.5kT_0 is average stellar kinetic energy at the initial time. We found that, in both soft (< 3kT_0) and hard (> 300kT_0) limits, clusters experience deep core collapse. In the intermediate hardness (10-100kT_0), the core collapses halt halfway due to an energy releases of the primordial binaries. The core radii at the halt can be explained by their energy

preprint2010arXiv

Mass-Loss Timescale of Star Clusters in an External Tidal Field. II. Effect of Mass Profile of Parent Galaxy

We investigate the long-term dynamical evolution of star clusters in a steady tidal field produced by its parent galaxy. In this paper, we focus on the influence of mass profile of the parent galaxy. The previous studies were done with the simplification where the parent galaxy was expressed by point mass. We express different mass profiles of the parent galaxy by the tidal fields in which the ratios of the epicyclic frequency to the angular velocity are different. We compare the mass-loss timescale of star clusters whose tidal radii are identical but in parent galaxies with different mass profile, by means of orbits calculations in fixed cluster potential and N-body simulations. In this situation, a cluster rotates around the parent galaxy more rapidly as the parent galaxy has shallower mass profile. We found that the mass-loss timescale increase 20% and 50% for the cases that the mass density profile of the parent galaxies are proportional to R^-2 and R^-1.5 where R is the distance from the galaxy center, compared to the point-mass case, in moderately strong tidal field. Counterintuitively, a cluster which rotates around the parent galaxy more rapidly has a longer lifetime. The increase of lifetime is due to the fact that the fraction occupied by regular-like orbit increases in shallower profile. Finally, we derive an evaluation formula for the mass-loss timescale of clusters. Our formula can explain a property of the population of the observed galactic globular clusters that their half-mass radii become smaller as their distances from the galactic center become smaller.

preprint2009arXiv

GreeM : Massively Parallel TreePM Code for Large Cosmological N-body Simulations

In this paper, we describe the implementation and performance of GreeM, a massively parallel TreePM code for large-scale cosmological N-body simulations. GreeM uses a recursive multi-section algorithm for domain decomposition. The size of the domains are adjusted so that the total calculation time of the force becomes the same for all processes. The loss of performance due to non-optimal load balancing is around 4%, even for more than 10^3 CPU cores. GreeM runs efficiently on PC clusters and massively-parallel computers such as a Cray XT4. The measured calculation speed on Cray XT4 is 5 \times 10^4 particles per second per CPU core, for the case of an opening angle of θ=0.5, if the number of particles per CPU core is larger than 10^6.

preprint2009arXiv

Variation of the subhalo abundance in dark matter halos

We analyzed the statistics of subhalo abundance of galaxy-sized and giant-galaxy-sized halos formed in a high-resolution cosmological simulation of a 46.5Mpc cube with the uniform mass resolution of $10^6 M_{\odot}$. We analyzed all halos with mass more than $1.5 \times 10^{12}M_{\odot}$ formed in this simulation box. The total number of halos was 125. We found that the subhalo abundance, measured by the number of subhalos with maximum rotation velocity larger than 10% of that of the parent halo, shows large halo-to-halo variations. The results of recent ultra-high-resolution runs fall within the variation of our samples. We found that the concentration parameter and the radius at the moment of the maximum expansion shows fairly tight correlation with the subhalo abundance. This correlation suggests that the variation of the subhalo abundance is at least partly due to the difference in the formation history. Halos formed earlier have smaller number of subhalos at present.

preprint2008arXiv

Environmental effect on the subhalo abundance -- a solution to the missing dwarf problem

Recent high-resolution simulations of the formation of dark-matter halos have shown that the distribution of subhalos is scale-free, in the sense that if scaled by the velocity dispersion of the parent halo, the velocity distribution function of galaxy-sized and cluster-sized halos are identical. For cluster-sized halos, simulation results agreed well with observations. Simulations, however, predicted far too many subhalos for galaxy-sized halos. Our galaxy has several tens of known dwarf galaxies. On the other hands, simulated dark-matter halos contain thousands of subhalos. We have performed simulation of a single large volume and measured the abundance of subhalos in all massive halos. We found that the variation of the subhalo abundance is very large, and those with largest number of subhalos correspond to simulated halos in previous studies. The subhalo abundance depends strongly on the local density of the background. Halos in high-density regions contain large number of subhalos. Our galaxy is in the low-density region. For our simulated halos in low-density regions, the number of subhalos is within a factor of three to that of our galaxy. We argue that the ``missing dwarf problem&#39;&#39; is not a real problem but caused by the biased selection of the initial conditions in previous studies, which were not appropriate for field galaxies.

preprint2007arXiv

PGPG: An Automatic Generator of Pipeline Design for Programmable GRAPE Systems

We have developed PGPG (Pipeline Generator for Programmable GRAPE), a software which generates the low-level design of the pipeline processor and communication software for FPGA-based computing engines (FBCEs). An FBCE typically consists of one or multiple FPGA (Field-Programmable Gate Array) chips and local memory. Here, the term &#34;Field-Programmable&#34; means that one can rewrite the logic implemented to the chip after the hardware is completed, and therefore a single FBCE can be used for calculation of various functions, for example pipeline processors for gravity, SPH interaction, or image processing. The main problem with FBCEs is that the user need to develop the detailed hardware design for the processor to be implemented to FPGA chips. In addition, she or he has to write the control logic for the processor, communication and data conversion library on the host processor, and application program which uses the developed processor. These require detailed knowledge of hardware design, a hardware description language such as VHDL, the operating system and the application, and amount of human work is huge. A relatively simple design would require 1 person-year or more. The PGPG software generates all necessary design descriptions, except for the application software itself, from a high-level design description of the pipeline processor in the PGPG language. The PGPG language is a simple language, specialized to the description of pipeline processors. Thus, the design of pipeline processor in PGPG language is much easier than the traditional design. For real applications such as the pipeline for gravitational interaction, the pipeline processor generated by PGPG achieved the performance similar to that of hand-written code. In this paper we present a detailed description of PGPG version 1.0.

preprint2005arXiv

GRAPE-6A: A single-card GRAPE-6 for parallel PC-GRAPE cluster system

In this paper, we describe the design and performance of GRAPE-6A, a special-purpose computer for gravitational many-body simulations. It was designed to be used with a PC cluster, in which each node has one GRAPE-6A. Such configuration is particularly effective in running parallel tree algorithm. Though the use of parallel tree algorithm was possible with the original GRAPE-6 hardware, it was not very cost-effective since a single GRAPE-6 board was still too fast and too expensive. Therefore, we designed GRAPE-6A as a single PCI card to minimize the reproduction cost and optimize the computing speed. The peak performance is 130 Gflops for one GRAPE-6A board and 3.1 Tflops for our 24 node cluster. We describe the implementation of the tree, TreePM and individual timestep algorithms on both a single GRAPE-6A system and GRAPE-6A cluster. Using the tree algorithm on our 16-node GRAPE-6A system, we can complete a collisionless simulation with 100 million particles (8000 steps) within 10 days.

preprint2005arXiv

PPPM and TreePM Methods on GRAPE Systems for Cosmological N-body Simulations

We present Particle-Particle-Particle-Mesh (PPPM) and Tree Particle-Mesh (TreePM) implementations on GRAPE-5 and GRAPE-6A systems, special-purpose hardware accelerators for gravitational many-body simulations. In our PPPM and TreePM implementations on GRAPE, the computational time is significantly reduced compared with the conventional implementations without GRAPE, especially under the strong particle clustering, and almost constant irrespective of the degree of particle clustering. We carry out the survey of two simulation parameters, the PM grid spacing and the opening parameter for the most optimal combination of force accuracy and computational speed. We also describe the parallelization of these implementations on a PC-GRAPE cluster, in which each node has one GRAPE board, and present the optimal configuration of simulation parameters for good parallel scalability.

preprint2004arXiv

Mass Loss Timescale of Star Clusters in External Tidal Field

We investigate evolution of star clusters in external tidal field by means of $N$-body simulations. We followed seven sets of cluster models whose central concentration and strength of the tidal field are different. We found that the mass loss timescale due to escape of stars, $t_{mloss}$, and its dependence on the two-body relaxation timescale, $t_{rh,i}$, are determined by the strength of the tidal field. The logarithmic slope [$= dln(t_{mloss})/dln(t_{rh,i})$] approaches to near unity for the cluster models in weaker tidal field. The timescale and the dependence are almost independent of the central concentration for clusters in the tidal field of the same strength. In our results, the scaling found by Baumgardt (2001) can be seen only in the cluster models with moderately strong tidal field.

preprint2003arXiv

GRAPE-6: The massively-parallel special-purpose computer for astrophysical particle simulation

In this paper, we describe the architecture and performance of the GRAPE-6 system, a massively-parallel special-purpose computer for astrophysical $N$-body simulations. GRAPE-6 is the successor of GRAPE-4, which was completed in 1995 and achieved the theoretical peak speed of 1.08 Tflops. As was the case with GRAPE-4, the primary application of GRAPE-6 is simulation of collisional systems, though it can be used for collisionless systems. The main differences between GRAPE-4 and GRAPE-6 are (a) The processor chip of GRAPE-6 integrates 6 force-calculation pipelines, compared to one pipeline of GRAPE-4 (which needed 3 clock cycles to calculate one interaction), (b) the clock speed is increased from 32 to 90 MHz, and (c) the total number of processor chips is increased from 1728 to 2048. These improvements resulted in the peak speed of 64 Tflops. We also discuss the design of the successor of GRAPE-6.

preprint1999arXiv

GRAPE-5: A Special-Purpose Computer for N-body Simulation

We have developed a special-purpose computer for gravitational many-body simulations, GRAPE-5. GRAPE-5 is the successor of GRAPE-3. Both consist of eight custom pipeline chips (G5 chip and GRAPE chip). The difference between GRAPE-5 and GRAPE-3 are: (1) The G5 chip contains two pipelines operating at 80 MHz, while the GRAPE chip had one at 20 MHz. Thus, the calculation speed of the G5 chip and that of GRAPE-5 board are 8 times faster than that of GRAPE chip and GRAPE-3 board. (2) The GRAPE-5 board adopted PCI bus as the interface to the host computer instead of VME of GRAPE-3, resulting in the communication speed one order of magnitude faster. (3) In addition to the pure 1/r potential, the G5 chip can calculate forces with arbitrary cutoff functions, so that it can be applied to Ewald or P^3M methods. (4) The pairwise force calculated on GRAPE-5 is about 10 times more accurate than that on GRAPE-3. On one GRAPE-5 board, one timestep of 128k-body simulation with direct summation algorithm takes 14 seconds. With Barnes-Hut tree algorithm (theta = 0.75), one timestep of 10^6-body simulation can be done in 16 seconds.

preprint1999arXiv

PROGRAPE-1: A Programmable, Multi-Purpose Computer for Many-Body Simulations

We have developed PROGRAPE-1 (PROgrammable GRAPE-1), a programmable multi-purpose computer for many-body simulations. The main difference between PROGRAPE-1 and &#34;traditional&#34; GRAPE systems is that the former uses FPGA (Field Programmable Gate Array) chips as the processing elements, while the latter rely on the hardwired pipeline processor specialized to gravitational interactions. Since the logic implemented in FPGA chips can be reconfigured, we can use PROGRAPE-1 to calculate not only gravitational interactions but also other forms of interactions such as van der Waals force, hydrodynamical interactions in SPH calculation and so on. PROGRAPE-1 comprises two Altera EPF10K100 FPGA chips, each of which contains nominally 100,000 gates. To evaluate the programmability and performance of PROGRAPE-1, we implemented a pipeline for gravitational interaction similar to that of GRAPE-3. One pipeline fitted into a single FPGA chip, which operated at 16 MHz clock. Thus, for gravitational interaction, PROGRAPE-1 provided the speed of 0.96 Gflops-equivalent. PROGRAPE will prove to be useful for wide-range of particle-based simulations in which the calculation cost of interactions other than gravity is high, such as the evaluation of SPH interactions.

preprint1994arXiv

Pre-Collapse Evolution of Galactic Globular Clusters

This paper is concerned with collisionless aspects of the early evolution of model star clusters. The effects of mass loss through stellar evolution and of a steady tidal field are modelled using $N$-body simulations. Our results (which depend on the assumed initial structure and the mass spectrum) agree qualitatively with those of Chernoff \& Weinberg (1990), who used a Fokker-Planck model with a spherically symmetric tidal cutoff. For those systems which are disrupted, the lifetime to disruption generally exceeds that found by Chernoff \& Weinberg, sometimes by as much as an order of magnitude. Because we do not model collisional effects correctly we cannot establish the fate of the survivors. In terms of theoretical interpretation, we find that tidal disruption must be understood as a loss of {\sl equilibrium}, and not a loss of {\sl stability}, as is sometimes stated.