Source author record

J. Ramanujam

J. Ramanujam appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.dis-nn Computational Complexity cond-mat.stat-mech cond-mat.str-el Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Other Computer Science

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

On Characterizing the Data Access Complexity of Programs

Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental characterization of data access complexity will be increasingly important. The problem of developing lower bounds for data access complexity has been modeled using the formalism of Hong & Kung's red/blue pebble game for computational directed acyclic graphs (CDAGs). However, previously developed approaches to lower bounds analysis for the red/blue pebble game are very limited in effectiveness when applied to CDAGs of real programs, with computations comprised of multiple sub-computations with differing DAG structure. We address this problem by developing an approach for effectively composing lower bounds based on graph decomposition. We also develop a static analysis algorithm to derive the asymptotic data-access lower bounds of programs, as a function of the problem size and cache size.

preprint2014arXiv

On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution

Technology trends are making the cost of data movement increasingly dominant, both in terms of energy and time, over the cost of performing arithmetic operations in computer systems. The fundamental ratio of aggregate data movement bandwidth to the total computational power (also referred to the machine balance parameter) in parallel computer systems is decreasing. It is there- fore of considerable importance to characterize the inherent data movement requirements of parallel algorithms, so that the minimal architectural balance parameters required to support it on future systems can be well understood. In this paper, we develop an extension of the well-known red-blue pebble game to develop lower bounds on the data movement complexity for the parallel execution of computational directed acyclic graphs (CDAGs) on parallel systems. We model multi-node multi-core parallel systems, with the total physical memory distributed across the nodes (that are connected through some interconnection network) and in a multi-level shared cache hierarchy for processors within a node. We also develop new techniques for lower bound characterization of non-homogeneous CDAGs. We demonstrate the use of the methodology by analyzing the CDAGs of several numerical algorithms, to develop lower bounds on data movement for their parallel execution.

preprint2014arXiv

Three Dimensional Edwards-Anderson Spin Glass Model in an External Field

We study the Edwards-Anderson model on a simple cubic lattice with a finite constant external field. We employ an indicator composed of a ratio of susceptibilities at finite wavenumbers, which was recently proposed to avoid the difficulties of a zero momentum quantity, for capturing the spin glass phase transition. Unfortunately, this new indicator is fairly noisy, so a large pool of samples at low temperature and small external field are needed to generate results with sufficiently small statistical error for analysis. We thus implement the Monte Carlo method using graphics processing units to drastically speedup the simulation. We confirm previous findings that conventional indicators for the spin glass transition, including the Binder ratio and the correlation length do not show any indication of a transition for rather low temperatures. However, the ratio of spin glass susceptibilities do show crossing behavior, albeit a systematic analysis is beyond the reach of the present data. This calls for a more thorough study of the three-dimension Edwards-Anderson model in an external field.

preprint2013arXiv

Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in order to guide efforts to enhance data locality. Reuse distance analysis of memory address traces is a valuable tool to perform data locality characterization of programs. A single reuse distance analysis can be used to estimate the number of cache misses in a fully associative LRU cache of any size, thereby providing estimates on the minimum bandwidth requirements at different levels of the memory hierarchy to avoid being bandwidth bound. However, such an analysis only holds for the particular execution order that produced the trace. It cannot estimate potential improvement in data locality through dependence preserving transformations that change the execution schedule of the operations in the computation. In this article, we develop a novel dynamic analysis approach to characterize the inherent locality properties of a computation and thereby assess the potential for data locality enhancement via dependence preserving transformations. The execution trace of a code is analyzed to extract a computational directed acyclic graph (CDAG) of the data dependences. The CDAG is then partitioned into convex subsets, and the convex partitioning is used to reorder the operations in the execution trace to enhance data locality. The approach enables us to go beyond reuse distance analysis of a single specific order of execution of the operations of a computation in characterization of its data locality properties. It can serve a valuable role in identifying promising code regions for manual transformation, as well as assessing the effectiveness of compiler transformations for data locality enhancement. We demonstrate the effectiveness of the approach using a number of benchmarks, including case studies where the potential shown by the analysis is exploited to achieve lower data movement costs and better performance.

preprint2013arXiv

Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU

Monte Carlo simulations of the Ising model play an important role in the field of computational statistical physics, and they have revealed many properties of the model over the past few decades. However, the effect of frustration due to random disorder, in particular the possible spin glass phase, remains a crucial but poorly understood problem. One of the obstacles in the Monte Carlo simulation of random frustrated systems is their long relaxation time making an efficient parallel implementation on state-of-the-art computation platforms highly desirable. The Graphics Processing Unit (GPU) is such a platform that provides an opportunity to significantly enhance the computational performance and thus gain new insight into this problem. In this paper, we present optimization and tuning approaches for the CUDA implementation of the spin glass simulation on GPUs. We discuss the integration of various design alternatives, such as GPU kernel construction with minimal communication, memory tiling, and look-up tables. We present a binary data format, Compact Asynchronous Multispin Coding (CAMSC), which provides an additional $28.4\%$ speedup compared with the traditionally used Asynchronous Multispin Coding (AMSC). Our overall design sustains a performance of 33.5 picoseconds per spin flip attempt for simulating the three-dimensional Edwards-Anderson model with parallel tempering, which significantly improves the performance over existing GPU implementations.

preprint2011arXiv

Solving the Parquet Equations for the Hubbard Model beyond Weak Coupling

We find that imposing the crossing symmetry in the iteration process considerably extends the range of convergence for solutions of the parquet equations for the Hubbard model. When the crossing symmetry is not imposed, the convergence of both simple iteration and more complicated continuous loading (homotopy) methods are limited to high temperatures and weak interactions. We modify the algorithm to impose the crossing symmetry without increasing the computational complexity. We also imposed time reversal and a subset of the point group symmetries, but they did not further improve the convergence. We elaborate the details of the latency hiding scheme which can significantly improve the performance in the computational implementation. With these modifications, stable solutions for the parquet equations can be obtained by iteration more quickly even for values of the interaction that are a significant fraction of the bandwidth and for temperatures that are much smaller than the bandwidth. This may represent a crucial step towards the solution of two-particle field theories for correlated electron models.

J. Ramanujam

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

On Characterizing the Data Access Complexity of Programs

On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution

Three Dimensional Edwards-Anderson Spin Glass Model in an External Field

Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU

Solving the Parquet Equations for the Hubbard Model beyond Weak Coupling