Source author record

Kamran Karimi

Kamran Karimi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Performance physics.comp-ph cond-mat.soft cond-mat.mtrl-sci cond-mat.other Machine Learning Artificial Intelligence Computation Computational Engineering, Finance, and Science cond-mat.dis-nn cond-mat.stat-mech Data Structures and Algorithms Other Computer Science quant-ph Software Engineering

Catalog footprint

What is connected

15works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Yielding in multi-component metallic glasses: Universal signatures of elastic modulus heterogeneities

Sheared multi-component bulk metallic glasses are characterized by both chemical and structural disorder that define their properties. We investigate the behavior of the local, microstructural elastic modulus across the plastic yielding transition in six Ni-based multi-component glasses, that are characterized by compositional features commonly associated with solid solution formability. We find that elastic modulus fluctuations display consistent percolation characteristics pointing towards universal behavior across chemical compositions and overall yielding sharpness characteristics. Elastic heterogeneity grows upon shearing via the percolation of elastically soft clusters within an otherwise rigid amorphous matrix, confirming prior investigations in granular media and colloidal glasses. We find clear signatures of percolation transition with spanning clusters that are universally characterized by scale-free characteristics and critical scaling exponents. The spatial correlation length and mean cluster size tend to diverge prior to yielding, with associated critical exponents that exhibit fairly weak dependence on compositional variations as well as macroscopic stress-strain curve details.

preprint2021arXiv

Shear banding instability in \remove{high entropy} multi-component metallic glasses: Interplay of composition and short-range order}

The shear-banding instability in quasi-statically driven bulk metallic glasses emerges from collective dynamics, mediated by shear transformation zones and associated non-local elastic interactions. It is also phenomenologically known that sharp structural features of shear bands are typically correlated to the sharpness of the plastic yielding transition, being predominant in commonly studied alloys composed of multiple different elements, that have very different atomic radii. However, in the opposite limit \remove{of high-entropy multicomponent alloys,} where elements' radii are relatively similar, plastic yielding of bulk metallic glasses is highly dependent on compositional and ordering features. In particular, a known mechanism at play involves the formation of short-range order dominated by icosahedra-based clusters. Here, we report on atomistic simulations of multi-component metallic glasses with different chemical compositions showing that the degree of strain localization is largely controlled by the interplay between composition-driven icosahedra-ordering and collectively-driven shear transformation zones. By altering compositions, strain localization ranges from diffuse homogenized patterns to singular crack-like features. We quantify the dynamical yielding transition by measuring the atoms' susceptibility to plastic rearrangements, strongly correlated to the local atomic structure. We find that the abundance of short-range ordering of icosahedra within rearranging zones increases glassy materials' capacity to delocalize strain. The kind of plastic yielding can be often qualitatively inferred by the commonly used compositional descriptor that characterizes element associations, the misfit parameter $δ_a$, and also by uncommon ones, such as shear-band width and shear-band dynamics' correlation parameters.

preprint2019arXiv

Self-diffusion in plastic flow of amorphous solids

We report on a particle-based numerical study of sheared amorphous solids in the dense slow flow regime. In this framework, deformation and flow are accompanied by critical fluctuation patterns associated with the macroscopic plastic response and single particle kinematics. The former is commonly attributed to the collective slip patterns that relax internal stresses within the bulk material and give rise to an effective mechanical noise governing the latter particle-level process. In this work, the avalanche-type dynamics between plastic events is shown to have a strong relevance on the self-diffusion of tracer particles in the Fickian regime. As a consequence, strong size effects emerge in the effective diffusion coefficient that is rationalized in terms of avalanche size distributions and the relevant temporal occurrence.

preprint2015arXiv

Rheology, diffusion, and velocity correlations in the bubble model

We present results on spatio-temporal correlations in the so-called mean drag version of the Durian bubble model in the limit of small, but finite, shearing rates, $\dotγ$. We study the rheology, diffusion, and spatial correlations of the instantaneous velocity field. The quasi-static (QS) effective diffusion co-efficient, $D_e$, shows an anomalous system size dependence indicative of organization of plastic slip into lines along the directions of maximum shearing. At higher rates, $D_e$ decays like $\dotγ^{-1/3}$. The instantaneous velocity fields have a spatial structure which is consistent with a set of spatially uncorrelated Eshelby transformations. The correlations are cut off beyond a length, $ξ$. $ξ\sim \dotγ^{-1/3}$ which explains the $D_e\sim\dotγ^{-1/3}$ behavior. The shear stress, $σ$, follows a similar rate dependence with $δσ=σ-σ_y\sim \dotγ^{1/3}$ where $σ_y$ is the yield stress observed in the QS regime.These results indicate that the form for the viscous dissipation can have a profound impact on the rheology, diffusion and spatial correlations in sheared soft glassy systems.

preprint2015arXiv

The Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming

OpenCL, along with CUDA, is one of the main tools used to program GPGPUs. However, it allows running the same code on multi-core CPUs too, making it a rival for the long-established OpenMP. In this paper we compare OpenCL and OpenMP when developing and running compute-heavy code on a CPU. Both ease of programming and performance aspects are considered. Since, unlike a GPU, no memory copy operation is involved, our comparisons measure the code generation quality, as well as thread management efficiency of OpenCL and OpenMP. We evaluate the performance of these development tools under two conditions: a large number of short-running compute-heavy parallel code executions, when more thread management is performed, and a small number of long-running parallel code executions, when less thread management is required. The results show that OpenCL and OpenMP each win in one of the two conditions. We argue that while using OpenMP requires less setup, OpenCL can be a viable substitute for OpenMP from a performance point of view, especially when a high number of thread invocations is required. We also provide a number of potential pitfalls to watch for when moving from OpenMP to OpenCL.

preprint2015arXiv

When In-Memory Computing is Slower than Heavy Disk Usage

Disk access latency and transfer times are often considered to have a major and detrimental impact on the running time of software. Developers are often advised to favour in-memory operations and minimise disk access. Furthermore, diskless computer architectures are being studied and designed to remove this bottleneck all together, to improve application performance in areas such as High Performance Computing, Big Data, and Business Intelligence. In this paper we use code inspired by real, production software, to show that in-memory operations are not always a guarantee for high performance, and may actually cause a considerable slow-down. We also show how small code changes can have dramatic effects on running times. We argue that a combination of system-level improvements and better developer awareness and coding practices are necessary to ensure in-memory computing can achieve its full potential.

preprint2013arXiv

Accelerating a Cloud-Based Software GNSS Receiver

In this paper we discuss ways to reduce the execution time of a software Global Navigation Satellite System (GNSS) receiver that is meant for offline operation in a cloud environment. Client devices record satellite signals they receive, and send them to the cloud, to be processed by this software. The goal of this project is for each client request to be processed as fast as possible, but also to increase total system throughput by making sure as many requests as possible are processed within a unit of time. The characteristics of our application provided both opportunities and challenges for increasing performance. We describe the speedups we obtained by enabling the software to exploit multi-core CPUs and GPGPUs. We mention which techniques worked for us and which did not. To increase throughput, we describe how we control the resources allocated to each invocation of the software to process a client request, such that multiple copies of the application can run at the same time. We use the notion of effective running time to measure the system's throughput when running multiple instances at the same time, and show how we can determine when the system's computing resources have been saturated.

preprint2012arXiv

Challenges of Upgrading a Virtual Appliance

A virtual appliance contains a target application, and the running environment necessary for running that application. Users run an appliance using a virtualization engine, freeing them from the need to make sure that the target application has access to all its dependencies. However, creating and managing a virtual appliance, versus a stand-alone application, requires special considerations. Upgrading a software system is a common requirement, and is more complicated when dealing with an appliance. This is because both the target application and the running environment must be upgraded, and there are often dependencies between these two components. In this paper we briefly discuss some important points to consider when upgrading an appliance. We then present a list of items that can help developers prevent problems during an upgrade effort.

preprint2011arXiv

A Performance Comparison of CUDA and OpenCL

CUDA and OpenCL are two different frameworks for GPU programming. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty. In this paper, we use complex, near-identical kernels from a Quantum Monte Carlo application to compare the performance of CUDA and OpenCL. We show that when using NVIDIA compiler tools, converting a CUDA kernel to an OpenCL kernel involves minimal modifications. Making such a kernel compile with ATI's build tools involves more modifications. Our performance tests measure and compare data transfer times to and from the GPU, kernel execution times, and end-to-end application execution times for both CUDA and OpenCL.

preprint2011arXiv

Investigating the Performance of an Adiabatic Quantum Optimization Processor

Adiabatic quantum optimization offers a new method for solving hard optimization problems. In this paper we calculate median adiabatic times (in seconds) determined by the minimum gap during the adiabatic quantum optimization for an NP-hard Ising spin glass instance class with up to 128 binary variables. Using parameters obtained from a realistic superconducting adiabatic quantum processor, we extract the minimum gap and matrix elements using high performance Quantum Monte Carlo simulations on a large-scale Internet-based computing platform. We compare the median adiabatic times with the median running times of two classical solvers and find that, for the considered problem sizes, the adiabatic times for the simulated processor architecture are about 4 and 6 orders of magnitude shorter than the two classical solvers' times. This shows that if the adiabatic time scale were to determine the computation time, adiabatic quantum optimization would be significantly superior to those classical solvers for median spin glass problems of at least up to 128 qubits. We also discuss important additional constraints that affect the performance of a realistic system.

preprint2010arXiv

A Brief Introduction to Temporality and Causality

Causality is a non-obvious concept that is often considered to be related to temporality. In this paper we present a number of past and present approaches to the definition of temporality and causality from philosophical, physical, and computational points of view. We note that time is an important ingredient in many relationships and phenomena. The topic is then divided into the two main areas of temporal discovery, which is concerned with finding relations that are stretched over time, and causal discovery, where a claim is made as to the causal influence of certain events on others. We present a number of computational tools used for attempting to automatically discover temporal and causal relations in data.

preprint2010arXiv

Generation and Interpretation of Temporal Decision Rules

We present a solution to the problem of understanding a system that produces a sequence of temporally ordered observations. Our solution is based on generating and interpreting a set of temporal decision rules. A temporal decision rule is a decision rule that can be used to predict or retrodict the value of a decision attribute, using condition attributes that are observed at times other than the decision attribute's time of observation. A rule set, consisting of a set of temporal decision rules with the same decision attribute, can be interpreted by our Temporal Investigation Method for Enregistered Record Sequences (TIMERS) to signify an instantaneous, an acausal or a possibly causal relationship between the condition attributes and the decision attribute. We show the effectiveness of our method, by describing a number of experiments with both synthetic and real temporal data.

preprint2010arXiv

High-Performance Physics Simulations Using Multi-Core CPUs and GPGPUs in a Volunteer Computing Context

This paper presents two conceptually simple methods for parallelizing a Parallel Tempering Monte Carlo simulation in a distributed volunteer computing context, where computers belonging to the general public are used. The first method uses conventional multi-threading. The second method uses CUDA, a graphics card computing system. Parallel Tempering is described, and challenges such as parallel random number generation and mapping of Monte Carlo chains to different threads are explained. While conventional multi-threading on CPUs is well-established, GPGPU programming techniques and technologies are still developing and present several challenges, such as the effective use of a relatively large number of threads. Having multiple chains in Parallel Tempering allows parallelization in a manner that is similar to the serial algorithm. Volunteer computing introduces important constraints to high performance computing, and we show that both versions of the application are able to adapt themselves to the varying and unpredictable computing resources of volunteers' computers, while leaving the machines responsive enough to use. We present experiments to show the scalable performance of these two approaches, and indicate that the efficiency of the methods increases with bigger problem sizes.

preprint2010arXiv

Importance of Explicit Vectorization for CPU and GPU Software Performance

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized GPU version.

preprint2010arXiv

Robust Parameter Selection for Parallel Tempering

This paper describes an algorithm for selecting parameter values (e.g. temperature values) at which to measure equilibrium properties with Parallel Tempering Monte Carlo simulation. Simple approaches to choosing parameter values can lead to poor equilibration of the simulation, especially for Ising spin systems that undergo $1^st$-order phase transitions. However, starting from an initial set of parameter values, the careful, iterative respacing of these values based on results with the previous set of values greatly improves equilibration. Example spin systems presented here appear in the context of Quantum Monte Carlo.

Kamran Karimi

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Yielding in multi-component metallic glasses: Universal signatures of elastic modulus heterogeneities

Shear banding instability in \remove{high entropy} multi-component metallic glasses: Interplay of composition and short-range order}

Self-diffusion in plastic flow of amorphous solids

Rheology, diffusion, and velocity correlations in the bubble model

The Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming

When In-Memory Computing is Slower than Heavy Disk Usage

Accelerating a Cloud-Based Software GNSS Receiver

Challenges of Upgrading a Virtual Appliance

A Performance Comparison of CUDA and OpenCL

Investigating the Performance of an Adiabatic Quantum Optimization Processor

A Brief Introduction to Temporality and Causality

Generation and Interpretation of Temporal Decision Rules

High-Performance Physics Simulations Using Multi-Core CPUs and GPGPUs in a Volunteer Computing Context

Importance of Explicit Vectorization for CPU and GPU Software Performance

Robust Parameter Selection for Parallel Tempering