Researcher profile

Roshan Ragel

Roshan Ragel contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2014arXiv

Accelerating Correlation Power Analysis Using Graphics Processing Units

Correlation Power Analysis (CPA) is a type of power analysis based side channel attack that can be used to derive the secret key of encryption algorithms including DES (Data Encryption Standard) and AES (Advanced Encryption Standard). A typical CPA attack on unprotected AES is performed by analysing a few thousand power traces that requires about an hour of computational time on a general purpose CPU. Due to the severity of this situation, a large number of researchers work on countermeasures to such attacks. Verifying that a proposed countermeasure works well requires performing the CPA attack on about 1.5 million power traces. Such processing, even for a single attempt of verification on commodity hardware would run for several days making the verification process infeasible. Modern Graphics Processing Units (GPUs) have support for thousands of light weight threads, making them ideal for parallelizable algorithms like CPA. While the cost of a GPU being lesser than a high performance multicore server, still the GPU performance for this algorithm is many folds better than that of a multicore server. We present an algorithm and its implementation on GPU for CPA on 128-bit AES that is capable of executing 1300x faster than that on a single threaded CPU and more than 60x faster than that on a 32 threaded multicore server. We show that an attack that would take hours on the multicore server would take even less than a minute on a much cost effective GPU.

preprint2014arXiv

Countermeasures against Bernstein's remote cache timing attack

Cache timing attack is a type of side channel attack where the leaking timing information due to the cache behaviour of a crypto system is used by an attacker to break the system. Advanced Encryption Standard (AES) was considered a secure encryption standard until 2005 when Daniel Bernstein claimed that the software implementation of AES is vulnerable to cache timing attack. Bernstein demonstrated a remote cache timing attack on a software implementation of AES. The original AES implementation can methodically be altered to prevent the cache timing attack by hiding the natural cache-timing pattern during the encryption while preserving its semantics. The alternations while preventing the attack should not make the implementation very slow. In this paper, we report outcomes of our experiments on designing and implementing a number of possible countermeasures.

preprint2014arXiv

Loop Unrolling in Multi-pipeline ASIP Design

Application Specific Instruction-set Processor (ASIP) is one of the popular processor design techniques for embedded systems which allows customizability in processor design without overly hindering design flexibility. Multi-pipeline ASIPs were proposed to improve the performance of such systems by compromising between speed and processor area. One of the problems in the multi-pipeline design is the limited inherent instruction level parallelism (ILP) available in applications. The ILP of application programs can be improved via a compiler optimization technique known as loop unrolling. In this paper, we present how loop unrolling effects the performance of multi-pipeline ASIPs. The improvements in performance average around 15% for a number of benchmark applications with the maximum improvement of around 30%. In addition, we analyzed the variable of performance against loop unrolling factor, which is the amount of unrolling we perform.

preprint2014arXiv

To Use or Not to Use: Graphics Processing Units for Pattern Matching Algorithms

String matching is an important part in today's computer applications and Aho-Corasick algorithm is one of the main string matching algorithms used to accomplish this. This paper discusses that when can the GPUs be used for string matching applications using the Aho-Corasick algorithm as a benchmark. We have to identify the best unit to run our string matching algorithm according to the performance of our devices and the applications. Sometimes CPU gives better performance than GPU and sometimes GPU gives better performance than CPU. Therefore, identifying this critical point is significant task for researchers who are using GPUs to improve the performance of their string matching applications based on string matching algorithms.

preprint2013arXiv

High Throughput Virtual Screening with Data Level Parallelism in Multi-core Processors

Improving the throughput of molecular docking, a computationally intensive phase of the virtual screening process, is a highly sought area of research since it has a significant weight in the drug designing process. With such improvements, the world might find cures for incurable diseases like HIV disease and Cancer sooner. Our approach presented in this paper is to utilize a multi-core environment to introduce Data Level Parallelism (DLP) to the Autodock Vina software, which is a widely used for molecular docking software. Autodock Vina already exploits Instruction Level Parallelism (ILP) in multi-core environments and therefore optimized for such environments. However, with the results we have obtained, it can be clearly seen that our approach has enhanced the throughput of the already optimized software by more than six times. This will dramatically reduce the time consumed for the lead identification phase in drug designing along with the shift in the processor technology from multi-core to many-core of the current era. Therefore, we believe that the contribution of this project will effectively make it possible to expand the number of small molecules docked against a drug target and improving the chances to design drugs for incurable diseases.