Researcher profile

Prakash Murali

Prakash Murali contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Adaptive job and resource management for the growing quantum cloud

As the popularity of quantum computing continues to grow, efficient quantum machine access over the cloud is critical to both academic and industry researchers across the globe. And as cloud quantum computing demands increase exponentially, the analysis of resource consumption and execution characteristics are key to efficient management of jobs and resources at both the vendor-end as well as the client-end. While the analysis and optimization of job / resource consumption and management are popular in the classical HPC domain, it is severely lacking for more nascent technology like quantum computing. This paper proposes optimized adaptive job scheduling to the quantum cloud taking note of primary characteristics such as queuing times and fidelity trends across machines, as well as other characteristics such as quality of service guarantees and machine calibration constraints. Key components of the proposal include a) a prediction model which predicts fidelity trends across machine based on compiled circuit features such as circuit depth and different forms of errors, as well as b) queuing time prediction for each machine based on execution time estimations. Overall, this proposal is evaluated on simulated IBM machines across a diverse set of quantum applications and system loading scenarios, and is able to reduce wait times by over 3x and improve fidelity by over 40\% on specific usecases, when compared to traditional job schedulers.

preprint2020arXiv

Architecting Noisy Intermediate-Scale Trapped Ion Quantum Computers

Trapped ions (TI) are a leading candidate for building Noisy Intermediate-Scale Quantum (NISQ) hardware. TI qubits have fundamental advantages over other technologies such as superconducting qubits, including high qubit quality, coherence and connectivity. However, current TI systems are small in size, with 5-20 qubits and typically use a single trap architecture which has fundamental scalability limitations. To progress towards the next major milestone of 50-100 qubits, a modular architecture termed the Quantum Charge Coupled Device (QCCD) has been proposed. In a QCCD-based TI device, small traps are connected through ion shuttling. While the basic hardware components for such devices have been demonstrated, building a 50-100 qubit system is challenging because of a wide range of design possibilities for trap sizing, communication topology and gate implementations and the need to match diverse application resource requirements. Towards realizing QCCD systems with 50-100 qubits, we perform an extensive architectural study evaluating the key design choices of trap sizing, communication topology and operation implementation methods. We built a design toolflow which takes a QCCD architecture's parameters as input, along with a set of applications and realistic hardware performance models. Our toolflow maps the applications onto the target device and simulates their execution to compute metrics such as application run time, reliability and device noise rates. Using six applications and several hardware design points, we show that trap sizing and communication topology choices can impact application reliability by up to three orders of magnitude. Microarchitectural gate implementation choices influence reliability by another order of magnitude. From these studies, we provide concrete recommendations to tune these choices to achieve highly reliable and performant application executions.

preprint2020arXiv

End-to-End Predictions-Based Resource Management Framework for Supercomputer Jobs

Job submissions of parallel applications to production supercomputer systems will have to be carefully tuned in terms of the job submission parameters to obtain minimum response times. In this work, we have developed an end-to-end resource management framework that uses predictions of queue waiting and execution times to minimize response times of user jobs submitted to supercomputer systems. Our method for predicting queue waiting times adaptively chooses a prediction method based on the cluster structure of similar jobs. Our strategy for execution time predictions dynamically learns the impact of load on execution times and uses this to predict a set of execution time ranges for the target job. We have developed two resource management techniques that employ these predictions, one that selects the number of processors for execution and the other that also dynamically changes the job submission time. Using workload simulations of large supercomputer traces, we show large-scale improvements in predictions and reductions in response times over existing techniques and baseline strategies.

preprint2020arXiv

On Optimizing Distributed Tucker Decomposition for Sparse Tensors

The Tucker decomposition generalizes the notion of Singular Value Decomposition (SVD) to tensors, the higher dimensional analogues of matrices. We study the problem of constructing the Tucker decomposition of sparse tensors on distributed memory systems via the HOOI procedure, a popular iterative method. The scheme used for distributing the input tensor among the processors (MPI ranks) critically influences the HOOI execution time. Prior work has proposed different distribution schemes: an offline scheme based on sophisticated hypergraph partitioning method and simple, lightweight alternatives that can be used real-time. While the hypergraph based scheme typically results in faster HOOI execution time, being complex, the time taken for determining the distribution is an order of magnitude higher than the execution time of a single HOOI iteration. Our main contribution is a lightweight distribution scheme, which achieves the best of both worlds. We show that the scheme is near-optimal on certain fundamental metrics associated with the HOOI procedure and as a result, near-optimal on the computational load (FLOPs). Though the scheme may incur higher communication volume, the computation time is the dominant factor and as the result, the scheme achieves better performance on the overall HOOI execution time. Our experimental evaluation on large real-life tensors (having up to 4 billion elements) shows that the scheme outperforms the prior schemes on the HOOI execution time by a factor of up to 3x. On the other hand, its distribution time is comparable to the prior lightweight schemes and is typically lesser than the execution time of a single HOOI iteration.

preprint2020arXiv

Software Mitigation of Crosstalk on Noisy Intermediate-Scale Quantum Computers

Crosstalk is a major source of noise in Noisy Intermediate-Scale Quantum (NISQ) systems and is a fundamental challenge for hardware design. When multiple instructions are executed in parallel, crosstalk between the instructions can corrupt the quantum state and lead to incorrect program execution. Our goal is to mitigate the application impact of crosstalk noise through software techniques. This requires (i) accurate characterization of hardware crosstalk, and (ii) intelligent instruction scheduling to serialize the affected operations. Since crosstalk characterization is computationally expensive, we develop optimizations which reduce the characterization overhead. On three 20-qubit IBMQ systems, we demonstrate two orders of magnitude reduction in characterization time (compute time on the QC device) compared to all-pairs crosstalk measurements. Informed by these characterization, we develop a scheduler that judiciously serializes high crosstalk instructions balancing the need to mitigate crosstalk and exponential decoherence errors from serialization. On real-system runs on three IBMQ systems, our scheduler improves the error rate of application circuits by up to 5.6x, compared to the IBM instruction scheduler and offers near-optimal crosstalk mitigation in practice. In a broader picture, the difficulty of mitigating crosstalk has recently driven QC vendors to move towards sparser qubit connectivity or disabling nearby operations entirely in hardware, which can be detrimental to performance. Our work makes the case for software mitigation of crosstalk errors.