Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

The fast pace of artificial intelligence~(AI) innovation demands an agile methodology for observation, reproduction and optimization of distributed machine learning~(ML) workload behavior in production AI systems and enables efficient software-hardware~(SW-HW) co-design for future systems. We present Chakra, an open and portable ecosystem for performance benchmarking and co-design. The core component of Chakra is an open and interoperable graph-based representation of distributed AI/ML workloads, called Chakra execution trace~(ET). These ETs represent key operations, such as compute, memory, and communication, data and control dependencies, timing, and resource constraints. Additionally, Chakra includes a complementary set of tools and capabilities to enable the collection, analysis, generation, and adoption of Chakra ETs by a broad range of simulators, emulators, and replay tools. We present analysis of Chakra ETs collected on production AI clusters and demonstrate value via real-world case studies. Chakra has been adopted by MLCommons and has active contributions and engagement across the industry, including but not limited to NVIDIA, AMD, Meta, Keysight, HPE, and Scala, to name a few.

preprint2022arXiv

Insights from a pseudospectral study of a potentially singular solution of the three-dimensional axisymmetric incompressible Euler equation

We develop a Fourier-Chebyshev pseudospectral direct numerical simulation (DNS) to examine a potentially singular solution of the radially bounded, three-dimensional (3D), axisymmetric Euler equations [G. Luo and T.Y. Hou, Proc. Natl. Acad. Sci. USA, 111.36 (2014)]. We demonstrate that: (a) the time of singularity is preceded, in any spectrally truncated DNS, by the formation of oscillatory structures called tygers, first investigated in the one-dimensional (1D) Burgers and two-dimensional (2D) Euler equations; (b) the analyticity-strip method can be generalized to obtain an estimate for the (potential) singularity time.

preprint2020arXiv

Assertion Detection in Multi-Label Clinical Text using Scope Localization

Multi-label sentences (text) in the clinical domain result from the rich description of scenarios during patient care. The state-of-theart methods for assertion detection mostly address this task in the setting of a single assertion label per sentence (text). In addition, few rules based and deep learning methods perform negation/assertion scope detection on single-label text. It is a significant challenge extending these methods to address multi-label sentences without diminishing performance. Therefore, we developed a convolutional neural network (CNN) architecture to localize multiple labels and their scopes in a single stage end-to-end fashion, and demonstrate that our model performs atleast 12% better than the state-of-the-art on multi-label clinical text.

preprint2020arXiv

Multi-boundary entanglement in Chern-Simons theory with finite gauge groups

We study the multi-boundary entanglement structure of the states prepared in (1+1) and (2+1) dimensional Chern-Simons theory with finite discrete gauge group $G$. The states in (1+1)-$d$ are associated with Riemann surfaces of genus $g$ with multiple $S^1$ boundaries and we use replica trick to compute the entanglement entropy for such states. In (2+1)-$d$, we focus on the states associated with torus link complements which live in the tensor product of Hilbert spaces associated with multiple $T^2$. We present a quantitative analysis of the entanglement structure for both abelian and non-abelian groups. For all the states considered in this work, we find that the entanglement entropy for direct product of groups is the sum of entropy for individual groups, i.e. $\text{EE}(G_1 \times G_2) = \text{EE}(G_1)+\text{EE}(G_2)$. Moreover, the reduced density matrix obtained by tracing out a subset of the total Hilbert space has a positive semidefinite partial transpose on any bi-partition of the remaining Hilbert space.

preprint2020arXiv

PACT: Privacy Sensitive Protocols and Mechanisms for Mobile Contact Tracing

The global health threat from COVID-19 has been controlled in a number of instances by large-scale testing and contact tracing efforts. We created this document to suggest three functionalities on how we might best harness computing technologies to supporting the goals of public health organizations in minimizing morbidity and mortality associated with the spread of COVID-19, while protecting the civil liberties of individuals. In particular, this work advocates for a third-party free approach to assisted mobile contact tracing, because such an approach mitigates the security and privacy risks of requiring a trusted third party. We also explicitly consider the inferential risks involved in any contract tracing system, where any alert to a user could itself give rise to de-anonymizing information. More generally, we hope to participate in bringing together colleagues in industry, academia, and civil society to discuss and converge on ideas around a critical issue rising with attempts to mitigate the COVID-19 pandemic.

preprint2020arXiv

Spatial Sharing of GPU for Autotuning DNN models

GPUs are used for training, inference, and tuning the machine learning models. However, Deep Neural Network (DNN) vary widely in their ability to exploit the full power of high-performance GPUs. Spatial sharing of GPU enables multiplexing several DNNs on the GPU and can improve GPU utilization, thus improving throughput and lowering latency. DNN models given just the right amount of GPU resources can still provide low inference latency, just as much as dedicating all of the GPU for their inference task. An approach to improve DNN inference is tuning of the DNN model. Autotuning frameworks find the optimal low-level implementation for a certain target device based on the trained machine learning model, thus reducing the DNN's inference latency and increasing inference throughput. We observe an interdependency between the tuned model and its inference latency. A DNN model tuned with specific GPU resources provides the best inference latency when inferred with close to the same amount of GPU resources. While a model tuned with the maximum amount of the GPU's resources has poorer inference latency once the GPU resources are limited for inference. On the other hand, a model tuned with an appropriate amount of GPU resources still achieves good inference latency across a wide range of GPU resource availability. We explore the causes that impact the tuning of a model at different amounts of GPU resources. We present many techniques to maximize resource utilization and improve tuning performance. We enable controlled spatial sharing of GPU to multiplex several tuning applications on the GPU. We scale the tuning server instances and shard the tuning model across multiple client instances for concurrent tuning of different operators of a model, achieving better GPU multiplexing. With our improvements, we decrease DNN autotuning time by up to 75 percent and increase throughput by a factor of 5.