Source author record

Adam Paszke

Adam Paszke appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Discrete Mathematics Distributed, Parallel, and Cluster Computing Formal Languages and Automata Theory Logic in Computer Science Programming Languages

Catalog footprint

What is connected

4works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Tensors Fitting Perfectly

Multidimensional arrays (NDArrays) are a central abstraction in modern scientific computing environments. Unfortunately, they can make reasoning about programs harder as the number of different array shapes used in an execution of a program is usually very large, and they rarely appear explicitly in program text. To make things worse, many operators make implicit assumptions about the shapes of their inputs: array addition is commonly enriched with broadcasting semantics, while matrix multiplication assumes that the lengths of contracted dimensions are equal. Because precise reasoning about shapes is crucial to write correct programs using NDArrays, and because shapes are often hard to infer from a quick glance at the program, we developed Tensors Fitting Perfectly, a static analysis tool that reasons about NDArray shapes in Swift for TensorFlow programs by synthesizing a set of shape constraints from an abstract interpretation of the program. It can both (1) check for possible inconsistencies, and (2) provide direct insights about the shapes of intermediate values appearing in the program, including via a mechanism called shape holes. The static analysis works in concert with optional runtime assertions to improve the productivity of program authors.

preprint2020arXiv

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

This paper presents the design, implementation, and evaluation of the PyTorch distributed data parallel module. PyTorch is a widely-adopted scientific computing package used in deep learning research and applications. Recent advances in deep learning argue for the value of large datasets and large models, which necessitates the ability to scale out model training to more computational resources. Data parallelism has emerged as a popular solution for distributed training thanks to its straightforward principle and broad applicability. In general, the technique of distributed data parallelism replicates the model on every computational resource to generate gradients independently and then communicates those gradients at each iteration to keep model replicas consistent. Despite the conceptual simplicity of the technique, the subtle dependencies between computation and communication make it non-trivial to optimize the distributed training efficiency. As of v1.5, PyTorch natively provides several techniques to accelerate distributed data parallel, including bucketing gradients, overlapping computation with communication, and skipping gradient synchronization. Evaluations show that, when configured appropriately, the PyTorch distributed data parallel module attains near-linear scalability using 256 GPUs.

preprint2020arXiv

VC density of set systems defnable in tree-like graphs

We study set systems definable in graphs using variants of logic with different expressive power. Our focus is on the notion of Vapnik-Chervonenkis density: the smallest possible degree of a polynomial bounding the cardinalities of restrictions of such set systems. On one hand, we prove that if $φ(\bar x,\bar y)$ is a fixed CMSO$_1$ formula and $\cal C$ is a class of graphs with uniformly bounded cliquewidth, then the set systems defined by $φ$ in graphs from $\cal C$ have VC density at most $|\bar y|$, which is the smallest bound that one could expect. We also show an analogous statement for the case when $φ(\bar x,\bar y)$ is a CMSO$_2$ formula and $\cal C$ is a class of graphs with uniformly bounded treewidth. We complement these results by showing that if $\cal C$ has unbounded cliquewidth (respectively, treewidth), then, under some mild technical assumptions on $\cal C$, the set systems definable by CMSO$_1$ (respectively, CMSO$_2$) formulas in graphs from $\cal C$ may have unbounded VC dimension, hence also unbounded VC density.

preprint2016arXiv

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

The ability to perform pixel-wise semantic segmentation in real-time is of paramount importance in mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point operations and have long run-times that hinder their usability. In this paper, we propose a novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation. ENet is up to 18$\times$ faster, requires 75$\times$ less FLOPs, has 79$\times$ less parameters, and provides similar or better accuracy to existing models. We have tested it on CamVid, Cityscapes and SUN datasets and report on comparisons with existing state-of-the-art methods, and the trade-offs between accuracy and processing time of a network. We present performance measurements of the proposed architecture on embedded systems and suggest possible software improvements that could make ENet even faster.