Source author record

Naveen Verma

Naveen Verma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SY Systems and Control Artificial Intelligence Computer Vision cs.CY Emerging Technologies Hardware Architecture Machine Learning Networking and Internet Architecture Robotics

Catalog footprint

What is connected

5works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Neural Network Training on In-memory-computing Hardware with Radix-4 Gradients

Deep learning training involves a large number of operations, which are dominated by high dimensionality Matrix-Vector Multiplies (MVMs). This has motivated hardware accelerators to enhance compute efficiency, but where data movement and accessing are proving to be key bottlenecks. In-Memory Computing (IMC) is an approach with the potential to overcome this, whereby computations are performed in-place within dense 2-D memory. However, IMC fundamentally trades efficiency and throughput gains for dynamic-range limitations, raising distinct challenges for training, where compute precision requirements are seen to be substantially higher than for inferencing. This paper explores training on IMC hardware by leveraging two recent developments: (1) a training algorithm enabling aggressive quantization through a radix-4 number representation; (2) IMC leveraging compute based on precision capacitors, whereby analog noise effects can be made well below quantization effects. Energy modeling calibrated to a measured silicon prototype implemented in 16nm CMOS shows that energy savings of over 400x can be achieved with full quantizer adaptability, where all training MVMs can be mapped to IMC, and 3x can be achieved for two-level quantizer adaptability, where two of the three training MVMs can be mapped to IMC.

preprint2022arXiv

Scalable Simulation and Demonstration of Jumping Piezoelectric 2-D Soft Robots

Soft robots have drawn great interest due to their ability to take on a rich range of shapes and motions, compared to traditional rigid robots. However, the motions, and underlying statics and dynamics, pose significant challenges to forming well-generalized and robust models necessary for robot design and control. In this work, we demonstrate a five-actuator soft robot capable of complex motions and develop a scalable simulation framework that reliably predicts robot motions. The simulation framework is validated by comparing its predictions to experimental results, based on a robot constructed from piezoelectric layers bonded to a steel-foil substrate. The simulation framework exploits the physics engine PyBullet, and employs discrete rigid-link elements connected by motors to model the actuators. We perform static and AC analyses to validate a single-unit actuator cantilever setup and observe close agreement between simulation and experiments for both the cases. The analyses are extended to the five-actuator robot, where simulations accurately predict the static and AC robot motions, including shapes for applied DC voltage inputs, nearly-static "inchworm" motion, and jumping (in vertical as well as vertical and horizontal directions). These motions exhibit complex non-linear behavior, with forward robot motion reaching ~1 cm/s. Our open-source code can be found at: https://github.com/zhiwuz/sfers.

preprint2021arXiv

REITS: Reflective Surface for Intelligent Transportation Systems

Autonomous vehicles are predicted to dominate the transportation industry in the foreseeable future. Safety is one of the major challenges to the early deployment of self-driving systems. To ensure safety, self-driving vehicles must sense and detect humans, other vehicles, and road infrastructure accurately, robustly, and timely. However, existing sensing techniques used by self-driving vehicles may not be absolutely reliable. In this paper, we design REITS, a system to improve the reliability of RF-based sensing modules for autonomous vehicles. We conduct theoretical analysis on possible failures of existing RF-based sensing systems. Based on the analysis, REITS adopts a multi-antenna design, which enables constructive blind beamforming to return an enhanced radar signal in the incident direction. REITS can also let the existing radar system sense identification information by switching between constructive beamforming state and destructive beamforming state. Preliminary results show that REITS improves the detection distance of a self-driving car radar by a factor of 3.63.

preprint2020arXiv

Nanotechnology-inspired Information Processing Systems of the Future

Nanoscale semiconductor technology has been a key enabler of the computing revolution. It has done so via advances in new materials and manufacturing processes that resulted in the size of the basic building block of computing systems - the logic switch and memory devices - being reduced into the nanoscale regime. Nanotechnology has provided increased computing functionality per unit volume, energy, and cost. In order for computing systems to continue to deliver substantial benefits for the foreseeable future to society at large, it is critical that the very notion of computing be examined in the light of nanoscale realities. In particular, one needs to ask what it means to compute when the very building block - the logic switch - no longer exhibits the level of determinism required by the von Neumann architecture. There needs to be a sustained and heavy investment in a nation-wide Vertically Integrated Semiconductor Ecosystem (VISE). VISE is a program in which research and development is conducted seamlessly across the entire compute stack - from applications, systems and algorithms, architectures, circuits and nanodevices, and materials. A nation-wide VISE provides clear strategic advantages in ensuring the US's global superiority in semiconductors. First, a VISE provides the highest quality seed-corn for nurturing transformative ideas that are critically needed today in order for nanotechnology-inspired computing to flourish. It does so by dramatically opening up new areas of semiconductor research that are inspired and driven by new application needs. Second, a VISE creates a very high barrier to entry from foreign competitors because it is extremely hard to establish, and even harder to duplicate.

preprint2018arXiv

A Microprocessor implemented in 65nm CMOS with Configurable and Bit-scalable Accelerator for Programmable In-memory Computing

This paper presents a programmable in-memory-computing processor, demonstrated in a 65nm CMOS technology. For data-centric workloads, such as deep neural networks, data movement often dominates when implemented with today's computing architectures. This has motivated spatial architectures, where the arrangement of data-storage and compute hardware is distributed and explicitly aligned to the computation dataflow, most notably for matrix-vector multiplication. In-memory computing is a spatial architecture where processing elements correspond to dense bit cells, providing local storage and compute, typically employing analog operation. Though this raises the potential for high energy efficiency and throughput, analog operation has significantly limited robustness, scale, and programmability. This paper describes a 590kb in-memory-computing accelerator integrated in a programmable processor architecture, by exploiting recent approaches to charge-domain in-memory computing. The architecture takes the approach of tight coupling with an embedded CPU, through accelerator interfaces enabling integration in the standard processor memory space. Additionally, a near-memory-computing datapath both enables diverse computations locally, to address operations required across applications, and enables bit-precision scalability for matrix/input-vector elements, through a bit-parallel/bit-serial (BP/BS) scheme. Chip measurements show an energy efficiency of 152/297 1b-TOPS/W and throughput of 4.7/1.9 1b-TOPS (scaling linearly with the matrix/input-vector element precisions) at VDD of 1.2/0.85V. Neural network demonstrations with 1-b/4-b weights and activations for CIFAR-10 classification consume 5.3/105.2 $μ$J/image at 176/23 fps, with accuracy at the level of digital/software implementation (89.3/92.4 $\%$ accuracy).