Source author record

Shimeng Yu

Shimeng Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence cond-mat.mtrl-sci cond-mat.mes-hall eess.SP eess.SY Emerging Technologies Hardware Architecture Human-Computer Interaction Machine Learning Neural and Evolutionary Computing physics.app-ph Robotics Systems and Control

Catalog footprint

What is connected

7works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architecture

Currently, most social robots interact with their surroundings and humans through sensors that are integral parts of the robots, which limits the usability of the sensors, human-robot interaction, and interchangeability. A wearable sensor garment that fits many robots is needed in many applications. This article presents an affordable wearable sensor vest, and an open-source software architecture with the Internet of Things (IoT) for social humanoid robots. The vest consists of touch, temperature, gesture, distance, vision sensors, and a wireless communication module. The IoT feature allows the robot to interact with humans locally and over the Internet. The designed architecture works for any social robot that has a general-purpose graphics processing unit (GPGPU), I2C/SPI buses, Internet connection, and the Robotics Operating System (ROS). The modular design of this architecture enables developers to easily add/remove/update complex behaviors. The proposed software architecture provides IoT technology, GPGPU nodes, I2C and SPI bus mangers, audio-visual interaction nodes (speech to text, text to speech, and image understanding), and isolation between behavior nodes and other nodes. The proposed IoT solution consists of related nodes in the robot, a RESTful web service, and user interfaces. We used the HTTP protocol as a means of two-way communication with the social robot over the Internet. Developers can easily edit or add nodes in C, C++, and Python programming languages. Our architecture can be used for designing more sophisticated behaviors for social humanoid robots.

preprint2021arXiv

Antiferroelectric negative capacitance from a structural phase transition in zirconia

Crystalline materials with broken inversion symmetry can exhibit a spontaneous electric polarization, which originates from a microscopic electric dipole moment. Long-range polar or anti-polar order of such permanent dipoles gives rise to ferroelectricity or antiferroelectricity, respectively. However, the recently discovered antiferroelectrics of fluorite structure (HfO$_2$ and ZrO$_2$) are different: A non-polar phase transforms into a polar phase by spontaneous inversion symmetry breaking upon the application of an electric field. Here, we show that this structural transition in antiferroelectric ZrO$_2$ gives rise to a negative capacitance, which is promising for overcoming the fundamental limits of energy efficiency in electronics. Our findings provide insight into the thermodynamically 'forbidden' region of the antiferroelectric transition in ZrO$_2$ and extend the concept of negative capacitance beyond ferroelectricity. This shows that negative capacitance is a more general phenomenon than previously thought and can be expected in a much broader range of materials exhibiting structural phase transitions.

preprint2021arXiv

Logic Compatible High-Performance Ferroelectric Transistor Memory

Silicon ferroelectric field-effect transistors (FeFETs) with low-k interfacial layer (IL) between ferroelectric gate stack and silicon channel suffers from high write voltage, limited write endurance and large read-after-write latency due to early IL breakdown and charge trapping and detrapping at the interface. We demonstrate low voltage, high speed memory operation with high write endurance using an IL-free back-end-of-line (BEOL) compatible FeFET. We fabricate IL-free FeFETs with 28nm channel length and 126nm width under a thermal budget <400C by integrating 5nm thick Hf0.5Zr0.5O2 gate stack with amorphous Indium Tungsten Oxide (IWO) semiconductor channel. We report 1.2V memory window and read current window of 10^5 for program and erase, write latency of 20ns with +/-2V write pulses, read-after-write latency <200ns, write endurance cycles exceeding 5x10^10 and 2-bit/cell programming capability. Array-level analysis establishes IL-free BEOL FeFET as a promising candidate for logic-compatible high-performance on-chip buffer memory and multi-bit weight cell for compute-in-memory accelerators.

preprint2020arXiv

DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training

DNN+NeuroSim is an integrated framework to benchmark compute-in-memory (CIM) accelerators for deep neural networks, with hierarchical design options from device-level, to circuit-level and up to algorithm-level. A python wrapper is developed to interface NeuroSim with a popular machine learning platform: Pytorch, to support flexible network structures. The framework provides automatic algorithm-to-hardware mapping, and evaluates chip-level area, energy efficiency and throughput for training or inference, as well as training/inference accuracy with hardware constraints. Our prior work (DNN+NeuroSim V1.1) was developed to estimate the impact of reliability in synaptic devices, and analog-to-digital converter (ADC) quantization loss on the accuracy and hardware performance of inference engines. In this work, we further investigated the impact of the analog emerging non-volatile memory non-ideal device properties for on-chip training. By introducing the nonlinearity, asymmetry, device-to-device and cycle-to-cycle variation of weight update into the python wrapper, and peripheral circuits for error/weight gradient computation in NeuroSim core, we benchmarked CIM accelerators based on state-of-the-art SRAM and eNVM devices for VGG-8 on CIFAR-10 dataset, revealing the crucial specs of synaptic devices for on-chip training. The proposed DNN+NeuroSim V2.0 framework is available on GitHub.

preprint2020arXiv

New Security Challenges on Machine Learning Inference Engine: Chip Cloning and Model Reverse Engineering

Machine learning inference engine is of great interest to smart edge computing. Compute-in-memory (CIM) architecture has shown significant improvements in throughput and energy efficiency for hardware acceleration. Emerging non-volatile memory technologies offer great potential for instant on and off by dynamic power gating. Inference engine is typically pre-trained by the cloud and then being deployed to the filed. There are new attack models on chip cloning and neural network model reverse engineering. In this paper, we propose countermeasures to the weight cloning and input-output pair attacks. The first strategy is the weight fine-tune to compensate the analog-to-digital converter (ADC) offset for a specific chip instance while inducing significant accuracy drop for cloned chip instances. The second strategy is the weight shuffle and fake rows insertion to allow accurate propagation of the activations of the neural network only with a key.

preprint2020arXiv

SMART Paths for Latency Reduction in ReRAM Processing-In-Memory Architecture for CNN Inference

This research work proposes a design of an analog ReRAM-based PIM (processing-in-memory) architecture for fast and efficient CNN (convolutional neural network) inference. For the overall architecture, we use the basic hardware hierarchy such as node, tile, core, and subarray. On the top of that, we design intra-layer pipelining, inter-layer pipelining, and batch pipelining to exploit parallelism in the architecture and increase overall throughput for the inference of an input image stream. We also optimize the performance of the NoC (network-on-chip) routers by decreasing hop counts using SMART (single-cycle multi-hop asynchronous repeated traversal) flow control. Finally, we experiment with weight replications for different CNN layers in VGG (A-E) for large-scale data set ImageNet. In our simulation, we achieve 40.4027 TOPS (tera-operations per second) for the best-case performance, which corresponds to over 1029 FPS (frames per second). We also achieve 3.5914 TOPS/W (tera-operaions per second per watt) for the best-case energy efficiency. In addition, the architecture with aggressive pipelining and weight replications can achieve 14X speedup compared to the baseline architecture with basic pipelining, and SMART flow control achieves 1.08X speedup in this architecture compared to the baseline. Last but not least, we also evaluate the performance of SMART flow control using synthetic traffic.

preprint2016arXiv

Device and System Level Design Considerations for Analog-Non-Volatile-Memory Based Neuromorphic Architectures

This paper gives an overview of recent progress in the brain inspired computing field with a focus on implementation using emerging memories as electronic synapses. Design considerations and challenges such as requirements and design targets on multilevel states, device variability, programming energy, array-level connectivity, fan-in/fanout, wire energy, and IR drop are presented. Wires are increasingly important in design decisions, especially for large systems, and cycle-to-cycle variations have large impact on learning performance.

Shimeng Yu

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architecture

Antiferroelectric negative capacitance from a structural phase transition in zirconia

Logic Compatible High-Performance Ferroelectric Transistor Memory

DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training

New Security Challenges on Machine Learning Inference Engine: Chip Cloning and Model Reverse Engineering

SMART Paths for Latency Reduction in ReRAM Processing-In-Memory Architecture for CNN Inference

Device and System Level Design Considerations for Analog-Non-Volatile-Memory Based Neuromorphic Architectures