Source author record

Peng Chang

Peng Chang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Robotics Sound Computation and Language Machine Learning

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment

Automatic pronunciation assessment is an important technology to help self-directed language learners. While pronunciation quality has multiple aspects including accuracy, fluency, completeness, and prosody, previous efforts typically only model one aspect (e.g., accuracy) at one granularity (e.g., at the phoneme-level). In this work, we explore modeling multi-aspect pronunciation assessment at multiple granularities. Specifically, we train a Goodness Of Pronunciation feature-based Transformer (GOPT) with multi-task learning. Experiments show that GOPT achieves the best results on speechocean762 with a public automatic speech recognition (ASR) acoustic model trained on Librispeech.

preprint2021arXiv

CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition

We propose a CTC alignment-based single step non-autoregressive transformer (CASS-NAT) for speech recognition. Specifically, the CTC alignment contains the information of (a) the number of tokens for decoder input, and (b) the time span of acoustics for each token. The information are used to extract acoustic representation for each token in parallel, referred to as token-level acoustic embedding which substitutes the word embedding in autoregressive transformer (AT) to achieve parallel generation in decoder. During inference, an error-based alignment sampling method is proposed to be applied to the CTC output space, reducing the WER and retaining the parallelism as well. Experimental results show that the proposed method achieves WERs of 3.8%/9.1% on Librispeech test clean/other dataset without an external LM, and a CER of 5.8% on Aishell1 Mandarin corpus, respectively1. Compared to the AT baseline, the CASS-NAT has a performance reduction on WER, but is 51.2x faster in terms of RTF. When decoding with an oracle CTC alignment, the lower bound of WER without LM reaches 2.3% on the test-clean set, indicating the potential of the proposed method.

preprint2020arXiv

Model-Based Manipulation of Linear Flexible Objects with Visual Curvature Feedback

Manipulation of deformable objects is a desired skill in making robots ubiquitous in manufacturing, service, healthcare, and security. Deformable objects are common in our daily lives, e.g., wires, clothes, bed sheets, etc., and are significantly more difficult to model than rigid objects. In this study, we investigate vision-based manipulation of linear flexible objects such as cables. We propose a geometric modeling method that is based on visual feedback to develop a general representation of the linear flexible object that is subject to gravity. The model characterizes the shape of the object by combining the curvatures on two projection planes. In this approach, we achieve tracking of the position and orientation (pose) of a cable-like object, the pose of its tip, and the pose of the selected grasp point on the object, which enables closed-loop manipulation of the object. We demonstrate the feasibility of our approach by completing the Plug Task used in the 2015 DARPA Robotics Challenge Finals, which involves unplugging a power cable from one socket and plugging it into another. Experiments show that we can successfully complete the task autonomously within 30 seconds.

preprint2020arXiv

Sim2Real2Sim: Bridging the Gap Between Simulation and Real-World in Flexible Object Manipulation

This paper addresses a new strategy called Simulation-to-Real-to-Simulation (Sim2Real2Sim) to bridge the gap between simulation and real-world, and automate a flexible object manipulation task. This strategy consists of three steps: (1) using the rough environment with the estimated models to develop the methods to complete the manipulation task in the simulation; (2) applying the methods from simulation to real-world and comparing their performance; (3) updating the models and methods in simulation based on the differences between the real world and the simulation. The Plug Task from the 2015 DARPA Robotics Challenge Finals is chosen to evaluate our Sim2Real2Sim strategy. A new identification approach for building the model of the linear flexible objects is derived from real-world to simulation. The automation of the DRC plug task in both simulation and real-world proves the success of the Sim2Real2Sim strategy. Numerical experiments are implemented to validate the simulated model.

Peng Chang

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment

CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition

Model-Based Manipulation of Linear Flexible Objects with Visual Curvature Feedback

Sim2Real2Sim: Bridging the Gap Between Simulation and Real-World in Flexible Object Manipulation