Source author record

W. Ronny Huang

W. Ronny Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation and Language Computer Vision eess.AS physics.optics Sound Cryptography and Security Neural and Evolutionary Computing physics.acc-ph Artificial Intelligence

Catalog footprint

What is connected

12works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model

Capitalization normalization (truecasing) is the task of restoring the correct case (uppercase or lowercase) of noisy text. We propose a fast, accurate and compact two-level hierarchical word-and-character-based recurrent neural network model. We use the truecaser to normalize user-generated text in a Federated Learning framework for language modeling. A case-aware language model trained on this normalized text achieves the same perplexity as a model trained on text with gold capitalization. In a real user A/B experiment, we demonstrate that the improvement translates to reduced prediction error rates in a virtual keyboard application. Similarly, in an ASR language model fusion experiment, we show reduction in uppercase character error rate and word error rate.

preprint2022arXiv

Detecting Unintended Memorization in Language-Model-Fused ASR

End-to-end (E2E) models are often being accompanied by language models (LMs) via shallow fusion for boosting their overall quality as well as recognition of rare words. At the same time, several prior works show that LMs are susceptible to unintentionally memorizing rare or unique sequences in the training data. In this work, we design a framework for detecting memorization of random textual sequences (which we call canaries) in the LM training data when one has only black-box (query) access to LM-fused speech recognizer, as opposed to direct access to the LM. On a production-grade Conformer RNN-T E2E model fused with a Transformer LM, we show that detecting memorization of singly-occurring canaries from the LM training data of 300M examples is possible. Motivated to protect privacy, we also show that such memorization gets significantly reduced by per-example gradient-clipped LM training without compromising overall quality.

preprint2022arXiv

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition. A common solution is to segment the audio in advance using a separate voice activity detector (VAD) that decides segment boundary locations based purely on acoustic speech/non-speech information. VAD segmenters, however, may be sub-optimal for real-world speech where, e.g., a complete sentence that should be taken as a whole may contain hesitations in the middle ("set an alarm for... 5 o'clock"). We propose to replace the VAD with an end-to-end ASR model capable of predicting segment boundaries in a streaming fashion, allowing the segmentation decision to be conditioned not only on better acoustic features but also on semantic features from the decoded text with negligible extra computation. In experiments on real world long-form audio (YouTube) with lengths of up to 30 minutes, we demonstrate 8.5% relative WER improvement and 250 ms reduction in median end-of-segment latency compared to the VAD segmenter baseline on a state-of-the-art Conformer RNN-T model.

preprint2022arXiv

Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

Language model fusion helps smart assistants recognize words which are rare in acoustic data but abundant in text-only corpora (typed search logs). However, such corpora have properties that hinder downstream performance, including being (1) too large, (2) beset with domain-mismatched content, and (3) heavy-headed rather than heavy-tailed (excessively many duplicate search queries such as "weather"). We show that three simple strategies for selecting language modeling data can dramatically improve rare-word recognition without harming overall performance. First, to address the heavy-headedness, we downsample the data according to a soft log function, which tunably reduces high frequency (head) sentences. Second, to encourage rare-word exposure, we explicitly filter for words rare in the acoustic data. Finally, we tackle domain-mismatch via perplexity-based contrastive selection, filtering for examples matched to the target domain. We down-select a large corpus of web search queries by a factor of 53x and achieve better LM perplexities than without down-selection. When shallow-fused with a state-of-the-art, production speech engine, our LM achieves WER reductions of up to 24% relative on rare-word sentences (without changing overall WER) compared to a baseline LM trained on the raw corpus. These gains are further validated through favorable side-by-side evaluations on live voice search traffic.

preprint2021arXiv

MetaPoison: Practical General-purpose Clean-label Data Poisoning

Data poisoning -- the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data -- is an emerging threat in the context of neural networks. Existing attacks for data poisoning neural networks have relied on hand-crafted heuristics, because solving the poisoning problem directly via bilevel optimization is generally thought of as intractable for deep models. We propose MetaPoison, a first-order method that approximates the bilevel problem via meta-learning and crafts poisons that fool neural networks. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin. MetaPoison is robust: poisoned data made for one model transfer to a variety of victim models with unknown training settings and architectures. MetaPoison is general-purpose, it works not only in fine-tuning scenarios, but also for end-to-end training from scratch, which till now hasn't been feasible for clean-label attacks with deep nets. MetaPoison can achieve arbitrary adversary goals -- like using poisons of one class to make a target image don the label of another arbitrarily chosen class. Finally, MetaPoison works in the real-world. We demonstrate for the first time successful data poisoning of models trained on the black-box Google Cloud AutoML API. Code and premade poisons are provided at https://github.com/wronnyhuang/metapoison

preprint2020arXiv

Are adversarial examples inevitable?

A wide range of defenses have been proposed to harden neural networks against adversarial attacks. However, a pattern has emerged in which the majority of adversarial defenses are quickly broken by new attacks. Given the lack of success at generating robust defenses, we are led to ask a fundamental question: Are adversarial attacks inevitable? This paper analyzes adversarial examples from a theoretical perspective, and identifies fundamental bounds on the susceptibility of a classifier to adversarial attacks. We show that, for certain classes of problems, adversarial examples are inescapable. Using experiments, we explore the implications of theoretical guarantees for real-world problems and discuss how factors such as dimensionality and image complexity limit a classifier's robustness against adversarial examples.

preprint2020arXiv

Deep k-NN Defense against Clean-label Data Poisoning Attacks

Targeted clean-label data poisoning is a type of adversarial attack on machine learning systems in which an adversary injects a few correctly-labeled, minimally-perturbed samples into the training data, causing a model to misclassify a particular test sample during inference. Although defenses have been proposed for general poisoning attacks, no reliable defense for clean-label attacks has been demonstrated, despite the attacks' effectiveness and realistic applications. In this work, we propose a simple, yet highly-effective Deep k-NN defense against both feature collision and convex polytope clean-label attacks on the CIFAR-10 dataset. We demonstrate that our proposed strategy is able to detect over 99% of poisoned examples in both attacks and remove them without compromising model performance. Additionally, through ablation studies, we discover simple guidelines for selecting the value of k as well as for implementing the Deep k-NN defense on real-world datasets with class imbalance. Our proposed defense shows that current clean-label poisoning attack strategies can be annulled, and serves as a strong yet simple-to-implement baseline defense to test future clean-label poisoning attacks. Our code is available at https://github.com/neeharperi/DeepKNNDefense

preprint2020arXiv

DeepErase: Weakly Supervised Ink Artifact Removal in Document Text Images

Paper-intensive industries like insurance, law, and government have long leveraged optical character recognition (OCR) to automatically transcribe hordes of scanned documents into text strings for downstream processing. Even in 2019, there are still many scanned documents and mail that come into businesses in non-digital format. Text to be extracted from real world documents is often nestled inside rich formatting, such as tabular structures or forms with fill-in-the-blank boxes or underlines whose ink often touches or even strikes through the ink of the text itself. Further, the text region could have random ink smudges or spurious strokes. Such ink artifacts can severely interfere with the performance of recognition algorithms or other downstream processing tasks. In this work, we propose DeepErase, a neural-based preprocessor to erase ink artifacts from text images. We devise a method to programmatically assemble real text images and real artifacts into realistic-looking "dirty" text images, and use them to train an artifact segmentation network in a weakly supervised manner, since pixel-level annotations are automatically obtained during the assembly process. In addition to high segmentation accuracy, we show that our cleansed images achieve a significant boost in recognition accuracy by popular OCR software such as Tesseract 4.0. Finally, we test DeepErase on out-of-distribution datasets (NIST SDB) of scanned IRS tax return forms and achieve double-digit improvements in accuracy. All experiments are performed on both printed and handwritten text. Code for all experiments is available at https://github.com/yikeqicn/DeepErase

preprint2020arXiv

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

This paper studies how neural network architecture affects the speed of training. We introduce a simple concept called gradient confusion to help formally analyze this. When gradient confusion is high, stochastic gradients produced by different data samples may be negatively correlated, slowing down convergence. But when gradient confusion is low, data samples interact harmoniously, and training proceeds quickly. Through theoretical and experimental results, we demonstrate how the neural network architecture affects gradient confusion, and thus the efficiency of training. Our results show that, for popular initialization techniques, increasing the width of neural networks leads to lower gradient confusion, and thus faster model training. On the other hand, increasing the depth of neural networks has the opposite effect. Our results indicate that alternate initialization techniques or networks using both batch normalization and skip connections help reduce the training burden of very deep networks.

preprint2016arXiv

Terahertz-driven, all-optical electron gun

Ultrashort electron beams with narrow energy spread, high charge, and low jitter are essential for resolving phase transitions in metals, semiconductors, and molecular crystals. These semirelativistic beams, produced by phototriggered electron guns, are also injected into accelerators for x-ray light sources. The achievable resolution of these time-resolved electron diffraction or x-ray experiments has been hindered by surface field and timing jitter limitations in conventional RF guns, which thus far are <200 MV/m and >96 fs, respectively. A gun driven by optically-generated single-cycle THz pulses provides a practical solution to enable not only GV/m surface fields but also absolute timing stability, since the pulses are generated by the same laser as the phototrigger. Here, we demonstrate an all-optical THz gun yielding peak electron energies approaching 1 keV, accelerated by 300 MV/m THz fields in a novel micron-scale waveguide structure. We also achieve quasimonoenergetic, sub-keV bunches with 32 fC of charge, which can already be used for time-resolved low-energy electron diffraction. Such ultracompact, easy to implement guns driven by intrinsically synchronized THz pulses that are pumped by an amplified arm of the already present photoinjector laser provide a new tool with potential to transform accelerator based science.

preprint2015arXiv

THz generation using a reflective stair-step echelon

We present a novel method for THz generation in lithium niobate using a reflective stair-step echelon structure. The echelon produces a discretely tilted pulse front with less angular dispersion compared to a high groove-density grating. The THz output was characterized using both a 1-lens and 3-lens imaging system to set the tilt angle at room and cryogenic temperatures. Using broadband 800 nm pulses with a pulse energy of 0.95 mJ and a pulse duration of 70 fs (24 nm FWHM bandwidth, 39 fs transform limited width), we produced THz pulses with field strengths as high as 500 kV/cm and pulse energies as high as 3.1 $μ$J. The highest conversion efficiency we obtained was 0.33%. In addition, we find that the echelon is easily implemented into an experimental setup for quick alignment and optimization.

preprint2015arXiv

Toward a terahertz-driven electron gun

Femtosecond electron bunches with keV energies and eV energy spread are needed by condensed matter physicists to resolve state transitions in carbon nanotubes, molecular structures, organic salts, and charge density wave materials. These semirelativistic electron sources are not only of interest for ultrafast electron diffraction, but also for electron energy-loss spectroscopy and as a seed for x-ray FELs. Thus far, the output energy spread (hence pulse duration) of ultrafast electron guns has been limited by the achievable electric field at the surface of the emitter, which is 10 MV/m for DC guns and 200 MV/m for RF guns. A single-cycle THz electron gun provides a unique opportunity to not only achieve GV/m surface electric fields but also with relatively low THz pulse energies, since a single-cycle transform-limited waveform is the most efficient way to achieve intense electric fields. Here, electron bunches of 50 fC from a flat copper photocathode are accelerated from rest to tens of eV by a microjoule THz pulse with peak electric field of 72 MV/m at 1 kHz repetition rate. We show that scaling to the readily-available GV/m THz field regime would translate to monoenergetic electron beams of ~100 keV.

W. Ronny Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model

Detecting Unintended Memorization in Language-Model-Fused ASR

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

MetaPoison: Practical General-purpose Clean-label Data Poisoning

Are adversarial examples inevitable?

Deep k-NN Defense against Clean-label Data Poisoning Attacks

DeepErase: Weakly Supervised Ink Artifact Removal in Document Text Images

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

Terahertz-driven, all-optical electron gun

THz generation using a reflective stair-step echelon

Toward a terahertz-driven electron gun