Source author record

Antonio González

Antonio González appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.CO Hardware Architecture eess.AS Machine Learning Neural and Evolutionary Computing Sound Computation and Language

Catalog footprint

What is connected

10works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

ASRPU: A Programmable Accelerator for Low-Power Automatic Speech Recognition

The outstanding accuracy achieved by modern Automatic Speech Recognition (ASR) systems is enabling them to quickly become a mainstream technology. ASR is essential for many applications, such as speech-based assistants, dictation systems and real-time language translation. However, highly accurate ASR systems are computationally expensive, requiring on the order of billions of arithmetic operations to decode each second of audio, which conflicts with a growing interest in deploying ASR on edge devices. On these devices, hardware acceleration is key for achieving acceptable performance. However, ASR is a rich and fast-changing field, and thus, any overly specialized hardware accelerator may quickly become obsolete. In this paper, we tackle those challenges by proposing ASRPU, a programmable accelerator for on-edge ASR. ASRPU contains a pool of general-purpose cores that execute small pieces of parallel code. Each of these programs computes one part of the overall decoder (e.g. a layer in a neural network). The accelerator automates some carefully chosen parts of the decoder to simplify the programming without sacrificing generality. We provide an analysis of a modern ASR system implemented on ASRPU and show that this architecture can achieve real-time decoding with a very low power budget.

preprint2022arXiv

Mixture-of-Rookies: Saving DNN Computations by Predicting ReLU Outputs

Deep Neural Networks (DNNs) are widely used in many applications domains. However, they require a vast amount of computations and memory accesses to deliver outstanding accuracy. In this paper, we propose a scheme to predict whether the output of each ReLu activated neuron will be a zero or a positive number in order to skip the computation of those neurons that will likely output a zero. Our predictor, named Mixture-of-Rookies, combines two inexpensive components. The first one exploits the high linear correlation between binarized (1-bit) and full-precision (8-bit) dot products, whereas the second component clusters together neurons that tend to output zero at the same time. We propose a novel clustering scheme based on the analysis of angles, as the sign of the dot product of two vectors depends on the cosine of the angle between them. We implement our hybrid zero output predictor on top of a state-of-the-art DNN accelerator. Experimental results show that our scheme introduces a small area overhead of 5.3% while achieving a speedup of 1.2x and reducing energy consumption by 16.5% on average for a set of diverse DNNs.

preprint2022arXiv

Saving RNN Computations with a Neuron-Level Fuzzy Memoization Scheme

Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation. Unlike conventional feed-forward DNNs, RNNs remember past information to improve the accuracy of future predictions and, therefore, they are very effective for sequence processing problems. For each application run, recurrent layers are executed many times for processing a potentially large sequence of inputs (words, images, audio frames, etc.). In this paper, we observe that the output of a neuron exhibits small changes in consecutive invocations.~We exploit this property to build a neuron-level fuzzy memoization scheme, which dynamically caches each neuron's output and reuses it whenever it is predicted that the current output will be similar to a previously computed result, avoiding in this way the output computations. The main challenge in this scheme is determining whether the new neuron's output for the current input in the sequence will be similar to a recently computed result. To this end, we extend the recurrent layer with a much simpler Bitwise Neural Network (BNN), and show that the BNN and RNN outputs are highly correlated: if two BNN outputs are very similar, the corresponding outputs in the original RNN layer are likely to exhibit negligible changes. The BNN provides a low-cost and effective mechanism for deciding when fuzzy memoization can be applied with a small impact on accuracy. We evaluate our memoization scheme on top of a state-of-the-art accelerator for RNNs, for a variety of different neural networks from multiple application domains. We show that our technique avoids more than 26.7\% of computations, resulting in 21\% energy savings and 1.4x speedup on average.

preprint2021arXiv

Exploiting Beam Search Confidence for Energy-Efficient Speech Recognition

With computers getting more and more powerful and integrated in our daily lives, the focus is increasingly shifting towards more human-friendly interfaces, making Automatic Speech Recognition (ASR) a central player as the ideal means of interaction with machines. Consequently, interest in speech technology has grown in the last few years, with more systems being proposed and higher accuracy levels being achieved, even surpassing \textit{Human Accuracy}. While ASR systems become increasingly powerful, the computational complexity also increases, and the hardware support have to keep pace. In this paper, we propose a technique to improve the energy-efficiency and performance of ASR systems, focusing on low-power hardware for edge devices. We focus on optimizing the DNN-based Acoustic Model evaluation, as we have observed it to be the main bottleneck in state-of-the-art ASR systems, by leveraging run-time information from the Beam Search. By doing so, we reduce energy and execution time of the acoustic model evaluation by 25.6% and 25.9%, respectively, with negligible accuracy loss.

preprint2016arXiv

New results on metric-locating-dominating sets of graphs

A dominating set $S$ of a graph is a metric-locating-dominating set if each vertex of the graph is uniquely distinguished by its distances from the elements of $S$, and the minimum cardinality of such a set is called the metric-location-domination number. In this paper, we undertake a study that, in general graphs and specific families, relates metric-locating-dominating sets to other special sets: resolving sets, dominating sets, locating-dominating sets and doubly resolving sets. We first characterize classes of trees according to certain relationships between their metric-location-domination number and their metric dimension and domination number. Then, we show different methods to transform metric-locating-dominating sets into locating-dominating sets and doubly resolving sets. Our methods produce new bounds on the minimum cardinalities of all those sets, some of them involving parameters that have not been related so far.

preprint2016arXiv

Shortcut sets for the locus of plane Euclidean networks

We study the problem of augmenting the locus $\mathcal{N}_{\ell}$ of a plane Euclidean network $\mathcal{N}$ by inserting iteratively a finite set of segments, called \emph{shortcut set}, while reducing the diameter of the locus of the resulting network. There are two main differences with the classical augmentation problems: the endpoints of the segments are allowed to be points of $\mathcal{N}_{\ell}$ as well as points of the previously inserted segments (instead of only vertices of $\mathcal{N}$), and the notion of diameter is adapted to the fact that we deal with $\mathcal{N}_{\ell}$ instead of $\mathcal{N}$. This increases enormously the hardness of the problem but also its possible practical applications to network design. Among other results, we characterize the existence of shortcut sets, compute them in polynomial time, and analyze the role of the convex hull of $\mathcal{N}_{\ell}$ when inserting a shortcut set. Our main results prove that, while the problem of minimizing the size of a shortcut set is NP-hard, one can always determine in polynomial time whether inserting only one segment suffices to reduce the diameter.

preprint2014arXiv

Resolving sets for breaking symmetries of graphs

This paper deals with the maximum value of the difference between the determining number and the metric dimension of a graph as a function of its order. Our technique requires to use locating-dominating sets, and perform an independent study on other functions related to these sets. Thus, we obtain lower and upper bounds on all these functions by means of very diverse tools. Among them are some adequate constructions of graphs, a variant of a classical result in graph domination and a polynomial time algorithm that produces both distinguishing sets and determining sets. Further, we consider specific families of graphs where the restrictions of these functions can be computed. To this end, we utilize two well-known objects in graph theory: $k$-dominating sets and matchings.

preprint2013arXiv

The resolving number of a graph

We study a graph parameter related to resolving sets and metric dimension, namely the resolving number, introduced by Chartrand, Poisson and Zhang. First, we establish an important difference between the two parameters: while computing the metric dimension of an arbitrary graph is known to be NP-hard, we show that the resolving number can be computed in polynomial time. We then relate the resolving number to classical graph parameters: diameter, girth, clique number, order and maximum degree. With these relations in hand, we characterize the graphs with resolving number 3 extending other studies that provide characterizations for smaller resolving number.

preprint2012arXiv

On the metric dimension, the upper dimension and the resolving number of graphs

This paper deals with three resolving parameters: the metric dimension, the upper dimension and the resolving number. We first answer a question raised by Chartrand and Zhang asking for a characterization of the graphs with equal metric dimension and resolving number. We also solve in the affirmative a conjecture posed by Chartrand, Poisson and Zhang about the realization of the metric dimension and the upper dimension. Finally we prove that no integer $a\geq 4$ is realizable as the resolving number of an infinite family of graphs.

preprint2012arXiv

Resolving sets for Johnson and Kneser graphs

A set of vertices $S$ in a graph $G$ is a {\em resolving set} for $G$ if, for any two vertices $u,v$, there exists $x\in S$ such that the distances $d(u,x) \neq d(v,x)$. In this paper, we consider the Johnson graphs $J(n,k)$ and Kneser graphs $K(n,k)$, and obtain various constructions of resolving sets for these graphs. As well as general constructions, we show that various interesting combinatorial objects can be used to obtain resolving sets in these graphs, including (for Johnson graphs) projective planes and symmetric designs, as well as (for Kneser graphs) partial geometries, Hadamard matrices, Steiner systems and toroidal grids.

Antonio González

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

ASRPU: A Programmable Accelerator for Low-Power Automatic Speech Recognition

Mixture-of-Rookies: Saving DNN Computations by Predicting ReLU Outputs

Saving RNN Computations with a Neuron-Level Fuzzy Memoization Scheme

Exploiting Beam Search Confidence for Energy-Efficient Speech Recognition

New results on metric-locating-dominating sets of graphs

Shortcut sets for the locus of plane Euclidean networks

Resolving sets for breaking symmetries of graphs

The resolving number of a graph

On the metric dimension, the upper dimension and the resolving number of graphs

Resolving sets for Johnson and Kneser graphs