Source author record

David Cox

David Cox appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision Machine Learning Computation and Language math.AC math.AG Neurons and Cognition cs.CY Databases eess.AS Information Theory math.IT Neural and Evolutionary Computing Sound

Catalog footprint

What is connected

12works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks of SSL learning in speech largely focus on the content information in speech, the most desirable speech representations should be able to disentangle unwanted variations, such as speaker variations, from the content. However, disentangling speakers is very challenging, because removing the speaker information could easily result in a loss of content as well, and the damage of the latter usually far outweighs the benefit of the former. In this paper, we propose a new SSL method that can achieve speaker disentanglement without severe loss of content. Our approach is adapted from the HuBERT framework, and incorporates disentangling mechanisms to regularize both the teacher labels and the learned representations. We evaluate the benefit of speaker disentanglement on a set of content-related downstream tasks, and observe a consistent and notable performance advantage of our speaker-disentangled representations.

preprint2022arXiv

VALHALLA: Visual Hallucination for Machine Translation

Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation. In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text, and the combined text and hallucinated representations are utilized to obtain the target translation. We train the hallucination transformer jointly with the translation transformer using standard backpropagation with cross-entropy losses while being guided by an additional loss that encourages consistency between predictions using either ground-truth or hallucinated visual representations. Extensive experiments on three standard translation datasets with a diverse set of language pairs demonstrate the effectiveness of our approach over both text-only baselines and state-of-the-art methods. Project page: http://www.svcl.ucsd.edu/projects/valhalla.

preprint2020arXiv

Lifelong Object Detection

Recent advances in object detection have benefited significantly from rapid developments in deep neural networks. However, neural networks suffer from the well-known issue of catastrophic forgetting, which makes continual or lifelong learning problematic. In this paper, we leverage the fact that new training classes arrive in a sequential manner and incrementally refine the model so that it additionally detects new object classes in the absence of previous training data. Specifically, we consider the representative object detector, Faster R-CNN, for both accurate and efficient prediction. To prevent abrupt performance degradation due to catastrophic forgetting, we propose to apply knowledge distillation on both the region proposal network and the region classification network, to retain the detection of previously trained classes. A pseudo-positive-aware sampling strategy is also introduced for distillation sample selection. We evaluate the proposed method on PASCAL VOC 2007 and MS COCO benchmarks and show competitive mAP and 6x inference speed improvement, which makes the approach more suitable for real-time applications. Our implementation will be publicly available.

preprint2016arXiv

Clique Topology Reveals Intrinsic Geometric Structure in Neural Correlations: An Overview

This publication serves as an overview of clique topology -- a novel matrix analysis technique used to extract structural features from neural activity data that contains hidden nonlinearities. We highlight work done by Gusti et al. which introduces clique topology and verifies its applicability to neural feature extraction by showing that neural correlations in the rat hippocampus are determined by geometric structure of hippocampal circuits, rather than being a consequence of positional coding.

preprint2016arXiv

Delta Epsilon Alpha Star: A PAC-Admissible Search Algorithm

Delta Epsilon Alpha Star is a minimal coverage, real-time robotic search algorithm that yields a moderately aggressive search path with minimal backtracking. Search performance is bounded by a placing a combinatorial bound, epsilon and delta, on the maximum deviation from the theoretical shortest path and the probability at which further deviations can occur. Additionally, we formally define the notion of PAC-admissibility -- a relaxed admissibility criteria for algorithms, and show that PAC-admissible algorithms are better suited to robotic search situations than epsilon-admissible or strict algorithms.

preprint2016arXiv

Microdatabases for the Industrial Internet

The Industrial Internet market is targeted to grow by trillions of US dollars by the year 2030, driven by adoption, deployment and integration of billions of intelligent devices and their associated data. This digital expansion faces a number of significant challenges, including reliable data management, security and privacy. Realizing the benefits from this evolution is made more difficult because a typical industrial plant includes multiple vendors and legacy technology stacks. Aggregating all the raw data to a single data center before performing analysis increases response times, raising performance concerns in traditional markets and requiring a compromise between data duplication and data access performance. Similar to the way microservices can integrate disparate information technologies without imposing monolithic cross-cutting architecture impacts, we propose microdatabases to manage the data heterogeneity of the Industrial Internet while allowing records to be captured and secured close to the industrial processes, but also be made available near the applications that can benefit from the data. A microdatabase is an abstraction of a data store that standardizes and protects the interactions between distributed data sources, providers and consumers. It integrates an information model with discoverable object types that can be browsed interactively and programmatically, and supports repository instances that evolve with their own lifecycles. The microdatabase abstraction is independent of technology choice and was designed based on solicitation and review of industry stakeholder concerns.

preprint2016arXiv

Syntactically Informed Text Compression with Recurrent Neural Networks

We present a self-contained system for constructing natural language models for use in text compression. Our system improves upon previous neural network based models by utilizing recent advances in syntactic parsing -- Google's SyntaxNet -- to augment character-level recurrent neural networks. RNNs have proven exceptional in modeling sequence data such as text, as their architecture allows for modeling of long-term contextual information.

preprint2016arXiv

Tensor Switching Networks

We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks.

preprint2016arXiv

Unsupervised Learning of Visual Structure using Predictive Generative Networks

The ability to predict future states of the environment is a central pillar of intelligence. At its core, effective prediction requires an internal model of the world and an understanding of the rules by which the world changes. Here, we explore the internal models developed by deep neural networks trained using a loss based on predicting future frames in synthetic video sequences, using a CNN-LSTM-deCNN framework. We first show that this architecture can achieve excellent performance in visual sequence prediction tasks, including state-of-the-art performance in a standard 'bouncing balls' dataset (Sutskever et al., 2009). Using a weighted mean-squared error and adversarial loss (Goodfellow et al., 2014), the same architecture successfully extrapolates out-of-the-plane rotations of computer-generated faces. Furthermore, despite being trained end-to-end to predict only pixel-level information, our Predictive Generative Networks learn a representation of the latent structure of the underlying three-dimensional objects themselves. Importantly, we find that this representation is naturally tolerant to object transformations, and generalizes well to new tasks, such as classification of static images. Similar models trained solely with a reconstruction loss fail to generalize as effectively. We argue that prediction can serve as a powerful unsupervised loss for learning rich internal representations of high-level object features.

preprint2012arXiv

A study of singularities on rational curves via syzygies

Consider a rational projective curve C of degree d over an algebraically closed field k. There are n homogeneous forms g_1,...,g_n of degree d in B=k[x,y] which parameterize C in a birational, base point free, manner. We study the singularities of C by studying a Hilbert-Burch matrix phi for the row vector [g_1,...,g_n]. In the "General Lemma" we use the generalized row ideals of phi to identify the singular points on C, their multiplicities, the number of branches at each singular point, and the multiplicity of each branch. Let p be a singular point on the parameterized planar curve C which corresponds to a generalized zero of phi. In the "Triple Lemma" we give a matrix phi' whose maximal minors parameterize the closure, in projective 2-space, of the blow-up at p of C in a neighborhood of p. We apply the General Lemma to phi' in order to learn about the singularities of C in the first neighborhood of p. If C has even degree d=2c and the multiplicity of C at p is equal to c, then we apply the Triple Lemma again to learn about the singularities of C in the second neighborhood of p. Consider rational plane curves C of even degree d=2c. We classify curves according to the configuration of multiplicity c singularities on or infinitely near C. There are 7 possible configurations of such singularities. We classify the Hilbert-Burch matrix which corresponds to each configuration. The study of multiplicity c singularities on, or infinitely near, a fixed rational plane curve C of degree 2c is equivalent to the study of the scheme of generalized zeros of the fixed balanced Hilbert-Burch matrix phi for a parameterization of C.

preprint2006arXiv

Secant varieties of toric varieties

Let $X_P$ be a smooth projective toric variety of dimension $n$ embedded in $\PP^r$ using all of the lattice points of the polytope $P$. We compute the dimension and degree of the secant variety $\Sec X_P$. We also give explicit formulas in dimensions 2 and 3 and obtain partial results for the projective varieties $X_A$ embedded using a set of lattice points $A \subset P\cap\ZZ^n$ containing the vertices of $P$ and their nearest neighbors.

preprint2005arXiv

A case study in bigraded commutative algebra

We study the commutative algebra of three bihomogeneous polynomials p_0,p_1,p_2 of degree (2,1) in variables x,y;z,w, assuming that they never vanish simultaneously on P^1 x P^1. Unlike the situation for P^2, the Koszul complex of the p_i is never exact. The purpose of this article is to illustrate how bigraded commutative algebra differs from the classical graded case and to indicate some of the theoretical tools needed to understand the free resolution of the ideal generated by p_0,p_1,p_2.

David Cox

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

VALHALLA: Visual Hallucination for Machine Translation

Lifelong Object Detection

Clique Topology Reveals Intrinsic Geometric Structure in Neural Correlations: An Overview

Delta Epsilon Alpha Star: A PAC-Admissible Search Algorithm

Microdatabases for the Industrial Internet

Syntactically Informed Text Compression with Recurrent Neural Networks

Tensor Switching Networks

Unsupervised Learning of Visual Structure using Predictive Generative Networks

A study of singularities on rational curves via syzygies

Secant varieties of toric varieties

A case study in bigraded commutative algebra