Source author record

Qingqing Huang

Qingqing Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation and Language eess.AS Sound Artificial Intelligence Systems and Control

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

MAQA: A Multimodal QA Benchmark for Negation

Multimodal learning can benefit from the representation power of pretrained Large Language Models (LLMs). However, state-of-the-art transformer based LLMs often ignore negations in natural language and there is no existing benchmark to quantitatively evaluate whether multimodal transformers inherit this weakness. In this study, we present a new multimodal question answering (QA) benchmark adapted from labeled music videos in AudioSet (Gemmeke et al., 2017) with the goal of systematically evaluating if multimodal transformers can perform complex reasoning to recognize new concepts as negation of previously learned concepts. We show that with standard fine-tuning approach multimodal transformers are still incapable of correctly interpreting negation irrespective of model size. However, our experiments demonstrate that augmenting the original training task distributions with negated QA examples allow the model to reliably reason with negation. To do this, we describe a novel data generation procedure that prompts the 540B-parameter PaLM model to automatically generate negated QA examples as compositions of easily accessible video tags. The generated examples contain more natural linguistic patterns and the gains compared to template-based task augmentation approach are significant.

preprint2022arXiv

MuLan: A Joint Embedding of Music Audio and Natural Language

Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: a first attempt at a new generation of acoustic models that link music audio directly to unconstrained natural language music descriptions. MuLan takes the form of a two-tower, joint audio-text embedding model trained using 44 million music recordings (370K hours) and weakly-associated, free-form text annotations. Through its compatibility with a wide range of music genres and text styles (including conventional music tags), the resulting audio-text representation subsumes existing ontologies while graduating to true zero-shot functionalities. We demonstrate the versatility of the MuLan embeddings with a range of experiments including transfer learning, zero-shot music tagging, language understanding in the music domain, and cross-modal retrieval applications.

preprint2022arXiv

Text-Driven Separation of Arbitrary Sounds

We propose a method of separating a desired sound source from a single-channel mixture, based on either a textual description or a short audio sample of the target source. This is achieved by combining two distinct models. The first model, SoundWords, is trained to jointly embed both an audio clip and its textual description to the same embedding in a shared representation. The second model, SoundFilter, takes a mixed source audio clip as an input and separates it based on a conditioning vector from the shared text-audio representation defined by SoundWords, making the model agnostic to the conditioning modality. Evaluating on multiple datasets, we show that our approach can achieve an SI-SDR of 9.1 dB for mixtures of two arbitrary sounds when conditioned on text and 10.1 dB when conditioned on audio. We also show that SoundWords is effective at learning co-embeddings and that our multi-modal training approach improves the performance of SoundFilter.

preprint2020arXiv

Superbloom: Bloom filter meets Transformer

We extend the idea of word pieces in natural language models to machine learning tasks on opaque ids. This is achieved by applying hash functions to map each id to multiple hash tokens in a much smaller space, similarly to a Bloom filter. We show that by applying a multi-layer Transformer to these Bloom filter digests, we are able to obtain models with high accuracy. They outperform models of a similar size without hashing and, to a large degree, models of a much larger size trained using sampled softmax with the same computational budget. Our key observation is that it is important to use a multi-layer Transformer for Bloom filter digests to remove ambiguity in the hashed input. We believe this provides an alternative method to solving problems with large vocabulary size.

preprint2015arXiv

Learning Mixtures of Gaussians in High Dimensions

Efficiently learning mixture of Gaussians is a fundamental problem in statistics and learning theory. Given samples coming from a random one out of k Gaussian distributions in Rn, the learning problem asks to estimate the means and the covariance matrices of these Gaussians. This learning problem arises in many areas ranging from the natural sciences to the social sciences, and has also found many machine learning applications. Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case. In this work, we show that provided we are in high enough dimensions, the class of Gaussian mixtures is learnable in its most general form under a smoothed analysis framework, where the parameters are randomly perturbed from an adversarial starting point. In particular, given samples from a mixture of Gaussians with randomly perturbed parameters, when n > Ω(k^2), we give an algorithm that learns the parameters with polynomial running time and using polynomial number of samples. The central algorithmic ideas consist of new ways to decompose the moment tensor of the Gaussian mixture by exploiting its structural properties. The symmetries of this tensor are derived from the combinatorial structure of higher order moments of Gaussian distributions (sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop new tools for bounding smallest singular values of structured random matrices, which could be useful in other smoothed analysis settings.

preprint2015arXiv

Minimal Realization Problems for Hidden Markov Models

Consider a stationary discrete random process with alphabet size d, which is assumed to be the output process of an unknown stationary Hidden Markov Model (HMM). Given the joint probabilities of finite length strings of the process, we are interested in finding a finite state generative model to describe the entire process. In particular, we focus on two classes of models: HMMs and quasi-HMMs, which is a strictly larger class of models containing HMMs. In the main theorem, we show that if the random process is generated by an HMM of order less or equal than k, and whose transition and observation probability matrix are in general position, namely almost everywhere on the parameter space, both the minimal quasi-HMM realization and the minimal HMM realization can be efficiently computed based on the joint probabilities of all the length N strings, for N > 4 lceil log_d(k) rceil +1. In this paper, we also aim to compare and connect the two lines of literature: realization theory of HMMs, and the recent development in learning latent variable models with tensor decomposition techniques.

preprint2015arXiv

Super-Resolution Off the Grid

Super-resolution is the problem of recovering a superposition of point sources using bandlimited measurements, which may be corrupted with noise. This signal processing problem arises in numerous imaging problems, ranging from astronomy to biology to spectroscopy, where it is common to take (coarse) Fourier measurements of an object. Of particular interest is in obtaining estimation procedures which are robust to noise, with the following desirable statistical and computational properties: we seek to use coarse Fourier measurements (bounded by some cutoff frequency); we hope to take a (quantifiably) small number of measurements; we desire our algorithm to run quickly. Suppose we have k point sources in d dimensions, where the points are separated by at least Δfrom each other (in Euclidean distance). This work provides an algorithm with the following favorable guarantees: - The algorithm uses Fourier measurements, whose frequencies are bounded by O(1/Δ) (up to log factors). Previous algorithms require a cutoff frequency which may be as large as Ω( d/Δ). - The number of measurements taken by and the computational complexity of our algorithm are bounded by a polynomial in both the number of points k and the dimension d, with no dependence on the separation Δ. In contrast, previous algorithms depended inverse polynomially on the minimal separation and exponentially on the dimension for both of these quantities. Our estimation procedure itself is simple: we take random bandlimited measurements (as opposed to taking an exponential number of measurements on the hyper-grid). Furthermore, our analysis and algorithm are elementary (based on concentration bounds for sampling and the singular value decomposition).

preprint2013arXiv

Efficiency-Risk Tradeoffs in Dynamic Oligopoly Markets - with application to electricity markets

In this paper, we examine in an abstract framework, how a tradeoff between efficiency and robustness arises in different dynamic oligopolistic market architectures. We consider a market in which there is a monopolistic resource provider and agents that enter and exit the market following a random process. Self-interested and fully rational agents dynamically update their resource consumption decisions over a finite time horizon, under the constraint that the total resource consumption requirements are met before each individual's deadline. We then compare the statistics of the stationary aggregate demand processes induced by the non-cooperative and cooperative load scheduling schemes. We show that although the non-cooperative load scheduling scheme leads to an efficiency loss - widely known as the "price of anarchy" - the stationary distribution of the corresponding aggregate demand process has a smaller tail. This tail, which corresponds to rare and undesirable demand spikes, is important in many applications of interest. On the other hand, when the agents can cooperate with each other in optimizing their total cost, a higher market efficiency is achieved at the cost of a higher probability of demand spikes. We thus posit that the origins of endogenous risk in such systems may lie in the market architecture, which is an inherent characteristic of the system.

Qingqing Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

MAQA: A Multimodal QA Benchmark for Negation

MuLan: A Joint Embedding of Music Audio and Natural Language

Text-Driven Separation of Arbitrary Sounds

Superbloom: Bloom filter meets Transformer

Learning Mixtures of Gaussians in High Dimensions

Minimal Realization Problems for Hidden Markov Models

Super-Resolution Off the Grid

Efficiency-Risk Tradeoffs in Dynamic Oligopoly Markets - with application to electricity markets