Source author record

Jacob Goldberger

Jacob Goldberger appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning Sound eess.AS Artificial Intelligence Information Theory math.IT Neural and Evolutionary Computing

Catalog footprint

What is connected

9works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Long Context Question Answering via Supervised Contrastive Learning

Long-context question answering (QA) tasks require reasoning over a long document or multiple documents. Addressing these tasks often benefits from identifying a set of evidence spans (e.g., sentences), which provide supporting evidence for answering the question. In this work, we propose a novel method for equipping long-context QA models with an additional sequence-level objective for better identification of the supporting evidence. We achieve this via an additional contrastive supervision signal in finetuning, where the model is encouraged to explicitly discriminate supporting evidence sentences from negative ones by maximizing question-evidence similarity. The proposed additional loss exhibits consistent improvements on three different strong long-context transformer models, across two challenging question answering benchmarks -- HotpotQA and QAsper.

preprint2022arXiv

Proposition-Level Clustering for Multi-Document Summarization

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition. Particularly, clusters were leveraged to indicate information saliency as well as to avoid redundancy. Such prior methods focused on clustering sentences, even though closely related sentences usually contain also non-aligned parts. In this work, we revisit the clustering approach, grouping together sub-sentential propositions, aiming at more precise information alignment. Specifically, our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster via text fusion. Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets, both in automatic ROUGE scores and human preference.

preprint2021arXiv

perm2vec: Graph Permutation Selection for Decoding of Error Correction Codes using Self-Attention

Error correction codes are an integral part of communication applications, boosting the reliability of transmission. The optimal decoding of transmitted codewords is the maximum likelihood rule, which is NP-hard due to the curse of dimensionality. For practical realizations, sub-optimal decoding algorithms are employed; yet limited theoretical insights prevent one from exploiting the full potential of these algorithms. One such insight is the choice of permutation in permutation decoding. We present a data-driven framework for permutation selection, combining domain knowledge with machine learning concepts such as node embedding and self-attention. Significant and consistent improvements in the bit error rate are introduced for all simulated codes, over the baseline decoders. To the best of the authors' knowledge, this work is the first to leverage the benefits of the neural Transformer networks in physical layer communication systems.

preprint2021arXiv

Speech enhancement with mixture-of-deep-experts with clean clustering pre-training

In this study we present a mixture of deep experts (MoDE) neural-network architecture for single microphone speech enhancement. Our architecture comprises a set of deep neural networks (DNNs), each of which is an 'expert' in a different speech spectral pattern such as phoneme. A gating DNN is responsible for the latent variables which are the weights assigned to each expert's output given a speech segment. The experts estimate a mask from the noisy input and the final mask is then obtained as a weighted average of the experts' estimates, with the weights determined by the gating DNN. A soft spectral attenuation, based on the estimated mask, is then applied to enhance the noisy speech signal. As a byproduct, we gain reduction at the complexity in test time. We show that the experts specialization allows better robustness to unfamiliar noise types.

preprint2020arXiv

FCN Approach for Dynamically Locating Multiple Speakers

In this paper, we present a deep neural network-based online multi-speaker localisation algorithm. Following the W-disjoint orthogonality principle in the spectral domain, each time-frequency (TF) bin is dominated by a single speaker, and hence by a single direction of arrival (DOA). A fully convolutional network is trained with instantaneous spatial features to estimate the DOA for each TF bin. The high resolution classification enables the network to accurately and simultaneously localize and track multiple speakers, both static and dynamic. Elaborated experimental study using both simulated and real-life recordings in static and dynamic scenarios, confirms that the proposed algorithm outperforms both classic and recent deep-learning-based algorithms.

preprint2016arXiv

A Semisupervised Approach for Language Identification based on Ladder Networks

In this study we address the problem of training a neuralnetwork for language identification using both labeled and unlabeled speech samples in the form of i-vectors. We propose a neural network architecture that can also handle out-of-set languages. We utilize a modified version of the recently proposed Ladder Network semisupervised training procedure that optimizes the reconstruction costs of a stack of denoising autoencoders. We show that this approach can be successfully applied to the case where the training dataset is composed of both labeled and unlabeled acoustic data. The results show enhanced language identification on the NIST 2015 language identification dataset.

preprint2016arXiv

Domain Adaptation For Formant Estimation Using Deep Learning

In this paper we present a domain adaptation technique for formant estimation using a deep network. We first train a deep learning network on a small read speech dataset. We then freeze the parameters of the trained network and use several different datasets to train an adaptation layer that makes the obtained network universal in the sense that it works well for a variety of speakers and speech domains with very different characteristics. We evaluated our adapted network on three datasets, each of which has different speaker characteristics and speech styles. The performance of our method compares favorably with alternative methods for formant estimation.

preprint2016arXiv

PMI Matrix Approximations with Applications to Neural Language Modeling

The negative sampling (NEG) objective function, used in word2vec, is a simplification of the Noise Contrastive Estimation (NCE) method. NEG was found to be highly effective in learning continuous word representations. However, unlike NCE, it was considered inapplicable for the purpose of learning the parameters of a language model. In this study, we refute this assertion by providing a principled derivation for NEG-based language modeling, founded on a novel analysis of a low-dimensional approximation of the matrix of pointwise mutual information between the contexts and the predicted words. The obtained language modeling is closely related to NCE language models but is based on a simplified objective function. We thus provide a unified formulation for two main language processing tasks, namely word embedding and language modeling, based on the NEG objective function. Experimental results on two popular language modeling benchmarks show comparable perplexity results, with a small advantage to NEG over NCE.

preprint2015arXiv

A Hybrid Approach for Speech Enhancement Using MoG Model and Neural Network Phoneme Classifier

In this paper we present a single-microphone speech enhancement algorithm. A hybrid approach is proposed merging the generative mixture of Gaussians (MoG) model and the discriminative neural network (NN). The proposed algorithm is executed in two phases, the training phase, which does not recur, and the test phase. First, the noise-free speech power spectral density (PSD) is modeled as a MoG, representing the phoneme based diversity in the speech signal. An NN is then trained with phoneme labeled database for phoneme classification with mel-frequency cepstral coefficients (MFCC) as the input features. Given the phoneme classification results, a speech presence probability (SPP) is obtained using both the generative and discriminative models. Soft spectral subtraction is then executed while simultaneously, the noise estimation is updated. The discriminative NN maintain the continuity of the speech and the generative phoneme-based MoG preserves the speech spectral structure. Extensive experimental study using real speech and noise signals is provided. We also compare the proposed algorithm with alternative speech enhancement algorithms. We show that we obtain a significant improvement over previous methods in terms of both speech quality measures and speech recognition results.

Jacob Goldberger

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Long Context Question Answering via Supervised Contrastive Learning

Proposition-Level Clustering for Multi-Document Summarization

perm2vec: Graph Permutation Selection for Decoding of Error Correction Codes using Self-Attention

Speech enhancement with mixture-of-deep-experts with clean clustering pre-training

FCN Approach for Dynamically Locating Multiple Speakers

A Semisupervised Approach for Language Identification based on Ladder Networks

Domain Adaptation For Formant Estimation Using Deep Learning

PMI Matrix Approximations with Applications to Neural Language Modeling

A Hybrid Approach for Speech Enhancement Using MoG Model and Neural Network Phoneme Classifier