Source author record

Nitish Shirish Keskar

Nitish Shirish Keskar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning math.OC Artificial Intelligence Biomolecules cs.CY Information Retrieval Information Theory math.IT

Catalog footprint

What is connected

8works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Generating Negative Samples for Sequential Recommendation

To make Sequential Recommendation (SR) successful, recent works focus on designing effective sequential encoders, fusing side information, and mining extra positive self-supervision signals. The strategy of sampling negative items at each time step is less explored. Due to the dynamics of users' interests and model updates during training, considering randomly sampled items from a user's non-interacted item set as negatives can be uninformative. As a result, the model will inaccurately learn user preferences toward items. Identifying informative negatives is challenging because informative negative items are tied with both dynamically changed interests and model parameters (and sampling process should also be efficient). To this end, we propose to Generate Negative Samples (items) for SR (GenNi). A negative item is sampled at each time step based on the current SR model's learned user preferences toward items. An efficient implementation is proposed to further accelerate the generation process, making it scalable to large-scale recommendation tasks. Extensive experiments on four public datasets verify the importance of providing high-quality negative samples for SR and demonstrate the effectiveness and efficiency of GenNi.

preprint2022arXiv

Modeling Multi-hop Question Answering as Single Sequence Prediction

Fusion-in-decoder (Fid) (Izacard and Grave, 2020) is a generative question answering (QA) model that leverages passage retrieval with a pre-trained transformer and pushed the state of the art on single-hop QA. However, the complexity of multi-hop QA hinders the effectiveness of the generative QA approach. In this work, we propose a simple generative approach (PathFid) that extends the task beyond just answer generation by explicitly modeling the reasoning process to resolve the answer for multi-hop questions. By linearizing the hierarchical reasoning path of supporting passages, their key sentences, and finally the factoid answer, we cast the problem as a single sequence prediction task. To facilitate complex reasoning with multiple clues, we further extend the unified flat representation of multiple input documents by encoding cross-passage interactions. Our extensive experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets: HotpotQA and IIRC. Besides the performance gains, PathFid is more interpretable, which in turn yields answers that are more faithfully grounded to the supporting passages and facts compared to the baseline Fid model.

preprint2021arXiv

Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity

Neural text decoding is important for generating high-quality texts using language models. To generate high-quality text, popular decoding algorithms like top-k, top-p (nucleus), and temperature-based sampling truncate or distort the unreliable low probability tail of the language model. Though these methods generate high-quality text after parameter tuning, they are ad hoc. Not much is known about the control they provide over the statistics of the output, which is important since recent reports show text quality is highest for a specific range of likelihoods. Here, first we provide a theoretical analysis of perplexity in top-k, top-p, and temperature sampling, finding that cross-entropy behaves approximately linearly as a function of p in top-p sampling whereas it is a nonlinear function of k in top-k sampling, under Zipfian statistics. We use this analysis to design a feedback-based adaptive top-k text decoding algorithm called mirostat that generates text (of any length) with a predetermined value of perplexity, and thereby high-quality text without any tuning. Experiments show that for low values of k and p in top-k and top-p sampling, perplexity drops significantly with generated text length, which is also correlated with excessive repetitions in the text (the boredom trap). On the other hand, for large values of k and p, we find that perplexity increases with generated text length, which is correlated with incoherence in the text (confusion trap). Mirostat avoids both traps: experiments show that cross-entropy has a near-linear relation with repetition in generated text. This relation is almost independent of the sampling method but slightly dependent on the model used. Hence, for a given language model, control over perplexity also gives control over repetitions. Experiments with human raters for fluency, coherence, and quality further verify our findings.

preprint2020arXiv

Limits of Detecting Text Generated by Large-Scale Language Models

Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is extended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.

preprint2020arXiv

ProGen: Language Modeling for Protein Generation

Generative modeling for protein engineering is key to solving fundamental problems in synthetic biology, medicine, and material science. We pose protein engineering as an unsupervised sequence generation problem in order to leverage the exponentially growing set of proteins that lack costly, structural annotations. We train a 1.2B-parameter language model, ProGen, on ~280M protein sequences conditioned on taxonomic and keyword tags such as molecular function and cellular component. This provides ProGen with an unprecedented range of evolutionary sequence diversity and allows it to generate with fine-grained control as demonstrated by metrics based on primary sequence similarity, secondary structure accuracy, and conformational energy.

preprint2016arXiv

A Limited-Memory Quasi-Newton Algorithm for Bound-Constrained Nonsmooth Optimization

We consider the problem of minimizing a continuous function that may be nonsmooth and nonconvex, subject to bound constraints. We propose an algorithm that uses the L-BFGS quasi-Newton approximation of the problem's curvature together with a variant of the weak Wolfe line search. The key ingredient of the method is an active-set selection strategy that defines the subspace in which search directions are computed. To overcome the inherent shortsightedness of the gradient for a nonsmooth function, we propose two strategies. The first relies on an approximation of the $ε$-minimum norm subgradient, and the second uses an iterative corrective loop that augments the active set based on the resulting search directions. We describe a Python implementation of the proposed algorithm and present numerical results on a set of standard test problems to illustrate the efficacy of our approach.

preprint2016arXiv

adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

Recurrent Neural Networks (RNNs) are powerful models that achieve exceptional performance on several pattern recognition problems. However, the training of RNNs is a computationally difficult task owing to the well-known "vanishing/exploding" gradient problem. Algorithms proposed for training RNNs either exploit no (or limited) curvature information and have cheap per-iteration complexity, or attempt to gain significant curvature information at the cost of increased per-iteration cost. The former set includes diagonally-scaled first-order methods such as ADAGRAD and ADAM, while the latter consists of second-order algorithms like Hessian-Free Newton and K-FAC. In this paper, we present adaQN, a stochastic quasi-Newton algorithm for training RNNs. Our approach retains a low per-iteration cost while allowing for non-diagonal scaling through a stochastic L-BFGS updating scheme. The method uses a novel L-BFGS scaling initialization scheme and is judicious in storing and retaining L-BFGS curvature pairs. We present numerical experiments on two language modeling tasks and show that adaQN is competitive with popular RNN training algorithms.

preprint2015arXiv

A Second-Order Method for Convex $\ell_1$-Regularized Optimization with Active Set Prediction

We describe an active-set method for the minimization of an objective function $ϕ$ that is the sum of a smooth convex function and an $\ell_1$-regularization term. A distinctive feature of the method is the way in which active-set identification and {second-order} subspace minimization steps are integrated to combine the predictive power of the two approaches. At every iteration, the algorithm selects a candidate set of free and fixed variables, performs an (inexact) subspace phase, and then assesses the quality of the new active set. If it is not judged to be acceptable, then the set of free variables is restricted and a new active-set prediction is made. We establish global convergence for our approach, and compare the new method against the state-of-the-art code LIBLINEAR.

Nitish Shirish Keskar

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Generating Negative Samples for Sequential Recommendation

Modeling Multi-hop Question Answering as Single Sequence Prediction

Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity

Limits of Detecting Text Generated by Large-Scale Language Models

ProGen: Language Modeling for Protein Generation

A Limited-Memory Quasi-Newton Algorithm for Bound-Constrained Nonsmooth Optimization

adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

A Second-Order Method for Convex $\ell_1$-Regularized Optimization with Active Set Prediction