Source author record

Nikhil Mehta

Nikhil Mehta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning astro-ph.IM Artificial Intelligence astro-ph.GA astro-ph.HE Computation and Language Computer Vision Information Retrieval

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates and addresses this challenge in the context of the Generative Retrieval (GenRetrieval) task. During GenRetrieval fine-tuning, we find this forgetting occurs rapidly and correlates with the distance between the fine-tuned and original model parameters. Given these observations, we propose ORBIT, a novel approach that actively tracks the distance between fine-tuned and initial model weights, and uses a weight averaging strategy to constrain model drift during GenRetrieval fine-tuning when this inter-model distance exceeds a maximum threshold. Our results show that ORBIT retains substantial text and retrieval performance by outperforming both common continual learning baselines and related regularization methods that also employ weight averaging.

preprint2021arXiv

Counterfactual Representation Learning with Balancing Weights

A key to causal inference with observational data is achieving balance in predictive features associated with each treatment type. Recent literature has explored representation learning to achieve this goal. In this work, we discuss the pitfalls of these strategies - such as a steep trade-off between achieving balance and predictive power - and present a remedy via the integration of balancing weights in causal learning. Specifically, we theoretically link balance to the quality of propensity estimation, emphasize the importance of identifying a proper target population, and elaborate on the complementary roles of feature balancing and weight adjustments. Using these concepts, we then develop an algorithm for flexible, scalable and accurate estimation of causal effects. Finally, we show how the learned weighted representations may serve to facilitate alternative causal learning procedures with appealing statistical features. We conduct an extensive set of experiments on both synthetic examples and standard benchmarks, and report encouraging results relative to state-of-the-art baselines.

preprint2021arXiv

Meta-Learned Attribute Self-Gating for Continual Generalized Zero-Shot Learning

Zero-shot learning (ZSL) has been shown to be a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges still remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed the state of the art of ZSL, but these generative models can be slow or computationally expensive to train. Additionally, while many previous ZSL methods assume a one-time adaptation to unseen classes, in reality, the world is always changing, necessitating a constant adjustment for deployed models. Models unprepared to handle a sequential stream of data are likely to experience catastrophic forgetting. We propose a meta-continual zero-shot learning (MCZSL) approach to address both these issues. In particular, by pairing self-gating of attributes and scaled class normalization with meta-learning based training, we are able to outperform state-of-the-art results while being able to train our models substantially faster ($>100\times$) than expensive generative-based approaches. We demonstrate this by performing experiments on five standard ZSL datasets (CUB, aPY, AWA1, AWA2 and SUN) in both generalized zero-shot learning and generalized continual zero-shot learning settings.

preprint2020arXiv

Survival Cluster Analysis

Conventional survival analysis approaches estimate risk scores or individualized time-to-event distributions conditioned on covariates. In practice, there is often great population-level phenotypic heterogeneity, resulting from (unknown) subpopulations with diverse risk profiles or survival distributions. As a result, there is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles, while jointly accounting for accurate individualized time-to-event predictions. An approach that addresses this need is likely to improve characterization of individual outcomes by leveraging regularities in subpopulations, thus accounting for population-level heterogeneity. In this paper, we propose a Bayesian nonparametrics approach that represents observations (subjects) in a clustered latent space, and encourages accurate time-to-event predictions and clusters (subpopulations) with distinct risk profiles. Experiments on real-world datasets show consistent improvements in predictive performance and interpretability relative to existing state-of-the-art survival analysis models.

preprint2020arXiv

WAFFLe: Weight Anonymized Factorization for Federated Learning

In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. In light of this need, federated learning has emerged as a popular training paradigm. However, many federated learning approaches trade transmitting data for communicating updated weight parameters for each local device. Therefore, a successful breach that would have otherwise directly compromised the data instead grants whitebox access to the local model, which opens the door to a number of attacks, including exposing the very data federated learning seeks to protect. Additionally, in distributed scenarios, individual client devices commonly exhibit high statistical heterogeneity. Many common federated approaches learn a single global model; while this may do well on average, performance degrades when the i.i.d. assumption is violated, underfitting individuals further from the mean, and raising questions of fairness. To address these issues, we propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks. Experiments on MNIST, FashionMNIST, and CIFAR-10 demonstrate WAFFLe's significant improvement to local test performance and fairness while simultaneously providing an extra layer of security.

preprint2012arXiv

Multi-Band Feeds: A Design Study

Broadband antenna feeds are of particular interest to existing and future radio telescopes for multi-frequency studies of astronomical sources. Although a 1:15 range in frequency is difficult to achieve, the well-known Eleven feed design offers a relatively uniform response over such a range, and reasonably well-matched responses in E & H planes. However, given the severe Radio Frequency Interference in several bands over such wide spectral range, one desires to selectively reject the corresponding bands. With this view, we have explored the possibilities of having a multi-band feed antenna spanning a wide frequency range, but which would have good response only in a number of pre-selected (relatively) RFI-free windows (for a particular telescope-site). The designs we have investigated use the basic configuration of pairs of dipoles as in the Eleven feed, but use simple wire dipoles instead of folded dipoles used in the latter. From our study of the two designs we have investigated, we find that the design with feed-lines constructed using co-axial lines shows good rejection in the unwanted parts of the spectrum and control over the locations of resonant bands.

preprint2012arXiv

RRI-GBT Multi-Band Receiver: Motivation, Design & Development

We report the design and development of a self-contained multi-band receiver (MBR) system, intended for use with a single large aperture to facilitate sensitive & high time-resolution observations simultaneously in 10 discrete frequency bands sampling a wide spectral span (100-1500 MHz) in a nearly log-periodic fashion. The development of this system was primarily motivated by need for tomographic studies of pulsar polar emission regions. Although the system design is optimized for the primary goal, it is also suited for several other interesting astronomical investigations. The system consists of a dual-polarization multi-band feed (with discrete responses corresponding to the 10 bands pre-selected as relatively RFI-free), a common wide-band RF front-end, and independent back-end receiver chains for the 10 individual sub-bands. The raw voltage time-sequences corresponding to 16 MHz bandwidth each for the two linear polarization channels and the 10 bands, are recorded at the Nyquist rate simultaneously. We present the preliminary results from the tests and pulsar observations carried out with the Green Bank Telescope using this receiver. The system performance implied by these results, and possible improvements are also briefly discussed.

Nikhil Mehta

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Counterfactual Representation Learning with Balancing Weights

Meta-Learned Attribute Self-Gating for Continual Generalized Zero-Shot Learning

Survival Cluster Analysis

WAFFLe: Weight Anonymized Factorization for Federated Learning

Multi-Band Feeds: A Design Study

RRI-GBT Multi-Band Receiver: Motivation, Design & Development