Researcher profile

Sachin Goyal

Sachin Goyal contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

Pretraining optimizers are tuned to produce the strongest possible base model, on the assumption that a stronger starting point yields a stronger model after subsequent changes like post-training and quantization. This overlooks the geometry of the base model which controls how much of the base model's capabilities survive subsequent parameter updates. We study three pretraining optimization approaches that bias optimization toward flatter minima: Sharpness-Aware Minimization (SAM), large learning rates, and shortened learning rate annealing periods. Across model sizes ranging from 20M to 150M parameters, we find that these interventions consistently improve downstream performance after post-training on five common datasets with up to 80% less forgetting. These principles hold at scale: a short SAM mid-training phase applied to an existing OLMo-2-1B checkpoint reduces forgetting by 31% after MetaMath post-training and by 40% after 4-bit quantization.

preprint2022arXiv

MET: Masked Encoding for Tabular Data

We consider the task of self-supervised representation learning (SSL) for tabular data: tabular-SSL. Typical contrastive learning based SSL methods require instance-wise data augmentations which are difficult to design for unstructured tabular data. Existing tabular-SSL methods design such augmentations in a relatively ad-hoc fashion and can fail to capture the underlying data manifold. Instead of augmentations based approaches for tabular-SSL, we propose a new reconstruction based method, called Masked Encoding for Tabular Data (MET), that does not require augmentations. MET is based on the popular MAE approach for vision-SSL [He et al., 2021] and uses two key ideas: (i) since each coordinate in a tabular dataset has a distinct meaning, we need to use separate representations for all coordinates, and (ii) using an adversarial reconstruction loss in addition to the standard one. Empirical results on five diverse tabular datasets show that MET achieves a new state of the art (SOTA) on all of these datasets and improves up to 9% over current SOTA methods. We shed more light on the working of MET via experiments on carefully designed simple datasets.

preprint2020arXiv

DROCC: Deep Robust One-Class Classification

Classical approaches for one-class problems such as one-class SVM and isolation forest require careful feature engineering when applied to structured domains like images. State-of-the-art methods aim to leverage deep learning to learn appropriate features via two main approaches. The first approach based on predicting transformations (Golan & El-Yaniv, 2018; Hendrycks et al., 2019a) while successful in some domains, crucially depends on an appropriate domain-specific set of transformations that are hard to obtain in general. The second approach of minimizing a classical one-class loss on the learned final layer representations, e.g., DeepSVDD (Ruff et al., 2018) suffers from the fundamental drawback of representation collapse. In this work, we propose Deep Robust One-Class Classification (DROCC) that is both applicable to most standard domains without requiring any side-information and robust to representation collapse. DROCC is based on the assumption that the points from the class of interest lie on a well-sampled, locally linear low dimensional manifold. Empirical evaluation demonstrates that DROCC is highly effective in two different one-class problem settings and on a range of real-world datasets across different domains: tabular data, images (CIFAR and ImageNet), audio, and time-series, offering up to 20% increase in accuracy over the state-of-the-art in anomaly detection. Code is available at https://github.com/microsoft/EdgeML.

preprint2020arXiv

Flapping, swirling and flipping: Non-linear dynamics of pre-stressed active filaments

Initially straight slender elastic rods with geometrically constrained ends buckle and form stable two-dimensional shapes when compressed by bringing the ends together. It is also known that beyond a critical value of the pre-stress, clamped rods transition to bent, twisted three-dimensional equilibrium shapes. Recently, we showed that pre-stressed planar shapes when immersed in a dissipative fluid and animated by nonconservative follower forces exhibit stable large-amplitude flapping oscillations. Here, we use time-stepper methods to analyze the three-dimensional instabilities and dynamics of pre-stressed planar and non-planar filament configurations when subject to active follower forces and dissipative fluid drag. First, we find that type of boundary constraint determines the nature of the non-linear patterns following instability. When the filament is clamped at one end and pinned at the other with follower forces directed towards the clamped end, we observe only stable planar (flapping) oscillations termed flapping result. When both ends are clamped however, we observe a secondary instability wherein planar oscillations are destabilized by off-planar perturbations and result in fully three-dimensional swirling patterns characterized by two distinct time-scales. The first time scale characterizes continuous and unidirectional swirling rotation around the end-to-end axis. The second time scale captures the rate at which the direction of swirling reverses or flips. The overall time over which the direction of swirling flips is very short compared to the long times over which the filament swirls in the same direction. Computations indicate that the reversal of swirling oscillations resembles relaxation oscillations with each cycle initiated by a sudden jump in torsional deformation and then followed by a period of gradual decrease in net torsion until the next cycle of variations.