Source author record

Janardhan Kulkarni

Janardhan Kulkarni appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Cryptography and Security Computation and Language Distributed, Parallel, and Cluster Computing Artificial Intelligence Computer Science and Game Theory Databases Information Theory math.IT Operating Systems Performance Systems and Control

Catalog footprint

What is connected

20works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

OptiMind: Teaching LLMs to Think Like Optimization Experts

Mathematical programming -- the task of expressing operations and decision-making problems in precise mathematical language -- is fundamental across domains, yet remains a skill-intensive process requiring operations research expertise. Recent advances in large language models for complex reasoning have spurred interest in automating this task, translating natural language into executable optimization models. Current approaches, however, achieve limited accuracy, hindered by scarce and noisy training data without leveraging domain knowledge. In this work, we systematically integrate optimization expertise to improve formulation accuracy for mixed-integer linear programming, a key family of mathematical programs. Our OptiMind framework leverages semi-automated, class-based error analysis to guide both training and inference, explicitly preventing common mistakes within each optimization class. Our resulting fine-tuned LLM significantly improves formulation accuracy by 20.7% across multiple optimization benchmarks, with consistent gains under test-time scaling methods such as self-consistency and multi-turn feedback, enabling further progress toward robust LLM-assisted optimization formulation.

preprint2025arXiv

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

As the era of autonomous agents making decisions on behalf of users unfolds, ensuring contextual integrity (CI) -- what is the appropriate information to share while carrying out a certain task -- becomes a central question to the field. We posit that CI demands a form of reasoning where the agent needs to reason about the context in which it is operating. To test this, we first prompt LLMs to reason explicitly about CI when deciding what information to disclose. We then extend this approach by developing a reinforcement learning (RL) framework that further instills in models the reasoning necessary to achieve CI. Using a synthetic, automatically created, dataset of only $\sim700$ examples but with diverse contexts and information disclosure norms, we show that our method substantially reduces inappropriate information disclosure while maintaining task performance across multiple model sizes and families. Importantly, improvements transfer from this synthetic dataset to established CI benchmarks such as PrivacyLens that has human annotations and evaluates privacy leakage of AI assistants in actions and tool calls. Our code is available at: https://github.com/EricGLan/CI-RL

preprint2022arXiv

Differentially Private Fine-tuning of Language Models

We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $ε= 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $ε= 6.8,δ=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.

preprint2022arXiv

Differentially Private Model Compression

Recent papers have shown that large pre-trained language models (LLMs) such as BERT, GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private models for many downstream Natural Language Processing (NLP) tasks while simultaneously guaranteeing differential privacy. The inference cost of these models -- which consist of hundreds of millions of parameters -- however, can be prohibitively large. Hence, often in practice, LLMs are compressed before they are deployed in specific applications. In this paper, we initiate the study of differentially private model compression and propose frameworks for achieving 50% sparsity levels while maintaining nearly full performance. We demonstrate these ideas on standard GLUE benchmarks using BERT models, setting benchmarks for future research on this topic.

preprint2022arXiv

Differentially Private Set Union

We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($ε$,$δ$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.

preprint2022arXiv

Synergy: Resource Sensitive DNN Scheduling in Multi-Tenant Clusters

Training Deep Neural Networks (DNNs) is a widely popular workload in both enterprises and cloud data centers. Existing schedulers for DNN training consider GPU as the dominant resource, and allocate other resources such as CPU and memory proportional to the number of GPUs requested by the job. Unfortunately, these schedulers do not consider the impact of a job's sensitivity to allocation of CPU, memory, and storage resources. In this work, we propose Synergy, a resource-sensitive scheduler for shared GPU clusters. Synergy infers the sensitivity of DNNs to different resources using optimistic profiling; some jobs might benefit from more than the GPU-proportional allocation and some jobs might not be affected by less than GPU-proportional allocation. Synergy performs such multi-resource workload-aware assignments across a set of jobs scheduled on shared multi-tenant clusters using a new near-optimal online algorithm. Our experiments show that workload-aware CPU and memory allocations can improve average JCT up to 3.4x when compared to traditional GPU-proportional scheduling.

preprint2021arXiv

Differentially Private Correlation Clustering

Correlation clustering is a widely used technique in unsupervised machine learning. Motivated by applications where individual privacy is a concern, we initiate the study of differentially private correlation clustering. We propose an algorithm that achieves subquadratic additive error compared to the optimal cost. In contrast, straightforward adaptations of existing non-private algorithms all lead to a trivial quadratic error. Finally, we give a lower bound showing that any pure differentially private algorithm for correlation clustering requires additive error of $Ω(n)$.

preprint2021arXiv

Fast and Memory Efficient Differentially Private-SGD via JL Projections

Differentially Private-SGD (DP-SGD) of Abadi et al. (2016) and its variations are the only known algorithms for private training of large scale neural networks. This algorithm requires computation of per-sample gradients norms which is extremely slow and memory intensive in practice. In this paper, we present a new framework to design differentially private optimizers called DP-SGD-JL and DP-Adam-JL. Our approach uses Johnson-Lindenstrauss (JL) projections to quickly approximate the per-sample gradient norms without exactly computing them, thus making the training time and memory requirements of our optimizers closer to that of their non-DP versions. Unlike previous attempts to make DP-SGD faster which work only on a subset of network architectures or use compiler techniques, we propose an algorithmic solution which works for any network in a black-box manner which is the main contribution of this paper. To illustrate this, on IMDb dataset, we train a Recurrent Neural Network (RNN) to achieve good privacy-vs-accuracy tradeoff, while being significantly faster than DP-SGD and with a similar memory footprint as non-private SGD. The privacy analysis of our algorithms is more involved than DP-SGD, we use the recently proposed f-DP framework of Dong et al. (2019) to prove privacy.

preprint2020arXiv

Consistent $k$-Median: Simpler, Better and Robust

In this paper we introduce and study the online consistent $k$-clustering with outliers problem, generalizing the non-outlier version of the problem studied in [Lattanzi-Vassilvitskii, ICML17]. We show that a simple local-search based online algorithm can give a bicriteria constant approximation for the problem with $O(k^2 \log^2 (nD))$ swaps of medians (recourse) in total, where $D$ is the diameter of the metric. When restricted to the problem without outliers, our algorithm is simpler, deterministic and gives better approximation ratio and recourse, compared to that of [Lattanzi-Vassilvitskii, ICML17].

preprint2020arXiv

Hierarchy-Based Algorithms for Minimizing Makespan under Precedence and Communication Constraints

We consider the classic problem of scheduling jobs with precedence constraints on a set of identical machines to minimize the makespan objective function. Understanding the exact approximability of the problem when the number of machines is a constant is a well-known question in scheduling theory. Indeed, an outstanding open problem from the classic book of Garey and Johnson asks whether this problem is NP-hard even in the case of 3 machines and unit-length jobs. In a recent breakthrough, Levey and Rothvoss gave a $(1+ε)$-approximation algorithm, which runs in nearly quasi-polynomial time, for the case when job have unit lengths. However, a substantially more difficult case where jobs have arbitrary processing lengths has remained open. We make progress on this more general problem. We show that there exists a $(1+ε)$-approximation algorithm (with similar running time as that of Levey and Rothvoss) for the non-migratory setting: when every job has to be scheduled entirely on a single machine, but within a machine the job need not be scheduled during consecutive time steps. Further, we also show that our algorithmic framework generalizes to another classic scenario where, along with the precedence constraints, the jobs also have communication delay constraints. Both of these fundamental problems are highly relevant to the practice of datacenter scheduling.

preprint2020arXiv

Locally Private Hypothesis Selection

We initiate the study of hypothesis selection under local differential privacy. Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-local differential privacy, a distribution from $\mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution. This is a generalization of the classic problem of $k$-wise simple hypothesis testing, which corresponds to when $p \in \mathcal{Q}$, and we wish to identify $p$. Absent privacy constraints, this problem requires $O(\log k)$ samples from $p$, and it was recently shown that the same complexity is achievable under (central) differential privacy. However, the naive approach to this problem under local differential privacy would require $\tilde O(k^2)$ samples. We first show that the constraint of local differential privacy incurs an exponential increase in cost: any algorithm for this problem requires at least $Ω(k)$ samples. Second, for the special case of $k$-wise simple hypothesis testing, we provide a non-interactive algorithm which nearly matches this bound, requiring $\tilde O(k)$ samples. Finally, we provide sequentially interactive algorithms for the general case, requiring $\tilde O(k)$ samples and only $O(\log \log k)$ rounds of interactivity. Our algorithms are achieved through a reduction to maximum selection with adversarial comparators, a problem of independent interest for which we initiate study in the parallel setting. For this problem, we provide a family of algorithms for each number of allowed rounds of interaction $t$, as well as lower bounds showing that they are near-optimal for every $t$. Notably, our algorithms result in exponential improvements on the round complexity of previous methods.

preprint2020arXiv

Privately Learning Markov Random Fields

We consider the problem of learning Markov Random Fields (including the prototypical example, the Ising model) under the constraint of differential privacy. Our learning goals include both structure learning, where we try to estimate the underlying graph structure of the model, as well as the harder goal of parameter learning, in which we additionally estimate the parameter on each edge. We provide algorithms and lower bounds for both problems under a variety of privacy constraints -- namely pure, concentrated, and approximate differential privacy. While non-privately, both learning goals enjoy roughly the same complexity, we show that this is not the case under differential privacy. In particular, only structure learning under approximate differential privacy maintains the non-private logarithmic dependence on the dimensionality of the data, while a change in either the learning goal or the privacy notion would necessitate a polynomial dependence. As a result, we show that the privacy constraint imposes a strong separation between these two learning problems in the high-dimensional data regime.

preprint2020arXiv

Scheduling with Communication Delays via LP Hierarchies and Clustering

We consider the classic problem of scheduling jobs with precedence constraints on identical machines to minimize makespan, in the presence of communication delays. In this setting, denoted by $\mathsf{P} \mid \mathsf{prec}, c \mid C_{\mathsf{max}}$, if two dependent jobs are scheduled on different machines, then at least $c$ units of time must pass between their executions. Despite its relevance to many applications, this model remains one of the most poorly understood in scheduling theory. Even for a special case where an unlimited number of machines is available, the best known approximation ratio is $2/3 \cdot (c+1)$, whereas Graham's greedy list scheduling algorithm already gives a $(c+1)$-approximation in that setting. An outstanding open problem in the top-10 list by Schuurman and Woeginger and its recent update by Bansal asks whether there exists a constant-factor approximation algorithm. In this work we give a polynomial-time $O(\log c \cdot \log m)$-approximation algorithm for this problem, where $m$ is the number of machines and $c$ is the communication delay. Our approach is based on a Sherali-Adams lift of a linear programming relaxation and a randomized clustering of the semimetric space induced by this lift.

preprint2020arXiv

The Power of Recourse: Better Algorithms for Facility Location in Online and Dynamic Models

In this paper we study the facility location problem in the online with recourse and dynamic algorithm models. In the online with recourse model, clients arrive one by one and our algorithm needs to maintain good solutions at all time steps with only a few changes to the previously made decisions (called recourse). We show that the classic local search technique can lead to a $(1+\sqrt{2}+ε)$-competitive online algorithm for facility location with only $O\left(\frac{\log n}ε\log\frac1ε\right)$ amortized facility and client recourse. We then turn to the dynamic algorithm model for the problem, where the main goal is to design fast algorithms that maintain good solutions at all time steps. We show that the result for online facility location, combined with the randomized local search technique of Charikar and Guha [10], leads to an $O(1+\sqrt{2}+ε)$ approximation dynamic algorithm with amortized update time of $\tilde O(n)$ in the incremental setting. Notice that the running time is almost optimal, since in general metric space it takes $Ω(n)$ time to specify a new client's position. The approximation factor of our algorithm also matches the best offline analysis of the classic local search algorithm. Finally, we study the fully dynamic model for facility location, where clients can both arrive and depart. Our main result is an $O(1)$-approximation algorithm in this model with $O(|F|)$ preprocessing time and $O(\log^3 D)$ amortized update time for the HST metric spaces. Using the seminal results of Bartal [4] and Fakcharoenphol, Rao and Talwar [17], which show that any arbitrary $N$-point metric space can be embedded into a distribution over HSTs such that the expected distortion is at most $O(\log N)$, we obtain a $O(\log |F|)$ approximation with preprocessing time of $O(|F|^2\log |F|)$ and $O(\log^3 D)$ amortized update time.

preprint2016arXiv

Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters

We present a scheduler that improves cluster utilization and job completion times by packing tasks having multi-resource requirements and inter-dependencies. While the problem is algorithmically very hard, we achieve near-optimality on the job DAGs that appear in production clusters at a large enterprise and in benchmarks such as TPC-DS. A key insight is that carefully handling the long-running tasks and those with tough-to-pack resource needs will produce good-enough schedules. However, which subset of tasks to treat carefully is not clear (and intractable to discover). Hence, we offer a search procedure that evaluates various possibilities and outputs a preferred schedule order over tasks. An online component enforces the schedule orders desired by the various jobs running on the cluster. In addition, it packs tasks, overbooks the fungible resources and guarantees bounded unfairness for a variety of desirable fairness schemes. Relative to the state-of-the art schedulers, we speed up 50% of the jobs by over 30% each.

preprint2015arXiv

Minimizing Flow-Time on Unrelated Machines

We consider some flow-time minimization problems in the unrelated machines setting. In this setting, there is a set of $m$ machines and a set of $n$ jobs, and each job $j$ has a machine dependent processing time of $p_{ij}$ on machine $i$. The flow-time of a job is the total time the job spends in the system (completion time minus its arrival time), and is one of the most natural quality of service measure. We show the following two results: an $O(\min(\log^2 n,\log n \log P))$ approximation algorithm for minimizing the total-flow time, and an $O(\log n)$ approximation for minimizing the maximum flow-time. Here $P$ is the ratio of maximum to minimum job size. These are the first known poly-logarithmic guarantees for both the problems.

preprint2015arXiv

Tight Bounds for Online Vector Scheduling

Modern data centers face a key challenge of effectively serving user requests that arrive online. Such requests are inherently multi-dimensional and characterized by demand vectors over multiple resources such as processor cycles, storage space, and network bandwidth. Typically, different resources require different objectives to be optimized, and $L_r$ norms of loads are among the most popular objectives considered. To address these problems, we consider the online vector scheduling problem in this paper. Introduced by Chekuri and Khanna (SIAM J of Comp. 2006), vector scheduling is a generalization of classical load balancing, where every job has a vector load instead of a scalar load. In this paper, we resolve the online complexity of the vector scheduling problem and its important generalizations. Our main results are: -For identical machines, we show that the optimal competitive ratio is $Θ(\log d / \log \log d)$ by giving an online lower bound and an algorithm with an asymptotically matching competitive ratio. The lower bound is technically challenging, and is obtained via an online lower bound for the minimum mono-chromatic clique problem using a novel online coloring game and randomized coding scheme. -For unrelated machines, we show that the optimal competitive ratio is $Θ(\log m + \log d)$ by giving an online lower bound that matches a previously known upper bound. Unlike identical machines, however, extending these results, particularly the upper bound, to general $L_r$ norms requires new ideas. In particular, we use a carefully constructed potential function that balances the individual $L_r$ objectives with the overall (convexified) min-max objective to guide the online algorithm and track the changes in potential to bound the competitive ratio.

preprint2014arXiv

Competitive Algorithms from Competitive Equilibria: Non-Clairvoyant Scheduling under Polyhedral Constraints

We introduce and study a general scheduling problem that we term the Packing Scheduling problem. In this problem, jobs can have different arrival times and sizes; a scheduler can process job $j$ at rate $x_j$, subject to arbitrary packing constraints over the set of rates ($\vec{x}$) of the outstanding jobs. The PSP framework captures a variety of scheduling problems, including the classical problems of unrelated machines scheduling, broadcast scheduling, and scheduling jobs of different parallelizability. It also captures scheduling constraints arising in diverse modern environments ranging from individual computer architectures to data centers. More concretely, PSP models multidimensional resource requirements and parallelizability, as well as network bandwidth requirements found in data center scheduling. In this paper, we design non-clairvoyant online algorithms for PSP and its special cases -- in this setting, the scheduler is unaware of the sizes of jobs. Our two main results are, 1) a constant competitive algorithm for minimizing total weighted completion time for PSP and 2)a scalable algorithm for minimizing the total flow-time on unrelated machines, which is a special case of PSP.

preprint2014arXiv

SELFISHMIGRATE: A Scalable Algorithm for Non-clairvoyantly Scheduling Heterogeneous Processors

We consider the classical problem of minimizing the total weighted flow-time for unrelated machines in the online \emph{non-clairvoyant} setting. In this problem, a set of jobs $J$ arrive over time to be scheduled on a set of $M$ machines. Each job $j$ has processing length $p_j$, weight $w_j$, and is processed at a rate of $\ell_{ij}$ when scheduled on machine $i$. The online scheduler knows the values of $w_j$ and $\ell_{ij}$ upon arrival of the job, but is not aware of the quantity $p_j$. We present the {\em first} online algorithm that is {\em scalable} ($(1+\eps)$-speed $O(\frac{1}{ε^2})$-competitive for any constant $\eps > 0$) for the total weighted flow-time objective. No non-trivial results were known for this setting, except for the most basic case of identical machines. Our result resolves a major open problem in online scheduling theory. Moreover, we also show that no job needs more than a logarithmic number of migrations. We further extend our result and give a scalable algorithm for the objective of minimizing total weighted flow-time plus energy cost for the case of unrelated machines and obtain a scalable algorithm. The key algorithmic idea is to let jobs migrate selfishly until they converge to an equilibrium. Towards this end, we define a game where each job's utility which is closely tied to the instantaneous increase in the objective the job is responsible for, and each machine declares a policy that assigns priorities to jobs based on when they migrate to it, and the execution speeds. This has a spirit similar to coordination mechanisms that attempt to achieve near optimum welfare in the presence of selfish agents (jobs). To the best our knowledge, this is the first work that demonstrates the usefulness of ideas from coordination mechanisms and Nash equilibria for designing and analyzing online algorithms.

preprint2013arXiv

Constant-Competitive Prior-Free Auction with Ordered Bidders

A central problem in Microeconomics is to design auctions with good revenue properties. In this setting, the bidders' valuations for the items are private knowledge, but they are drawn from publicly known prior distributions. The goal is to find a truthful auction (no bidder can gain in utility by misreporting her valuation) that maximizes the expected revenue. Naturally, the optimal-auction is sensitive to the prior distributions. An intriguing question is to design a truthful auction that is oblivious to these priors, and yet manages to get a constant factor of the optimal revenue. Such auctions are called prior-free. Goldberg et al. presented a constant-approximate prior-free auction when there are identical copies of an item available in unlimited supply, bidders are unit-demand, and their valuations are drawn from i.i.d. distributions. The recent work of Leonardi et al. [STOC 2012] generalized this problem to non i.i.d. bidders, assuming that the auctioneer knows the ordering of their reserve prices. Leonardi et al. proposed a prior-free auction that achieves a $O(\log^* n)$ approximation. We improve upon this result, by giving the first prior-free auction with constant approximation guarantee.

Janardhan Kulkarni

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

OptiMind: Teaching LLMs to Think Like Optimization Experts

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

Differentially Private Fine-tuning of Language Models

Differentially Private Model Compression

Differentially Private Set Union

Synergy: Resource Sensitive DNN Scheduling in Multi-Tenant Clusters

Differentially Private Correlation Clustering

Fast and Memory Efficient Differentially Private-SGD via JL Projections

Consistent $k$-Median: Simpler, Better and Robust

Hierarchy-Based Algorithms for Minimizing Makespan under Precedence and Communication Constraints

Locally Private Hypothesis Selection

Privately Learning Markov Random Fields

Scheduling with Communication Delays via LP Hierarchies and Clustering

The Power of Recourse: Better Algorithms for Facility Location in Online and Dynamic Models

Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters

Minimizing Flow-Time on Unrelated Machines

Tight Bounds for Online Vector Scheduling

Competitive Algorithms from Competitive Equilibria: Non-Clairvoyant Scheduling under Polyhedral Constraints

SELFISHMIGRATE: A Scalable Algorithm for Non-clairvoyantly Scheduling Heterogeneous Processors

Constant-Competitive Prior-Free Auction with Ordered Bidders