Source author record

Mahesh Viswanathan

Mahesh Viswanathan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Logic in Computer Science Formal Languages and Automata Theory Programming Languages Software Engineering Artificial Intelligence Computation and Language Computational Complexity Cryptography and Security cs.CY eess.SY Machine Learning math.OC Systems and Control

Catalog footprint

What is connected

12works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Correcting Selection Bias in Sparse User Feedback for Large Language Model Quality Estimation: A Multi-Agent Hierarchical Bayesian Approach

[Abridged] Production LLM deployments receive feedback from a non-random fraction of users: thumbs sit mostly in the tails of the satisfaction distribution, and a naive average over them can land 40-50 percentage points away from true system quality. We treat this as a topic- and sentiment- stratified selection-bias problem and propose a three-agent hierarchical Bayesian pipeline that does not require ground-truth labels on individual interactions. A Topic Clustering Agent partitions the stream via UMAP + HDBSCAN over text embeddings; a Bias Modeling Agent fits a two-stage hierarchical Beta-Binomial under NUTS, inferring per-topic selection rates $s_c$ and quality $q_c$ with partial pooling; a Synthesis Agent reweights $q_c$ by true topic prevalence $\hatπ_c = n_c/N$ to report a bias-corrected aggregate posterior $\bar Q = \sum_c \hatπ_c q_c$ with credible interval, plus drift signals for online recalibration. Validation uses UltraFeedback (N=10,232 retained interactions, $C=18$ clusters, $Q^\star=0.6249$) with simulated topic- and sentiment-dependent selection biases. We compare five Bayesian variants against Naive and IPW baselines. A mild prior on the feedback channel (typical positive-feedback rate and negative-to-positive ratio, both readable from any production dashboard without labels) keeps Hierarchical-Informed within 4-13 pp of $Q^\star$ as the bias ratio sweeps from 1:1 to 30:1, with 95% credible intervals covering $Q^\star$ in 50/50 random-seed replicates at $κ_{\max}=10$. Without channel-side priors, every weak-prior variant misses $Q^\star$ by 22-33 pp: the per-cluster sufficient statistics admit a one-parameter family of equally good fits, and the prior on the bias channel (not on latent quality) is what breaks the degeneracy.

preprint2026arXiv

From Intent to Execution: Composing Agentic Workflows with Agent Recommendation

Multi-Agent Systems (MAS) built using AI agents fulfill a variety of user intents that may be used to design and build a family of related applications. However, the creation of such MAS currently involves manual composition of the plan, manual selection of appropriate agents, and manual creation of execution graphs. This paper introduces a framework for the automated creation of multi-agent systems which replaces multiple manual steps with an automated framework. The proposed framework consists of software modules and a workflow to orchestrate the requisite task- specific application. The modules include: an LLM-derived planner, a set of tasks described in natural language, a dynamic call graph, an orchestrator for map agents to tasks, and an agent recommender that finds the most suitable agent(s) from local and global agent registries. The agent recommender uses a two-stage information retrieval (IR) system comprising a fast retriever and an LLM-based re-ranker. We implemented a series of experiments exploring the choice of embedders, re- rankers, agent description enrichment, and supervising critique agent. We benchmarked this system end-to-end, evaluating the combination of planning, agent selection, and task completion, with our proposed approach. Our experimental results show that our approach outperforms the state-of-the- art in terms of the recall rate and is more robust and scalable compared to previous approaches. The critique agent holistically reevaluates both agent and tool recommendations against the overall plan. We show that the inclusion of the critique agent further enhances the recall score, proving that the comprehensive review and revision of task-based agent selection is an essential step in building end-to-end multi-agent systems.

preprint2022arXiv

Proof Blocks: Autogradable Scaffolding Activities for Learning to Write Proofs

Proof Blocks is a software tool which enables students to write proofs by dragging and dropping prewritten proof lines into the correct order. These proofs can be graded completely automatically, enabling students to receive rapid feedback on how they are doing with their proofs. When constructing a problem, the instructor specifies the dependency graph of the lines of the proof, so that any correct arrangement of the lines can receive full credit. This innovation can improve assessment tools by increasing the types of questions we can ask students about proofs, and can give greater access to proof knowledge by increasing the amount that students can learn on their own with the help of a computer.

preprint2020arXiv

Decidable Synthesis of Programs with Uninterpreted Functions

We identify a decidable synthesis problem for a class of programs of unbounded size with conditionals and iteration that work over infinite data domains. The programs in our class use uninterpreted functions and relations, and abide by a restriction called coherence that was recently identified to yield decidable verification. We formulate a powerful grammar-restricted (syntax-guided) synthesis problem for coherent uninterpreted programs, and we show the problem to be decidable, identify its precise complexity, and also study several variants of the problem.

preprint2020arXiv

Decidable Verification of Uninterpreted Programs

We study the problem of completely automatically verifying uninterpreted programs---programs that work over arbitrary data models that provide an interpretation for the constants, functions and relations the program uses. The verification problem asks whether a given program satisfies a postcondition written using quantifier-free formulas with equality on the final state, with no loop invariants, contracts, etc. being provided. We show that this problem is undecidable in general. The main contribution of this paper is a subclass of programs, called coherent programs that admits decidable verification, and can be decided in PSPACE. We then extend this class of programs to classes of programs that are $k$-coherent, where $k \in \mathbb{N}$, obtained by (automatically) adding $k$ ghost variables and assignments that make them coherent. We also extend the decidability result to programs with recursive function calls and prove several undecidability results that show why our restrictions to obtain decidability seem necessary.

preprint2020arXiv

Deciding Differential Privacy for Programs with Finite Inputs and Outputs

Differential privacy is a de facto standard for statistical computations over databases that contain private data. The strength of differential privacy lies in a rigorous mathematical definition that guarantees individual privacy and yet allows for accurate statistical results. Thanks to its mathematical definition, differential privacy is also a natural target for formal analysis. A broad line of work uses logical methods for proving privacy. However, these methods are not complete, and only partially automated. A recent and complementary line of work uses statistical methods for finding privacy violations. However, the methods only provide statistical guarantees (but no proofs). We propose the first decision procedure for checking the differential privacy of a non-trivial class of probabilistic computations. Our procedure takes as input a program P parametrized by a privacy budget $ε$, and either proves differential privacy for all possible values of $ε$ or generates a counterexample. In addition, our procedure applies both to $ε$-differential privacy and $(ε,δ)$-differential privacy. Technically, the decision procedure is based on a novel and judicious encoding of the semantics of programs in our class into a decidable fragment of the first-order theory of the reals with exponentiation. We implement our procedure and use it for (dis)proving privacy bounds for many well-known examples, including randomized response, histogram, report noisy max and sparse vector.

preprint2020arXiv

Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

Probabilistic Computation Tree Logic (PCTL) is frequently used to formally specify control objectives such as probabilistic reachability and safety. In this work, we focus on model checking PCTL specifications statistically on Markov Decision Processes (MDPs) by sampling, e.g., checking whether there exists a feasible policy such that the probability of reaching certain goal states is greater than a threshold. We use reinforcement learning to search for such a feasible policy for PCTL specifications, and then develop a statistical model checking (SMC) method with provable guarantees on its error. Specifically, we first use upper-confidence-bound (UCB) based Q-learning to design an SMC algorithm for bounded-time PCTL specifications, and then extend this algorithm to unbounded-time specifications by identifying a proper truncation time by checking the PCTL specification and its negation at the same time. Finally, we evaluate the proposed method on case studies.

preprint2020arXiv

The Complexity of Dynamic Data Race Prediction

Writing concurrent programs is notoriously hard due to scheduling non-determinism. The most common concurrency bugs are data races, which are accesses to a shared resource that can be executed concurrently. Dynamic data-race prediction is the most standard technique for detecting data races: given an observed, data-race-free trace $t$, the task is to determine whether $t$ can be reordered to a trace $t^*$ that exposes a data-race. Although the problem has received significant practical attention for over three decades, its complexity has remained elusive. In this work, we address this lacuna, identifying sources of intractability and conditions under which the problem is efficiently solvable. Given a trace $t$ of size $n$ over $k$ threads, our main results are as follows. First, we establish a general $O(k\cdot n^{2\cdot (k-1)})$ upper-bound, as well as an $O(n^k)$ upper-bound when certain parameters of $t$ are constant. In addition, we show that the problem is NP-hard and even W[1]-hard parameterized by $k$, and thus unlikely to be fixed-parameter tractable. Second, we study the problem over acyclic communication topologies, such as server-clients hierarchies. We establish an $O(k^2\cdot d\cdot n^2\cdot \log n)$ upper-bound, where $d$ is the number of shared variables accessed in $t$. In addition, we show that even for traces with $k=2$ threads, the problem has no $O(n^{2-ε})$ algorithm under Orthogonal Vectors. Since any trace with 2 threads defines an acyclic topology, our upper-bound for this case is optimal wrt polynomial improvements for up to moderate values of $k$ and $d$. Finally, we study a distance-bounded version of the problem, where the task is to expose a data race by a witness trace that is similar to $t$. We develop an algorithm that works in $O(n)$ time when certain parameters of $t$ are constant.

preprint2020arXiv

Verifying Stochastic Hybrid Systems with Temporal Logic Specifications via Model Reduction

We present a scalable methodology to verify stochastic hybrid systems. Using the Mori-Zwanzig reduction method, we construct a finite state Markov chain reduction of a given stochastic hybrid system and prove that this reduced Markov chain is approximately equivalent to the original system in a distributional sense. Approximate equivalence of the stochastic hybrid system and its Markov chain reduction means that analyzing the Markov chain with respect to a suitably strengthened property, allows us to conclude whether the original stochastic hybrid system meets its temporal logic specifications. We present the first statistical model checking algorithms to verify stochastic hybrid systems against correctness properties, expressed in the linear inequality linear temporal logic (iLTL) or the metric interval temporal logic (MITL).

preprint2019arXiv

Deciding Memory Safety for Single-Pass Heap-Manipulating Programs

We investigate the decidability of automatic program verification for programs that manipulate heaps, and in particular, decision procedures for proving memory safety for them. We extend recent work that identified a decidable subclass of uninterpreted programs to a class of alias-aware programs that can update maps. We apply this theory to develop verification algorithms for memory safety--- determining if a heap-manipulating program that allocates and frees memory locations and manipulates heap pointers does not dereference an unallocated memory location. We show that this problem is decidable when the initial allocated heap forms a forest data-structure and when programs are streaming-coherent, which intuitively restricts programs to make a single pass over a data-structure. Our experimental evaluation on a set of library routines that manipulate forest data-structures shows that common single-pass algorithms on data-structures often fall in the decidable class, and that our decision procedure is efficient in verifying them.

preprint2013arXiv

Reachability under Contextual Locking

The pairwise reachability problem for a multi-threaded program asks, given control locations in two threads, whether they can be simultaneously reached in an execution of the program. The problem is important for static analysis and is used to detect statements that are concurrently enabled. This problem is in general undecidable even when data is abstracted and when the threads (with recursion) synchronize only using a finite set of locks. Popular programming paradigms that limit the lock usage patterns have been identified under which the pairwise reachability problem becomes decidable. In this paper, we consider a new natural programming paradigm, called contextual locking, which ties the lock usage to calling patterns in each thread: we assume that locks are released in the same context that they were acquired and that every lock acquired by a thread in a procedure call is released before the procedure returns. Our main result is that the pairwise reachability problem is polynomial-time decidable for this new programming paradigm as well. The problem becomes undecidable if the locks are reentrant; reentrant locking is a \emph{recursive locking} mechanism which allows a thread in a multi-threaded program to acquire the reentrant lock multiple times.

preprint2011arXiv

Power of Randomization in Automata on Infinite Strings

Probabilistic Büchi Automata (PBA) are randomized, finite state automata that process input strings of infinite length. Based on the threshold chosen for the acceptance probability, different classes of languages can be defined. In this paper, we present a number of results that clarify the power of such machines and properties of the languages they define. The broad themes we focus on are as follows. We present results on the decidability and precise complexity of the emptiness, universality and language containment problems for such machines, thus answering questions central to the use of these models in formal verification. Next, we characterize the languages recognized by PBAs topologically, demonstrating that though general PBAs can recognize languages that are not regular, topologically the languages are as simple as ω-regular languages. Finally, we introduce Hierarchical PBAs, which are syntactically restricted forms of PBAs that are tractable and capture exactly the class of ω-regular languages.

Mahesh Viswanathan

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Correcting Selection Bias in Sparse User Feedback for Large Language Model Quality Estimation: A Multi-Agent Hierarchical Bayesian Approach

From Intent to Execution: Composing Agentic Workflows with Agent Recommendation

Proof Blocks: Autogradable Scaffolding Activities for Learning to Write Proofs

Decidable Synthesis of Programs with Uninterpreted Functions

Decidable Verification of Uninterpreted Programs

Deciding Differential Privacy for Programs with Finite Inputs and Outputs

Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

The Complexity of Dynamic Data Race Prediction

Verifying Stochastic Hybrid Systems with Temporal Logic Specifications via Model Reduction

Deciding Memory Safety for Single-Pass Heap-Manipulating Programs

Reachability under Contextual Locking

Power of Randomization in Automata on Infinite Strings