Source author record

Harsh Trivedi

Harsh Trivedi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language math.FA math.OA Machine Learning math.DS Software Engineering

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

AI agents may soon become capable of autonomously completing valuable, long-horizon tasks in diverse domains. Current benchmarks either do not measure real-world tasks, or are not sufficiently difficult to meaningfully measure frontier models. To this end, we present Terminal-Bench 2.0: a carefully curated hard benchmark composed of 89 tasks in computer terminal environments inspired by problems from real workflows. Each task features a unique environment, human-written solution, and comprehensive tests for verification. We show that frontier models and agents score less than 65\% on the benchmark and conduct an error analysis to identify areas for model and agent improvement. We publish the dataset and evaluation harness to assist developers and researchers in future work at https://www.tbench.ai/ .

preprint2022arXiv

MuSiQue: Multihop Questions via Single-hop Question Composition

Multihop reasoning remains an elusive goal as existing multihop benchmarks are known to be largely solvable via shortcuts. Can we create a question answering (QA) dataset that, by construction, \emph{requires} proper multihop reasoning? To this end, we introduce a bottom-up approach that systematically selects composable pairs of single-hop questions that are connected, i.e., where one reasoning step critically relies on information from another. This bottom-up methodology lets us explore a vast space of questions and add stringent filters as well as other mechanisms targeting connected reasoning. It provides fine-grained control over the construction process and the properties of the resulting $k$-hop questions. We use this methodology to create MuSiQue-Ans, a new multihop QA dataset with 25K 2-4 hop questions. Relative to existing datasets, MuSiQue-Ans is more difficult overall (3x increase in human-machine gap), and harder to cheat via disconnected reasoning (e.g., a single-hop model has a 30 point drop in F1). We further add unanswerable contrast questions to produce a more stringent dataset, MuSiQue-Full. We hope our datasets will help the NLP community develop models that perform genuine multihop reasoning.

preprint2022arXiv

Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions

Current QA systems can generate reasonable-sounding yet false answers without explanation or evidence for the generated answer, which is especially problematic when humans cannot readily check the model's answers. This presents a challenge for building trust in machine learning systems. We take inspiration from real-world situations where difficult questions are answered by considering opposing sides (see Irving et al., 2018). For multiple-choice QA examples, we build a dataset of single arguments for both a correct and incorrect answer option in a debate-style set-up as an initial step in training models to produce explanations for two candidate answers. We use long contexts -- humans familiar with the context write convincing explanations for pre-selected correct and incorrect answers, and we test if those explanations allow humans who have not read the full context to more accurately determine the correct answer. We do not find that explanations in our set-up improve human accuracy, but a baseline condition shows that providing human-selected text snippets does improve accuracy. We use these findings to suggest ways of improving the debate set up for future data collection efforts.

preprint2020arXiv

DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Transformer-based QA models use input-wide self-attention -- i.e. across both the question and the input passage -- at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1% drop in accuracy. We open source the code at https://github.com/StonyBrookNLP/deformer.

preprint2020arXiv

Doubly commuting invariant subspaces for representations of product systems of $C^*$-correspondences

We obtain a Shimorin-Wold-type decomposition for a doubly commuting covariant representation of a product system of $C^*$-correspondences. This extends a recent Wold-type decomposition by Jeu and Pinto for a $q$-doubly commuting isometries. Application to the wandering subspaces of doubly commuting induced representations is explored, and a version of Mandrekar's Beurling type theorem is obtained to study doubly commuting invariant subspaces using Fock space approach due to Popescu.

preprint2016arXiv

Rokhlin Property for Group Actions on Hilbert $C^*$-modules

We introduce Rokhlin properties for certain discrete group actions on $C^*$-correspondences as well as on Hilbert bimodules and analyze them. It turns out that the group actions on any $C^*$-correspondence $E$ with Rokhlin property induces group actions on the associated $C^*$-algebra $\mathcal O_E$ with Rokhlin property and the group actions on any Hilbert bimodule with Rokhlin property induces group actions on the linking algebra with Rokhlin property. Permanence properties of several notions such as nuclear dimension and $\mathcal D$-absorbing property with respect to crossed product of Hilbert $C^*$-modules with groups, where group actions have Rokhlin property, are studied. We also investigate a notion of outerness for Hilbert bimodules.