Source author record

Kaustubh Dhole

Kaustubh Dhole appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Machine Learning cs.CY Human-Computer Interaction

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation

Recent advances in mechanistic interpretability suggest that intermediate attention layers encode token-level hypotheses that are iteratively refined toward the final output. In this work, we exploit this property to generate adversarial examples directly from attention-layer token distributions. Unlike prompt-based or gradient-based attacks, our approach leverages model-internal token predictions, producing perturbations that are both plausible and internally consistent with the model's own generation process. We evaluate whether tokens extracted from intermediate layers can serve as effective adversarial perturbations for downstream evaluation tasks. We conduct experiments on argument quality assessment using the ArgQuality dataset, with LLaMA-3.1-Instruct-8B serving as both the generator and evaluator. Our results show that attention-based adversarial examples lead to measurable drops in evaluation performance while remaining semantically similar to the original inputs. However, we also observe that substitutions drawn from certain layers and token positions can introduce grammatical degradation, limiting their practical effectiveness. Overall, our findings highlight both the promise and current limitations of using intermediate-layer representations as a principled source of adversarial examples for stress-testing LLM-based evaluation pipelines.

preprint2023arXiv

Is AI Art Another Industrial Revolution in the Making?

A major shift from skilled to unskilled workers was one of the many changes caused by the Industrial Revolution, when the switch to machines contributed to decline in the social and economic status of artisans, whose skills were dismembered into discrete actions by factory-line workers. We consider what may be an analogous computing technology: the recent introduction of AI-generated art software. AI art generators such as Dall-E and Midjourney can create fully rendered images based solely on a user's prompt, just at the click of a button. Some artists fear if the cheaper price and conveyor-belt speed that comes with AI-produced images is seen as an improvement to the current system, it may permanently change the way society values/views art and artists. In this article, we consider the implications that AI art generation introduces through a post-industrial revolution historical lens. We then reflect on the analogous issues that appear to arise as a result of the AI art revolution, and we conclude that the problems raised mirror those of industrialization, giving a vital glimpse into what may lie ahead.

preprint2022arXiv

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.