Researcher profile

Vamshi Krishna Bonagiri

Vamshi Krishna Bonagiri contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Intrinsic Guardrails: How Semantic Geometry of Personality Interacts with Emergent Misalignment in LLMs

Fine-tuning Large Language Models (LLMs) on benign narrow data can sometimes induce broad harmful behaviors, a vulnerability termed emergent misalignment (EM). While prior work links these failures to specific directions in the activation space, their relationship to the model's broader persona remains unexplored. We map the latent personality space of LLMs through established psychometric profiles like the Big Five, Dark Triad, and LLM-specific behaviors (e.g. evil, sycophancy), and show that the semantic geometry is highly stable across aligned models and their corrupted fine-tunes. Through causal interventions, we find that directions isolating social valence, such as the 'Evil' persona vector, and a Semantic Valence Vector (SVV) that we introduce, function as intrinsic guardrails: ablating them drives the misalignment rates above $40$%, while amplifying them suppresses the failure mode to less than $3$%. Leveraging the structural stability of the personality space, we show that vectors extracted $\textit{a priori}$ from an instruct-tuned model transfer zero-shot to successfully regulate EM in corrupted fine-tunes. Overall, our findings suggest that harmful fine-tuning does not overwrite a model's internal representation of personality, allowing conserved representations to serve as robust, cross-distribution guardrails.

preprint2022arXiv

Are Deepfakes Concerning? Analyzing Conversations of Deepfakes on Reddit and Exploring Societal Implications

Deepfakes are synthetic content generated using advanced deep learning and AI technologies. The advancement of technology has created opportunities for anyone to create and share deepfakes much easier. This may lead to societal concerns based on how communities engage with it. However, there is limited research available to understand how communities perceive deepfakes. We examined deepfake conversations on Reddit from 2018 to 2021 -- including major topics and their temporal changes as well as implications of these conversations. Using a mixed-method approach -- topic modeling and qualitative coding, we found 6,638 posts and 86,425 comments discussing concerns of the believable nature of deepfakes and how platforms moderate them. We also found Reddit conversations to be pro-deepfake and building a community that supports creating and sharing deepfake artifacts and building a marketplace regardless of the consequences. Possible implications derived from qualitative codes indicate that deepfake conversations raise societal concerns. We propose that there are implications for Human Computer Interaction (HCI) to mitigate the harm created from deepfakes.