Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
18topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Persona-Model Collapse in Emergent Misalignment

Fine-tuning large language models on narrow data with harmful content produces broadly misaligned behavior on unrelated prompts, a phenomenon known as emergent misalignment. We propose that emergent misalignment involves persona-model collapse: deterioration of the model's internal capacity to simulate, differentiate, and maintain consistent characters. We test this hypothesis behaviorally using two metrics: moral susceptibility (S) and moral robustness (R), computed from the across- and within-persona variability of models' Moral Foundations Questionnaire responses under persona role-play. These metrics formalize the model's ability to differentiate characters (S) and its consistency when simulating a given one (R). We evaluate four frontier models (DeepSeek-V3.1, GPT-4.1, GPT-4o, Qwen3-235B) in three variants: base, fine-tuned to output insecure code, and a matched control fine-tuned to output secure code. Across the four models, insecure fine-tuning produces an average $55\%$ increase in S, pushing all four insecure variants beyond the band observed across 13 frontier models benchmarked in prior work -- with GPT-4o reaching more than twice the band's upper end -- signaling dysregulated differentiation. It also causes an average $65\%$ decrease in R, equivalent to a $304\%$ increase in 1/R. By contrast, the matched secure control preserves S near the base and induces only a partial R loss, showing that these effects are largely misalignment-specific. Complementing these metric shifts, insecure variants' unconditioned responses converge toward saturation near the scale ceiling, departing markedly from both base models' structured responses and those elicited when base models role-play toxic personas. Taken together, these metrics provide a sensitive diagnostic for emergent misalignment and serve as behavioral evidence that it involves persona-model collapse.

preprint2022arXiv

Effective Sample Size, Dimensionality, and Generalization in Covariate Shift Adaptation

In supervised learning, training and test datasets are often sampled from distinct distributions. Domain adaptation techniques are thus required. Covariate shift adaptation yields good generalization performance when domains differ only by the marginal distribution of features. Covariate shift adaptation is usually implemented using importance weighting, which may fail, according to common wisdom, due to small effective sample sizes (ESS). Previous research argues this scenario is more common in high-dimensional settings. However, how effective sample size, dimensionality, and model performance/generalization are formally related in supervised learning, considering the context of covariate shift adaptation, is still somewhat obscure in the literature. Thus, a main challenge is presenting a unified theory connecting those points. Hence, in this paper, we focus on building a unified view connecting the ESS, data dimensionality, and generalization in the context of covariate shift adaptation. Moreover, we also demonstrate how dimensionality reduction or feature selection can increase the ESS, and argue that our results support dimensionality reduction before covariate shift adaptation as a good practice.

preprint2022arXiv

Restricted Boltzmann Machine Flows and The Critical Temperature of Ising models

We explore alternative experimental setups for the iterative sampling (flow) from Restricted Boltzmann Machines (RBM) mapped on the temperature space of square lattice Ising models by a neural network thermometer. This framework has been introduced to explore connections between RBM-based deep neural networks and the Renormalization Group (RG). It has been found that, under certain conditions, the flow of an RBM trained with Ising spin configurations approaches in the temperature space a value around the critical one: $ k_B T_c / J \approx 2.269$. In this paper we consider datasets with no information about model topology to argue that a neural network thermometer is not an accurate way to detect whether the RBM has learned scale invariance or not.

preprint2013arXiv

Moral foundations in an interacting neural networks society

The moral foundations theory supports that people, across cultures, tend to consider a small number of dimensions when classifying issues on a moral basis. The data also show that the statistics of weights attributed to each moral dimension is related to self-declared political affiliation, which in turn has been connected to cognitive learning styles by recent literature in neuroscience and psychology. Inspired by these data, we propose a simple statistical mechanics model with interacting neural networks classifying vectors and learning from members of their social neighborhood about their average opinion on a large set of issues. The purpose of learning is to reduce dissension among agents even when disagreeing. We consider a family of learning algorithms parametrized by δ, that represents the importance given to corroborating (same sign) opinions. We define an order parameter that quantifies the diversity of opinions in a group with homogeneous learning style. Using Monte Carlo simulations and a mean field approximation we find the relation between the order parameter and the learning parameter δat a temperature we associate with the importance of social influence in a given group. In concordance with data, groups that rely more strongly on corroborating evidence sustains less opinion diversity. We discuss predictions of the model and propose possible experimental tests.

preprint2013arXiv

Statistical mechanics of reputation systems in autonomous networks

Reputation systems seek to infer which members of a community can be trusted based on ratings they issue about each other. We construct a Bayesian inference model and simulate approximate estimates using belief propagation (BP). The model is then mapped onto computing equilibrium properties of a spin glass in a random field and analyzed by employing the replica symmetric cavity approach. Having the fraction of trustful nodes and environment noise level as control parameters, we evaluate the theoretical performance in terms of estimation error and the robustness of the BP approximation in different scenarios. Regions of degraded performance are then explained by the convergence properties of the BP algorithm and by the emergence of a glassy phase.

preprint2012arXiv

Altruism can proliferate through group/kin selection despite high random gene flow

The ways in which natural selection can allow the proliferation of cooperative behavior have long been seen as a central problem in evolutionary biology. Most of the literature has focused on interactions between pairs of individuals and on linear public goods games. This emphasis led to the conclusion that even modest levels of migration would pose a serious problem to the spread of altruism in group structured populations. Here we challenge this conclusion, by analyzing evolution in a framework which allows for complex group interactions and random migration among groups. We conclude that contingent forms of strong altruism can spread when rare under realistic group sizes and levels of migration. Our analysis combines group-centric and gene-centric perspectives, allows for arbitrary strength of selection, and leads to extensions of Hamilton's rule for the spread of altruistic alleles, applicable under broad conditions.

preprint2012arXiv

Invasion, polymorphic equilibria and fixation of a mutant social allele in group structured populations

Stable mixtures of cooperators and defectors are often seen in nature. This fact is at odds with predictions based on linear public goods games under weak selection. That model implies fixation either of cooperators or of defectors, and the former scenario requires a level of group relatedness larger than the cost/benefit ratio, being therefore expected only if there is either kin recognition or a very low cost/benefit ratio, or else under stringent conditions with low gene flow. This motivated us to study here social evolution in a large class of group structured populations, with arbitrary multi-individual interactions among group members and random migration among groups. Under the assumption of weak selection, we analyze the equilibria and their stability. For some significant models of social evolution with non-linear fitness functions, including contingent behavior in iterated public goods games and threshold models, we show that three regimes occur, depending on the migration rate among groups. For sufficiently high migration rates, a rare cooperative allele A cannot invade a monomorphic population of asocial alleles N. For sufficiently low values of the migration rate, allele A can invade the population, when rare, and then fixate, eliminating N. For intermediate values of the migration rate, allele A can invade the population, when rare, producing a polymorphic equilibrium, in which it coexists with N. The equilibria and their stability do not depend on the details of the population structure. The levels of migration (gene flow) and group relatedness that allow for invasion of the cooperative allele leading to polymorphic equilibria with the non-cooperative allele are common in nature.

preprint2012arXiv

The Taylor-Frank method cannot be applied to some biologically important, continuous fitness functions

The Taylor-Frank method for making kin selection models when fitness is a nonlinear function of a continuous phenotype requires this function to be differentiable. This assumption sometimes fails for biologically important fitness functions, for instance in microbial data and the theory of repeated n-person games, even when fitness functions are smooth and continuous. In these cases, the Taylor-Frank methodology cannot be used, and a more general form of direct fitness must replace the standard one to account for kin selection, even under weak selection.

preprint2011arXiv

Agent-based Social Psychology: from Neurocognitive Processes to Social Data

Moral Foundation Theory states that groups of different observers may rely on partially dissimilar sets of moral foundations, thereby reaching different moral valuations. The use of functional imaging techniques has revealed a spectrum of cognitive styles with respect to the differential handling of novel or corroborating information that is correlated to political affiliation. Here we characterize the collective behavior of an agent-based model whose inter individual interactions due to information exchange in the form of opinions are in qualitative agreement with experimental neuroscience data. The main conclusion derived connects the existence of diversity in the cognitive strategies and statistics of the sets of moral foundations and suggests that this connection arises from interactions between agents. Thus a simple interacting agent model, whose interactions are in accord with empirical data on conformity and learning processes, presents statistical signatures consistent with moral judgment patterns of conservatives and liberals as obtained by survey studies of social psychology.

preprint2011arXiv

Two-level Fisher-Wright framework with selection and migration: An approach to studying evolution in group structured populations

A framework for the mathematical modeling of evolution in group structured populations is introduced. The population is divided into a fixed large number of groups of fixed size. From generation to generation, new groups are formed that descend from previous groups, through a two-level Fisher-Wright process, with selection between groups and within groups and with migration between groups at rate $m$. When $m=1$, the framework reduces to the often used trait-group framework, so that our setting can be seen as an extension of that approach. Our framework allows the analysis of previously introduced models in which altruists and non-altruists compete, and provides new insights into these models. We focus on the situation in which initially there is a single altruistic allele in the population, and no further mutations occur. The main questions are conditions for the viability of that altruistic allele to spread, and the fashion in which it spreads when it does. Because our results and methods are rigorous, we see them as shedding light on various controversial issues in this field, including the role of Hamilton's rule, and of the Price equation, the relevance of linearity in fitness functions and the need to only consider pairwise interactions, or weak selection. In this paper we analyze the early stages of the evolution, during which the number of altruists is small compared to the size of the population. We show that during this stage the evolution is well described by a multitype branching process. The driving matrix for this process can be obtained, reducing the problem of determining when the altruistic gene is viable to a comparison between the leading eigenvalue of that matrix, and the fitness of the non-altruists before the altruistic gene appeared. This leads to a generalization of Hamilton's condition for the viability of a mutant gene.

preprint2009arXiv

An information theoretic approach to statistical dependence: copula information

We discuss the connection between information and copula theories by showing that a copula can be employed to decompose the information content of a multivariate distribution into marginal and dependence components, with the latter quantified by the mutual information. We define the information excess as a measure of deviation from a maximum entropy distribution. The idea of marginal invariant dependence measures is also discussed and used to show that empirical linear correlation underestimates the amplitude of the actual correlation in the case of non-Gaussian marginals. The mutual information is shown to provide an upper bound for the asymptotic empirical log-likelihood of a copula. An analytical expression for the information excess of T-copulas is provided, allowing for simple model identification within this family. We illustrate the framework in a financial data set.