Source author record

Aleksandra M. Walczak

Aleksandra M. Walczak appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Populations and Evolution Molecular Networks Quantitative Methods Genomics Biological Physics cond-mat.stat-mech cond-mat.dis-nn Cell Behavior Biomolecules Machine Learning physics.data-an Robotics Subcellular Processes Systems and Control

Catalog footprint

What is connected

41works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories

Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice. One class of methods uses data simulated with different parameters to infer models of the likelihood-to-evidence ratio, or equivalently the posterior function. Here we frame the inference task as an estimation of an energy function parametrized with an artificial neural network. We present an intuitive approach where the optimal model of the likelihood-to-evidence ratio is found by maximizing the likelihood of simulated data. Within this framework, the connection between the task of simulation-based inference and mutual information maximization is clear, and we show how several known methods of posterior estimation relate to alternative lower bounds to mutual information. These distinct objective functions aim at the same optimal energy form and therefore can be directly benchmarked. We compare their accuracy in the inference of model parameters, focusing on four dynamical systems that encompass common challenges in time series analysis: dynamics driven by multiplicative noise, nonlinear interactions, chaotic behavior, and high-dimensional parameter space.

preprint2022arXiv

NoisET: Noise learning and Expansion detection of T-cell receptors

High-throughput sequencing of T- and B-cell receptors makes it possible to track immune repertoires across time, in different tissues, in acute and chronic diseases and in healthy individuals. However quantitative comparison between repertoires is confounded by variability in the read count of each receptor clonotype due to sampling, library preparation, and expression noise. We review methods for accounting for both biological and experimental noise and present an easy-to-use python package NoisET that implements and generalizes a previously developed Bayesian method. It can be used to learn experimental noise models for repertoire sequencing from replicates, and to detect responding clones following a stimulus. We test the package on different repertoire sequencing technologies and datasets. We review how such approaches have been used to identify responding clonotypes in vaccination and disease data. Availability: NoisET is freely available to use with source code at github.com/statbiophys/NoisET.

preprint2021arXiv

Affinity maturation for an optimal balance between long-term immune coverage and short-term resource constraints

In order to target threatening pathogens, the adaptive immune system performs a continuous reorganization of its lymphocyte repertoire. Following an immune challenge, the B cell repertoire can evolve cells of increased specificity for the encountered strain. This process of affinity maturation generates a memory pool whose diversity and size remain difficult to predict. We assume that the immune system follows a strategy that maximizes the long-term immune coverage and minimizes the short-term metabolic costs associated with affinity maturation. This strategy is defined as an optimal decision process on a finite dimensional phenotypic space, where a pre-existing population of naive cells is sequentially challenged with a neutrally evolving strain. We unveil a trade-off between immune protection against future strains and the necessary reorganization of the repertoire. This plasticity of the repertoire drives the emergence of distinct regimes for the size and diversity of the memory pool, depending on the density of naive cells and on the mutation rate of the strain. The model predicts power-law distributions of clonotype sizes observed in data, and rationalizes antigenic imprinting as a strategy to minimize metabolic costs while keeping good immune protection against future strains.

preprint2021arXiv

Antigenic waves of virus-immune co-evolution

The evolution of many microbes and pathogens, including circulating viruses such as seasonal influenza, is driven by immune pressure from the host population. In turn, the immune systems of infected populations get updated, chasing viruses even further away. Quantitatively understanding how these dynamics result in observed patterns of rapid pathogen and immune adaptation is instrumental to epidemiological and evolutionary forecasting. Here we present a mathematical theory of co-evolution between immune systems and viruses in a finite-dimensional antigenic space, which describes the cross-reactivity of viral strains and immune systems primed by previous infections. We show the emergence of an antigenic wave that is pushed forward and canalized by cross-reactivity. We obtain analytical results for shape, speed, and angular diffusion of the wave. In particular, we show that viral-immune co-evolution generates a new emergent timescale, the persistence time of the wave's direction in antigenic space, which can be much longer than the coalescence time of the viral population. We compare these dynamics to the observed antigenic turnover of influenza strains, and we discuss how the dimensionality of antigenic space impacts on the predictability of the evolutionary dynamics. Our results provide a concrete and tractable framework to describe pathogen-host co-evolution.

preprint2020arXiv

Building general Langevin models from discrete data sets

Many living and complex systems exhibit second order emergent dynamics. Limited experimental access to the configurational degrees of freedom results in data that appears to be generated by a non-Markovian process. This poses a challenge in the quantitative reconstruction of the model from experimental data, even in the simple case of equilibrium Langevin dynamics of Hamiltonian systems. We develop a novel Bayesian inference approach to learn the parameters of such stochastic effective models from discrete finite length trajectories. We first discuss the failure of naive inference approaches based on the estimation of derivatives through finite differences, regardless of the time resolution and the length of the sampled trajectories. We then derive, adopting higher order discretization schemes, maximum likelihood estimators for the model parameters that provide excellent results even with moderately long trajectories. We apply our method to second order models of collective motion and show that our results also hold in the presence of interactions.

preprint2020arXiv

Immune Fingerprinting through Repertoire Similarity

Immune repertoires provide a unique fingerprint reflecting the immune history of individuals, with potential applications in precision medicine. However, the question of how personal that information is and how it can be used to identify individuals has not been explored. Here, we show that individuals can be uniquely identified from repertoires of just a few thousands lymphocytes. We present "Immprint," a classifier using an information-theoretic measure of repertoire similarity to distinguish pairs of repertoire samples coming from the same versus different individuals. Using published T-cell receptor repertoires and statistical modeling, we tested its ability to identify individuals with great accuracy, including identical twins, by computing false positive and false negative rates $< 10^{-6}$ from samples composed of 10,000 T-cells. We verified through longitudinal datasets and simulations that the method is robust to acute infections and the passage of time. These results emphasize the private and personal nature of repertoire data.

preprint2020arXiv

Learning the heterogeneous hypermutation landscape of immunoglobulins from high-throughput repertoire data

Somatic hypermutations of immunoglobulin (Ig) genes occuring during affinity maturation drive B-cell receptors' ability to evolve strong binding to their antigenic targets. The landscape of these mutations is highly heterogeneous, with certain regions of the Ig gene being preferentially targeted. However, a rigorous quantification of this bias has been difficult because of phylogenetic correlations between sequences and the interference of selective forces. Here, we present an approach that corrects for these issues, and use it to learn a model of hypermutation preferences from a recently published large IgH repertoire dataset. The obtained model predicts mutation profiles accurately and in a reproducible way, including in the previously uncharacterized Complementarity Determining Region 3, revealing that both the sequence context of the mutation and its absolute position along the gene are important. In addition, we show that hypermutations occurring concomittantly along B-cell lineages tend to co-localize, suggesting a possible mechanism for accelerating affinity maturation.

preprint2020arXiv

Longitudinal high-throughput TCR repertoire profiling reveals the dynamics of T cell memory formation after mild COVID-19 infection

COVID-19 is a global pandemic caused by the SARS-CoV-2 coronavirus. T cells play a key role in the adaptive antiviral immune response by killing infected cells and facilitating the selection of virus-specific antibodies. However neither the dynamics and cross-reactivity of the SARS-CoV-2-specific T cell response nor the diversity of resulting immune memory are well understood. In this study we use longitudinal high-throughput T cell receptor (TCR) sequencing to track changes in the T cell repertoire following two mild cases of COVID-19. In both donors we identified CD4+ and CD8+ T cell clones with transient clonal expansion after infection. The antigen specificity of CD8+ TCR sequences to SARS-CoV-2 epitopes was confirmed by both MHC tetramer binding and presence in large database of SARS-CoV-2 epitope-specific TCRs. We describe characteristic motifs in TCR sequences of COVID-19-reactive clones and show preferential occurence of these motifs in publicly available large dataset of repertoires from COVID-19 patients. We show that in both donors the majority of infection-reactive clonotypes acquire memory phenotypes. Certain T cell clones were detected in the memory fraction at the pre-infection timepoint, suggesting participation of pre-existing cross-reactive memory T cells in the immune response to SARS-CoV-2.

preprint2020arXiv

On generative models of T-cell receptor sequences

T-cell receptors (TCR) are key proteins of the adaptive immune system, generated randomly in each individual, whose diversity underlies our ability to recognize infections and malignancies. Modeling the distribution of TCR sequences is of key importance for immunology and medical applications. Here, we compare two inference methods trained on high-throughput sequencing data: a knowledge-guided approach, which accounts for the details of sequence generation, supplemented by a physics-inspired model of selection; and a knowledge-free Variational Auto-Encoder based on deep artificial neural networks. We show that the knowledge-guided model outperforms the deep network approach at predicting TCR probabilities, while being more interpretable, at a lower computational cost.

preprint2020arXiv

Population variability in the generation and thymic selection of T-cell repertoires

The diversity of T-cell receptor (TCR) repertoires is achieved by a combination of two intrinsically stochastic steps: random receptor generation by VDJ recombination, and selection based on the recognition of random self-peptides presented on the major histocompatibility complex. These processes lead to a large receptor variability within and between individuals. However, the characterization of the variability is hampered by the limited size of the sampled repertoires. We introduce a new software tool SONIA to facilitate inference of individual-specific computational models for the generation and selection of the TCR beta chain (TRB) from sequenced repertoires of 651 individuals, separating and quantifying the variability of the two processes of generation and selection in the population. We find not only that most of the variability is driven by the VDJ generation process, but there is a large degree of consistency between individuals with the inter-individual variance of repertoires being about 2% of the intra-individual variance. Known viral-specific TCRs follow the same generation and selection statistics as all TCRs.

preprint2016arXiv

Local equilibrium in bird flocks

The correlated motion of flocks is an instance of global order emerging from local interactions. An essential difference with analogous ferromagnetic systems is that flocks are active: animals move relative to each other, dynamically rearranging their interaction network. The effect of this off-equilibrium element is well studied theoretically, but its impact on actual biological groups deserves more experimental attention. Here, we introduce a novel dynamical inference technique, based on the principle of maximum entropy, which accodomates network rearrangements and overcomes the problem of slow experimental sampling rates. We use this method to infer the strength and range of alignment forces from data of starling flocks. We find that local bird alignment happens on a much faster timescale than neighbour rearrangement. Accordingly, equilibrium inference, which assumes a fixed interaction network, gives results consistent with dynamical inference. We conclude that bird orientations are in a state of local quasi-equilibrium over the interaction length scale, providing firm ground for the applicability of statistical physics in certain active systems.

preprint2016arXiv

Rényi entropy, abundance distribution and the equivalence of ensembles

Distributions of abundances or frequencies play an important role in many fields of science, from biology to sociology, as does the Rényi entropy, which measures the diversity of a statistical ensemble. We derive a mathematical relation between the abundance distribution and the Rényi entropy, by analogy with the equivalence of ensembles in thermodynamics. The abundance distribution is mapped onto the density of states, and the Rényi entropy to the free energy. The two quantities are related in the thermodynamic limit by a Legendre transform, by virtue of the equivalence between the micro-canonical and canonical ensembles. In this limit, we show how the Rényi entropy can be constructed geometrically from rank-frequency plots. This mapping predicts that non-concave regions of the rank-frequency curve should result in kinks in the Rényi entropy as a function of its order. We illustrate our results on simple examples, and emphasize the limitations of the equivalence of ensembles when a thermodynamic limit is not well defined. Our results help choose reliable diversity measures based on the experimental accuracy of the abundance distributions in particular frequency ranges.

preprint2015arXiv

Diversity of immune strategies explained by adaptation to pathogen statistics

Biological organisms have evolved a wide range of immune mechanisms to defend themselves against pathogens. Beyond molecular details, these mechanisms differ in how protection is acquired, processed and passed on to subsequent generations -- differences that may be essential to long-term survival. Here, we introduce a mathematical framework to compare the long-term adaptation of populations as a function of the pathogen dynamics that they experience and of the immune strategy that they adopt. We find that the two key determinants of an optimal immune strategy are the frequency and the characteristic timescale of the pathogens. Depending on these two parameters, our framework identifies distinct modes of immunity, including adaptive, innate, bet-hedging and CRISPR-like immunities, which recapitulate the diversity of natural immune systems.

preprint2015arXiv

Extending the dynamic range of transcription factor action by translational regulation

A crucial step in the regulation of gene expression is binding of transcription factor (TF) proteins to regulatory sites along the DNA. But transcription factors act at nanomolar concentrations, and noise due to random arrival of these molecules at their binding sites can severely limit the precision of regulation. Recent work on the optimization of information flow through regulatory networks indicates that the lower end of the dynamic range of concentrations is simply inaccessible, overwhelmed by the impact of this noise. Motivated by the behavior of homeodomain proteins, such as the maternal morphogen Bicoid in the fruit fly embryo, we suggest a scheme in which transcription factors also act as indirect translational regulators, binding to the mRNA of other transcription factors. Intuitively, each mRNA molecule acts as an independent sensor of the TF concentration, and averaging over these multiple sensors reduces the noise. We analyze information flow through this new scheme and identify conditions under which it outperforms direct transcriptional regulation. Our results suggest that the dual role of homeodomain proteins is not just a historical accident, but a solution to a crucial physics problem in the regulation of gene expression.

preprint2015arXiv

Flocking and turning: a new model for self-organized collective motion

Birds in a flock move in a correlated way, resulting in large polarization of velocities. A good understanding of this collective behavior exists for linear motion of the flock. Yet observing actual birds, the center of mass of the group often turns giving rise to more complicated dynamics, still keeping strong polarization of the flock. Here we propose novel dynamical equations for the collective motion of polarized animal groups that account for correlated turning including solely social forces. We exploit rotational symmetries and conservation laws of the problem to formulate a theory in terms of generalized coordinates of motion for the velocity directions akin to a Hamiltonian formulation for rotations. We explicitly derive the correspondence between this formulation and the dynamics of the individual velocities, thus obtaining a new model of collective motion. In the appropriate overdamped limit we recover the well-known Vicsek model, which dissipates rotational information and does not allow for polarized turns. Although the new model has its most vivid success in describing turning groups, its dynamics is intrinsically different from previous ones in a wide dynamical regime, while reducing to the hydrodynamic description of Toner and Tu at very large length-scales. The derived framework is therefore general and it may describe the collective motion of any strongly polarized active matter system.

preprint2015arXiv

Fluctuating fitness shapes the clone size distribution of immune repertoires

The adaptive immune system relies on the diversity of receptors expressed on the surface of B and T-cells to protect the organism from a vast amount of pathogenic threats. The proliferation and degradation dynamics of different cell types (B cells, T cells, naive, memory) is governed by a variety of antigenic and environmental signals, yet the observed clone sizes follow a universal power law distribution. Guided by this reproducibility we propose effective models of somatic evolution where cell fate depends on an effective fitness. This fitness is determined by growth factors acting either on clones of cells with the same receptor responding to specific antigens, or directly on single cells with no regards for clones. We identify fluctuations in the fitness acting specifically on clones as the essential ingredient leading to the observed distributions. Combining our models with experiments we characterize the scale of fluctuations in antigenic environments and we provide tools to identify the relevant growth signals in different tissues and organisms. Our results generalize to any evolving population in a fluctuating environment.

preprint2015arXiv

Habitat Fluctuations Drive Species Covariation in the Human Microbiota

Two species with similar resource requirements respond in a characteristic way to variations in their habitat -- their abundances rise and fall in concert. We use this idea to learn how bacterial populations in the microbiota respond to habitat conditions that vary from person-to-person across the human population. Our mathematical framework shows that habitat fluctuations are sufficient for explaining intra-bodysite correlations in relative species abundances from the Human Microbiome Project. We explicitly show that the relative abundances of phylogenetically related species are positively correlated and can be predicted from taxonomic relationships. We identify a small set of functional pathways related to metabolism and maintenance of the cell wall that form the basis of a common resource sharing niche space of the human microbiota.

preprint2015arXiv

Inferring processes underlying B-cell repertoire diversity

We quantify the VDJ recombination and somatic hypermutation processes in human B-cells using probabilistic inference methods on high-throughput DNA sequence repertoires of human B-cell receptor heavy chains. Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality. We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection). Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, due to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site.

preprint2015arXiv

Noise expands the response range of the Bacillus subtilis competence circuit

Gene regulatory circuits must contend with intrinsic noise that arises due to finite numbers of proteins. While some circuits act to reduce this noise, others appear to exploit it. A striking example is the competence circuit in Bacillus subtilis, which exhibits much larger noise in the duration of its competence events than a synthetically constructed analog that performs the same function. Here, using stochastic modeling and fluorescence microscopy, we show that this larger noise allows cells to exit terminal phenotypic states, which expands the range of stress levels to which cells are responsive and leads to phenotypic heterogeneity at the population level. This is an important example of how noise confers a functional benefit in a genetic decision-making circuit.

preprint2015arXiv

repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data

The diversity of the immune repertoire is initially generated by random rearrangements of the receptor gene during early T and B cell development. Rearrangement scenarios are composed of random events -- choices of gene templates, base pair deletions and insertions -- described by probability distributions. Not all scenarios are equally likely, and the same receptor sequence may be obtained in several different ways. Quantifying the distribution of these rearrangements is an essential baseline for studying the immune system diversity. Inferring the properties of the distributions from receptor sequences is a computationally hard problem, requiring enumerating every possible scenario for every sampled receptor sequence. We present a Hidden Markov model, which accounts for all plausible scenarios that can generate the receptor sequences. We developed and implemented a method based on the Baum-Welch algorithm that can efficiently infer the parameters for the different events of the rearrangement process. We tested our software tool on sequence data for both the alpha and beta chains of the T cell receptor. To test the validity of our algorithm, we also generated synthetic sequences produced by a known model, and confirmed that its parameters could be accurately inferred back from the sequences. The inferred model can be used to generate synthetic sequences, to calculate the probability of generation of any receptor sequence, as well as the theoretical diversity of the repertoire. We estimate this diversity to be $\approx 10^{23}$ for human T cells. The model gives a baseline to investigate the selection and dynamics of immune repertoires.

preprint2015arXiv

Tiling solutions for optimal biological sensing

Biological systems, from cells to organisms, must respond to the ever changing environment in order to survive and function. This is not a simple task given the often random nature of the signals they receive, as well as the intrinsically stochastic, many body and often self-organized nature of the processes that control their sensing and response and limited resources. Despite a wide range of scales and functions that can be observed in the living world, some common principles that govern the behavior of biological systems emerge. Here I review two examples of very different biological problems: information transmission in gene regulatory networks and diversity of adaptive immune receptor repertoires that protect us from pathogens. I discuss the trade-offs that physical laws impose on these systems and show that the optimal designs of both immune repertoires and gene regulatory networks display similar discrete tiling structures. These solutions rely on locally non-overlapping placements of the responding elements (genes and receptors) that, overall, cover space nearly uniformly.

preprint2015arXiv

Trade-offs in delayed information transmission in biochemical networks

In order to transmit biochemical signals, biological regulatory systems dissipate energy with concomitant entropy production. Additionally, signaling often takes place in challenging environmental conditions. In a simple model regulatory circuit given by an input and a delayed output, we explore the trade-offs between information transmission and the system's energetic efficiency. We determine the maximally informative network, given a fixed amount of entropy production and delayed response, exploring both the case with and without feedback. We find that feedback allows the circuit to overcome energy constraints and transmit close to the maximum available information even in the dissipationless limit. Negative feedback loops, characteristic of shock responses, are optimal at high dissipation. Close to equilibrium positive feedback loops, known for their stability, become more informative. Asking how the signaling network should be constructed to best function in the worst possible environment, rather than an optimally tuned one or in steady state, we discover that at large dissipation the same universal motif is optimal in all of these conditions.

preprint2014arXiv

Capturing coevolutionary signals in repeat proteins

The analysis of correlations of amino acid occurrences in globular proteins has led to the development of statistical tools that can identify native contacts -- portions of the chains that come to close distance in folded structural ensembles. Here we introduce a statistical coupling analysis for repeat proteins -- natural systems for which the identification of domains remains challenging. We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias reveals true co-evolutionary signals from which local native-contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families. The overall procedure can be used to reconstruct the interactions at long distances, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.

preprint2014arXiv

How a well-adapted immune system is organized

The repertoire of lymphocyte receptors in the adaptive immune system protects organisms from diverse pathogens. A well-adapted repertoire should be tuned to the pathogenic environment to reduce the cost of infections. We develop a general framework for predicting the optimal repertoire that minimizes the cost of infections contracted from a given distribution of pathogens. The theory predicts that the immune system will have more receptors for rare antigens than expected from the frequency of encounters; individuals exposed to the same infections will have sparse repertoires that are largely different, but nevertheless exploit cross-reactivity to provide the same coverage of antigens; and the optimal repertoires can be reached via the dynamics of competitive binding of antigens by receptors, and selective amplification of stimulated receptors. Our results follow from a tension between the statistics of pathogen detection, which favor a broader receptor distribution, and the effects of cross-reactivity, which tend to concentrate the optimal repertoire onto a few highly abundant clones. Our predictions can be tested in high throughput surveys of receptor and pathogen diversity.

preprint2014arXiv

Quantifying selection in immune receptor repertoires

The efficient recognition of pathogens by the adaptive immune system relies on the diversity of receptors displayed at the surface of immune cells. T-cell receptor diversity results from an initial random DNA editing process, called VDJ recombination, followed by functional selection of cells according to the interaction of their surface receptors with self and foreign antigenic peptides. To quantify the effect of selection on the highly variable elements of the receptor, we apply a probabilistic maximum likelihood approach to the analysis of high-throughput sequence data from the $β$-chain of human T-cell receptors. We quantify selection factors for V and J gene choice, and for the length and amino-acid composition of the variable region. Our approach is necessary to disentangle the effects of selection from biases inherent in the recombination process. Inferred selection factors differ little between donors, or between naive and memory repertoires. The number of sequences shared between donors is well-predicted by the model, indicating a purely stochastic origin of such "public" sequences. We find a significant correlation between biases induced by VDJ recombination and our inferred selection factors, together with a reduction of diversity during selection. Both effects suggest that natural selection acting on the recombination process has anticipated the selection pressures experienced during somatic evolution.

preprint2013arXiv

Dynamical maximum entropy approach to flocking

We derive a new method to infer from data the out-of-equilibrium alignment dynamics of collectively moving animal groups, by considering the maximum entropy distribution consistent with temporal and spatial correlations of flight direction. When bird neighborhoods evolve rapidly, this dynamical inference correctly learns the parameters of the model, while a static one relying only on the spatial correlations fails. When neighbors change slowly and detailed balance is satisfied, we recover the static procedure. We demonstrate the validity of the method on simulated data. The approach is applicable to other systems of active matter.

preprint2013arXiv

Interference limits resolution of selection pressures from linked neutral diversity

Pervasive natural selection can strongly influence observed patterns of genetic variation, but these effects remain poorly understood when multiple selected variants segregate in nearby regions of the genome. Classical population genetics fails to account for interference between linked mutations, which grows increasingly severe as the density of selected polymorphisms increases. Here, we describe a simple limit that emerges when interference is common, in which the fitness effects of individual mutations play a relatively minor role. Instead, molecular evolution is determined by the variance in fitness within the population, defined over an effectively asexual segment of the genome (a ``linkage block''). We exploit this insensitivity in a new ``coarse-grained'' coalescent framework, which approximates the effects of many weakly selected mutations with a smaller number of strongly selected mutations with the same variance in fitness. This approximation generates accurate and efficient predictions for the genetic diversity that cannot be summarized by a simple reduction in effective population size. However, these results suggest a fundamental limit on our ability to resolve individual selection pressures from contemporary sequence data alone, since a wide range of parameters yield nearly identical patterns of sequence variability.

preprint2013arXiv

The effect of phenotypic selection on stochastic gene expression

Genetically identical cells in the same population can take on phenotypically variable states, leading to differentiated responses to external signals, such as nutrients and drug-induced stress. Many models and experiments have focused on a description based on discrete phenotypic states. Here we consider the effects of selection acting on a single trait, which we explicitly link to the variable number of proteins expressed by a gene. Considering different regulatory models for the gene under selection, we calculate the steady-state distribution of expression levels and show how the population adapts its expression to enhance its fitness. We quantitatively relate the overall fitness of the population to the heritability of expression levels, and their diversity within the population. We show how selection can increase or decrease the variability in the population, alter the stability of bimodal states, and impact the switching rates between metastable attractors.

preprint2013arXiv

Time-dependent information transmission in a model regulatory circuit

Many biological regulatory systems process signals out of steady state and respond with a physiological delay. A simple model of regulation which respects these features shows how the ability of a delayed output to transmit information is limited: at short times by the timescale of the dynamic input, at long times by that of the dynamic output. We find that topologies of maximally informative networks correspond to commonly occurring biological circuits linked to stress response and that circuits functioning out of steady state may exploit absorbing states to transmit information optimally.

preprint2012arXiv

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations

Positive selection distorts the structure of genealogies and hence alters patterns of genetic variation within a population. Most analyses of these distortions focus on the signatures of hitchhiking due to hard or soft selective sweeps at a single genetic locus. However, in linked regions of rapidly adapting genomes, multiple beneficial mutations at different loci can segregate simultaneously within the population, an effect known as clonal interference. This leads to a subtle interplay between hitchhiking and interference effects, which leads to a unique signature of rapid adaptation on genetic variation both at the selected sites and at linked neutral loci. Here, we introduce an effective coalescent theory (a "fitness-class coalescent") that describes how positive selection at many perfectly linked sites alters the structure of genealogies. We use this theory to calculate several simple statistics describing genetic variation within a rapidly adapting population, and to implement efficient backwards-time coalescent simulations which can be used to predict how clonal interference alters the expected patterns of molecular evolution.

preprint2012arXiv

Statistical inference of the generation probability of T-cell receptors from sequence repertoires

Stochastic rearrangement of germline DNA by VDJ recombination is at the origin of immune system diversity. This process is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Since any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on non-productive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our distribution predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.

preprint2012arXiv

The Influence of Decoys on the Noise and Dynamics of Gene Expression

Many transcription factors bind to DNA with a remarkable lack of specificity, so that regulatory binding sites compete with an enormous number of non-regulatory 'decoy' sites. For an auto-regulated gene, we show decoy sites decrease noise in the number of unbound proteins to a Poisson limit that results from binding and unbinding. This noise buffering is optimized for a given protein concentration when decoys have a 1/2 probability of being occupied. Decoys linearly increase the time to approach steady state and exponentially increase the time to switch epigenetically between bistable states.

preprint2012arXiv

Transition path sampling algorithm for discrete many-body systems

We propose a new Monte Carlo method for efficiently sampling trajectories with fixed initial and final conditions in a system with discrete degrees of freedom. The method can be applied to any stochastic process with local interactions, including systems that are out of equilibrium. We combine the proposed path-sampling algorithm with thermodynamic integration to calculate transition rates. We demonstrate our method on the well studied 2D Ising model with periodic boundary conditions, and show agreement with other results both for large and small system sizes. The method scales well with the system size, allowing one to simulate systems with many degrees of freedom, and providing complementary information with respect to other algorithms.

preprint2011arXiv

The structure of allelic diversity in the presence of purifying selection

In the absence of selection, the structure of allelic diversity is described by the elegant sampling formula of Ewens. This formula has helped shape our expectations of empirical patterns of molecular variation. Along with coalescent theory, it provides statistical techniques for rejecting the null model of neutrality. However, we still do not fully understand the statistics of the allelic diversity we expect to see in the presence of natural selection. Earlier work has described the effects of strongly deleterious mutations linked to many neutral sites, and allelic variation in models where offspring fitness is unrelated to parental fitness, but it has proven difficult to understand allelic diversity in the presence of purifying selection at many linked sites. Here, we study the population genetics of infinitely many perfectly linked sites, some neutral and some deleterious. Our approach is based on studying the lineage structure within each class of individuals of similar fitness in the deleterious mutation-selection balance. Analogous to the Ewens sampling formula, we derive expressions for the likelihoods of any configuration of allelic types in a sample. We find that for moderate and weak selection pressures the patterns of allelic diversity cannot be described by a neutral model for any choice of the effective population size, indicating that there is power to detect selection from patterns of sampled allelic diversity.

preprint2011arXiv

The Structure of Genealogies in the Presence of Purifying Selection: A "Fitness-Class Coalescent"

Compared to a neutral model, purifying selection distorts the structure of genealogies and hence alters the patterns of sampled genetic variation. Although these distortions may be common in nature, our understanding of how we expect purifying selection to affect patterns of molecular variation remains incomplete. Genealogical approaches such as coalescent theory have proven difficult to generalize to situations involving selection at many linked sites, unless selection pressures are extremely strong. Here, we introduce an effective coalescent theory (a "fitness-class coalescent") to describe the structure of genealogies in the presence of purifying selection at many linked sites. We use this effective theory to calculate several simple statistics describing the expected patterns of variation in sequence data, both at the sites under selection and at linked neutral sites. Our analysis combines our earlier description of the allele frequency spectrum in the presence of purifying selection (Desai et al. 2010) with the structured coalescent approach of Nordborg (1997), to trace the ancestry of individuals through the distribution of fitnesses within the population. Alternatively, we can derive our results using an extension of the coalescent approach of Hudson and Kaplan (1994). We find that purifying selection leads to patterns of genetic variation that are related but not identical to a neutrally evolving population in which population size has varied in a specific way in the past.

preprint2010arXiv

Analytic methods for modeling stochastic regulatory networks

The past decade has seen a revived interest in the unavoidable or intrinsic noise in biochemical and genetic networks arising from the finite copy number of the participating species. That is, rather than modeling regulatory networks in terms of the deterministic dynamics of concentrations, we model the dynamics of the probability of a given copy number of the reactants in single cells. Most of the modeling activity of the last decade has centered on stochastic simulation of individual realizations, i.e., Monte-Carlo methods for generating stochastic time series. Here we review the mathematical description in terms of probability distributions, introducing the relevant derivations and illustrating several cases for which analytic progress can be made either instead of or before turning to numerical computation.

preprint2010arXiv

Telling time with an intrinsically noisy clock

Intracellular transmission of information via chemical and transcriptional networks is thwarted by a physical limitation: the finite copy number of the constituent chemical species introduces unavoidable intrinsic noise. Here we provide a method for solving for the complete probabilistic description of intrinsically noisy oscillatory driving. We derive and numerically verify a number of simple scaling laws. Unlike in the case of measuring a static quantity, response to an oscillatory driving can exhibit a resonant frequency which maximizes information transmission. Further, we show that the optimal regulatory design is dependent on the biophysical constraints (i.e., the allowed copy number and response time). The resulting phase diagram illustrates under what conditions threshold regulation outperforms linear regulation.

preprint2009arXiv

Optimizing information flow in small genetic networks. I

In order to survive, reproduce and (in multicellular organisms) differentiate, cells must control the concentrations of the myriad different proteins that are encoded in the genome. The precision of this control is limited by the inevitable randomness of individual molecular events. Here we explore how cells can maximize their control power in the presence of these physical limits; formally, we solve the theoretical problem of maximizing the information transferred from inputs to outputs when the number of available molecules is held fixed. We start with the simplest version of the problem, in which a single transcription factor protein controls the readout of one or more genes by binding to DNA. We further simplify by assuming that this regulatory network operates in steady state, that the noise is small relative to the available dynamic range, and that the target genes do not interact. Even in this simple limit, we find a surprisingly rich set of optimal solutions. Importantly, for each locally optimal regulatory network, all parameters are determined once the physical constraints on the number of available molecules are specified. Although we are solving an over--simplified version of the problem facing real cells, we see parallels between the structure of these optimal solutions and the behavior of actual genetic regulatory networks. Subsequent papers will discuss more complete versions of the problem.

preprint2009arXiv

Optimizing information flow in small genetic networks. II: Feed forward interactions

Central to the functioning of a living cell is its ability to control the readout or expression of information encoded in the genome. In many cases, a single transcription factor protein activates or represses the expression of many genes. As the concentration of the transcription factor varies, the target genes thus undergo correlated changes, and this redundancy limits the ability of the cell to transmit information about input signals. We explore how interactions among the target genes can reduce this redundancy and optimize information transmission. Our discussion builds on recent work [Tkacik et al, Phys Rev E 80, 031920 (2009)], and there are connections to much earlier work on the role of lateral inhibition in enhancing the efficiency of information transmission in neural circuits; for simplicity we consider here the case where the interactions have a feed forward structure, with no loops. Even with this limitation, the networks that optimize information transmission have a structure reminiscent of the networks found in real biological systems.

preprint2008arXiv

Gene-gene cooperativity in small networks

We show how to construct a reduced description of interacting genes in noisy, small regulatory networks using coupled binary "spin" variables. Treating both the protein number and gene expression state variables stochastically and on equal footing we propose a mapping which connects the molecular level description of networks to the binary representation. We construct a phase diagram indicating when genes can be considered to be independent and when the coupling between them cannot be neglected leading to synchrony or correlations. We find that an appropriately mapped boolean description reproduces the probabilities of gene expression states of the full stochastic system very well and can be transfered to examples of self-regulatory systems with a larger number of gene copies.

preprint2005arXiv

Absolute Rate Theories of Epigenetic Stability

Spontaneous switching events in most characterized genetic switches are rare, resulting in extremely stable epigenetic properties. We show how simple arguments lead to theories of the rate of such events much like the absolute rate theory of chemical reactions corrected by a transmission factor. Both the probability of the rare cellular states that allow epigenetic escape, and the transmission factor, depend on the rates of DNA binding and unbinding events and on the rates of protein synthesis and degradation. Different mechanisms of escape from the stable attractors occur in the nonadiabatic, weakly adiabatic and strictly adiabatic regimes, characterized by the relative values of those input rates.

Aleksandra M. Walczak

What is connected

Connect this record

See the researcher in context

Building this map preview

41 published item(s)

MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories

NoisET: Noise learning and Expansion detection of T-cell receptors

Affinity maturation for an optimal balance between long-term immune coverage and short-term resource constraints

Antigenic waves of virus-immune co-evolution

Building general Langevin models from discrete data sets

Immune Fingerprinting through Repertoire Similarity

Learning the heterogeneous hypermutation landscape of immunoglobulins from high-throughput repertoire data

Longitudinal high-throughput TCR repertoire profiling reveals the dynamics of T cell memory formation after mild COVID-19 infection

On generative models of T-cell receptor sequences

Population variability in the generation and thymic selection of T-cell repertoires

Local equilibrium in bird flocks

Rényi entropy, abundance distribution and the equivalence of ensembles

Diversity of immune strategies explained by adaptation to pathogen statistics

Extending the dynamic range of transcription factor action by translational regulation

Flocking and turning: a new model for self-organized collective motion

Fluctuating fitness shapes the clone size distribution of immune repertoires

Habitat Fluctuations Drive Species Covariation in the Human Microbiota

Inferring processes underlying B-cell repertoire diversity

Noise expands the response range of the Bacillus subtilis competence circuit

repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data

Tiling solutions for optimal biological sensing

Trade-offs in delayed information transmission in biochemical networks

Capturing coevolutionary signals in repeat proteins

How a well-adapted immune system is organized

Quantifying selection in immune receptor repertoires

Dynamical maximum entropy approach to flocking

Interference limits resolution of selection pressures from linked neutral diversity

The effect of phenotypic selection on stochastic gene expression

Time-dependent information transmission in a model regulatory circuit

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations

Statistical inference of the generation probability of T-cell receptors from sequence repertoires

The Influence of Decoys on the Noise and Dynamics of Gene Expression

Transition path sampling algorithm for discrete many-body systems

The structure of allelic diversity in the presence of purifying selection

The Structure of Genealogies in the Presence of Purifying Selection: A "Fitness-Class Coalescent"

Analytic methods for modeling stochastic regulatory networks

Telling time with an intrinsically noisy clock

Optimizing information flow in small genetic networks. I

Optimizing information flow in small genetic networks. II: Feed forward interactions

Gene-gene cooperativity in small networks

Absolute Rate Theories of Epigenetic Stability