Researcher profile

Yoram Louzoun

Yoram Louzoun contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

BFS based distributed algorithm for parallel local directed sub-graph enumeration

Estimating the frequency of sub-graphs is of importance for many tasks, including sub-graph isomorphism, kernel-based anomaly detection, and network structure analysis. While multiple algorithms were proposed for full enumeration or sampling-based estimates, these methods fail in very large graphs. Recent advances in parallelization allow for estimates of total sub-graphs counts in very large graphs. The task of counting the frequency of each sub-graph associated with each vertex also received excellent solutions for undirected graphs. However, there is currently no good solution for very large directed graphs. We here propose VDMC (Vertex specific Distributed Motif Counting) -- a fully distributed algorithm to optimally count all the 3 and 4 vertices connected directed graphs (sub-graph motifs) associated with each vertex of a graph. VDMC counts each motif only once and its efficacy is linear in the number of counted motifs. It is fully parallelized to be efficient in GPU-based computation. VDMC is based on three main elements: 1) Ordering the vertices and only counting motifs containing increasing order vertices, 2) sub-ordering motifs based on the average length of the BFS composing the motif, and 3) removing isomorphisms only once for the entire graph. We here compare VDMC to analytical estimates of the expected number of motifs and show its accuracy. VDMC is available as a highly efficient CPU and GPU code with a novel data structure for efficient graph manipulation. We show the efficacy of VDMC and real-world graphs. VDMC allows for the precise analysis of sub-graph frequency around each vertex in large graphs and opens the way for the extension of methods until now limited to graphs of thousands of edges to graphs with millions of edges and above. GIT: https://github.com/louzounlab/graph-measures

preprint2022arXiv

Family based HLA imputation and optimization of haplo-identical transplants

Recently, haplo-identical transplantation with multiple HLA mismatches has become a viable option for system cell transplants. Haplotype sharing detection requires imputation of donor and recipient. We show that even in high-resolution typing when all alleles are known, there is a 15% error rate in haplotype phasing, and even more in low resolution typings. Similarly, in related donors, parents haplotypes should be imputed to determine what haplotype each child inherited. We propose GRAMM (GRaph bAsed FaMilly iMputation) to phase alleles in family pedigree HLA typing data, and in mother-cord blood unit pairs. We show that GRAMM has practically no phasing errors when pedigree data are available. We apply GRAMM to simulations with different typing resolutions as well as paired cord-mother typings, and show very high phasing accuracy, and improved alleles imputation accuracy. We use GRAMM to detect recombination events and show that the rate of falsely detected recombination events (False Positive Rate) in simulations is very low. We then apply recombination detection to typed families to estimate the recombination rate in Israeli and Australian population datasets. The estimated recombination rate has an upper bound of 10-20% per family (1-4% per individual). GRAMM is available at: https://gramm.math.biu.ac.il/.

preprint2022arXiv

High fraction of silent recombination in a finite population two-locus neutral birth-death-mutation model

A precise estimate of allele and haplotype polymorphism is of great interest in theoretical population genetics, but also has practical applications, such as bone marrow registries management. Allele polymorphism is driven mainly by point mutations, while haplotype polymorphism is also affected by recombination. Current estimates treat recombination as mutations in an infinite site model. We here show that even in the simple case of two loci in a haploid individual, for a finite population, most recombination events produce existing haplotypes, and as such are silent. Silent recombination considerably reduces the total number of haplotypes expected from the infinite site model for populations that are not much larger than one over the mutation rate. Moreover, in contrast with mutations, the number of haplotypes does not grow linearly with the population size. We hence propose a more accurate estimate of the total number of haplotypes that takes into account silent recombination. We study large-scale Human Leukocyte Antigen (HLA) haplotype frequencies from human populations to show that the current estimated recombination rate in the HLA region is underestimated.

preprint2022arXiv

IRMA: Iterative Repair for graph MAtching

The alignment of two similar graphs from different domains is a well-studied problem. In many practical usages, there is no reliable information or labels over the vertices or edges, leaving structural similarity as the only information available to match such a graph. In such cases, one often assumes a small amount of already aligned vertices -- called a seed. Current state-of-the-art scalable seeded alignment algorithms are based on percolation. Namely, aligned vertices are used to align their neighbors and gradually percolate in parallel in both graphs. However, percolation-based graph alignment algorithms are still limited in scale-free degree distributions. We here propose `IRMA' -- Iterative Repair for graph MAtching to show that the accuracy of percolation-based algorithms can be improved in real-world graphs with a limited additional computational cost, and with lower run time when used in a parallel version. IRMA starts by creating a primary alignment using an existing percolation algorithm, then it iteratively repairs the mistakes in the previous alignment steps. We prove that IRMA improves on single-iteration algorithms. We then numerically show that it is significantly better than all state-of-the-art seeded graph alignment algorithms on the graphs that they tested. In scale-free networks, many vertices have a very low degree. Such vertices have a high probability of erroneous alignments. We show that combining iterations with high recall but low precision in the alignment leads in the long run to higher recall and precision for the entire alignment.

preprint2020arXiv

Estimating differential entropy using recursive copula splitting

A method for estimating the Shannon differential entropy of multidimensional random variables using independent samples is described. The method is based on decomposing the distribution into a product of the marginal distributions and the joint dependency, also known as the copula. The entropy of marginals is estimated using one-dimensional methods. The entropy of the copula, which always has a compact support, is estimated recursively by splitting the data along statistically dependent dimensions. Numerical examples demonstrate that the method is accurate for distributions with compact and non-compact supports, which is imperative when the support is not known or of mixed type (in different dimensions). At high dimensions (larger than 20), our method is not only more accurate, but also significantly more efficient than existing approaches.

preprint2020arXiv

Explicit Gradient Learning

Black-Box Optimization (BBO) methods can find optimal policies for systems that interact with complex environments with no analytical representation. As such, they are of interest in many Artificial Intelligence (AI) domains. Yet classical BBO methods fall short in high-dimensional non-convex problems. They are thus often overlooked in real-world AI tasks. Here we present a BBO method, termed Explicit Gradient Learning (EGL), that is designed to optimize high-dimensional ill-behaved functions. We derive EGL by finding weak-spots in methods that fit the objective function with a parametric Neural Network (NN) model and obtain the gradient signal by calculating the parametric gradient. Instead of fitting the function, EGL trains a NN to estimate the objective gradient directly. We prove the convergence of EGL in convex optimization and its robustness in the optimization of integrable functions. We evaluate EGL and achieve state-of-the-art results in two challenging problems: (1) the COCO test suite against an assortment of standard BBO methods; and (2) in a high-dimensional non-convex image generation task.

preprint2020arXiv

Invasion rate versus diversity in population dynamics with catastrophes

A key question in the current diversity crisis is how diversity has been maintained throughout evolution and how to preserve it. Modern coexistence theories suggest that a high invasion rate of rare new types is directly related to diversity. We show that adding almost any mechanism of catastrophes to a stochastic birth, death, and mutation process with limited carrying capacity induces a novel phase transition characterized by a positive invasion rate but a low diversity. In this phase, new types emerge and grow rapidly, but the resulting growth of very large types decreases diversity. This model also resolves two major drawbacks of neutral evolution models: their failure to explain balancing selection without resorting to fitness differences and the unrealistic time required for the creation of the observed large types. We test this model on a classical case of genetic polymorphism: the HLA locus.