Source author record

Serafim Batzoglou

Serafim Batzoglou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computational Engineering, Finance, and Science Genomics Logic in Computer Science Machine Learning math.HO math.LO

Catalog footprint

What is connected

3works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ReplaySCM: A Benchmark for Executable Causal Mechanism Induction from Interventions

Most causal benchmarks for language models score local answers or graph structure. We introduce ReplaySCM, a 1,300 item benchmark for executable causal mechanism induction from finite interventional evidence. Each item contains binary worlds generated by a latent fully observed acyclic Boolean structural causal model (SCM). A system must output a mechanism map in a restricted Boolean DSL; the submission is parsed, checked for legality and acyclicity, and replayed on training and held-out intervention worlds. Scoring uses replay behavior rather than formula strings, so syntactically different mechanisms receive credit when they behave correctly. ReplaySCM varies the structural information disclosed to the model through Ordered, Block-order, Hidden-order, and Hidden-roots settings, and includes Alternative-SCM tasks that supply a valid reference SCM and ask for a semantically distinct alternative that fits the training worlds, together with a separating intervention and witness. Frontier LLMs infer parts of the functional-parent structure, but held-out replay drops sharply when order or root structure is hidden. We also evaluate a matched support-audit ladder: Original, Extra Worlds, and Counterexample Audit (CEx), that raises mean local predecessor-pattern coverage from 0.8949 to 0.9815 to 1.0; under the audited searches, no discovered semantic alternative remains consistent with the training worlds. The Ordered/Hidden-order gap persists under this stronger evidence. ReplaySCM complements answer-level causal reasoning and graph-discovery benchmarks by evaluating executable replay generalization from finite interventional evidence, without claiming unique identification of the latent SCM.

preprint2022arXiv

Independence of the Continuum Hypothesis: an Intuitive Introduction

The independence of the continuum hypothesis is a result of broad impact: it settles a basic question regarding the nature of N and R, two of the most familiar mathematical structures; it introduces the method of forcing that has become the main workhorse of set theory; and it has broad implications on mathematical foundations and on the role of syntax versus semantics. Despite its broad impact, it is not broadly taught. A main reason is the lack of accessible expositions for nonspecialists, because the mathematical structures and techniques employed in the proof are unfamiliar outside of set theory. This manuscript aims to take a step in addressing this gap by providing an exposition at a level accessible to advanced undergraduate mathematicians and theoretical computer scientists, while covering all the technically challenging parts of the proof.

preprint2014arXiv

Fast and Scalable Inference of Multi-Sample Cancer Lineages

Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of SSNVs obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open-sourced and available at http://viq854.github.io/lichee.