Graph explorer

Judge Circuits

LLM-as-a-judge has become the dominant paradigm for grading model outputs at scale, yet the same model assigns systematically different scores when its output format changes (e.g., a 1-5 rating vs. a True/False label). Existing diagnoses of these format-induced inconsistencies stop at the input-output level. Using Position-aware Edge Attribution Patching (PEAP), we causally investigate the internal mechanism in Gemma-3, Qwen2.5, and Llama-3. We find that judgments across structured understanding and open-ended preference tasks share a sparse, generalized Latent Evaluator sub-graph in the mid-to-late multi-layer perceptrons (MLPs); zero-ablating it collapses judgment while preserving world knowledge in architecturally modular models. By structurally decoupling abstract judging from output formatting, we provide a mechanistic account of format-induced inconsistency on the open-weight models we study: a continuous judgment signal computed in the shared trunk is mapped through fragile, format-specific terminal branches, enabling format-independent preference to be isolated downstream of the requested output format. Our findings imply that benchmark-level reliability comparisons across formats are partially measuring formatter geometry rather than evaluation quality.

15 nodes80 linksoverview previewJudge Circuits
15 nodes80 links
Judge Circuits15 visible / 15 total nodes / 80 links
Related contextCo-authorshipCo-authorshipCo-authorshipCo-authorshipWorks onWorks onCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipWorks onWorks onWorks onWorks onWorks onAuthorshipWorks onWorks onWorks onWorks onWorks onWJudge Circuitspreprint / 2026ANils FeldhusResearcherATanja BaeumelResearcherAElena GolimblevskaiaResearcherAQianli WangResearcherTMachine Learning49008 worksTComputation and Language14115 worksAVan Bach NguyenResearcherAAaron Louis EidtResearcherAChristopher EbertResearcherAWojciech SamekResearcherAJing YangResearcherASebastian MöllerResearcherASimon OstermannResearcherAVera SchmittResearcher
PaperSignal 101 links

Judge Circuits

preprint / 2026

Open