Researcher profile

Emery D. Berger

Emery D. Berger contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2020arXiv

Scalene: Scripting-Language Aware Profiling for Python

Existing profilers for scripting languages (a.k.a. "glue" languages) like Python suffer from numerous problems that drastically limit their usefulness. They impose order-of-magnitude overheads, report information at too coarse a granularity, or fail in the face of threads. Worse, past profilers---essentially variants of their counterparts for C---are oblivious to the fact that optimizing code in scripting languages requires information about code spanning the divide between the scripting language and libraries written in compiled languages. This paper introduces scripting-language aware profiling, and presents Scalene, an implementation of scripting-language aware profiling for Python. Scalene employs a combination of sampling, inference, and disassembly of byte-codes to efficiently and precisely attribute execution time and memory usage to either Python, which developers can optimize, or library code, which they cannot. It includes a novel sampling memory allocator that reports line-level memory consumption and trends with low overhead, helping developers reduce footprints and identify leaks. Finally, it introduces a new metric, copy volume, to help developers root out insidious copying costs across the Python/library boundary, which can drastically degrade performance. Scalene works for single or multi-threaded Python code, is precise, reporting detailed information at the line granularity, while imposing modest overheads (26%--53%).

preprint2019arXiv

ExceLint: Automatically Finding Spreadsheet Formula Errors

Spreadsheets are one of the most widely used programming environments, and are widely deployed in domains like finance where errors can have catastrophic consequences. We present a static analysis specifically designed to find spreadsheet formula errors. Our analysis directly leverages the rectangular character of spreadsheets. It uses an information-theoretic approach to identify formulas that are especially surprising disruptions to nearby rectangular regions. We present ExceLint, an implementation of our static analysis for Microsoft Excel. We demonstrate that ExceLint is fast and effective: across a corpus of 70 spreadsheets, ExceLint takes a median of 5 seconds per spreadsheet, and it significantly outperforms the state of the art analysis.

preprint2019arXiv

Mesh: Compacting Memory Management for C/C++ Applications

Programs written in C/C++ can suffer from serious memory fragmentation, leading to low utilization of memory, degraded performance, and application failure due to memory exhaustion. This paper introduces Mesh, a plug-in replacement for malloc that, for the first time, eliminates fragmentation in unmodified C/C++ applications. Mesh combines novel randomized algorithms with widely-supported virtual memory operations to provably reduce fragmentation, breaking the classical Robson bounds with high probability. Mesh generally matches the runtime performance of state-of-the-art memory allocators while reducing memory consumption; in particular, it reduces the memory of consumption of Firefox by 16% and Redis by 39%.

preprint2019arXiv

On the Impact of Programming Languages on Code Quality

This paper is a reproduction of work by Ray et al. which claimed to have uncovered a statistically significant association between eleven programming languages and software defects in projects hosted on GitHub. First we conduct an experimental repetition, repetition is only partially successful, but it does validate one of the key claims of the original work about the association of ten programming languages with defects. Next, we conduct a complete, independent reanalysis of the data and statistical modeling steps of the original study. We uncover a number of flaws that undermine the conclusions of the original study as only four languages are found to have a statistically significant association with defects, and even for those the effect size is exceedingly small. We conclude with some additional sources of bias that should be investigated in follow up work and a few best practice recommendations for similar efforts.

preprint2019arXiv

Tea: A High-level Language and Runtime System for Automating Statistical Analysis

Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests, and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests.