Source author record

Dominik Moritz

Dominik Moritz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Human-Computer Interaction Machine Learning Artificial Intelligence Databases cs.CY Social and Information Networks Software Engineering

Catalog footprint

What is connected

9works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Understanding Annotator Safety Policy with Interpretability

Safety policies define what constitutes safe and unsafe AI outputs, guiding data annotation and model development. However, annotation disagreement is pervasive and can stem from multiple sources such as operational failures (annotators misunderstand or misexecute the task), policy ambiguity (policy wording leaves room for interpretation), or value pluralism (different annotators hold different perspectives on safety). Distinguishing these sources matters. For example, operational failures call for quality control, ambiguity calls for policy clarification, and pluralism calls for deliberation about incorporating diverse perspectives. Yet understanding why annotators disagree is difficult. Directly asking annotators for their reasoning is costly, substantially increasing annotation burden, and can be unreliable for both human and LLM annotators as self-reported reasoning often fails to reflect actual decision processes. We introduce Annotator Policy Models (APMs), interpretable models that learn annotators' internal safety policies from labeling behavior alone, making annotator reasoning visible and comparable without additional annotation effort. We validate that APMs accurately model annotator safety policy (>80% accuracy), faithfully predict responses to counterfactual edits, and recover known policy differences in controlled settings. Applying APMs to LLM and human annotations, we demonstrate two core applications: (1) surfacing policy ambiguity by revealing how annotators interpret safety instructions differently, and (2) surfacing value pluralism by uncovering systematic differences in safety priorities across demographic groups. Together, these capabilities support more targeted, transparent, and inclusive safety policy design.

preprint2024arXiv

Optimizing Dataflow Systems for Scalable Interactive Visualization

Supporting the interactive exploration of large datasets is a popular and challenging use case for data management systems. Traditionally, the interface and the back-end system are built and optimized separately, and interface design and system optimization require different skill sets that are difficult for one person to master. To enable analysts to focus on visualization design, we contribute VegaPlus, a system that automatically optimizes interactive dashboards to support large datasets. To achieve this, VegaPlus leverages two core ideas. First, we introduce an optimizer that can reason about execution plans in Vega, a back-end DBMS, or a mix of both environments. The optimizer also considers how user interactions may alter execution plan performance, and can partially or fully rewrite the plans when needed. Through a series of benchmark experiments on seven different dashboard designs, our results show that VegaPlus provides superior performance and versatility compared to standard dashboard optimization techniques.

preprint2022arXiv

ComputableViz: Mathematical Operators as a Formalism for Visualization Processing and Analysis

Data visualizations are created and shared on the web at an unprecedented speed, raising new needs and questions for processing and analyzing visualizations after they have been generated and digitized. However, existing formalisms focus on operating on a single visualization instead of multiple visualizations, making it challenging to perform analysis tasks such as sorting and clustering visualizations. Through a systematic analysis of previous work, we abstract visualization-related tasks into mathematical operators such as union and propose a design space of visualization operations. We realize the design by developing ComputableViz, a library that supports operations on multiple visualization specifications. To demonstrate its usefulness and extensibility, we present multiple usage scenarios concerning processing and analyzing visualization, such as generating visualization embeddings and automatically making visualizations accessible. We conclude by discussing research opportunities and challenges for managing and exploiting the massive visualizations on the web.

preprint2022arXiv

Demonstration of VegaPlus: Optimizing Declarative Visualization Languages

While many visualization specification languages are user-friendly, they tend to have one critical drawback: they are designed for small data on the client-side and, as a result, perform poorly at scale. We propose a system that takes declarative visualization specifications as input and automatically optimizes the resulting visualization execution plans by offloading computational-intensive operations to a separate database management system (DBMS). Our demo emphasizes live programming of visualizations over big data, enabling users to write or import Vega specifications, view the optimized plans from our system, and even modify these plans and compare their performance via a dedicated performance dashboard.

preprint2022arXiv

Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

The confusion matrix, a ubiquitous visualization for helping people evaluate machine learning models, is a tabular layout that compares predicted class labels against actual class labels over all data instances. We conduct formative research with machine learning practitioners at Apple and find that conventional confusion matrices do not support more complex data-structures found in modern-day applications, such as hierarchical and multi-output labels. To express such variations of confusion matrices, we design an algebra that models confusion matrices as probability distributions. Based on this algebra, we develop Neo, a visual analytics system that enables practitioners to flexibly author and interact with hierarchical and multi-output confusion matrices, visualize derived metrics, renormalize confusions, and share matrix specifications. Finally, we demonstrate Neo's utility with three model evaluation scenarios that help people better understand model performance and reveal hidden confusions.

preprint2022arXiv

Network Report: A Structured Description for Network Datasets

The rapid development of network science and technologies depends on shareable datasets. Currently, there is no standard practice for reporting and sharing network datasets. Some network dataset providers only share links, while others provide some contexts or basic statistics. As a result, critical information may be unintentionally dropped, and network dataset consumers may misunderstand or overlook critical aspects. Inappropriately using a network dataset can lead to severe consequences (e.g., discrimination) especially when machine learning models on networks are deployed in high-stake domains. Challenges arise as networks are often used across different domains (e.g., network science, physics, etc) and have complex structures. To facilitate the communication between network dataset providers and consumers, we propose network report. A network report is a structured description that summarizes and contextualizes a network dataset. Network report extends the idea of dataset reports (e.g., Datasheets for Datasets) from prior work with network-specific descriptions of the non-i.i.d. nature, demographic information, network characteristics, etc. We hope network reports encourage transparency and accountability in network research and development across different fields.

preprint2022arXiv

Symphony: Composing Interactive Interfaces for Machine Learning

Interfaces for machine learning (ML), information and visualizations about models or data, can help practitioners build robust and responsible ML systems. Despite their benefits, recent studies of ML teams and our interviews with practitioners (n=9) showed that ML interfaces have limited adoption in practice. While existing ML interfaces are effective for specific tasks, they are not designed to be reused, explored, and shared by multiple stakeholders in cross-functional teams. To enable analysis and communication between different ML practitioners, we designed and implemented Symphony, a framework for composing interactive ML interfaces with task-specific, data-driven components that can be used across platforms such as computational notebooks and web dashboards. We developed Symphony through participatory design sessions with 10 teams (n=31), and discuss our findings from deploying Symphony to 3 production ML projects at Apple. Symphony helped ML practitioners discover previously unknown issues like data duplicates and blind spots in models while enabling them to share insights with other stakeholders.

preprint2022arXiv

VegaFusion: Automatic Server-Side Scaling for Interactive Vega Visualizations

The Vega grammar has been broadly adopted by a growing ecosystem of browser-based visualization tools. However, the reference Vega renderer does not scale well to large datasets (e.g., millions of rows or hundreds of megabytes) because it requires the entire dataset to be loaded into browser memory. We introduce VegaFusion, which brings automatic server-side scaling to the Vega ecosystem. VegaFusion accepts generic Vega specifications and partitions the required computation between the client and an out-of-browser, natively-compiled server-side process. Large datasets can be processed server-side to avoid loading them into the browser and to take advantage of multi-threading, more powerful server hardware and caching. We demonstrate how VegaFusion can be integrated into the existing Vega ecosystem, and show that VegaFusion greatly outperforms the reference implementation. We demonstrate these benefits with VegaFusion running on the same machine as the client as well as on a remote machine.

preprint2020arXiv

mage: Fluid Moves Between Code and Graphical Work in Computational Notebooks

We aim to increase the flexibility at which a data worker can choose the right tool for the job, regardless of whether the tool is a code library or an interactive graphical user interface (GUI). To achieve this flexibility, we extend computational notebooks with a new API mage, which supports tools that can represent themselves as both code and GUI as needed. We discuss the design of mage as well as design opportunities in the space of flexible code/GUI tools for data work. To understand tooling needs, we conduct a study with nine professional practitioners and elicit their feedback on mage and potential areas for flexible code/GUI tooling. We then implement six client tools for mage that illustrate the main themes of our study findings. Finally, we discuss open challenges in providing flexible code/GUI interactions for data workers.

Dominik Moritz

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Understanding Annotator Safety Policy with Interpretability

Optimizing Dataflow Systems for Scalable Interactive Visualization

ComputableViz: Mathematical Operators as a Formalism for Visualization Processing and Analysis

Demonstration of VegaPlus: Optimizing Declarative Visualization Languages

Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

Network Report: A Structured Description for Network Datasets

Symphony: Composing Interactive Interfaces for Machine Learning

VegaFusion: Automatic Server-Side Scaling for Interactive Vega Visualizations

mage: Fluid Moves Between Code and Graphical Work in Computational Notebooks