Source author record

Leilani Battle

Leilani Battle appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Human-Computer Interaction Databases Data Structures and Algorithms eess.SY Machine Learning Systems and Control

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Optimizing Dataflow Systems for Scalable Interactive Visualization

Supporting the interactive exploration of large datasets is a popular and challenging use case for data management systems. Traditionally, the interface and the back-end system are built and optimized separately, and interface design and system optimization require different skill sets that are difficult for one person to master. To enable analysts to focus on visualization design, we contribute VegaPlus, a system that automatically optimizes interactive dashboards to support large datasets. To achieve this, VegaPlus leverages two core ideas. First, we introduce an optimizer that can reason about execution plans in Vega, a back-end DBMS, or a mix of both environments. The optimizer also considers how user interactions may alter execution plan performance, and can partially or fully rewrite the plans when needed. Through a series of benchmark experiments on seven different dashboard designs, our results show that VegaPlus provides superior performance and versatility compared to standard dashboard optimization techniques.

preprint2022arXiv

A Grammar-Based Approach for Applying Visualization Taxonomies to Interaction Logs

Researchers collect large amounts of user interaction data with the goal of mapping user's workflows and behaviors to their higher-level motivations, intuitions, and goals. Although the visual analytics community has proposed numerous taxonomies to facilitate this mapping process, no formal methods exist for systematically applying these existing theories to user interaction logs. This paper seeks to bridge the gap between visualization task taxonomies and interaction log data by making the taxonomies more actionable for interaction log analysis. To achieve this, we leverage structural parallels between how people express themselves through interactions and language by reformulating existing theories as regular grammars. We represent interactions as terminals within a regular grammar, similar to the role of individual words in a language, and patterns of interactions or non-terminals as regular expressions over these terminals to capture common language patterns. To demonstrate our approach, we generate regular grammars for seven visualization taxonomies and develop code to apply them to three interaction log datasets. In analyzing our results, we find that existing taxonomies at the low-level (i.e., terminals) show mixed results in expressing multiple interaction log datasets, and taxonomies at the high-level (i.e., regular expressions) have limited expressiveness, due to primarily two challenges: inconsistencies in interaction log dataset granularity and structure, and under-expressiveness of certain terminals. Based on our findings, we suggest new research directions for the visualization community for augmenting existing taxonomies, developing new ones, and building better interaction log recording processes to facilitate the data-driven development of user behavior taxonomies.

preprint2022arXiv

Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time

Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the "sensemaking loop." Although recent work observes snapshots of the sensemaking loop within computational notebooks, none measure shifts in sensemaking behaviors over time -- between exploration and explanation. This gap limits our ability to understand the full scope of the sensemaking process and thus our ability to design tools to fully support sensemaking. We contribute the first quantitative method to characterize how sensemaking evolves within data science computational notebooks. To this end, we conducted a quantitative study of 2,574 Jupyter notebooks mined from GitHub. First, we identify data science-focused notebooks that have undergone significant iterations. Second, we present regression models that automatically characterize sensemaking activity within individual notebooks by assigning them a score representing their position within the sensemaking spectrum. Finally, we use our regression models to calculate and analyze shifts in notebook scores across GitHub versions. Our results show that notebook authors participate in a diverse range of sensemaking tasks over time, such as annotation, branching analysis, and documentation. Finally, we propose design recommendations for extending notebook environments to support the sensemaking behaviors we observed.

preprint2022arXiv

Demonstration of VegaPlus: Optimizing Declarative Visualization Languages

While many visualization specification languages are user-friendly, they tend to have one critical drawback: they are designed for small data on the client-side and, as a result, perform poorly at scale. We propose a system that takes declarative visualization specifications as input and automatically optimizes the resulting visualization execution plans by offloading computational-intensive operations to a separate database management system (DBMS). Our demo emphasizes live programming of visualizations over big data, enabling users to write or import Vega specifications, view the optimized plans from our system, and even modify these plans and compare their performance via a dedicated performance dashboard.

preprint2022arXiv

Exploring D3 Implementation Challenges on Stack Overflow

Visualization languages help to standardize the process of designing effective visualizations, one of the most prominent being D3. However, few researchers have analyzed at scale how users incorporate these languages into existing visualization programming processes, i.e., implementation workflows. In this paper, we present an analysis of the experiences of D3 users as observed through Stack Overflow, summarizing common D3 implementation workflows and challenges discussed online. Our results show how the visualization community may be limiting its understanding of users' visualization implementation challenges by ignoring the larger context in which languages such as D3 are used. Based on our findings, we suggest new research directions to enhance the user experience with visualization languages. All our data and code are available at: https://osf.io/fup48/.

preprint2022arXiv

Lodestar: Supporting Independent Learning and Rapid Experimentation Through Data-Driven Analysis Recommendations

Keeping abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose Lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We derive our recommendations from directed graphs of known analysis states, with two input sources: one manually curated from online data science tutorials, and another extracted through semi-automatic analysis of a corpus of over 6,000 Jupyter notebooks. We evaluate Lodestar in a formative study guiding our next set of improvements to the tool. Our results suggest that users find Lodestar useful for rapidly creating data science workflows.

preprint2022arXiv

Recommendations for Visualization Recommendations: Exploring Preferences and Priorities in Public Health

The promise of visualization recommendation systems is that analysts will be automatically provided with relevant and high-quality visualizations that will reduce the work of manual exploration or chart creation. However, little research to date has focused on what analysts value in the design of visualization recommendations. We interviewed 18 analysts in the public health sector and explored how they made sense of a popular in-domain dataset. in service of generating visualizations to recommend to others. We also explored how they interacted with a corpus of both automatically- and manually-generated visualization recommendations, with the goal of uncovering how the design values of these analysts are reflected in current visualization recommendation systems. We find that analysts champion simple charts with clear takeaways that are nonetheless connected with existing semantic information or domain hypotheses. We conclude by recommending that visualization recommendation designers explore ways of integrating context and expectation into their systems.

preprint2020arXiv

Kyrix-S: Authoring Scalable Scatterplot Visualizations of Big Data

Static scatterplots often suffer from the overdraw problem on big datasets where object overlap causes undesirable visual clutter. The use of zooming in scatterplots can help alleviate this problem. With multiple zoom levels, more screen real estate is available, allowing objects to be placed in a less crowded way. We call this type of visualization scalable scatterplot visualizations, or SSV for short. Despite the potential of SSVs, existing systems and toolkits fall short in supporting the authoring of SSVs due to three limitations. First, many systems have limited scalability, assuming that data fits in the memory of one computer. Second, too much developer work, e.g., using custom code to generate mark layouts or render objects, is required. Third, many systems focus on only a small subset of the SSV design space (e.g. supporting a specific type of visual marks). To address these limitations, we have developed Kyrix-S, a system for easy authoring of SSVs at scale. Kyrix-S derives a declarative grammar that enables specification of a variety of SSVs in a few tens of lines of code, based on an existing survey of scatterplot tasks and designs. The declarative grammar is supported by a distributed layout algorithm which automatically places visual marks onto zoom levels. We store data in a multi-node database and use multi-node spatial indexes to achieve interactive browsing of large SSVs. Extensive experiments show that 1) Kyrix-S enables interactive browsing of SSVs of billions of objects, with response times under 500ms and 2) Kyrix-S achieves 4X-9X reduction in specification compared to a state-of-the-art authoring system.

preprint2014arXiv

Indexing Cost Sensitive Prediction

Predictive models are often used for real-time decision making. However, typical machine learning techniques ignore feature evaluation cost, and focus solely on the accuracy of the machine learning models obtained utilizing all the features available. We develop algorithms and indexes to support cost-sensitive prediction, i.e., making decisions using machine learning models taking feature evaluation cost into account. Given an item and a online computation cost (i.e., time) budget, we present two approaches to return an appropriately chosen machine learning model that will run within the specified time on the given item. The first approach returns the optimal machine learning model, i.e., one with the highest accuracy, that runs within the specified time, but requires significant up-front precomputation time. The second approach returns a possibly sub- optimal machine learning model, but requires little up-front precomputation time. We study these two algorithms in detail and characterize the scenarios (using real and synthetic data) in which each performs well. Unlike prior work that focuses on a narrow domain or a specific algorithm, our techniques are very general: they apply to any cost-sensitive prediction scenario on any machine learning algorithm.

Leilani Battle

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Optimizing Dataflow Systems for Scalable Interactive Visualization

A Grammar-Based Approach for Applying Visualization Taxonomies to Interaction Logs

Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time

Demonstration of VegaPlus: Optimizing Declarative Visualization Languages

Exploring D3 Implementation Challenges on Stack Overflow

Lodestar: Supporting Independent Learning and Rapid Experimentation Through Data-Driven Analysis Recommendations

Recommendations for Visualization Recommendations: Exploring Preferences and Priorities in Public Health

Kyrix-S: Authoring Scalable Scatterplot Visualizations of Big Data

Indexing Cost Sensitive Prediction