Source author record

Stephen Todd

Stephen Todd appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Machine Learning Methodology

Catalog footprint

What is connected

2works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Predicting feature imputability in the absence of ground truth

Data imputation is the most popular method of dealing with missing values, but in most real life applications, large missing data can occur and it is difficult or impossible to evaluate whether data has been imputed accurately (lack of ground truth). This paper addresses these issues by proposing an effective and simple principal component based method for determining whether individual data features can be accurately imputed - feature imputability. In particular, we establish a strong linear relationship between principal component loadings and feature imputability, even in the presence of extreme missingness and lack of ground truth. This work will have important implications in practical data imputation strategies.

preprint2013arXiv

Hyper-Graph Based Database Partitioning for Transactional Workloads

A common approach to scaling transactional databases in practice is horizontal partitioning, which increases system scalability, high availability and self-manageability. Usu- ally it is very challenging to choose or design an optimal partitioning scheme for a given workload and database. In this technical report, we propose a fine-grained hyper-graph based database partitioning system for transactional work- loads. The partitioning system takes a database, a workload, a node cluster and partitioning constraints as input and out- puts a lookup-table encoding the final database partitioning decision. The database partitioning problem is modeled as a multi-constraints hyper-graph partitioning problem. By deriving a min-cut of the hyper-graph, our system can min- imize the total number of distributed transactions in the workload, balance the sizes and workload accesses of the partitions and satisfy all the partition constraints imposed. Our system is highly interactive as it allows users to im- pose partition constraints, watch visualized partitioning ef- fects, and provide feedback based on human expertise and indirect domain knowledge for generating better partition- ing schemes.