Source author record

Kristin P. Bennett

Kristin P. Bennett appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computation and Language cs.CY math.OC Quantitative Methods

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Downstream Fairness Caveats with Synthetic Healthcare Data

This paper evaluates synthetically generated healthcare data for biases and investigates the effect of fairness mitigation techniques on utility-fairness. Privacy laws limit access to health data such as Electronic Medical Records (EMRs) to preserve patient privacy. Albeit essential, these laws hinder research reproducibility. Synthetic data is a viable solution that can enable access to data similar to real healthcare data without privacy risks. Healthcare datasets may have biases in which certain protected groups might experience worse outcomes than others. With the real data having biases, the fairness of synthetically generated health data comes into question. In this paper, we evaluate the fairness of models generated on two healthcare datasets for gender and race biases. We generate synthetic versions of the dataset using a Generative Adversarial Network called HealthGAN, and compare the real and synthetic model's balanced accuracy and fairness scores. We find that synthetic data has different fairness properties compared to real data and fairness mitigation techniques perform differently, highlighting that synthetic data is not bias free.

preprint2022arXiv

Should we tweet this? Generative response modeling for predicting reception of public health messaging on Twitter

The way people respond to messaging from public health organizations on social media can provide insight into public perceptions on critical health issues, especially during a global crisis such as COVID-19. It could be valuable for high-impact organizations such as the US Centers for Disease Control and Prevention (CDC) or the World Health Organization (WHO) to understand how these perceptions impact reception of messaging on health policy recommendations. We collect two datasets of public health messages and their responses from Twitter relating to COVID-19 and Vaccines, and introduce a predictive method which can be used to explore the potential reception of such messages. Specifically, we harness a generative model (GPT-2) to directly predict probable future responses and demonstrate how it can be used to optimize expected reception of important health guidance. Finally, we introduce a novel evaluation scheme with extensive statistical testing which allows us to conclude that our models capture the semantics and sentiment found in actual public health responses.

preprint2012arXiv

Crossing Minimization within Graph Embeddings

We propose a novel optimization-based approach to embedding heterogeneous high-dimensional data characterized by a graph. The goal is to create a two-dimensional visualization of the graph structure such that edge-crossings are minimized while preserving proximity relations between nodes. This paper provides a fundamentally new approach for addressing the crossing minimization criteria that exploits Farkas' Lemma to re-express the condition for no edge-crossings as a system of nonlinear inequality constraints. The approach has an intuitive geometric interpretation closely related to support vector machine classification. While the crossing minimization formulation can be utilized in conjunction with any optimization-based embedding objective, here we demonstrate the approach on multidimensional scaling by modifying the stress majorization algorithm to include penalties for edge crossings. The proposed method is used to (1) solve a visualization problem in tuberculosis molecular epidemiology and (2) generate embeddings for a suite of randomly generated graphs designed to challenge the algorithm. Experimental results demonstrate the efficacy of the approach. The proposed edge-crossing constraints and penalty algorithm can be readily adapted to other supervised and unsupervised optimization-based embedding or dimensionality reduction methods. The constraints can be generalized to remove overlaps between any graph components represented as convex polyhedrons including node-edge and node-node intersections.

preprint2011arXiv

Prediction of peptide bonding affinity: kernel methods for nonlinear modeling

This paper presents regression models obtained from a process of blind prediction of peptide binding affinity from provided descriptors for several distinct datasets as part of the 2006 Comparative Evaluation of Prediction Algorithms (COEPRA) contest. This paper finds that kernel partial least squares, a nonlinear partial least squares (PLS) algorithm, outperforms PLS, and that the incorporation of transferable atom equivalent features improves predictive capability.