Source author record

Jamie Haddock

Jamie Haddock appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.NA Numerical Analysis cs.CY Information Retrieval math.CO math.PR physics.data-an physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

7works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Generalized Hierarchical Nonnegative Tensor Decomposition

Nonnegative matrix factorization (NMF) has found many applications including topic modeling and document analysis. Hierarchical NMF (HNMF) variants are able to learn topics at various levels of granularity and illustrate their hierarchical relationship. Recently, nonnegative tensor factorization (NTF) methods have been applied in a similar fashion in order to handle data sets with complex, multi-modal structure. Hierarchical NTF (HNTF) methods have been proposed, however these methods do not naturally generalize their matrix-based counterparts. Here, we propose a new HNTF model which directly generalizes a HNMF model special case, and provide a supervised extension. We also provide a multiplicative updates training method for this model. Our experimental results show that this model more naturally illuminates the topic hierarchy than previous HNMF and HNTF methods.

preprint2022arXiv

Nonbacktracking spectral clustering of nonuniform hypergraphs

Spectral methods offer a tractable, global framework for clustering in graphs via eigenvector computations on graph matrices. Hypergraph data, in which entities interact on edges of arbitrary size, poses challenges for matrix representations and therefore for spectral clustering. We study spectral clustering for nonuniform hypergraphs based on the hypergraph nonbacktracking operator. After reviewing the definition of this operator and its basic properties, we prove a theorem of Ihara-Bass type which allows eigenpair computations to take place on a smaller matrix, often enabling faster computation. We then propose an alternating algorithm for inference in a hypergraph stochastic blockmodel via linearized belief-propagation which involves a spectral clustering step again using nonbacktracking operators. We provide proofs related to this algorithm that both formalize and extend several previous results. We pose several conjectures about the limits of spectral methods and detectability in hypergraph stochastic blockmodels in general, supporting these with in-expectation analysis of the eigeinpairs of our studied operators. We perform experiments in real and synthetic data that demonstrate the benefits of hypergraph methods over graph-based ones when interactions of different sizes carry different information about cluster structure.

preprint2022arXiv

Semi-supervised Nonnegative Matrix Factorization for Document Classification

We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification and provide motivation for these models as maximum likelihood estimators. The proposed SSNMF models simultaneously provide both a topic model and a model for classification, thereby offering highly interpretable classification results. We derive training methods using multiplicative updates for each new model, and demonstrate the application of these models to single-label and multi-label document classification, although the models are flexible to other supervised learning tasks such as regression. We illustrate the promise of these models and training methods on document classification datasets (e.g., 20 Newsgroups, Reuters).

preprint2022arXiv

Zero Forcing with Random Sets

Given a graph $G$ and a real number $0\le p\le 1$, we define the random set $B_p(G)\subset V(G)$ by including each vertex independently and with probability $p$. We investigate the probability that the random set $B_p(G)$ is a zero forcing set of $G$. In particular, we prove that for large $n$, this probability for trees is upper bounded by the corresponding probability for a path graph. Given a minimum degree condition, we also prove a conjecture of Boyer et.\ al.\ regarding the number of zero forcing sets of a given size that a graph can have.

preprint2021arXiv

On a Guided Nonnegative Matrix Factorization

Fully unsupervised topic models have found fantastic success in document clustering and classification. However, these models often suffer from the tendency to learn less-than-meaningful or even redundant topics when the data is biased towards a set of features. For this reason, we propose an approach based upon the nonnegative matrix factorization (NMF) model, deemed \textit{Guided NMF}, that incorporates user-designed seed word supervision. Our experimental results demonstrate the promise of this model and illustrate that it is competitive with other methods of this ilk with only very little supervision information.

preprint2020arXiv

Feature Selection on Lyme Disease Patient Survey Data

Lyme disease is a rapidly growing illness that remains poorly understood within the medical community. Critical questions about when and why patients respond to treatment or stay ill, what kinds of treatments are effective, and even how to properly diagnose the disease remain largely unanswered. We investigate these questions by applying machine learning techniques to a large scale Lyme disease patient registry, MyLymeData, developed by the nonprofit LymeDisease.org. We apply various machine learning methods in order to measure the effect of individual features in predicting participants' answers to the Global Rating of Change (GROC) survey questions that assess the self-reported degree to which their condition improved, worsened, or remained unchanged following antibiotic treatment. We use basic linear regression, support vector machines, neural networks, entropy-based decision tree models, and $k$-nearest neighbors approaches. We first analyze the general performance of the model and then identify the most important features for predicting participant answers to GROC. After we identify the "key" features, we separate them from the dataset and demonstrate the effectiveness of these features at identifying GROC. In doing so, we highlight possible directions for future study both mathematically and clinically.

preprint2020arXiv

Greed Works: An Improved Analysis of Sampling Kaczmarz-Motzkin

Stochastic iterative algorithms have gained recent interest in machine learning and signal processing for solving large-scale systems of equations, $Ax=b$. One such example is the Randomized Kaczmarz (RK) algorithm, which acts only on single rows of the matrix $A$ at a time. While RK randomly selects a row of $A$ to work with, Motzkin's Method (MM) employs a greedy row selection. Connections between the two algorithms resulted in the Sampling Kaczmarz-Motzkin (SKM) algorithm which samples a random subset of $β$ rows of $A$ and then greedily selects the best row of the subset. Despite their variable computational costs, all three algorithms have been proven to have the same theoretical upper bound on the convergence rate. In this work, an improved analysis of the range of random (RK) to greedy (MM) methods is presented. This analysis improves upon previous known convergence bounds for SKM, capturing the benefit of partially greedy selection schemes. This work also further generalizes previous known results, removing the theoretical assumptions that $β$ must be fixed at every iteration and that $A$ must have normalized rows.

Jamie Haddock

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

A Generalized Hierarchical Nonnegative Tensor Decomposition

Nonbacktracking spectral clustering of nonuniform hypergraphs

Semi-supervised Nonnegative Matrix Factorization for Document Classification

Zero Forcing with Random Sets

On a Guided Nonnegative Matrix Factorization

Feature Selection on Lyme Disease Patient Survey Data

Greed Works: An Improved Analysis of Sampling Kaczmarz-Motzkin