Source author record

Benjamin Jarman

Benjamin Jarman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA Numerical Analysis Information Retrieval Machine Learning

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents

Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our proposed method improves both classification accuracy and topic coherence in comparison to past methods like Semi-Supervised Non-negative Matrix Factorization (SSNMF) and Guided Non-negative Matrix Factorization (Guided NMF).

preprint2022arXiv

On Block Accelerations of Quantile Randomized Kaczmarz for Corrupted Systems of Linear Equations

With the growth of large data as well as large-scale learning tasks, the need for efficient and robust linear system solvers is greater than ever. The randomized Kaczmarz method (RK) and similar stochastic iterative methods have received considerable recent attention due to their efficient implementation and memory footprint. These methods can tolerate streaming data, accessing only part of the data at a time, and can also approximate the least squares solution even if the system is affected by noise. However, when data is instead affected by large (possibly adversarial) corruptions, these methods fail to converge, as corrupted data points draw iterates far from the true solution. A recently proposed solution to this is the QuantileRK method, which avoids harmful corrupted data by exploring the space carefully as the method iterates. The exploration component requires the computation of quantiles of large samples from the system and is computationally much heavier than the subsequent iteration update. In this paper, we propose an approach that better uses the information obtained during exploration by incorporating an averaged version of the block Kaczmarz method. This significantly speeds up convergence, while still allowing for a constant fraction of the equations to be arbitrarily corrupted. We provide theoretical convergence guarantees as well as experimental supporting evidence. We also demonstrate that the classical projection-based block Kaczmarz method cannot be robust to sparse adversarial corruptions, but rather the blocking has to be carried out by averaging one-dimensional projections.

preprint2022arXiv

Randomized Extended Kaczmarz is a Limit Point of Sketch-and-Project

The sketch-and-project (SAP) framework for solving systems of linear equations has unified the theory behind popular projective iterative methods such as randomized Kaczmarz, randomized coordinate descent, and variants thereof. The randomized extended Kaczmarz (REK) method is a popular extension of randomized Kaczmarz for solving inconsistent systems, which has not yet been shown to lie within the SAP framework. In this work we show that, in a certain sense, REK may be expressed as the limit point of a family of SAP methods, but we argue that it is unlikely that REK can be translated into a SAP method itself. We provide an extensive theoretical analysis of the family of methods comprising said limit, including convergence guarantees and further connections to REK. We follow this with an array of experiments demonstrating these methods and their connections in practice.

Benjamin Jarman

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents

On Block Accelerations of Quantile Randomized Kaczmarz for Corrupted Systems of Linear Equations

Randomized Extended Kaczmarz is a Limit Point of Sketch-and-Project