Researcher profile

Benjamin Jarman

Benjamin Jarman contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents

Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our proposed method improves both classification accuracy and topic coherence in comparison to past methods like Semi-Supervised Non-negative Matrix Factorization (SSNMF) and Guided Non-negative Matrix Factorization (Guided NMF).

preprint2022arXiv

On Block Accelerations of Quantile Randomized Kaczmarz for Corrupted Systems of Linear Equations

With the growth of large data as well as large-scale learning tasks, the need for efficient and robust linear system solvers is greater than ever. The randomized Kaczmarz method (RK) and similar stochastic iterative methods have received considerable recent attention due to their efficient implementation and memory footprint. These methods can tolerate streaming data, accessing only part of the data at a time, and can also approximate the least squares solution even if the system is affected by noise. However, when data is instead affected by large (possibly adversarial) corruptions, these methods fail to converge, as corrupted data points draw iterates far from the true solution. A recently proposed solution to this is the QuantileRK method, which avoids harmful corrupted data by exploring the space carefully as the method iterates. The exploration component requires the computation of quantiles of large samples from the system and is computationally much heavier than the subsequent iteration update. In this paper, we propose an approach that better uses the information obtained during exploration by incorporating an averaged version of the block Kaczmarz method. This significantly speeds up convergence, while still allowing for a constant fraction of the equations to be arbitrarily corrupted. We provide theoretical convergence guarantees as well as experimental supporting evidence. We also demonstrate that the classical projection-based block Kaczmarz method cannot be robust to sparse adversarial corruptions, but rather the blocking has to be carried out by averaging one-dimensional projections.

preprint2022arXiv

Randomized Extended Kaczmarz is a Limit Point of Sketch-and-Project

The sketch-and-project (SAP) framework for solving systems of linear equations has unified the theory behind popular projective iterative methods such as randomized Kaczmarz, randomized coordinate descent, and variants thereof. The randomized extended Kaczmarz (REK) method is a popular extension of randomized Kaczmarz for solving inconsistent systems, which has not yet been shown to lie within the SAP framework. In this work we show that, in a certain sense, REK may be expressed as the limit point of a family of SAP methods, but we argue that it is unlikely that REK can be translated into a SAP method itself. We provide an extensive theoretical analysis of the family of methods comprising said limit, including convergence guarantees and further connections to REK. We follow this with an array of experiments demonstrating these methods and their connections in practice.