Researcher profile

Shane T. Jensen

Shane T. Jensen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Clustering Areal Units at Multiple Levels of Resolution to Model Crime in Philadelphia

Estimation of the spatial heterogeneity in crime incidence across an entire city is an important step towards reducing crime and increasing our understanding of the physical and social functioning of urban environments. This is a difficult modeling endeavor since crime incidence can vary smoothly across space and time but there also exist physical and social barriers that result in discontinuities in crime rates between different regions within a city. A further difficulty is that there are different levels of resolution that can be used for defining regions of a city in order to analyze crime. To address these challenges, we develop a Bayesian non-parametric approach for the clustering of urban areal units at different levels of resolution simultaneously. Our approach is evaluated with an extensive synthetic data study and then applied to the estimation of crime incidence at various levels of resolution in the city of Philadelphia.

preprint2022arXiv

Crime in Philadelphia: Bayesian Clustering with Particle Optimization

Accurate estimation of the change in crime over time is a critical first step towards better understanding of public safety in large urban environments. Bayesian hierarchical modeling is a natural way to study spatial variation in urban crime dynamics at the neighborhood level, since it facilitates principled ``sharing of information'' between spatially adjacent neighborhoods. Typically, however, cities contain many physical and social boundaries that may manifest as spatial discontinuities in crime patterns. In this situation, standard prior choices often yield overly-smooth parameter estimates, which can ultimately produce mis-calibrated forecasts. To prevent potential over-smoothing, we introduce a prior that partitions the set of neighborhoods into several clusters and encourages spatial smoothness within each cluster. In terms of model implementation, conventional stochastic search techniques are computationally prohibitive, as they must traverse a combinatorially vast space of partitions. We introduce an ensemble optimization procedure that simultaneously identifies several high probability partitions by solving one optimization problem using a new local search strategy. We then use the identified partitions to estimate crime trends in Philadelphia between 2006 and 2017. On simulated and real data, our proposed method demonstrates good estimation and partition selection performance.

preprint2013arXiv

Estimating Player Contribution in Hockey with Regularized Logistic Regression

We present a regularized logistic regression model for evaluating player contributions in hockey. The traditional metric for this purpose is the plus-minus statistic, which allocates a single unit of credit (for or against) to each player on the ice for a goal. However, plus-minus scores measure only the marginal effect of players, do not account for sample size, and provide a very noisy estimate of performance. We investigate a related regression problem: what does each player on the ice contribute, beyond aggregate team performance and other factors, to the odds that a given goal was scored by their team? Due to the large-p (number of players) and imbalanced design setting of hockey analysis, a major part of our contribution is a careful treatment of prior shrinkage in model estimation. We showcase two recently developed techniques -- for posterior maximization or simulation -- that make such analysis feasible. Each approach is accompanied with publicly available software and we include the simple commands used in our analysis. Our results show that most players do not stand out as measurably strong (positive or negative) contributors. This allows the stars to really shine, reveals diamonds in the rough overlooked by earlier analyses, and argues that some of the highest paid players in the league are not making contributions worth their expense.

preprint2012arXiv

A Level-Set Hit-and-Run Sampler for Quasi-Concave Distributions

We develop a new sampling strategy that uses the hit-and-run algorithm within level sets of the target density. Our method can be applied to any quasi-concave density, which covers a broad class of models. Our sampler performs well in high-dimensional settings, which we illustrate with a comparison to Gibbs sampling on a spike-and-slab mixture model. We also extend our method to exponentially-tilted quasi-concave densities, which arise often in Bayesian models consisting of a log-concave likelihood and quasi-concave prior density. Within this class of models, our method is effective at sampling from posterior distributions with high dependence between parameters, which we illustrate with a simple multivariate normal example. We also implement our level-set sampler on a Cauchy-normal model where we demonstrate the ability of our level set sampler to handle multi-modal posterior distributions.

preprint2012arXiv

Refining the Protein-Protein Interactome using Gene Expression Data

Proteins interact with other proteins within biological pathways, forming connected subgraphs in the protein-protein interactome (PPI). Proteins are often involved in multiple biological pathways which complicates interpretation of interactions between proteins. Gene expression data can assist our inference since genes within a particular pathway tend to have more correlated expression patterns than genes from distinct pathways. We provide an algorithm that uses gene expression information to remove inter-pathway protein-protein interactions, thereby simplifying the structure of the protein-protein interactome. This refined topology permits easier interpretation and greater biological coherence of multiple biological pathways simultaneously.

preprint2010arXiv

An Alternative Prior Process for Nonparametric Bayesian Clustering

Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering -- the uniform process -- for applications where the "rich-get-richer" property is undesirable. We also explore the cost of this process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.

preprint2007arXiv

Bayesian variable selection and data integration for biological regulatory networks

A substantial focus of research in molecular biology are gene regulatory networks: the set of transcription factors and target genes which control the involvement of different biological processes in living cells. Previous statistical approaches for identifying gene regulatory networks have used gene expression data, ChIP binding data or promoter sequence data, but each of these resources provides only partial information. We present a Bayesian hierarchical model that integrates all three data types in a principled variable selection framework. The gene expression data are modeled as a function of the unknown gene regulatory network which has an informed prior distribution based upon both ChIP binding and promoter sequence data. We also present a variable weighting methodology for the principled balancing of multiple sources of prior information. We apply our procedure to the discovery of gene regulatory relationships in Saccharomyces cerevisiae (Yeast) for which we can use several external sources of information to validate our results. Our inferred relationships show greater biological relevance on the external validation measures than previous data integration methods. Our model also estimates synergistic and antagonistic interactions between transcription factors, many of which are validated by previous studies. We also evaluate the results from our procedure for the weighting for multiple sources of prior information. Finally, we discuss our methodology in the context of previous approaches to data integration and Bayesian variable selection.