Researcher profile

Philip E. Bourne

Philip E. Bourne contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

The Least Difference in Means: A Statistic for Effect Size Strength and Practical Significance

With limited resources, scientific inquiries must be prioritized for further study, funding, and translation based on their practical significance: whether the effect size is large enough to be meaningful in the real world. Doing so must evaluate a result's effect strength, defined as a conservative assessment of practical significance. We propose the least difference in means ($δ_L$) as a two-sample statistic that can quantify effect strength and perform a hypothesis test to determine if a result has a meaningful effect size. To facilitate consensus, $δ_L$ allows scientists to compare effect strength between related results and choose different thresholds for hypothesis testing without recalculation. Both $δ_L$ and the relative $δ_L$ outperform other candidate statistics in identifying results with higher effect strength. We use real data to demonstrate how the relative $δ_L$ compares effect strength across broadly related experiments. The relative $δ_L$ can prioritize research based on the strength of their results.

preprint2022arXiv

The Most Difference in Means: A Statistic for the Strength of Null and Near-Zero Results

Statistical insignificance does not suggest the absence of effect, yet scientists must often use null results as evidence of negligible (near-zero) effect size to falsify scientific hypotheses. Doing so must assess a result's null strength, defined as the evidence for a negligible effect size. Such an assessment would differentiate strong null results that suggest a negligible effect size from weak null results that suggest a broad range of potential effect sizes. We propose the most difference in means ($δ_M$) as a two-sample statistic that can both quantify null strength and perform a hypothesis test for negligible effect size. To facilitate consensus when interpreting results, our statistic allows scientists to conclude that a result has negligible effect size using different thresholds with no recalculation required. To assist with selecting a threshold, $δ_M$ can also compare null strength between related results. Both $δ_M$ and the relative form of $δ_M$ outperform other candidate statistics in comparing null strength. We compile broadly related results and use the relative $δ_M$ to compare null strength across different treatments, measurement methods, and experiment models. Reporting the relative $δ_M$ may provide a technical solution to the file drawer problem by encouraging the publication of null and near-zero results.

preprint2022arXiv

Why it takes a village to manage and share data

Implementation plans for the National Institutes of Health policy for data management and sharing, which takes effect in 2023, provide an opportunity to reflect on the stakeholders, infrastructures, practice, economics, and sustainability of data sharing. Responsibility for fulfilling data sharing requirements tends to fall on principal investigators, whereas it takes a village of stakeholders to construct, manage, and sustain the necessary knowledge infrastructure for disseminating data products. Individual scientists have mixed incentives, and many disincentives to share data, all of which vary by research domain, methods, resources, and other factors. Motivations and investments for data sharing also vary widely among academic institutional stakeholders such as university leadership, research computing, libraries, and individual schools and departments. Stakeholder concerns are interdependent along many dimensions, seven of which are explored: what data to share; context and credit; discovery; methods and training; intellectual property; data science programs; and international tensions. Data sharing is not a simple matter of individual practice, but one of infrastructure, institutions, and economics. Governments, funding agencies, and international science organizations all will need to invest in commons approaches for data sharing to develop into a sustainable international ecosystem.

preprint2020arXiv

Deep Learning of Protein Structural Classes: Any Evidence for an 'Urfold'?

Recent computational advances in the accurate prediction of protein three-dimensional (3D) structures from amino acid sequences now present a unique opportunity to decipher the interrelationships between proteins. This task entails--but is not equivalent to--a problem of 3D structure comparison and classification. Historically, protein domain classification has been a largely manual and subjective activity, relying upon various heuristics. Databases such as CATH represent significant steps towards a more systematic (and automatable) approach, yet there still remains much room for the development of more scalable and quantitative classification methods, grounded in machine learning. We suspect that re-examining these relationships via a Deep Learning (DL) approach may entail a large-scale restructuring of classification schemes, improved with respect to the interpretability of distant relationships between proteins. Here, we describe our training of DL models on protein domain structures (and their associated physicochemical properties) in order to evaluate classification properties at CATH's "homologous superfamily" (SF) level. To achieve this, we have devised and applied an extension of image-classification methods and image segmentation techniques, utilizing a convolutional autoencoder model architecture. Our DL architecture allows models to learn structural features that, in a sense, 'define' different homologous SFs. We evaluate and quantify pairwise 'distances' between SFs by building one model per SF and comparing the loss functions of the models. Hierarchical clustering on these distance matrices provides a new view of protein interrelationships--a view that extends beyond simple structural/geometric similarity, and towards the realm of structure/function properties.