Source author record

Philip E. Bourne

Philip E. Bourne appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Quantitative Methods Biomolecules Digital Libraries

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

The Least Difference in Means: A Statistic for Effect Size Strength and Practical Significance

With limited resources, scientific inquiries must be prioritized for further study, funding, and translation based on their practical significance: whether the effect size is large enough to be meaningful in the real world. Doing so must evaluate a result's effect strength, defined as a conservative assessment of practical significance. We propose the least difference in means ($δ_L$) as a two-sample statistic that can quantify effect strength and perform a hypothesis test to determine if a result has a meaningful effect size. To facilitate consensus, $δ_L$ allows scientists to compare effect strength between related results and choose different thresholds for hypothesis testing without recalculation. Both $δ_L$ and the relative $δ_L$ outperform other candidate statistics in identifying results with higher effect strength. We use real data to demonstrate how the relative $δ_L$ compares effect strength across broadly related experiments. The relative $δ_L$ can prioritize research based on the strength of their results.

preprint2022arXiv

The Most Difference in Means: A Statistic for the Strength of Null and Near-Zero Results

Statistical insignificance does not suggest the absence of effect, yet scientists must often use null results as evidence of negligible (near-zero) effect size to falsify scientific hypotheses. Doing so must assess a result's null strength, defined as the evidence for a negligible effect size. Such an assessment would differentiate strong null results that suggest a negligible effect size from weak null results that suggest a broad range of potential effect sizes. We propose the most difference in means ($δ_M$) as a two-sample statistic that can both quantify null strength and perform a hypothesis test for negligible effect size. To facilitate consensus when interpreting results, our statistic allows scientists to conclude that a result has negligible effect size using different thresholds with no recalculation required. To assist with selecting a threshold, $δ_M$ can also compare null strength between related results. Both $δ_M$ and the relative form of $δ_M$ outperform other candidate statistics in comparing null strength. We compile broadly related results and use the relative $δ_M$ to compare null strength across different treatments, measurement methods, and experiment models. Reporting the relative $δ_M$ may provide a technical solution to the file drawer problem by encouraging the publication of null and near-zero results.

preprint2022arXiv

Why it takes a village to manage and share data

Implementation plans for the National Institutes of Health policy for data management and sharing, which takes effect in 2023, provide an opportunity to reflect on the stakeholders, infrastructures, practice, economics, and sustainability of data sharing. Responsibility for fulfilling data sharing requirements tends to fall on principal investigators, whereas it takes a village of stakeholders to construct, manage, and sustain the necessary knowledge infrastructure for disseminating data products. Individual scientists have mixed incentives, and many disincentives to share data, all of which vary by research domain, methods, resources, and other factors. Motivations and investments for data sharing also vary widely among academic institutional stakeholders such as university leadership, research computing, libraries, and individual schools and departments. Stakeholder concerns are interdependent along many dimensions, seven of which are explored: what data to share; context and credit; discovery; methods and training; intellectual property; data science programs; and international tensions. Data sharing is not a simple matter of individual practice, but one of infrastructure, institutions, and economics. Governments, funding agencies, and international science organizations all will need to invest in commons approaches for data sharing to develop into a sustainable international ecosystem.

preprint2020arXiv

Deep Learning of Protein Structural Classes: Any Evidence for an 'Urfold'?

Recent computational advances in the accurate prediction of protein three-dimensional (3D) structures from amino acid sequences now present a unique opportunity to decipher the interrelationships between proteins. This task entails--but is not equivalent to--a problem of 3D structure comparison and classification. Historically, protein domain classification has been a largely manual and subjective activity, relying upon various heuristics. Databases such as CATH represent significant steps towards a more systematic (and automatable) approach, yet there still remains much room for the development of more scalable and quantitative classification methods, grounded in machine learning. We suspect that re-examining these relationships via a Deep Learning (DL) approach may entail a large-scale restructuring of classification schemes, improved with respect to the interpretability of distant relationships between proteins. Here, we describe our training of DL models on protein domain structures (and their associated physicochemical properties) in order to evaluate classification properties at CATH's "homologous superfamily" (SF) level. To achieve this, we have devised and applied an extension of image-classification methods and image segmentation techniques, utilizing a convolutional autoencoder model architecture. Our DL architecture allows models to learn structural features that, in a sense, 'define' different homologous SFs. We evaluate and quantify pairwise 'distances' between SFs by building one model per SF and comparing the loss functions of the models. Hierarchical clustering on these distance matrices provides a new view of protein interrelationships--a view that extends beyond simple structural/geometric similarity, and towards the realm of structure/function properties.

Philip E. Bourne

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

The Least Difference in Means: A Statistic for Effect Size Strength and Practical Significance

The Most Difference in Means: A Statistic for the Strength of Null and Near-Zero Results

Why it takes a village to manage and share data

Deep Learning of Protein Structural Classes: Any Evidence for an 'Urfold'?