Researcher profile

Peter J. Waddell

Peter J. Waddell contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2012arXiv

New g%AIC, g%AICc, g%BIC, and Power Divergence Fit Statistics Expose Mating between Modern Humans, Neanderthals and other Archaics

The purpose of this article is to look at how information criteria, such as AIC and BIC, relate to the g%SD fit criterion derived in Waddell et al. (2007, 2010a). The g%SD criterion measures the fit of data to model based on a normalized weighted root mean square percentage deviation between the observed data and model estimates of the data, with g%SD = 0 being a perfectly fitting model. However, this criterion may not be adjusting for the number of parameters in the model comprehensively. Thus, its relationship to more traditional measures for maximizing useful information in a model, including AIC and BIC, are examined. This results in an extended set of fit criteria including g%AIC and g%BIC. Further, a broader range of asymptotically most powerful fit criteria of the power divergence family, which includes maximum likelihood (or minimum G^2) and minimum X^2 modeling as special cases, are used to replace the sum of squares fit criterion within the g%SD criterion. Results are illustrated with a set of genetic distances looking particularly at a range of Jewish populations, plus a genomic data set that looks at how Neanderthals and Denisovans are related to each other and modern humans. Evidence that Homo erectus may have left a significant fraction of its genome within the Denisovan is shown to persist with the new modeling criteria.

preprint2011arXiv

Homo denisova, Correspondence Spectral Analysis, Finite Sites Reticulate Hierarchical Coalescent Models and the Ron Jeremy Hypothesis

This article shows how to fit reticulate finite and infinite sites sequence spectra to aligned data from five modern human genomes (San, Yoruba, French, Han and Papuan) plus two archaic humans (Denisovan and Neanderthal), to better infer demographic parameters. These include interbreeding between distinct lineages. Major improvements in the fit of the sequence spectrum are made with successively more complicated models. Findings include some evidence of a male biased gene flow from the Denisova lineage to Papuan ancestors and possibly even more archaic gene flow. It is unclear if there is evidence for more than one Neanderthal interbreeding, as the evidence suggesting this largely disappears when a finite sites model is fitted.

preprint2010arXiv

A Unified Framework for Trees, Multi-Dimensional Scaling and Planar Graphs

Least squares trees, multi-dimensional scaling and Neighbor Nets are all different and popular ways of visualizing multi-dimensional data. The method of flexi-Weighted Least Squares (fWLS) is a powerful method of fitting phylogenetic trees, when the exact form of errors is unknown. Here, both polynomial and exponential weights are used to model errors. The exact same models are implemented for multi-dimensional scaling to yield flexi-Weighted MDS, including as special cases methods such as the Sammon Stress function. Here we apply all these methods to population genetic data looking at the relationships of "Abrahams Children" encompassing Arabs and now widely dispersed populations of Jews, in relation to an African outgroup and a variety of European populations. Trees, MDS and Neighbor Nets of this data are compared within a common likelihood framework and the strengths and weaknesses of each method are explored. Because the errors in this type of data can be complex, for example, due to unexpected genetic transfer, we use a residual resampling method to assess the robustness of trees and the Neighbor Net. Despite the Neighbor Net fitting best by all criteria except BIC, its structure is ill defined following residual resampling. In contrast, fWLS trees are favored by BIC and retain considerable strong internal structure following residual resampling. This structure clearly separates various European and Middle Eastern populations, yet it is clear all of the models have errors much larger than expected by sampling variance alone.

preprint2010arXiv

Resampling Residuals on Phylogenetic Trees: Extended Results

In this article the results of Waddell and Azad (2009) are extended. In particular, the geometric percentage mean standard deviation measure of the fit of distances to a phylogenetic tree is adjusted for the number of parameters fitted to the model. The formulae are also presented in their general form for any weight that is a function of the distance. The cell line gene expression data set of Ross et al. (2000) is reanalyzed. It is shown that ordinary least squares (OLS) is a much better fit to the data than a Neighbor Joining or BME tree. Residual resampling shows that cancer cell lines do indeed fit a tree fairly well and that the tree does have strong internal structure. Simulations show that least squares tree building methods, including OLS, are strong competitors with BME type methods for fitting model data, while real world examples often suggest the same conclusion.

preprint2010arXiv

What use are Exponential Weights for flexi-Weighted Least Squares Phylogenetic Trees?

The method of flexi-Weighted Least Squares on evolutionary trees uses simple polynomial or exponential functions of the evolutionary distance in place of model-based variances. This has the advantage that unexpected deviations from additivity can be modeled in a more flexible way. At present, only polynomial weights have been used. However, a general family of exponential weights is desirable to compare with polynomial weights and to potentially exploit recent insights into fast least squares edge length estimation on trees. Here describe families of weights that are multiplicative on trees, along with measures of fit of data to tree. It is shown that polynomial, but also multiplicative weights can approximate model-based variance of evolutionary distances well. Both models are fitted to evolutionary data from yeast genomes and while the polynomial weights model fits better, the exponential weights model can fit a lot better than ordinary least squares. Iterated least squares is evaluated and is seen to converge quickly and with minimal change in the fit statistics when the data are in the range expected for the useful evolutionary distances and simple Markov models of character change. In summary, both polynomial and exponential weighted least squares work well and justify further investment into developing the fastest possible algorithms for evaluating evolutionary trees.