Source author record

Laurie Davies

Laurie Davies appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications math.ST q-fin.ST stat.OT Statistics Theory

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Covariate Selection Based on a Model-free Approach to Linear Regression with Exact Probabilities

In this paper we give a completely new approach to the problem of covariate selection in linear regression. A covariate or a set of covariates is included only if it is better in the sense of least squares than the same number of Gaussian covariates consisting of i.i.d. $N(0,1)$ random variables. The Gaussian P-value is defined as the probability that the Gaussian covariates are better. It is given in terms of the Beta distribution, it is exact and it holds for all data making it model-free free. The covariate selection procedures require only a cut-off value $α$ for the Gaussian P-value: the default value in this paper is $α=0.01$. The resulting procedures are very simple, very fast, do not overfit and require only least squares. In particular there is no regularization parameter, no data splitting, no use of simulations, no shrinkage and no post selection inference is required. The paper includes the results of simulations, applications to real data sets and theorems on the asymptotic behaviour under the standard linear model. Here the step-wise procedure performs overwhelmingly better than any other procedure we are aware of. An R-package {\it gausscov} is available.

preprint2022arXiv

Linear Regression, Covariate Selection and the Failure of Modelling

It is argued that all model based approaches to the selection of covariates in linear regression have failed. This applies to frequentist approaches based on P-values and to Bayesian approaches although for different reasons. In the first part of the paper 13 model based procedures are compared to the model-free Gaussian covariate procedure in terms of the covariates selected and the time required. The comparison is based on seven data sets and three simulations. There is nothing special about these data sets which are often used as examples in the literature. All the model based procedures failed. In the second part of the paper it is argued that the cause of this failure is the very use of a model. If the model involves all the available covariates standard P-values can be used. The use of P-values in this situation is quite straightforward. As soon as the model specifies only some unknown subset of the covariates the problem being to identify this subset the situation changes radically. There are many P-values, they are dependent and most of them are invalid. The P-value based approach collapses. The Bayesian paradigm also assumes a correct model but although there are no conceptual problems with a large number of covariates there is a considerable overhead causing computational and allocation problems even for moderately sized data sets. The Gaussian covariate procedure is based on P-values which are defined as the probability that a random Gaussian covariate is better than the covariate being considered. These P-values are exact and valid whatever the situation. The allocation requirements and the algorithmic complexity are both linear in the size of the data making the procedure capable of handling large data sets. It outperforms all the other procedures in every respect.

preprint2016arXiv

Functional Choice and Non-significance Regions in Regression

Given data $y$ and $k$ covariates $x$ the problem is to decide which covariates to include when approximating $y$ by a linear function of the covariates. The decision is based on replacing subsets of the covariates by i.i.d. normal random variables and comparing the error with that obtained by retaining the subsets. If the two errors are not significantly different for a particular subset it is concluded that the covariates in this subset are no better than random noise and they are not included in the linear approximation to $y$.

preprint2016arXiv

On $p$-values

Models are consistently treated as approximations and all procedures are consistent with this. They do not treat the model as being true. In this context $p$-values are one measure of approximation, a small $p$-value indicating a poor approximation. Approximation regions are defined and distinguished from confidence regions.

preprint2016arXiv

Stylized Facts and Simulating Long Range Financial Data

We propose a new method (implemented in an R-program) to simulate long-range daily stock-price data. The program reproduces various stylized facts much better than various parametric models from the extended GARCH-family. In particular, the empirically observed changes in unconditional variance are truthfully mirrored in the simulated data.

preprint2010arXiv

Locally adaptive image denoising by a statistical multiresolution criterion

We demonstrate how one can choose the smoothing parameter in image denoising by a statistical multiresolution criterion, both globally and locally. Using inhomogeneous diffusion and total variation regularization as examples for localized regularization schemes, we present an efficient method for locally adaptive image denoising. As expected, the smoothing parameter serves as an edge detector in this framework. Numerical examples illustrate the usefulness of our approach. We also present an application in confocal microscopy.

Laurie Davies

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Covariate Selection Based on a Model-free Approach to Linear Regression with Exact Probabilities

Linear Regression, Covariate Selection and the Failure of Modelling

Functional Choice and Non-significance Regions in Regression

On $p$-values

Stylized Facts and Simulating Long Range Financial Data

Locally adaptive image denoising by a statistical multiresolution criterion