Researcher profile

David Degras

David Degras contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2021arXiv

Scalable Feature Matching Across Large Data Collections

This paper is concerned with matching feature vectors in a one-to-one fashion across large collections of datasets. Formulating this task as a multidimensional assignment problem with decomposable costs (MDADC), we develop extremely fast algorithms with time complexity linear in the number $n$ of datasets and space complexity a small fraction of the data size. These remarkable properties hinge on using the squared Euclidean distance as dissimilarity function, which can reduce ${n \choose 2}$ matching problems between pairs of datasets to $n$ problems and enable calculating assignment costs on the fly. To our knowledge, no other method applicable to the MDADC possesses these linear scaling and low-storage properties necessary to large-scale applications. In numerical experiments, the novel algorithms outperform competing methods and show excellent computational and optimization performances. An application of feature matching to a large neuroimaging database is presented. The algorithms of this paper are implemented in the R package matchFeat available at https://github.com/ddegras/matchFeat.

preprint2013arXiv

Confidence bands for Horvitz-Thompson estimators using sampled noisy functional data

When collections of functional data are too large to be exhaustively observed, survey sampling techniques provide an effective way to estimate global quantities such as the population mean function. Assuming functional data are collected from a finite population according to a probabilistic sampling scheme, with the measurements being discrete in time and noisy, we propose to first smooth the sampled trajectories with local polynomials and then estimate the mean function with a Horvitz-Thompson estimator. Under mild conditions on the population size, observation times, regularity of the trajectories, sampling scheme, and smoothing bandwidth, we prove a Central Limit theorem in the space of continuous functions. We also establish the uniform consistency of a covariance function estimator and apply the former results to build confidence bands for the mean function. The bands attain nominal coverage and are obtained through Gaussian process simulations conditional on the estimated covariance function. To select the bandwidth, we propose a cross-validation method that accounts for the sampling weights. A simulation study assesses the performance of our approach and highlights the influence of the sampling scheme and bandwidth choice.

preprint2013arXiv

Rotation Sampling for Functional Data

This paper addresses the survey estimation of a population mean in continuous time. For this purpose we extend the rotation sampling method to functional data. In contrast to conventional rotation designs that select the sample before the survey, our approach randomizes each sample replacement and thus allows for adaptive sampling. Using Markov chain theory, we evaluate the covariance structure and the integrated squared error [ISE] of the related Horvitz-Thompson estimator. Our sampling designs decrease the mean ISE by suitably reallocating the sample across population strata during replacements. They also reduce the variance of the ISE by increasing the frequency or the intensity of replacements. To investigate the benefits of using both current and past measurements in the estimation, we develop a new composite estimator. In an application to electricity usage data, our rotation method outperforms fixed panels and conventional rotation samples. Because of the weak temporal dependence of the data, the composite estimator only slightly improves upon the Horvitz-Thompson estimator.

preprint2011arXiv

Local Polynomial Regression Based on Functional Data

Suppose that $n$ statistical units are observed, each following the model $Y(x_j)=m(x_j)+ ε(x_j),\, j=1,...,N,$ where $m$ is a regression function, $0 \leq x_1 <...<x_N \leq 1$ are observation times spaced according to a sampling density $f$, and $ε$ is a continuous-time error process having mean zero and regular covariance function. Considering the local polynomial estimation of $m$ and its derivatives, we derive asymptotic expressions for the bias and variance as $n,N\to\infty$. Such results are particularly relevant in the context of functional data where essential information is contained in the derivatives. Based on these results, we deduce optimal sampling densities, optimal bandwidths and asymptotic normality of the estimator. Simulations are conducted in order to compare the performances of local polynomial estimators based on exact optimal bandwidths, asymptotic optimal bandwidths, and cross-validated bandwidths.

preprint2011arXiv

Nonparametric estimation of a trend based upon sampled continuous processes

Let X be a second order random process indexed by a compact interval [0,T]. Assume that n independent realizations of X are observed on a fixed grid of p time points. Under mild regularity assumptions on the sample paths of X, we show the asymptotic normality of suitable nonparametric estimators of the trend function mu = EX in the space C([0,T]) as n, p go to infinity and, using Gaussian process theory, we derive approximate simultaneous confidence bands for mu.