Source author record

Lutz Duembgen

Lutz Duembgen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Methodology Computation econ.EM math.PR

Catalog footprint

What is connected

24works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Accelerating the pool-adjacent-violators algorithm for isotonic distributional regression

In the context of estimating stochastically ordered distribution functions, the pool-adjacent-violators algorithm (PAVA) can be modified such that the computation times are reduced substantially. This is achieved by studying the dependence of antitonic weighted least squares fits on the response vector to be approximated.

preprint2022arXiv

Bounding distributional errors via density ratios

We present some new and explicit error bounds for the approximation of distributions. The approximation error is quantified by the maximal density ratio of the distribution $Q$ to be approximated and its proxy $P$. This non-symmetric measure is more informative than and implies bounds for the total variation distance. Explicit approximation problems include, among others, hypergeometric by binomial distributions, binomial by Poisson distributions, and beta by gamma distributions. In many cases we provide both upper and (matching) lower bounds.

preprint2022arXiv

Honest calibration assessment for binary outcome predictions

Probability predictions from binary regressions or machine learning methods ought to be calibrated: If an event is predicted to occur with probability $x$, it should materialize with approximately that frequency, which means that the so-called calibration curve $p(\cdot)$ should equal the identity, $p(x) = x$ for all $x$ in the unit interval. We propose honest calibration assessment based on novel confidence bands for the calibration curve, which are valid only subject to the natural assumption of isotonicity. Besides testing the classical goodness-of-fit null hypothesis of perfect calibration, our bands facilitate inverted goodness-of-fit tests whose rejection allows for the sought-after conclusion of a sufficiently well specified model. We show that our bands have a finite sample coverage guarantee, are narrower than existing approaches, and adapt to the local smoothness of the calibration curve $p$ and the local variance of the binary observations. In an application to model predictions of an infant having a low birth weight, the bounds give informative insights on model calibration.

preprint2022arXiv

Honest Confidence Bands for Isotonic Quantile Curves

We provide confidence bands for isotonic quantile curves in nonparametric univariate regression with guaranteed given coverage probability. The method is an adaptation of the confidence bands of Duembgen and Johns (2004) for isotonic median curves.

preprint2022arXiv

Refining Invariant Coordinate Selection via Local Projection Pursuit

Independent component selection (ICS), introduced by Tyler et al. (2009, JRSS B), is a powerful tool to find potentially interesting projections of multivariate data. In some cases, some of the projections proposed by ICS come close to really interesting ones, but little deviations can result in a blurred view which does not reveal the feature (e.g. a clustering) which would otherwise be clearly visible. To remedy this problem, we propose an automated and localized version of projection pursuit (PP), cf. Huber (1985, Ann. Statist.}. Precisely, our local search is based on gradient descent applied to estimated differential entropy as a function of the projection matrix.

preprint2020arXiv

Local Estimation of a Multivariate Density and its Derivatives

We analyze four different approaches to estimate a multivariate probability density (or the log-density) and its first and second order derivatives. Two methods, local log-likelihood and local Hyvärinen score estimation, are in terms of weighted scoring rules with local quadratic models. The other two approaches are matching of local moments and kernel density estimation. All estimators depend on a general kernel, and we use the Gaussian kernel to provide explicit examples. Asymptotic properties of the estimators are derived and compared. In terms of rates of convergence, a refined local moment matching estimator is the best.

preprint2020arXiv

The density ratio of Poisson binomial versus Poisson distributions

Let $b(x)$ be the probability that a sum of independent Bernoulli random variables with parameters $p_1, p_2, p_3, \ldots \in [0,1)$ equals $x$, where $λ:= p_1 + p_2 + p_3 + \cdots$ is finite. We prove two inequalities for the maximal ratio $b(x)/π_λ(x)$, where $π_λ$ is the weight function of the Poisson distribution with parameter $λ$.

preprint2019arXiv

Monotone Least Squares and Isotonic Quantiles

We consider bivariate observations $(X_1,Y_1), \ldots, (X_n,Y_n)$ such that, conditional on the $X_i$, the $Y_i$ are independent random variables with distribution functions $F_{X_i}$, where $(F_x)_x$ is an unknown family of distribution functions. Under the sole assumption that $x \mapsto F_x$ is isotonic with respect to stochastic order, one can estimate $(F_x)_x$ in two ways: (i) For any fixed $y$ one estimates the antitonic function $x \mapsto F_x(y)$ via nonparametric monotone least squares, replacing the responses $Y_i$ with the indicators $1_{[Y_i \le y]}$. (ii) For any fixed $β\in (0,1)$ one estimates the isotonic quantile function $x \mapsto F_x^{-1}(β)$ via a nonparametric version of regression quantiles. We show that these two approaches are closely related, with (i) being more flexible than (ii). Then, under mild regularity conditions, we establish rates of convergence for the resulting estimators $\hat{F}_x(y)$ and $\hat{F}_x^{-1}(β)$, uniformly over $(x,y)$ and $(x,β)$ in certain rectangles as well as uniformly in $y$ or $β$ for a fixed $x$.

preprint2016arXiv

(Ab)Using Regression for Data Adjustment

In various economic applications, people want to compare $n$ units with respect to certain quantities $Y_1, Y_2, \ldots, Y_n$ measuring their performance. The latter, however, is often influenced by certain factors which are beyond control of the units, and one would like to extract an adjusted performance from the data. Specifically, let $X_i \in \mathcal{X}$ summarize the factors of the $i$-th unit. Then one could think of a model equation $Y_i = f_o(X_i) + ε_i$ with a regression function $f_o : \mathcal{X} \to \mathbb{R}$ describing the unavoidable influence of the factors $X_i$ and $ε_i$ being the adjusted performance of the $i$-th unit. Now a common proposal is to estimate $f_o$ via regression methods by a function $\hat{f}$ depending on the current data $(X_i,Y_i)$, possibly augmented by additional past data, and to use the residuals $\hatε_i := Y_i - \hat{f}(X_i)$ as surrogates for the adjusted performances $ε_i$. In the present report we discuss this approach, its potential pitfalls and (mis)interpretation. In particular, an unavoidable property of the residuals $\hatε_i$ is that they measure only parts of the adjusted performance while the remaining parts get hidden in the estimated function $\hat{f}$. Possible alternatives are mentioned briefly.

preprint2016arXiv

Bi-log-concave distribution functions

Nonparametric statistics for distribution functions F or densities f=F' under qualitative shape constraints provides an interesting alternative to classical parametric or entirely nonparametric approaches. We contribute to this area by considering a new shape constraint: F is said to be bi-log-concave, if both log(F) and log(1 - F) are concave. Many commonly considered distributions are compatible with this constraint. For instance, any c.d.f. F with log-concave density f = F' is bi-log-concave. But in contrast to the latter constraint, bi-log-concavity allows for multimodal densities. We provide various characterizations. It is shown that combining any nonparametric confidence band for F with the new shape-constraint leads to substantial improvements, particularly in the tails. To pinpoint this, we show that these confidence bands imply non-trivial confidence bounds for arbitrary moments and the moment generating function of F.

preprint2016arXiv

Geodesic Convexity and Regularized Scatter Estimators

As observed by Auderset et al. (2005) and Wiesel (2012), viewing covariance matrices as elements of a Riemannian manifold and using the concept of geodesic convexity provide useful tools for studying M-estimators of multivariate scatter. In this paper, we begin with a mathematically rigorous self-contained overview of Riemannian geometry on the space of symmetric positive definite matrices and of the notion of geodesic convexity. The overview contains both a review as well as new results. In particular, we introduce and utilize first and second order Taylor expansions with respect to geodesic parametrizations. This enables us to give sufficient conditions for a function to be geodesically convex. In addition, we introduce the concept of geodesic coercivity, which is important in establishing the existence of a minimum to a geodesic convex function. We also develop a general partial Newton algorithm for minimizing smooth and strictly geodesically convex functions. We then use these results to generate a fairly complete picture of the existence, uniqueness and computation of regularized M-estimators of scatter defined using additive geodescially convex penalty terms. Various such penalties are demonstrated which shrink an estimator towards the identity matrix or multiples of the identity matrix. Finally, we propose a cross-validation method for choosing the scaling parameter for the penalty function, and illustrate our results using a numerical example.

preprint2016arXiv

On an Auxiliary Function for Log-Density Estimation

In this note we provide explicit expressions and expansions for a special function which appears in nonparametric estimation of log-densities. This function returns the integral of a log-linear function on a simplex of arbitrary dimension. In particular it is used in the R-package "LogCondDEAD" by Cule et al. (2007).

preprint2015arXiv

M-Functionals of Multivariate Scatter

This survey provides a self-contained account of $M$-estimation of multivariate scatter. In particular, we present new proofs for existence of the underlying $M$-functionals and discuss their weak continuity and differentiability. This is done in a rather general framework with matrix-valued random variables. By doing so we reveal a connection between Tyler's (1987) $M$-functional of scatter and the estimation of proportional covariance matrices. Moreover, this general framework allows us to treat a new class of scatter estimators, based on symmetrizations of arbitrary order. Finally these results are applied to $M$-estimation of multivariate location and scatter via multivariate $t$-distributions.

preprint2015arXiv

New Algorithms for $M$-Estimation of Multivariate Scatter and Location

We present new algorithms for $M$-estimators of multivariate scatter and location and for symmetrized $M$-estimators of multivariate scatter. The new algorithms are considerably faster than currently used fixed-point and related algorithms. The main idea is to utilize a second order Taylor expansion of the target functional and to devise a partial Newton-Raphson procedure. In connection with symmetrized $M$-estimators we work with incomplete $U$-statistics to accelerate our procedures initially.

preprint2014arXiv

Maximum-Likelihood Estimation of a Log-Concave Density based on Censored Data

We consider nonparametric maximum-likelihood estimation of a log-concave density in case of interval-censored, right-censored and binned data. We allow for the possibility of a subprobability density with an additional mass at $+\infty$, which is estimated simultaneously. The existence of the estimator is proved under mild conditions and various theoretical aspects are given, such as certain shape and consistency properties. An EM algorithm is proposed for the approximate computation of the estimator and its performance is illustrated in two examples.

preprint2013arXiv

Optimal Confidence Bands for Shape-Restricted Curves

Let $Y$ be a stochastic process on $[0,1]$ satisfying $dY(t) = n^{1/2} f(t) dt + dW(t)$, where $n \ge 1$ is a given scale parameter (``sample size''), $W$ is standard Brownian motion and $f$ is an unknown function. Utilizing suitable multiscale tests we construct confidence bands for $f$ with guaranteed given coverage probability, assuming that $f$ is isotonic or convex. These confidence bands are computationally feasible and shown to be asymptotically sharp optimal in an appropriate sense.

preprint2012arXiv

Multiscale Methods for Shape Constraints in Deconvolution: Confidence Statements for Qualitative Features

We derive multiscale statistics for deconvolution in order to detect qualitative features of the unknown density. An important example covered within this framework is to test for local monotonicity on all scales simultaneously. We investigate the moderately ill-posed setting, where the Fourier transform of the error density in the deconvolution model is of polynomial decay. For multiscale testing, we consider a calibration, motivated by the modulus of continuity of Brownian motion. We investigate the performance of our results from both the theoretical and simulation based point of view. A major consequence of our work is that the detection of qualitative features of a density in a deconvolution problem is a doable task although the minimax rates for pointwise estimation are very slow.

preprint2011arXiv

Active Set and EM Algorithms for Log-Concave Densities Based on Complete and Censored Data

We develop an active set algorithm for the maximum likelihood estimation of a log-concave density based on complete data. Building on this fast algorithm, we indidate an EM algorithm to treat arbitrarily censored or binned data.

preprint2011arXiv

Approximation by log-concave distributions, with applications to regression

We study the approximation of arbitrary distributions $P$ on $d$-dimensional space by distributions with log-concave density. Approximation means minimizing a Kullback--Leibler-type functional. We show that such an approximation exists if and only if $P$ has finite first moments and is not supported by some hyperplane. Furthermore we show that this approximation depends continuously on $P$ with respect to Mallows distance $D_1(\cdot,\cdot)$. This result implies consistency of the maximum likelihood estimator of a log-concave density under fairly general conditions. It also allows us to prove existence and consistency of estimators in regression models with a response $Y=μ(X)+ε$, where $X$ and $ε$ are independent, $μ(\cdot)$ belongs to a certain class of regression functions while $ε$ is a random error with log-concave density and mean zero.

preprint2011arXiv

On Low-Dimensional Projections of High-Dimensional Distributions

Let $P$ be a probability distribution on $q$-dimensional space. The so-called Diaconis-Freedman effect means that for a fixed dimension $d << q$, most $d$-dimensional projections of $P$ look like a scale mixture of spherically symmetric Gaussian distributions. The present paper provides necessary and sufficient conditions for this phenomenon in a suitable asymptotic framework with increasing dimension $q$. It turns out, that the conditions formulated by Diaconis and Freedman (1984) are not only sufficient but necessary as well. Moreover, letting $\hat{P}$ be the empirical distribution of $n$ independent random vectors with distribution $P$, we investigate the behavior of the empirical process $\sqrt{n}(\hat{P} - P)$ under random projections, conditional on $\hat{P}$.

preprint2011arXiv

Stochastic Search for Semiparametric Linear Regression Models

This paper introduces and analyzes a stochastic search method for parameter estimation in linear regression models in the spirit of Beran and Millar (1987). The idea is to generate a random finite subset of a parameter space which will automatically contain points which are very close to an unknown true parameter. The motivation for this procedure comes from recent work of Duembgen, Samworth and Schuhmacher (2011) on regression models with log-concave error distributions.

preprint2010arXiv

Bounding Standard Gaussian Tail Probabilities

We review various inequalities for Mills' ratio (1 - Φ)/ϕ, where ϕand Φdenote the standard Gaussian density and distribution function, respectively. Elementary considerations involving finite continued fractions lead to a general approximation scheme which implies and refines several known bounds.

preprint2009arXiv

Least Squares and Shrinkage Estimation under Bimonotonicity Constraints

In this paper we describe active set type algorithms for minimization of a smooth function under general order constraints, an important case being functions on the set of bimonotone r-by-s matrices. These algorithms can be used, for instance, to estimate a bimonotone regression function via least squares or (a smooth approximation of) least absolute deviations. Another application is shrinkage estimation in image denoising or, more generally, regression problems with two ordinal factors after representing the data in a suitable basis which is indexed by pairs (i,j) in {1,...,r}x{1,...,s}. Various numerical examples illustrate our methods.

preprint2009arXiv

Nemirovski's Inequalities Revisited

An important tool for statistical research are moment inequalities for sums of independent random vectors. Nemirovski and coworkers (1983, 2000) derived one particular type of such inequalities: For certain Banach spaces $(\B,\|\cdot\|)$ there exists a constant $K = K(\B,\|\cdot\|)$ such that for arbitrary independent and centered random vectors $X_1, X_2, ..., X_n \in \B$, their sum $S_n$ satisfies the inequality $ E \|S_n \|^2 \le K \sum_{i=1}^n E \|X_i\|^2$. We present and compare three different approaches to obtain such inequalities: Nemirovski's results are based on deterministic inequalities for norms. Another possible vehicle are type and cotype inequalities, a tool from probability theory on Banach spaces. Finally, we use a truncation argument plus Bernstein's inequality to obtain another version of the moment inequality above. Interestingly, all three approaches have their own merits.

Lutz Duembgen

What is connected

Connect this record

See the researcher in context

Building this map preview

24 published item(s)

Accelerating the pool-adjacent-violators algorithm for isotonic distributional regression

Bounding distributional errors via density ratios

Honest calibration assessment for binary outcome predictions

Honest Confidence Bands for Isotonic Quantile Curves

Refining Invariant Coordinate Selection via Local Projection Pursuit

Local Estimation of a Multivariate Density and its Derivatives

The density ratio of Poisson binomial versus Poisson distributions

Monotone Least Squares and Isotonic Quantiles

(Ab)Using Regression for Data Adjustment

Bi-log-concave distribution functions

Geodesic Convexity and Regularized Scatter Estimators

On an Auxiliary Function for Log-Density Estimation

M-Functionals of Multivariate Scatter

New Algorithms for $M$-Estimation of Multivariate Scatter and Location

Maximum-Likelihood Estimation of a Log-Concave Density based on Censored Data

Optimal Confidence Bands for Shape-Restricted Curves

Multiscale Methods for Shape Constraints in Deconvolution: Confidence Statements for Qualitative Features

Active Set and EM Algorithms for Log-Concave Densities Based on Complete and Censored Data

Approximation by log-concave distributions, with applications to regression

On Low-Dimensional Projections of High-Dimensional Distributions

Stochastic Search for Semiparametric Linear Regression Models

Bounding Standard Gaussian Tail Probabilities

Least Squares and Shrinkage Estimation under Bimonotonicity Constraints

Nemirovski's Inequalities Revisited