Source author record

Piotr Fryzlewicz

Piotr Fryzlewicz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

10works
5topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2023arXiv

Detecting linear trend changes in data sequences

We propose TrendSegment, a methodology for detecting multiple change-points corresponding to linear trend changes in one dimensional data. A core ingredient of TrendSegment is a new Tail-Greedy Unbalanced Wavelet transform: a conditionally orthonormal, bottom-up transformation of the data through an adaptively constructed unbalanced wavelet basis, which results in a sparse representation of the data. Due to its bottom-up nature, this multiscale decomposition focuses on local features in its early stages and on global features next which enables the detection of both long and short linear trend segments at once. To reduce the computational complexity, the proposed method merges multiple regions in a single pass over the data. We show the consistency of the estimated number and locations of change-points. The practicality of our approach is demonstrated through simulations and two real data examples, involving Iceland temperature data and sea ice extent of the Arctic and the Antarctic. Our methodology is implemented in the R package trendsegmentR, available from CRAN.

preprint2020arXiv

Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection

Many existing procedures for detecting multiple change-points in data sequences fail in frequent-change-point scenarios. This article proposes a new change-point detection methodology designed to work well in both infrequent and frequent change-point settings. It is made up of two ingredients: one is "Wild Binary Segmentation 2" (WBS2), a recursive algorithm for producing what we call a `complete' solution path to the change-point detection problem, i.e. a sequence of estimated nested models containing $0, \ldots, T-1$ change-points, where $T$ is the data length. The other ingredient is a new model selection procedure, referred to as "Steepest Drop to Low Levels" (SDLL). The SDLL criterion acts on the WBS2 solution path, and, unlike many existing model selection procedures for change-point problems, it is not penalty-based, and only uses thresholding as a certain discrete secondary check. The resulting WBS2.SDLL procedure, combining both ingredients, is shown to be consistent, and to significantly outperform the competition in the frequent change-point scenarios tested. WBS2.SDLL is fast, easy to code and does not require the choice of a window or span parameter.

preprint2020arXiv

Detection of gamma-ray transients with wild binary segmentation

In the context of time domain astronomy, we present an offline detection search of gamma-ray transients using a wild binary segmentation analysis called FWBSB targeting both short and long gamma-ray bursts (GRBs) and covering the soft and hard gamma-ray bands. We use NASA Fermi/GBM archival data as a training and testing data set. This paper describes the analysis applied to the 12 NaI detectors of the Fermi/GBM instrument. This includes background removal, change-point detection that brackets the peaks of gamma-ray flares, the evaluation of significance for each individual GBM detector and the combination of the results among the detectors. We also explain the calibration of the 10 parameters present in the method using one week of archival data. Finally, we present our detection performance result for 60 days of a blind search analysis with FWBSB by comparing to both the on-board and offline GBM search as well as external events found by others surveys such as Swift-BAT. We detect 42/44 on-board GBM{events but also other gamma-ray flares at a rate of 1 per hour in the 4-50 keV band. Our results show that FWBSB is capable of recovering gamma-ray flares, including the detection of soft X-ray long transients. FWBSB offers an independent identification of GRBs in combination with methods for determining spectral and temporal properties of the transient as well as localization. This is particularly useful for increasing the GRB rate and that will help the joint detection with gravitational-wave events.

preprint2020arXiv

Exploiting disagreement between high-dimensional variable selectors for uncertainty visualization

We propose Combined Selection and Uncertainty Visualizer (CSUV), which estimates the set of true covariates in high-dimensional linear regression and visualizes selection uncertainties by exploiting the (dis)agreement among different base selectors. Our proposed method selects covariates that get selected the most frequently by the different variable selection methods on subsampled data. The method is generic and can be used with different existing variable selection methods. We demonstrate its variable selection performance using real and simulated data. The variable selection method and its uncertainty illustration tool are publicly available as R package CSUV (https://github.com/christineyuen/CSUV). The graphical tool is also available online via https://csuv.shinyapps.io/csuv

preprint2016arXiv

High-dimensional variable selection via tilting

The paper considers variable selection in linear regression models where the number of covariates is possibly much larger than the number of observations. High dimensionality of the data brings in many complications, such as (possibly spurious) high correlations between the variables, which result in marginal correlation being unreliable as a measure of association between the variables and the response. We propose a new way of measuring the contribution of each variable to the response which takes into account high correlations between the variables in a data-driven way. The proposed tilting procedure provides an adaptive choice between the use of marginal correlation and tilted correlation for each variable, where the choice is made depending on the values of the hard thresholded sample correlation of the design matrix. We study the conditions under which this measure can successfully discriminate between the relevant and the irrelevant variables and thus be used as a tool for variable selection. Finally, an iterative variable screening algorithm is constructed to exploit the theoretical properties of tilted correlation, and its good practical performance is demonstrated in a comparative simulation study.

preprint2016arXiv

Multiple-change-point detection for high dimensional time series via sparsified binary segmentation

Time series segmentation, a.k.a. multiple change-point detection, is a well-established problem. However, few solutions are designed specifically for high-dimensional situations. In this paper, our interest is in segmenting the second-order structure of a high-dimensional time series. In a generic step of a binary segmentation algorithm for multivariate time series, one natural solution is to combine CUSUM statistics obtained from local periodograms and cross-periodograms of the components of the input time series. However, the standard "maximum" and "average" methods for doing so often fail in high dimensions when, for example, the change-points are sparse across the panel or the CUSUM statistics are spuriously large. In this paper, we propose the Sparsified Binary Segmentation (SBS) algorithm which aggregates the CUSUM statistics by adding only those that pass a certain threshold. This "sparsifying" step reduces the impact of irrelevant, noisy contributions, which is particularly beneficial in high dimensions. In order to show the consistency of SBS, we introduce the multivariate Locally Stationary Wavelet model for time series, which is a separate contribution of this work.

preprint2016arXiv

Multiscale and multilevel technique for consistent segmentation of nonstationary time series

In this paper, we propose a fast, well-performing, and consistent method for segmenting a piecewise-stationary, linear time series with an unknown number of breakpoints. The time series model we use is the nonparametric Locally Stationary Wavelet model, in which a complete description of the piecewise-stationary second-order structure is provided by wavelet periodograms computed at multiple scales and locations. The initial stage of our method is a new binary segmentation procedure, with a theoretically justified and rapidly computable test criterion that detects breakpoints in wavelet periodograms separately at each scale. This is followed by within-scale and across-scales post-processing steps, leading to consistent estimation of the number and locations of breakpoints in the second-order structure of the original process. An extensive simulation study demonstrates good performance of our method.

preprint2016arXiv

Multiscale interpretation of taut string estimation and its connection to Unbalanced Haar wavelets

We compare two state-of-the-art non-linear techniques for nonparametric function estimation via piecewise constant approximation: the taut string and the Unbalanced Haar methods. While it is well-known that the latter is multiscale, it is not obvious that the former can also be interpreted as multiscale. We provide a unified multiscale representation for both methods, which offers an insight into the relationship between them as well as suggesting lessons both methods can learn from each other.

preprint2014arXiv

Wild binary segmentation for multiple change-point detection

We propose a new technique, called wild binary segmentation (WBS), for consistent estimation of the number and locations of multiple change-points in data. We assume that the number of change-points can increase to infinity with the sample size. Due to a certain random localisation mechanism, WBS works even for very short spacings between the change-points and/or very small jump magnitudes, unlike standard binary segmentation. On the other hand, despite its use of localisation, WBS does not require the choice of a window or span parameter, and does not lead to a significant increase in computational complexity. WBS is also easy to code. We propose two stopping criteria for WBS: one based on thresholding and the other based on what we term the `strengthened Schwarz information criterion'. We provide default recommended values of the parameters of the procedure and show that it offers very good practical performance in comparison with the state of the art. The WBS methodology is implemented in the R package wbs, available on CRAN. In addition, we provide a new proof of consistency of binary segmentation with improved rates of convergence, as well as a corresponding result for WBS.

preprint2011arXiv

Mixing properties of ARCH and time-varying ARCH processes

There exist very few results on mixing for non-stationary processes. However, mixing is often required in statistical inference for non-stationary processes such as time-varying ARCH (tvARCH) models. In this paper, bounds for the mixing rates of a stochastic process are derived in terms of the conditional densities of the process. These bounds are used to obtain the $α$, 2-mixing and $β$-mixing rates of the non-stationary time-varying $\operatorname {ARCH}(p)$ process and $\operatorname {ARCH}(\infty)$ process. It is shown that the mixing rate of the time-varying $\operatorname {ARCH}(p)$ process is geometric, whereas the bound on the mixing rate of the $\operatorname {ARCH}(\infty)$ process depends on the rate of decay of the $\operatorname {ARCH}(\infty)$ parameters. We note that the methodology given in this paper is applicable to other processes.