Source author record

Sarit Agami

Sarit Agami appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Machine Learning physics.ao-ph physics.soc-ph stat.OT

Catalog footprint

What is connected

5works

6topics

2close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Correcting for Measurement Error in Segmented Cox Model

Measurement error in the covariate of main interest (e.g. the exposure variable, or the risk factor) is common in epidemiologic and health studies. It can effect the relative risk estimator or other types of coefficients derived from the fitted regression model. In order to perform a measurement error analysis, one needs information about the error structure. Two sources of validation data are an internal subset of the main data, and external or independent study. For the both sources, the true covariate is measured (that is, without error), or alternatively, its surrogate, which is error-prone covariate, is measured several times (repeated measures). This paper compares the precision in estimation via the different validation sources in the Cox model with a changepoint in the main covariate, using the bias correction methods RC and RR. The theoretical properties under each validation source is presented. In a simulation study it is found that the best validation source in terms of smaller mean square error and narrower confidence interval is the internal validation with measure of the true covariate in a common disease case, and the external validation with repeated measures of the surrogate for a rare disease case. In addition, it is found that addressing the correlation between the true covariate and its surrogate, and the value of the changepoint, is needed, especially in the rare disease case.

preprint2022arXiv

Improved Modeling of Persistence Diagram

High-dimensional reduction methods are powerful tools for describing the main patterns in big data. One of these methods is the topological data analysis (TDA), which modeling the shape of the data in terms of topological properties. This method specifically translates the original data into two-dimensional system, which is graphically represented via the 'persistence diagram'. The outliers points on this diagram present the data pattern, whereas the other points behave as a random noise. In order to determine which points are significant outliers, replications of the original data set are needed. Once only one original data is available, replications can be created by fitting a model for the points on the persistence diagram, and then using the MCMC methods. One of such model is the RST (Replicating Statistical Topology). In this paper we suggest a modification of the RST model. Using a simulation study, we show that the modified RST improves the performance of the RST in terms of goodness of fit. We use the MCMC Metropolis-Hastings algorithm for sampling according to the fitted model.

preprint2020arXiv

Comparison of Persistence Diagrams

Topological Data Analysis (TDA) is an approach to handle with big data by studying its shape. A main tool of TDA is the persistence diagram, and one can use it to compare data sets. One approach to learn on the similarity between two persistence diagrams is to use the Bottleneck and the Wasserstein distances. Another approach is to fit a parametric model for each diagram, and then to compare the model coefficients. We study the behaviour of both distance measures and the RST parametric model. The theoretical behaviour of the distance measures is difficult to be developed, and therefore we study their behaviour numerically. We conclude that the RST model has an advantage over the Bottleneck and the Wasserstein distances in sense that it can give a definite conclusion regarding the similarity between two persistence diagrams. More of that, a great advantage of the RST is its ability to distinguish between two data sets that are geometrically different but topologically are the same, which is impossible to have by the two distance measures.

preprint2020arXiv

Impact of COVID-19 on Air Quality in Israel

The COVID-19 pandemic has caused, in general, a sharp reduction in traffic and industrial activities. This in turn leaded to a reduction in air pollution around the world. It is important to quantity the amount of that reduction in order to estimate the influence weight of traffic and industrial activities over the total variation of air quality. The aim of this paper is to evaluate the impact of the COVID-19 outbreak on air pollution in Israel, which is considered one of the countries with a higher air pollution than other Western countries. The results reveal two main findings: 1. During the COVID-19 outbreak, relative to its earlier closest period, the pollution from transport, based on Nitrogen oxides, had reduced by 40$\%$ on average, whereas the pollution from industrial, based on Grand-level ozone, had increased by 34$\%$ on average. Relative to 2019, the COVID-19 outbreak caused a reduction in air pollution from transport and industrial as well. 2. The explanation percent of the time period of COVID-19 is at most 22$\%$ over the total variation of each pollutant amount.

preprint2017arXiv

Modeling and replicating statistical topology, and evidence for CMB non-homogeneity

Under the banner of `Big Data', the detection and classification of structure in extremely large, high dimensional, data sets, is, one of the central statistical challenges of our times. Among the most intriguing approaches to this challenge is `TDA', or `Topological Data Analysis', one of the primary aims of which is providing non-metric, but topologically informative, pre-analyses of data sets which make later, more quantitative analyses feasible. While TDA rests on strong mathematical foundations from Topology, in applications it has faced challenges due to an inability to handle issues of statistical reliability and robustness and, most importantly, in an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis -- the typical case for big data applications -- replications can be generated to allow for conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical procedure for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the approach in a novel and revealing analysis of CMB non-homogeneity.