Source author record

Mark Tygert

Mark Tygert appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Computation Machine Learning math.ST Statistics Theory Neural and Evolutionary Computing Numerical Analysis math.OC Artificial Intelligence cs.CY eess.SY Mathematical Software Systems and Control

Catalog footprint

What is connected

19works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Ties in ranking scores can be treated as weighted samples

Prior proposals for cumulative statistics suggest making tiny random perturbations to the scores (independent variables in a regression) in order to ensure the scores' uniqueness. Uniqueness means that no score for any member of the population or subpopulation being analyzed is exactly equal to any other member's score. It turns out to be possible to construct from the original data a weighted data set that modifies the scores, weights, and responses (dependent variables in the regression) such that the new scores are unique and (together with the new weights and responses) yield the desired cumulative statistics for the original data. This reduces the problem of analyzing data with scores that may not be unique to the problem of analyzing a weighted data set with scores that are unique by construction. Recent proposals for cumulative statistics have already detailed how to process weighted samples whose scores are unique.

preprint2020arXiv

An optimizable scalar objective value cannot be objective and should not be the sole objective

This paper concerns the ethics and morality of algorithms and computational systems, and has been circulating internally at Facebook for the past couple years. The paper reviews many Nobel laureates' work, as well as the work of other prominent scientists such as Richard Dawkins, Andrei Kolmogorov, Vilfredo Pareto, and John von Neumann. The paper draws conclusions based on such works, as summarized in the title. The paper argues that the standard approach to modern machine learning and artificial intelligence is bound to be biased and unfair, and that longstanding traditions in the professions of law, justice, politics, and medicine should help.

preprint2020arXiv

Plots of the cumulative differences between observed and expected values of ordered Bernoulli variates

Many predictions are probabilistic in nature; for example, a prediction could be for precipitation tomorrow, but with only a 30 percent chance. Given both the predictions and the actual outcomes, "reliability diagrams" (also known as "calibration plots") help detect and diagnose statistically significant discrepancies between the predictions and the outcomes. The canonical reliability diagrams are based on histogramming the observed and expected values of the predictions; several variants of the standard reliability diagrams propose to replace the hard histogram binning with soft kernel density estimation using smooth convolutional kernels of widths similar to the widths of the bins. In all cases, an important question naturally arises: which widths are best (or are multiple plots with different widths better)? Rather than answering this question, plots of the cumulative differences between the observed and expected values largely avoid the question, by displaying miscalibration directly as the slopes of secant lines for the graphs. Slope is easy to perceive with quantitative precision even when the constant offsets of the secant lines are irrelevant. There is no need to bin or perform kernel density estimation with a somewhat arbitrary kernel.

preprint2016arXiv

Convolutional networks and learning invariant to homogeneous multiplicative scalings

The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification stage turns out to be more robust than multinomial logistic regression, appears to result in slightly lower errors on several standard test sets, has similar computational costs, and features precise control over the actual rate of learning. "Scale-invariant" means that multiplying the input values by any nonzero scalar leaves the output unchanged.

preprint2016arXiv

Poor starting points in machine learning

Poor (even random) starting points for learning/training/optimization are common in machine learning. In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points -- indeed, for poor starting points Nesterov acceleration can help during the initial iterations, even though Nesterov methods not designed for stochastic approximation could hurt during later iterations. The common practice of training with nontrivial minibatches enhances the advantage of Nesterov acceleration.

preprint2015arXiv

A mathematical motivation for complex-valued convolutional networks

A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors followed by (2) taking the absolute value of every entry of the resulting vectors followed by (3) local averaging. For processing real-valued random vectors, complex-valued convnets can be viewed as "data-driven multiscale windowed power spectra," "data-driven multiscale windowed absolute spectra," "data-driven multiwavelet absolute values," or (in their most general configuration) "data-driven nonlinear multiwavelet packets." Indeed, complex-valued convnets can calculate multiscale windowed spectra when the convnet filters are windowed complex-valued exponentials. Standard real-valued convnets, using rectified linear units (ReLUs), sigmoidal (for example, logistic or tanh) nonlinearities, max. pooling, etc., do not obviously exhibit the same exact correspondence with data-driven wavelets (whereas for complex-valued convnets, the correspondence is much more than just a vague analogy). Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to (complex-valued) convnets.

preprint2014arXiv

An implementation of a randomized algorithm for principal component analysis

Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis (PCA) and the calculation of truncated singular value decompositions (SVD). The present paper presents an essentially black-box, fool-proof implementation for Mathworks' MATLAB, a popular software platform for numerical computation. As illustrated via several tests, the randomized algorithms for low-rank approximation outperform or at least match the classical techniques (such as Lanczos iterations) in basically all respects: accuracy, computational efficiency (both speed and memory usage), ease-of-use, parallelizability, and reliability. However, the classical procedures remain the methods of choice for estimating spectral norms, and are far superior for calculating the least singular values and corresponding singular vectors (or singular subspaces).

preprint2013arXiv

Significance testing without truth

A popular approach to significance testing proposes to decide whether the given hypothesized statistical model is likely to be true (or false). Statistical decision theory provides a basis for this approach by requiring every significance test to make a decision about the truth of the hypothesis/model under consideration. Unfortunately, many interesting and useful models are obviously false (that is, not exactly true) even before considering any data. Fortunately, in practice a significance test need only gauge the consistency (or inconsistency) of the observed data with the assumed hypothesis/model -- without enquiring as to whether the assumption is likely to be true (or false), or whether some alternative is likely to be true (or false). In this practical formulation, a significance test rejects a hypothesis/model only if the observed data is highly improbable when calculating the probability while assuming the hypothesis being tested; the significance test only gauges whether the observed data likely invalidates the assumed hypothesis, and cannot decide that the assumption -- however unmistakably false -- is likely to be false a priori, without any data.

preprint2013arXiv

Testing goodness-of-fit for logistic regression

Explicitly accounting for all applicable independent variables, even when the model being tested does not, is critical in testing goodness-of-fit for logistic regression. This can increase statistical power by orders of magnitude.

preprint2012arXiv

A comparison of the discrete Kolmogorov-Smirnov statistic and the Euclidean distance

Goodness-of-fit tests gauge whether a given set of observations is consistent (up to expected random fluctuations) with arising as independent and identically distributed (i.i.d.) draws from a user-specified probability distribution known as the "model." The standard gauges involve the discrepancy between the model and the empirical distribution of the observed draws. Some measures of discrepancy are cumulative; others are not. The most popular cumulative measure is the Kolmogorov-Smirnov statistic; when all probability distributions under consideration are discrete, a natural noncumulative measure is the Euclidean distance between the model and the empirical distributions. In the present paper, both mathematical analysis and its illustration via various data sets indicate that the Kolmogorov-Smirnov statistic tends to be more powerful than the Euclidean distance when there is a natural ordering for the values that the draws can take -- that is, when the data is ordinal -- whereas the Euclidean distance is more reliable and more easily understood than the Kolmogorov-Smirnov statistic when there is no natural ordering (or partial order) -- that is, when the data is nominal.

preprint2012arXiv

Computing the asymptotic power of a Euclidean-distance test for goodness-of-fit

A natural (yet unconventional) test for goodness-of-fit measures the discrepancy between the model and empirical distributions via their Euclidean distance (or, equivalently, via its square). The present paper characterizes the statistical power of such a test against a family of alternative distributions, in the limit that the number of observations is large, with every alternative departing from the model in the same direction. Specifically, the paper provides an efficient numerical method for evaluating the cumulative distribution function (cdf) of the square of the Euclidean distance between the model and empirical distributions under the alternatives, in the limit that the number of observations is large. The paper illustrates the scheme by plotting the asymptotic power (as a function of the significance level) for several examples.

preprint2012arXiv

Testing the significance of assuming homogeneity in contingency-tables/cross-tabulations

The model for homogeneity of proportions in a two-way contingency-table/cross-tabulation is the same as the model of independence, except that the probabilistic process generating the data is viewed as fixing the column totals (but not the row totals). When gauging the consistency of observed data with the assumption of independence, recent work has illustrated that the Euclidean/Frobenius/Hilbert-Schmidt distance is often far more statistically powerful than the classical statistics such as chi-square, the log-likelihood-ratio (G), the Freeman-Tukey/Hellinger distance, and other members of the Cressie-Read power-divergence family. The present paper indicates that the Euclidean/Frobenius/Hilbert-Schmidt distance can be more powerful for gauging the consistency of observed data with the assumption of homogeneity, too.

preprint2011arXiv

An algorithm for the principal component analysis of large data sets

Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy --- even on parallel processors --- unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently "out-of-core.") We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.

preprint2011arXiv

Chi-square and classical exact tests often wildly misreport significance; the remedy lies in computers

If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson chi-square statistic can involve division by nearly zero. This often leads to serious trouble in practice -- even in the absence of round-off errors -- as the present article illustrates via numerous examples. Fortunately, with the now widespread availability of computers, avoiding all the trouble is simple and easy: without the problematic division by nearly zero, the actual values taken by goodness-of-fit statistics are not humanly interpretable, but black-box computer programs can rapidly calculate their precise significance.

preprint2011arXiv

Computing the confidence levels for a root-mean-square test of goodness-of-fit

The classic chi-squared statistic for testing goodness-of-fit has long been a cornerstone of modern statistical practice. The statistic consists of a sum in which each summand involves division by the probability associated with the corresponding bin in the distribution being tested for goodness-of-fit. Typically this division should precipitate rebinning to uniformize the probabilities associated with the bins, in order to make the test reasonably powerful. With the now widespread availability of computers, there is no longer any need for this. The present paper provides efficient black-box algorithms for calculating the asymptotic confidence levels of a variant on the classic chi-squared test which omits the problematic division. In many circumstances, it is also feasible to compute the exact confidence levels via Monte Carlo simulation.

preprint2011arXiv

Computing the confidence levels for a root-mean-square test of goodness-of-fit, II

This paper extends our earlier article, "Computing the confidence levels for a root-mean-square test of goodness-of-fit;" unlike in the earlier article, the models in the present paper involve parameter estimation -- both the null and alternative hypotheses in the associated tests are composite. We provide efficient black-box algorithms for calculating the asymptotic confidence levels of a variant on the classic chi-squared test. In some circumstances, it is also feasible to compute the exact confidence levels via Monte Carlo simulation.

preprint2010arXiv

Fast algorithms for spherical harmonic expansions, III

We accelerate the computation of spherical harmonic transforms, using what is known as the butterfly scheme. This provides a convenient alternative to the approach taken in the second paper from this series on "Fast algorithms for spherical harmonic expansions." The requisite precomputations become manageable when organized as a "depth-first traversal" of the program's control-flow graph, rather than as the perhaps more natural "breadth-first traversal" that processes one-by-one each level of the multilevel procedure. We illustrate the results via several numerical examples.

preprint2010arXiv

Statistical tests for whether a given set of independent, identically distributed draws does not come from a specified probability density

We discuss several tests for whether a given set of independent and identically distributed (i.i.d.) draws does not come from a specified probability density function. The most commonly used are Kolmogorov-Smirnov tests, particularly Kuiper's variant, which focus on discrepancies between the cumulative distribution function for the specified probability density and the empirical cumulative distribution function for the given set of i.i.d. draws. Unfortunately, variations in the probability density function often get smoothed over in the cumulative distribution function, making it difficult to detect discrepancies in regions where the probability density is small in comparison with its values in surrounding regions. We discuss tests without this deficiency, complementing the classical methods. The tests of the present paper are based on the plain fact that it is unlikely to draw a random number whose probability is small, provided that the draw is taken from the same distribution used in calculating the probability (thus, if we draw a random number whose probability is small, then we can be confident that we did not draw the number from the same distribution used in calculating the probability).

preprint2009arXiv

A fast randomized algorithm for orthogonal projection

We describe an algorithm that, given any full-rank matrix A having fewer rows than columns, can rapidly compute the orthogonal projection of any vector onto the null space of A, as well as the orthogonal projection onto the row space of A, provided that both A and its adjoint can be applied rapidly to arbitrary vectors. As an intermediate step, the algorithm solves the overdetermined linear least-squares regression involving the adjoint of A (and so can be used for this, too). The basis of the algorithm is an obvious but numerically unstable scheme; suitable use of a preconditioner yields numerical stability. We generate the preconditioner rapidly via a randomized procedure that succeeds with extremely high probability. In many circumstances, the method can accelerate interior-point methods for convex optimization, such as linear programming (Ming Gu, personal communication).

Mark Tygert

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Ties in ranking scores can be treated as weighted samples

An optimizable scalar objective value cannot be objective and should not be the sole objective

Plots of the cumulative differences between observed and expected values of ordered Bernoulli variates

Convolutional networks and learning invariant to homogeneous multiplicative scalings

Poor starting points in machine learning

A mathematical motivation for complex-valued convolutional networks

An implementation of a randomized algorithm for principal component analysis

Significance testing without truth

Testing goodness-of-fit for logistic regression

A comparison of the discrete Kolmogorov-Smirnov statistic and the Euclidean distance

Computing the asymptotic power of a Euclidean-distance test for goodness-of-fit

Testing the significance of assuming homogeneity in contingency-tables/cross-tabulations

An algorithm for the principal component analysis of large data sets

Chi-square and classical exact tests often wildly misreport significance; the remedy lies in computers

Computing the confidence levels for a root-mean-square test of goodness-of-fit

Computing the confidence levels for a root-mean-square test of goodness-of-fit, II

Fast algorithms for spherical harmonic expansions, III

Statistical tests for whether a given set of independent, identically distributed draws does not come from a specified probability density

A fast randomized algorithm for orthogonal projection