Researcher profile

Heinrich Matzinger

Heinrich Matzinger contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2022arXiv

Invariance principle of random projection for the norm

Johnson-Lindenstrauss guarantees certain topological structure is preserved under random projections when project high dimensional deterministic vectors to low dimensional vectors. In this work, we try to understand how random matrix affect norms of random vectors. In particular we prove the distribution of the norm of random vector $X \in \mathbb{R}^n$, whose entries are i.i.d. random variables, is preserved by random projection $S:\mathbb{R}^n \to \mathbb{R}^m$. More precisely, \[ \frac{X^TS^TSX - mn}{\sqrt{σ^2 m^2n+2mn^2}} \xrightarrow[\quad m/n\to 0 \quad ]{ m,n\to \infty } \mathcal{N}(0,1) \] We also prove a concentration of the random norm transformed by either random projection or random embedding. Overall, our results showed random matrix has low distortion for the norm of random vectors with i.i.d. entries.

preprint2020arXiv

An Analytical Formula for Spectrum Reconstruction

We study the spectrum reconstruction technique. As is known to all, eigenvalues play an important role in many research fields and are foundation to many practical techniques such like PCA(Principal Component Analysis). We believe that related algorithms should perform better with more accurate spectrum estimation. There was an approximation formula proposed, however, they didn't give any proof. In our research, we show why the formula works. And when both number of features and dimension of space go to infinity, we find the order of error for the approximation formula, which is related to a constant $c$-the ratio of dimension of space and number of features.

preprint2012arXiv

A Monte Carlo Approach to the Fluctuation Problem in Optimal Alignments of Random Strings

The problem of determining the correct order of fluctuation of the optimal alignment score of two random strings of length $n$ has been open for several decades. It is known that the biased expected effect of a random letter-change on the optimal score implies an order of fluctuation linear in $\sqrt{n}$. However, in many situations where such a biased effect is observed empirically, it has been impossible to prove analytically. The main result of this paper shows that when the rescaled-limit of the optimal alignment score increases in a certain direction, then the biased effect exists. On the basis of this result one can quantify a confidence level for the existence of such a biased effect and hence of an order $\sqrt{n}$ fluctuation based on simulation of optimal alignments scores.This is an important step forward, as the correct order of fluctuation was previously known only for certain special distributions. To illustrate the usefulness of our new methodology, we apply it to optimal alignments of strings written in the DNA-alphabet. As scoring function, we use the BLASTZ default-substitution matrix together with a realistic gap penalty. BLASTZ is one of the most widely used sequence alignment methodologies in bioinformatics. For this DNA-setting, we show that with a high level of confidence, the fluctuation of the optimal alignment score is of order $Θ(\sqrt{n})$. An important special case of optimal alignment score is the Longest Common Subsequence (LCS) of random strings. For binary sequences with equiprobable symbols, the question of the fluctuation of the LCS remains open. The symmetry in that case does not allow for our method. On the other hand, in real-life DNA sequences, it is not the case that all letters occur with the same frequency. Thus, for many real life situations, our method allows to determine the order of the fluctuation up to a high confidence level.

preprint2012arXiv

Distribution of Aligned Letter Pairs in Optimal Alignments of Random Sequences

Considering the optimal alignment of two i.i.d. random sequences of length $n$, we show that when the scoring function is chosen randomly, almost surely the empirical distribution of aligned letter pairs in all optimal alignments converges to a unique limiting distribution as $n$ tends to infinity. This result is interesting because it helps understanding the microscopic path structure of a special type of last passage percolation problem with correlated weights, an area of long-standing open problems. Characterizing the microscopic path structure yields furthermore a robust alternative to optimal alignment scores for testing the relatedness of genetic sequences.

preprint2012arXiv

General approach to the fluctuations problem in random sequence comparison

We present a general approach to the problem of determining the asymptotic order of the variance of the optimal score between two independent random sequences defined over an arbitrary finite alphabet. Our general approach is based on identifying random variables driving the fluctuations of the optimal score and conveniently choosing functions of them which exhibit certain monotonicity properties. We show how our general approach establishes a common theoretical background for the techniques used by Matzinger et al. in a series of previous articles [6, 8, 20, 24, 26, 37] studying the same problem in especial cases. Additionally, we explicitely apply our general approach to study the fluctuations of the optimal score between two random sequences over a finite alphabet (closing the study as initiated in [26]) and of the length of the longest common subsequences between two random sequences with a certain block structure (generalizing part of [37]).

preprint2011arXiv

Information recovery from observations by a random walk having jump distribution with exponential tails

A {\it scenery} is a coloring $ξ$ of the integers. Let $\{S_t\}_{t\geq 0}$ be a recurrent random walk on the integers. Observing the scenery $ξ$ along the path of this random walk, one sees the color $χ_t:=ξ(S_t)$ at time $t$. The {\it scenery reconstruction problem} is concerned with recovering the scenery $ξ$, given only the sequence of observations $χ:=(χ_t)_{t\geq 0}$. The scenery reconstruction methods presented to date require the random walk to have bounded increments. Here, we present a new approach for random walks with unbounded increments which works when the tail of the increment distribution decays exponentially fast enough and the scenery has five colors.

preprint2010arXiv

Fluctuations of the Longest Common Subsequence for Sequences of Independent Blocks

The problem of the fluctuation of the Longest Common Subsequence (LCS) of two i.i.d. sequences of length $n>0$ has been open for decades. There exist contradicting conjectures on the topic. Chvatal and Sankoff conjectured in 1975 that asymptotically the order should be $n^{2/3}$, while Waterman conjectured in 1994 that asymptotically the order should be $n$. A contiguous substring consisting only of one type of symbol is called a block. In the present work, we determine the order of the fluctuation of the LCS for a special model of sequences consisting of i.i.d. blocks whose lengths are uniformly distributed on the set $\{l-1,l,l+1\}$, with $l$ a given positive integer. We showed that the fluctuation in this model is asymptotically of order $n$, which confirm Waterman's conjecture. For achieving this goal, we developed a new method which allows us to reformulate the problem of the order of the variance as a (relatively) low dimensional optimization problem.

preprint2010arXiv

Random modification effect in the size of the fluctuation of the LCS of two sequences of i.i.d. blocks

The problem of the order of the fluctuation of the Longest Common Subsequence (LCS) of two independent sequences has been open for decades. There exist contradicting conjectures on the topic, due to Chvatal - Sankoff in 1975 and Waterman in 1994. In the present article, we consider a special model of i.i.d. sequences made out of blocks. A block is a contiguous substring consisting only of one type of symbol. Our model allows only three possible block lengths, each been equiprobable picked up. In this context, we introduce a random operation (random modification) on the blocks of one of the sequences. In the present article, we develop the techniques to prove the following: if we suppose that the random modification increases the length of the LCS with high probability, then the order of the fluctuation of the LCS is as conjectured by Waterman. This result is a key technical part in the study of the size of the fluctuation of the LCS for sequences of i.i.d. blocks, developed by Matzinger and Torres.

preprint2010arXiv

The rate of the convergence of the mean score in random sequence comparison

We consider a general class of super-additive scores measuring the similarity of two independent sequences of $n$ i.i.d. letters from a finite alphabet. Our object of interest is the mean score by letter $l_n$. By the subadditivity $l_n$ is nondecreasing and converges to a limit $l$. We give a simple method of bounding the difference $l-l_n$ and obtaining the rate of convergence. Our result generalizes a previous result of Alexander, where only the special case of the longest common subsequence is considered.