Source author record

Shurong Zheng

Shurong Zheng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Methodology Statistics Theory Artificial Intelligence Computer Vision

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) have demonstrated impressive progress in single-image grounding and general multi-image understanding. Recently, some methods begin to address multi-image grounding. However, they are constrained by single-target localization and limited types of practical tasks, due to the lack of unified modeling for generalized grounding tasks. Therefore, we propose GeM-VG, an MLLM capable of Generalized Multi-image Visual Grounding. To support this, we systematically categorize and organize existing multi-image grounding tasks according to their reliance of cross-image cues and reasoning, and introduce the MG-Data-240K dataset, addressing the limitations of existing datasets regarding target quantity and image relation. To tackle the challenges of robustly handling diverse multi-image grounding tasks, we further propose a hybrid reinforcement finetuning strategy that integrates chain-of-thought (CoT) reasoning and direct answering, considering their complementary strengths. This strategy adopts an R1-like algorithm guided by a carefully designed rule-based reward, effectively enhancing the model's overall perception and reasoning capabilities. Extensive experiments demonstrate the superior generalized grounding capabilities of our model. For multi-image grounding, it outperforms the previous leading MLLMs by 2.0% and 9.7% on MIG-Bench and MC-Bench, respectively. In single-image grounding, it achieves a 9.1% improvement over the base model on ODINW. Furthermore, our model retains strong capabilities in general multi-image understanding.

preprint2022arXiv

Adaptive Tests for Bandedness of High-dimensional Covariance Matrices

Estimation of the high-dimensional banded covariance matrix is widely used in multivariate statistical analysis. To ensure the validity of estimation, we aim to test the hypothesis that the covariance matrix is banded with a certain bandwidth under the high-dimensional framework. Though several testing methods have been proposed in the literature, the existing tests are only powerful for some alternatives with certain sparsity levels, whereas they may not be powerful for alternatives with other sparsity structures. The goal of this paper is to propose a new test for the bandedness of high-dimensional covariance matrix, which is powerful for alternatives with various sparsity levels. The proposed new test also be used for testing the banded structure of covariance matrices of error vectors in high-dimensional factor models. Based on these statistics, a consistent bandwidth estimator is also introduced for a banded high dimensional covariance matrix. Extensive simulation studies and an application to a prostate cancer dataset from protein mass spectroscopy are conducted for evaluating the effectiveness of the proposed adaptive tests blue and bandwidth estimator for the banded covariance matrix.

preprint2022arXiv

On block-wise and reference panel-based estimators for genetic data prediction in high dimensions

Genetic prediction of complex traits and diseases has attracted enormous attention in precision medicine, mainly because it has the potential to translate discoveries from genome-wide association studies (GWAS) into medical advances. As the high dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants has a block-diagonal structure, many existing methods attempt to account for the dependence among variants in predetermined local LD blocks/regions. Moreover, due to privacy restrictions and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training dataset. This paper presents a unified analysis of block-wise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, block-wise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training dataset and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training dataset. This analysis is based on our novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and the large-scale UK Biobank real data analysis of 36 complex traits.

preprint2014arXiv

CLT for large dimensional general Fisher matrices and its applications in high-dimensional data analysis

Random Fisher matrices arise naturally in multivariate statistical analysis and understanding the properties of its eigenvalues is of primary importance for many hypothesis testing problems like testing the equality between two multivariate population covariance matrices, or testing the independence between sub-groups of a multivariate random vector. This paper is concerned with the properties of a large-dimensional Fisher matrix when the dimension of the population is proportionally large compared to the sample size. Most of existing works on Fisher matrices deal with a particular Fisher matrix where populations have i.i.d components so that the population covariance matrices are all identity. In this paper, we consider general Fisher matrices with arbitrary population covariance matrices. The first main result of the paper establishes the limiting distribution of the eigenvalues of a Fisher matrix while in a second main result, we provide a central limit theorem for a wide class of functionals of its eigenvalues. Some applications of these results are also proposed for testing hypotheses on high-dimensional covariance matrices.

preprint2014arXiv

Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing

Sample covariance matrices are widely used in multivariate statistical analysis. The central limit theorems (CLT's) for linear spectral statistics of high-dimensional non-centered sample covariance matrices have received considerable attention in random matrix theory and have been applied to many high-dimensional statistical problems. However, known population mean vectors are assumed for non-centered sample covariance matrices, some of which even assume Gaussian-like moment conditions. In fact, there are still another two most frequently used sample covariance matrices: the MLE (by subtracting the sample mean vector from each sample vector) and the unbiased sample covariance matrix (by changing the denominator $n$ as $N=n-1$ in the MLE) without depending on unknown population mean vectors. In this paper, we not only establish new CLT's for non-centered sample covariance matrices without Gaussian-like moment conditions but also characterize the non-negligible differences among the CLT's for the three classes of high-dimensional sample covariance matrices by establishing a {\em substitution principle}: substitute the {\em adjusted} sample size $N=n-1$ for the actual sample size $n$ in the major centering term of the new CLT's so as to obtain the CLT of the unbiased sample covariance matrices. Moreover, it is found that the difference between the CLT's for the MLE and unbiased sample covariance matrix is non-negligible in the major centering term although the two sample covariance matrices only have differences $n$ and $n-1$ on the dominator. The new results are applied to two testing problems for high-dimensional data.

preprint2013arXiv

A Note on Central Limit Theorems for Linear Spectral Statistics of Large Dimensional F-matrix

Sample covariance matrix and multivariate $F$-matrix play important roles in multivariate statistical analysis. The central limit theorems {\sl (CLT)} of linear spectral statistics associated with these matrices were established in Bai and Silverstein (2004) and Zheng (2012) which received considerable attentions and have been applied to solve many large dimensional statistical problems. However, the sample covariance matrices used in these papers are not centralized and there exist some questions about CLT's defined by the centralized sample covariance matrices. In this note, we shall provide some short complements on the CLT's in Bai and Silverstein (2004) and Zheng (2012), and show that the results in these two papers remain valid for the centralized sample covariance matrices, provided that the ratios of dimension $p$ to sample sizes $(n,n_1,n_2)$ are redefined as $p/(n-1)$ and $p/(n_i-1)$, $i=1,2$, respectively.

preprint2013arXiv

CLT for linear spectral statistics of random matrix $S^{-1}T$

This paper proposes a CLT for linear spectral statistics of random matrix $S^{-1}T$ for a general non-negative definite and {\bf non-random} Hermitian matrix $T$.

preprint2009arXiv

Corrections to LRT on Large Dimensional Covariance Matrix by RMT

In this paper, we give an explanation to the failure of two likelihood ratio procedures for testing about covariance matrices from Gaussian populations when the dimension is large compared to the sample size. Next, using recent central limit theorems for linear spectral statistics of sample covariance matrices and of random F-matrices, we propose necessary corrections for these LR tests to cope with high-dimensional effects. The asymptotic distributions of these corrected tests under the null are given. Simulations demonstrate that the corrected LR tests yield a realized size close to nominal level for both moderate p (around 20) and high dimension, while the traditional LR tests with chi-square approximation fails. Another contribution from the paper is that for testing the equality between two covariance matrices, the proposed correction applies equally for non-Gaussian populations yielding a valid pseudo-likelihood ratio test.

preprint2007arXiv

Variable Selection Incorporating Prior Constraint Information into Lasso

We propose the variable selection procedure incorporating prior constraint information into lasso. The proposed procedure combines the sample and prior information, and selects significant variables for responses in a narrower region where the true parameters lie. It increases the efficiency to choose the true model correctly. The proposed procedure can be executed by many constrained quadratic programming methods and the initial estimator can be found by least square or Monte Carlo method. The proposed procedure also enjoys good theoretical properties. Moreover, the proposed procedure is not only used for linear models but also can be used for generalized linear models({\sl GLM}), Cox models, quantile regression models and many others with the help of Wang and Leng (2007)'s LSA, which changes these models as the approximation of linear models. The idea of combining sample and prior constraint information can be also used for other modified lasso procedures. Some examples are used for illustration of the idea of incorporating prior constraint information in variable selection procedures.