Researcher profile

Cong Ma

Cong Ma contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

When Are Trade-Off Functions Testable from Finite Samples?

We study finite-sample inference for the trade-off function of two unknown probability distributions, the function that traces the optimal type I/type II error frontier in binary testing. Given samples from distributions $P$ and $Q$, we consider the problem of testing whether their trade-off function lies above a benchmark curve $f_0$ or falls below a weaker benchmark $f_1$. Without structural restrictions, this problem is impossible uniformly over nonparametric classes. We identify a sharp condition under which it becomes possible. The key structural assumption is that the Neyman--Pearson rejection regions for $(P,Q)$ are attainable, up to null sets, by a prescribed class $S$ of measurable sets. Within this exact attainability framework, finite Vapnik--Chervonenkis dimension of $S$ is both sufficient and necessary for nontrivial finite-sample testing. We construct a test with nonasymptotic error guarantees: type I error control is valid without assuming attainability, while power holds uniformly over attainable alternatives satisfying an explicit separation condition. By inverting the test, we also obtain simultaneous confidence bands for the whole trade-off curve. Finally, we study the sharpness and robustness of the procedure. In the monotone likelihood-ratio model, we derive local separation rates and prove matching lower bounds up to logarithmic factors. We also allow approximate, rather than exact, attainability; this extension yields finite-sample guarantees for univariate log-concave distributions by approximating their rejection regions with unions of intervals.

preprint2025arXiv

The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.

preprint2022arXiv

A new similarity measure for covariate shift with applications to nonparametric regression

We study covariate shift in the context of nonparametric regression. We introduce a new measure of distribution mismatch between the source and target distributions that is based on the integrated ratio of probabilities of balls at a given radius. We use the scaling of this measure with respect to the radius to characterize the minimax rate of estimation over a family of Hölder continuous functions under covariate shift. In comparison to the recently proposed notion of transfer exponent, this measure leads to a sharper rate of convergence and is more fine-grained. We accompany our theory with concrete instances of covariate shift that illustrate this sharp difference.

preprint2022arXiv

An FPGA Based energy correction method for one-to-one coupled PET detector: model and evaluation

A PET scanner based on silicon photomultipliers (SiPMs) has been widely used as an advanced nuclear medicine imaging technique that yields quantitative images of regional in vivo biology and biochemistry. The compact size of the SiPM allows direct one to one coupling between the scintillation crystal and the photosensor, yielding better timing and energy resolutions than the light sharing methods that have to be used in photomultiplier tube (PMT) PET systems. To decrease the volume of readout electronics, a front end multiplexer with position decoder is a common choice for the one to one system without a highly integrated application specific integrated circuit (ASIC). However, in this case we cannot measure each crystal's deposited energy inspired by an annihilation photon, so the inter-crystal scatter (ICS) events will lead to the crystal mispositioning and then deteriorate the detector intrinsic resolution. Besides, considering the events rejection within the energy window resulting from the gain dispersion and nonlinear outputs of the SiPMs, an energy correction mechanism is needed. Yet, lack of the information of each crystal's energy will introduce large energy correction error for the ICS events. For this issue, an online energy correction mechanism implemented on a Kintext-7 Field Programmable Gate Array (FPGA) device is presented in this paper. Experiments in the laboratory were performed using an 8 x 8 segmented LYSO crystals coupled with an 8 x 8 SiPM (J-series, from ON Semiconductor) array which is under 22Na point source excitation. Test results indicate that both the energy of the non-ICS and ICS events can be precisely corrected and the energy resolution is better than 12 %. We also applied this method to an actual clinical PET scanner under a 68Ge line source to verify its multi-channel reliability.

preprint2022arXiv

BBA-net: A bi-branch attention network for crowd counting

In the field of crowd counting, the current mainstream CNN-based regression methods simply extract the density information of pedestrians without finding the position of each person. This makes the output of the network often found to contain incorrect responses, which may erroneously estimate the total number and not conducive to the interpretation of the algorithm. To this end, we propose a Bi-Branch Attention Network (BBA-NET) for crowd counting, which has three innovation points. i) A two-branch architecture is used to estimate the density information and location information separately. ii) Attention mechanism is used to facilitate feature extraction, which can reduce false responses. iii) A new density map generation method combining geometric adaptation and Voronoi split is introduced. Our method can integrate the pedestrian's head and body information to enhance the feature expression ability of the density map. Extensive experiments performed on two public datasets show that our method achieves a lower crowd counting error compared to other state-of-the-art methods.

preprint2022arXiv

Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements

Tensors, which provide a powerful and flexible model for representing multi-attribute data and multi-way interactions, play an indispensable role in modern data science across various fields in science and engineering. A fundamental task is to faithfully recover the tensor from highly incomplete measurements in a statistically and computationally efficient manner. Harnessing the low-rank structure of tensors in the Tucker decomposition, this paper develops a scaled gradient descent (ScaledGD) algorithm to directly recover the tensor factors with tailored spectral initializations, and shows that it provably converges at a linear rate independent of the condition number of the ground truth tensor for two canonical problems -- tensor completion and tensor regression -- as soon as the sample size is above the order of $n^{3/2}$ ignoring other parameter dependencies, where $n$ is the dimension of the tensor. This leads to an extremely scalable approach to low-rank tensor estimation compared with prior art, which suffers from at least one of the following drawbacks: extreme sensitivity to ill-conditioning, high per-iteration costs in terms of memory and computation, or poor sample complexity guarantees. To the best of our knowledge, ScaledGD is the first algorithm that achieves near-optimal statistical and computational complexities simultaneously for low-rank tensor completion with the Tucker decomposition. Our algorithm highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation, where the iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the underlying symmetry in low-rank tensor factorization.

preprint2022arXiv

The design of a time-interleaved analog-digital conversion modulator based on FPGA-TDC for PET application

Fully Field Programmable Gate Array (FPGA)based digitizer for high-resolution time and energy measurement is an attractive low cost solution for the readout electronics in positron emission computed tomography (PET)detector. In recent years, the FPGA based time-digital converter (FPGA-TDC) has been widely used for time measurement in the commercial PET scanners. Yet, for the energy measurement, few studies have been reported on a fully FPGA based, large dynamic range and high resolution alternative to the commercial analog-digital converter (ADC). Our previous research presents a 25 Ms/s FPGA-TDC based free-running ADC (FPGA-ADC), and successfully employed it in the readout electronics for PET detector. In this work-in-progress study, by means of the time-interleaved strategy, a 50 Ms/s FPGA-ADC is presented. With only two off-chip resistors, both the A/D conversion and energy measurement are achieved on a Xilinx Kintex-7 FPGA. Therefore, this method has great advantages inimproving system integration. Initial performance tests are also presented, and we hope it can give us a possibility to develop a new FPGA-only front-end digitizer for PET in future.

preprint2021arXiv

Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data

This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers, and (3) missing data. This problem, often dubbed as robust principal component analysis (robust PCA), finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistical support (particularly the stability analysis vis-à-vis random noise) remains highly suboptimal, which we strengthen in this paper. When the unknown matrix is well-conditioned, incoherent, and of constant rank, we demonstrate that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the $\ell_{\infty}$ loss. All of this happens even when nearly a constant fraction of observations are corrupted by outliers with arbitrary magnitudes. The key analysis idea lies in bridging the convex program in use and an auxiliary nonconvex optimization algorithm, and hence the title of this paper.

preprint2021arXiv

Learning Mixtures of Low-Rank Models

We study the problem of learning mixtures of low-rank models, i.e. reconstructing multiple low-rank matrices from unlabelled linear measurements of each. This problem enriches two widely studied settings -- low-rank matrix sensing and mixed linear regression -- by bringing latent variables (i.e. unknown labels) and structural priors (i.e. low-rank structures) into consideration. To cope with the non-convexity issues arising from unlabelled heterogeneous data and low-complexity structure, we develop a three-stage meta-algorithm that is guaranteed to recover the unknown matrices with near-optimal sample and computational complexities under Gaussian designs. In addition, the proposed algorithm is provably stable against random noise. We complement the theoretical studies with empirical evidence that confirms the efficacy of our algorithm.

preprint2021arXiv

Minimax Off-Policy Evaluation for Multi-Armed Bandits

We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards, and develop minimax rate-optimal procedures under three settings. First, when the behavior policy is known, we show that the Switch estimator, a method that alternates between the plug-in and importance sampling estimators, is minimax rate-optimal for all sample sizes. Second, when the behavior policy is unknown, we analyze performance in terms of the competitive ratio, thereby revealing a fundamental gap between the settings of known and unknown behavior policies. When the behavior policy is unknown, any estimator must have mean-squared error larger -- relative to the oracle estimator equipped with the knowledge of the behavior policy -- by a multiplicative factor proportional to the support size of the target policy. Moreover, we demonstrate that the plug-in approach achieves this worst-case competitive ratio up to a logarithmic factor. Third, we initiate the study of the partial knowledge setting in which it is assumed that the minimum probability taken by the behavior policy is known. We show that the plug-in estimator is optimal for relatively large values of the minimum probability, but is sub-optimal when the minimum probability is low. In order to remedy this gap, we propose a new estimator based on approximation by Chebyshev polynomials that provably achieves the optimal estimation error. Numerical experiments on both simulated and real data corroborate our theoretical findings.

preprint2019arXiv

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in nonconvex optimization: even in the absence of explicit regularization, gradient descent enforces proper regularization implicitly under various statistical models. In fact, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This "implicit regularization" feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on three fundamental statistical estimation problems, i.e. phase retrieval, low-rank matrix completion, and blind deconvolution, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. In particular, by marrying statistical modeling with generic optimization theory, we develop a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal error control --- measured entrywise and by the spectral norm --- which might be of independent interest.