Source author record

Cong Ma

Cong Ma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning astro-ph.CO math.ST Statistics Theory eess.SP Information Theory math.IT math.OC physics.ins-det Artificial Intelligence astro-ph.IM Computer Vision gr-qc nucl-ex Social and Information Networks

Catalog footprint

What is connected

20works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

When Are Trade-Off Functions Testable from Finite Samples?

We study finite-sample inference for the trade-off function of two unknown probability distributions, the function that traces the optimal type I/type II error frontier in binary testing. Given samples from distributions $P$ and $Q$, we consider the problem of testing whether their trade-off function lies above a benchmark curve $f_0$ or falls below a weaker benchmark $f_1$. Without structural restrictions, this problem is impossible uniformly over nonparametric classes. We identify a sharp condition under which it becomes possible. The key structural assumption is that the Neyman--Pearson rejection regions for $(P,Q)$ are attainable, up to null sets, by a prescribed class $S$ of measurable sets. Within this exact attainability framework, finite Vapnik--Chervonenkis dimension of $S$ is both sufficient and necessary for nontrivial finite-sample testing. We construct a test with nonasymptotic error guarantees: type I error control is valid without assuming attainability, while power holds uniformly over attainable alternatives satisfying an explicit separation condition. By inverting the test, we also obtain simultaneous confidence bands for the whole trade-off curve. Finally, we study the sharpness and robustness of the procedure. In the monotone likelihood-ratio model, we derive local separation rates and prove matching lower bounds up to logarithmic factors. We also allow approximate, rather than exact, attainability; this extension yields finite-sample guarantees for univariate log-concave distributions by approximating their rejection regions with unions of intervals.

preprint2025arXiv

The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.

preprint2022arXiv

A new similarity measure for covariate shift with applications to nonparametric regression

We study covariate shift in the context of nonparametric regression. We introduce a new measure of distribution mismatch between the source and target distributions that is based on the integrated ratio of probabilities of balls at a given radius. We use the scaling of this measure with respect to the radius to characterize the minimax rate of estimation over a family of Hölder continuous functions under covariate shift. In comparison to the recently proposed notion of transfer exponent, this measure leads to a sharper rate of convergence and is more fine-grained. We accompany our theory with concrete instances of covariate shift that illustrate this sharp difference.

preprint2022arXiv

An FPGA Based energy correction method for one-to-one coupled PET detector: model and evaluation

A PET scanner based on silicon photomultipliers (SiPMs) has been widely used as an advanced nuclear medicine imaging technique that yields quantitative images of regional in vivo biology and biochemistry. The compact size of the SiPM allows direct one to one coupling between the scintillation crystal and the photosensor, yielding better timing and energy resolutions than the light sharing methods that have to be used in photomultiplier tube (PMT) PET systems. To decrease the volume of readout electronics, a front end multiplexer with position decoder is a common choice for the one to one system without a highly integrated application specific integrated circuit (ASIC). However, in this case we cannot measure each crystal's deposited energy inspired by an annihilation photon, so the inter-crystal scatter (ICS) events will lead to the crystal mispositioning and then deteriorate the detector intrinsic resolution. Besides, considering the events rejection within the energy window resulting from the gain dispersion and nonlinear outputs of the SiPMs, an energy correction mechanism is needed. Yet, lack of the information of each crystal's energy will introduce large energy correction error for the ICS events. For this issue, an online energy correction mechanism implemented on a Kintext-7 Field Programmable Gate Array (FPGA) device is presented in this paper. Experiments in the laboratory were performed using an 8 x 8 segmented LYSO crystals coupled with an 8 x 8 SiPM (J-series, from ON Semiconductor) array which is under 22Na point source excitation. Test results indicate that both the energy of the non-ICS and ICS events can be precisely corrected and the energy resolution is better than 12 %. We also applied this method to an actual clinical PET scanner under a 68Ge line source to verify its multi-channel reliability.

preprint2022arXiv

BBA-net: A bi-branch attention network for crowd counting

In the field of crowd counting, the current mainstream CNN-based regression methods simply extract the density information of pedestrians without finding the position of each person. This makes the output of the network often found to contain incorrect responses, which may erroneously estimate the total number and not conducive to the interpretation of the algorithm. To this end, we propose a Bi-Branch Attention Network (BBA-NET) for crowd counting, which has three innovation points. i) A two-branch architecture is used to estimate the density information and location information separately. ii) Attention mechanism is used to facilitate feature extraction, which can reduce false responses. iii) A new density map generation method combining geometric adaptation and Voronoi split is introduced. Our method can integrate the pedestrian's head and body information to enhance the feature expression ability of the density map. Extensive experiments performed on two public datasets show that our method achieves a lower crowd counting error compared to other state-of-the-art methods.

preprint2022arXiv

Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements

Tensors, which provide a powerful and flexible model for representing multi-attribute data and multi-way interactions, play an indispensable role in modern data science across various fields in science and engineering. A fundamental task is to faithfully recover the tensor from highly incomplete measurements in a statistically and computationally efficient manner. Harnessing the low-rank structure of tensors in the Tucker decomposition, this paper develops a scaled gradient descent (ScaledGD) algorithm to directly recover the tensor factors with tailored spectral initializations, and shows that it provably converges at a linear rate independent of the condition number of the ground truth tensor for two canonical problems -- tensor completion and tensor regression -- as soon as the sample size is above the order of $n^{3/2}$ ignoring other parameter dependencies, where $n$ is the dimension of the tensor. This leads to an extremely scalable approach to low-rank tensor estimation compared with prior art, which suffers from at least one of the following drawbacks: extreme sensitivity to ill-conditioning, high per-iteration costs in terms of memory and computation, or poor sample complexity guarantees. To the best of our knowledge, ScaledGD is the first algorithm that achieves near-optimal statistical and computational complexities simultaneously for low-rank tensor completion with the Tucker decomposition. Our algorithm highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation, where the iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the underlying symmetry in low-rank tensor factorization.

preprint2022arXiv

The design of a time-interleaved analog-digital conversion modulator based on FPGA-TDC for PET application

Fully Field Programmable Gate Array (FPGA)based digitizer for high-resolution time and energy measurement is an attractive low cost solution for the readout electronics in positron emission computed tomography (PET)detector. In recent years, the FPGA based time-digital converter (FPGA-TDC) has been widely used for time measurement in the commercial PET scanners. Yet, for the energy measurement, few studies have been reported on a fully FPGA based, large dynamic range and high resolution alternative to the commercial analog-digital converter (ADC). Our previous research presents a 25 Ms/s FPGA-TDC based free-running ADC (FPGA-ADC), and successfully employed it in the readout electronics for PET detector. In this work-in-progress study, by means of the time-interleaved strategy, a 50 Ms/s FPGA-ADC is presented. With only two off-chip resistors, both the A/D conversion and energy measurement are achieved on a Xilinx Kintex-7 FPGA. Therefore, this method has great advantages inimproving system integration. Initial performance tests are also presented, and we hope it can give us a possibility to develop a new FPGA-only front-end digitizer for PET in future.

preprint2021arXiv

Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data

This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers, and (3) missing data. This problem, often dubbed as robust principal component analysis (robust PCA), finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistical support (particularly the stability analysis vis-à-vis random noise) remains highly suboptimal, which we strengthen in this paper. When the unknown matrix is well-conditioned, incoherent, and of constant rank, we demonstrate that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the $\ell_{\infty}$ loss. All of this happens even when nearly a constant fraction of observations are corrupted by outliers with arbitrary magnitudes. The key analysis idea lies in bridging the convex program in use and an auxiliary nonconvex optimization algorithm, and hence the title of this paper.

preprint2021arXiv

Learning Mixtures of Low-Rank Models

We study the problem of learning mixtures of low-rank models, i.e. reconstructing multiple low-rank matrices from unlabelled linear measurements of each. This problem enriches two widely studied settings -- low-rank matrix sensing and mixed linear regression -- by bringing latent variables (i.e. unknown labels) and structural priors (i.e. low-rank structures) into consideration. To cope with the non-convexity issues arising from unlabelled heterogeneous data and low-complexity structure, we develop a three-stage meta-algorithm that is guaranteed to recover the unknown matrices with near-optimal sample and computational complexities under Gaussian designs. In addition, the proposed algorithm is provably stable against random noise. We complement the theoretical studies with empirical evidence that confirms the efficacy of our algorithm.

preprint2021arXiv

Minimax Off-Policy Evaluation for Multi-Armed Bandits

We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards, and develop minimax rate-optimal procedures under three settings. First, when the behavior policy is known, we show that the Switch estimator, a method that alternates between the plug-in and importance sampling estimators, is minimax rate-optimal for all sample sizes. Second, when the behavior policy is unknown, we analyze performance in terms of the competitive ratio, thereby revealing a fundamental gap between the settings of known and unknown behavior policies. When the behavior policy is unknown, any estimator must have mean-squared error larger -- relative to the oracle estimator equipped with the knowledge of the behavior policy -- by a multiplicative factor proportional to the support size of the target policy. Moreover, we demonstrate that the plug-in approach achieves this worst-case competitive ratio up to a logarithmic factor. Third, we initiate the study of the partial knowledge setting in which it is assumed that the minimum probability taken by the behavior policy is known. We show that the plug-in estimator is optimal for relatively large values of the minimum probability, but is sub-optimal when the minimum probability is low. In order to remedy this gap, we propose a new estimator based on approximation by Chebyshev polynomials that provably achieves the optimal estimation error. Numerical experiments on both simulated and real data corroborate our theoretical findings.

preprint2019arXiv

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in nonconvex optimization: even in the absence of explicit regularization, gradient descent enforces proper regularization implicitly under various statistical models. In fact, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This "implicit regularization" feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on three fundamental statistical estimation problems, i.e. phase retrieval, low-rank matrix completion, and blind deconvolution, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. In particular, by marrying statistical modeling with generic optimization theory, we develop a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal error control --- measured entrywise and by the spectral norm --- which might be of independent interest.

preprint2016arXiv

Application of Bayesian graphs to SN Ia data analysis and compression

Bayesian graphical models are an efficient tool for modelling complex data and derive self-consistent expressions of the posterior distribution of model parameters. We apply Bayesian graphs to perform statistical analyses of Type Ia supernova (SN Ia) luminosity distance measurements from the joint light-curve analysis (JLA) data set. In contrast to the $χ^2$ approach used in previous studies, the Bayesian inference allows us to fully account for the standard-candle parameter dependence of the data covariance matrix. Comparing with $χ^2$ analysis results, we find a systematic offset of the marginal model parameter bounds. We demonstrate that the bias is statistically significant in the case of the SN Ia standardization parameters with a maximal 6 $σ$ shift of the SN light-curve colour correction. In addition, we find that the evidence for a host galaxy correction is now only 2.4 $σ$. Systematic offsets on the cosmological parameters remain small, but may increase by combining constraints from complementary cosmological probes. The bias of the $χ^2$ analysis is due to neglecting the parameter-dependent log-determinant of the data covariance, which gives more statistical weight to larger values of the standardization parameters. We find a similar effect on compressed distance modulus data. To this end, we implement a fully consistent compression method of the JLA data set that uses a Gaussian approximation of the posterior distribution for fast generation of compressed data. Overall, the results of our analysis emphasize the need for a fully consistent Bayesian statistical approach in the analysis of future large SN Ia data sets.

preprint2015arXiv

Panther: Fast Top-k Similarity Search in Large Networks

Estimating similarity between vertices is a fundamental issue in network analysis across various domains, such as social networks and biological networks. Methods based on common neighbors and structural contexts have received much attention. However, both categories of methods are difficult to scale up to handle large networks (with billions of nodes). In this paper, we propose a sampling method that provably and accurately estimates the similarity between vertices. The algorithm is based on a novel idea of random path, and an extended method is also presented, to enhance the structural similarity when two vertices are completely disconnected. We provide theoretical proofs for the error-bound and confidence of the proposed algorithm. We perform extensive empirical study and show that our algorithm can obtain top-k similar vertices for any vertex in a network approximately 300x faster than state-of-the-art methods. We also use identity resolution and structural hole spanner finding, two important applications in social networks, to evaluate the accuracy of the estimated similarities. Our experimental results demonstrate that the proposed algorithm achieves clearly better performance than several alternative methods.

preprint2015arXiv

Testing the Copernican Principle with Hubble Parameter

Using the longitudinal expression of Hubble expansion rate for the general Lemaître-Tolman-Bondi (LTB) metric as a function of cosmic time, we examine the scale on which the Copernican Principle holds in the context of a void model. By way of performing parameter estimation on the CGBH void model, we show that the Hubble parameter data favors a void with characteristic radius of 2 ~ 3 Gpc. This brings the void model closer, but not yet enough, to harmony with observational indications given by the background kinetic Sunyaev-Zel'dovich effect and the normalization of near-infrared galaxy luminosity function. However, the test of such void models may ultimately lie in the future detection of the discrepancy between longitudinal and transverse expansion rates, a touchstone of inhomogeneous models. With the proliferation of observational Hubble parameter data and future large-scale structure observation, a definitive test could be performed on the question of cosmic homogeneity. Particularly, the spherical LTB void models have been ruled out, but more general non-spherical inhomogeneities still need to be tested by observation. In this paper, we utilise a spherical void model to provide guidelines into how observational tests may be done with more general models in the future.

preprint2015arXiv

The Analog Front-end Prototype Electronics Designed for LHAASO WCDA

In the readout electronics of the Water Cerenkov Detector Array (WCDA) in the Large High Altitude Air Shower Observatory (LHAASO) experiment, both high-resolution charge and time measurement are required over a dynamic range from 1 photoelectron (P.E.) to 4000 P.E. The Analog Front-end (AFE) circuit is one of the crucial parts in the whole readout electronics. We designed and optimized a prototype of the AFE through parameter calculation and circuit simulation, and conducted initial electronics tests on this prototype to evaluate its performance. Test results indicate that the charge resolution is better than 1% @ 4000 P.E. and remains better than 10% @ 1 P.E., and the time resolution is better than 0.5 ns RMS, which is better than application requirement.

preprint2013arXiv

Cosmological constraints on holographic dark energy models under the energy conditions

We study the holographic and agegraphic dark energy models without interaction using the latest observational Hubble parameter data (OHD), the Union2.1 compilation of type Ia supernovae (SNIa), and the energy conditions. Scenarios of dark energy are distinguished by the cut-off of cosmic age, conformal time, and event horizon. The best-fit value of matter density for the three scenarios almost steadily located at $Ω_{m0}=0.26$ by the joint constraint. For the agegraphic models, they can be recovered to the standard cosmological model when the constant $c$ which presents the fraction of dark energy approaches to infinity. Absence of upper limit of $c$ by the joint constraint demonstrates the recovery possibility. Using the fitted result, we also reconstruct the current equation of state of dark energy at different scenarios, respectively. Employing the model criteria $χ^2_{\textrm{min}}/dof$, we find that conformal time model is the worst, but they can not be distinguished clearly. Comparing with the observational constraints, we find that SEC is fulfilled at redshift $0.2 \lesssim z \lesssim 0.3$ with $1σ$ confidence level. We also find that NEC gives a meaningful constraint for the event horizon cut-off model, especially compared with OHD only. We note that the energy condition maybe could play an important role in the interacting models because of different degeneracy between $Ω_m$ and constant $c$.

preprint2012arXiv

Reconstructing the History of Energy Condition Violation from Observational Data

We study the likelihood of energy condition violations in the history of the Universe. Our method is based on a set of functions that characterize energy condition violation. FLRW cosmological models are built around these "indication functions". By computing the Fisher matrix of model parameters using type Ia supernova and Hubble parameter data, we extract the principal modes of these functions' redshift evolution. These modes allow us to obtain general reconstructions of energy condition violation history independent of the dark energy model. We find that the data suggest a history of strong energy condition violation, but the null and dominant energy conditions are likely to be fulfilled. Implications for dark energy models are discussed.

preprint2011arXiv

Constraints on the Dark Side of the Universe and Observational Hubble Parameter Data

This paper is a review on the observational Hubble parameter data that have gained increasing attention in recent years for their illuminating power on the dark side of the universe --- the dark matter, dark energy, and the dark age. Currently, there are two major methods of independent observational H(z) measurement, which we summarize as the "differential age method" and the "radial BAO size method". Starting with fundamental cosmological notions such as the spacetime coordinates in an expanding universe, we present the basic principles behind the two methods. We further review the two methods in greater detail, including the source of errors. We show how the observational H(z) data presents itself as a useful tool in the study of cosmological models and parameter constraint, and we also discuss several issues associated with their applications. Finally, we point the reader to a future prospect of upcoming observation programs that will lead to some major improvements in the quality of observational H(z) data.

preprint2011arXiv

Power of Observational Hubble Parameter Data: a Figure of Merit Exploration

We use simulated Hubble parameter data in the redshift range 0 \leq z \leq 2 to explore the role and power of observational H(z) data in constraining cosmological parameters of the ΛCDM model. The error model of the simulated data is empirically constructed from available measurements and scales linearly as z increases. By comparing the median figures of merit calculated from simulated datasets with that of current type Ia supernova data, we find that as many as 64 further independent measurements of H(z) are needed to match the parameter constraining power of SNIa. If the error of H(z) could be lowered to 3%, the same number of future measurements would be needed, but then the redshift coverage would only be required to reach z = 1. We also show that accurate measurements of the Hubble constant H_0 can be used as priors to increase the H(z) data's figure of merit.

preprint2010arXiv

Numerical Strategies of Computing the Luminosity Distance

We propose two efficient numerical methods of evaluating the luminosity distance in the spatially flat ΛCDM universe. The first method is based on the Carlson symmetric form of elliptic integrals, which is highly accurate and can replace numerical quadratures. The second method, using a modified version of Hermite interpolation, is less accurate but involves only basic numerical operations and can be easily implemented. We compare our methods with other numerical approximation schemes and explore their respective features and limitations. Possible extensions of these methods to other cosmological models are also discussed.

Cong Ma

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

When Are Trade-Off Functions Testable from Finite Samples?

The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

A new similarity measure for covariate shift with applications to nonparametric regression

An FPGA Based energy correction method for one-to-one coupled PET detector: model and evaluation

BBA-net: A bi-branch attention network for crowd counting

Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements

The design of a time-interleaved analog-digital conversion modulator based on FPGA-TDC for PET application

Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data

Learning Mixtures of Low-Rank Models

Minimax Off-Policy Evaluation for Multi-Armed Bandits

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Application of Bayesian graphs to SN Ia data analysis and compression

Panther: Fast Top-k Similarity Search in Large Networks

Testing the Copernican Principle with Hubble Parameter

The Analog Front-end Prototype Electronics Designed for LHAASO WCDA

Cosmological constraints on holographic dark energy models under the energy conditions

Reconstructing the History of Energy Condition Violation from Observational Data

Constraints on the Dark Side of the Universe and Observational Hubble Parameter Data

Power of Observational Hubble Parameter Data: a Figure of Merit Exploration

Numerical Strategies of Computing the Luminosity Distance