Researcher profile

Chong Peng

Chong Peng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning

Achieving robust perception-reasoning synergy is a central goal for advanced Vision-Language Models (VLMs). Recent advancements have pursued this goal via architectural designs or agentic workflows. However, these approaches are often limited by static textual reasoning or complicated by the significant compute and engineering burden of external agentic complexity. Worse, this heavy investment does not yield proportional gains, often witnessing a "seesaw effect" on perception and reasoning. This motivates a fundamental rethinking of the true bottleneck. In this paper, we argue that the root cause of this trade-off is an ambiguity in modality credit assignment: when a VLM fails, is it due to flawed perception ("bad seeing") or flawed logic ("bad thinking")? To resolve this, we introduce a reinforcement learning framework that improves perception-reasoning synergy by reliably rewarding the perception fidelity. We explicitly decompose the generation process into interleaved perception and reasoning steps. This decoupling enables targeted supervision on perception. Crucially, we introduce Perception Verification (PV), leveraging a "blindfolded reasoning" proxy to reward perceptual fidelity independently of reasoning outcomes. Furthermore, to scale training across free-form VL tasks, we propose Structured Verbal Verification, which replaces high-variance LLM judging with structured algorithmic execution. These techniques are integrated into a Modality-Aware Credit Assignment (MoCA) mechanism, which routes rewards to the specific source of error -- either bad seeing or bad thinking -- enabling a single VLM to achieve simultaneous performance gains across a wide task spectrum.

preprint2022arXiv

A Novel Long-term Iterative Mining Scheme for Video Salient Object Detection

The existing state-of-the-art (SOTA) video salient object detection (VSOD) models have widely followed short-term methodology, which dynamically determines the balance between spatial and temporal saliency fusion by solely considering the current consecutive limited frames. However, the short-term methodology has one critical limitation, which conflicts with the real mechanism of our visual system -- a typical long-term methodology. As a result, failure cases keep showing up in the results of the current SOTA models, and the short-term methodology becomes the major technical bottleneck. To solve this problem, this paper proposes a novel VSOD approach, which performs VSOD in a complete long-term way. Our approach converts the sequential VSOD, a sequential task, to a data mining problem, i.e., decomposing the input video sequence to object proposals in advance and then mining salient object proposals as much as possible in an easy-to-hard way. Since all object proposals are simultaneously available, the proposed approach is a complete long-term approach, which can alleviate some difficulties rooted in conventional short-term approaches. In addition, we devised an online updating scheme that can grasp the most representative and trustworthy pattern profile of the salient objects, outputting framewise saliency maps with rich details and smoothing both spatially and temporally. The proposed approach outperforms almost all SOTA models on five widely used benchmark datasets.

preprint2022arXiv

Alcohol Intake Differentiates AD and LATE: A Telltale Lifestyle from Two Large-Scale Datasets

Alzheimer's disease (AD), as a progressive brain disease, affects cognition, memory, and behavior. Similarly, limbic-predominant age-related TDP-43 encephalopathy (LATE) is a recently defined common neurodegenerative disease that mimics the clinical symptoms of AD. At present, the risk factors implicated in LATE and those distinguishing LATE from AD are largely unknown. We leveraged an integrated feature selection-based algorithmic approach, to identify important factors differentiating subjects with LATE and/or AD from Control on significantly imbalanced data. We analyzed two datasets ROSMAP and NACC and discovered that alcohol consumption was a top lifestyle and environmental factor linked with LATE and AD and their associations were differential. In particular, we identified a specific subpopulation consisting of APOE e4 carriers. We found that, for this subpopulation, light-to-moderate alcohol intake was a protective factor against both AD and LATE, but its protective role against AD appeared stronger than LATE. The codes for our algorithms are available at https://github.com/xinxingwu-uk/PFV.

preprint2022arXiv

Log-based Sparse Nonnegative Matrix Factorization for Data Representation

Nonnegative matrix factorization (NMF) has been widely studied in recent years due to its effectiveness in representing nonnegative data with parts-based representations. For NMF, a sparser solution implies better parts-based representation.However, current NMF methods do not always generate sparse solutions.In this paper, we propose a new NMF method with log-norm imposed on the factor matrices to enhance the sparseness.Moreover, we propose a novel column-wisely sparse norm, named $\ell_{2,\log}$-(pseudo) norm to enhance the robustness of the proposed method.The $\ell_{2,\log}$-(pseudo) norm is invariant, continuous, and differentiable.For the $\ell_{2,\log}$ regularized shrinkage problem, we derive a closed-form solution, which can be used for other general problems.Efficient multiplicative updating rules are developed for the optimization, which theoretically guarantees the convergence of the objective value sequence.Extensive experimental results confirm the effectiveness of the proposed method, as well as the enhanced sparseness and robustness.

preprint2020arXiv

A Novel Video Salient Object Detection Method via Semi-supervised Motion Quality Perception

Previous video salient object detection (VSOD) approaches have mainly focused on designing fancy networks to achieve their performance improvements. However, with the slow-down in development of deep learning techniques recently, it may become more and more difficult to anticipate another breakthrough via fancy networks solely. To this end, this paper proposes a universal learning scheme to get a further 3\% performance improvement for all state-of-the-art (SOTA) methods. The major highlight of our method is that we resort the "motion quality"---a brand new concept, to select a sub-group of video frames from the original testing set to construct a new training set. The selected frames in this new training set should all contain high-quality motions, in which the salient objects will have large probability to be successfully detected by the "target SOTA method"---the one we want to improve. Consequently, we can achieve a significant performance improvement by using this new training set to start a new round of network training. During this new round training, the VSOD results of the target SOTA method will be applied as the pseudo training objectives. Our novel learning scheme is simple yet effective, and its semi-supervised methodology may have large potential to inspire the VSOD community in the future.

preprint2020arXiv

Depth Quality Aware Salient Object Detection

The existing fusion based RGB-D salient object detection methods usually adopt the bi-stream structure to strike the fusion trade-off between RGB and depth (D). The D quality usually varies from scene to scene, while the SOTA bi-stream approaches are depth quality unaware, which easily result in substantial difficulties in achieving complementary fusion status between RGB and D, leading to poor fusion results in facing of low-quality D. Thus, this paper attempts to integrate a novel depth quality aware subnet into the classic bi-stream structure, aiming to assess the depth quality before conducting the selective RGB-D fusion. Compared with the SOTA bi-stream methods, the major highlight of our method is its ability to lessen the importance of those low-quality, no-contribution, or even negative-contribution D regions during the RGB-D fusion, achieving a much improved complementary status between RGB and D.

preprint2020arXiv

Full Reference Screen Content Image Quality Assessment by Fusing Multi-level Structure Similarity

The screen content images (SCIs) usually comprise various content types with sharp edges, in which the artifacts or distortions can be well sensed by the vanilla structure similarity measurement in a full reference manner. Nonetheless, almost all of the current SOTA structure similarity metrics are "locally" formulated in a single-level manner, while the true human visual system (HVS) follows the multi-level manner, and such mismatch could eventually prevent these metrics from achieving trustworthy quality assessment. To ameliorate, this paper advocates a novel solution to measure structure similarity "globally" from the perspective of sparse representation. To perform multi-level quality assessment in accordance with the real HVS, the above-mentioned global metric will be integrated with the conventional local ones by resorting to the newly devised selective deep fusion network. To validate its efficacy and effectiveness, we have compared our method with 12 SOTA methods over two widely-used large-scale public SCI datasets, and the quantitative results indicate that our method yields significantly higher consistency with subjective quality score than the currently leading works. Both the source code and data are also publicly available to gain widespread acceptance and facilitate new advancement and its validation.

preprint2020arXiv

Structured Graph Learning for Clustering and Semi-supervised Classification

Graphs have become increasingly popular in modeling structures and interactions in a wide variety of problems during the last decade. Graph-based clustering and semi-supervised classification techniques have shown impressive performance. This paper proposes a graph learning framework to preserve both the local and global structure of data. Specifically, our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Furthermore, most existing graph-based methods conduct clustering and semi-supervised classification on the graph learned from the original data matrix, which doesn't have explicit cluster structure, thus they might not achieve the optimal performance. By considering rank constraint, the achieved graph will have exactly $c$ connected components if there are $c$ clusters or classes. As a byproduct of this, graph learning and label inference are jointly and iteratively implemented in a principled way. Theoretically, we show that our model is equivalent to a combination of kernel k-means and k-means methods under certain condition. Extensive experiments on clustering and semi-supervised classification demonstrate that the proposed method outperforms other state-of-the-art methods.

preprint2020arXiv

Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering

In this paper, we propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF. It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step. In particular, projection matrices are sought under the guidance of building new data representations, such that the spatial information is retained and projections are enhanced by the goal of clustering, which helps construct optimal projection directions. Moreover, to exploit nonlinear structures of the data, manifold is constructed in the projected subspace, which is adaptively updated according to the projections and less afflicted with noise and outliers of the data and thus more representative in the projected space. Hence, seeking projections, building new data representations, and learning manifold are seamlessly integrated in a single model, which mutually enhance other and lead to a powerful data representation. Comprehensive experimental results verify the effectiveness of TS-NMF in comparison with several state-of-the-art algorithms, which suggests high potential of the proposed method for real world applications.

preprint2019arXiv

Discriminative Ridge Machine: A Classifier for High-Dimensional Data or Imbalanced Data

We introduce a discriminative regression approach to supervised classification in this paper. It estimates a representation model while accounting for discriminativeness between classes, thereby enabling accurate derivation of categorical information. This new type of regression models extends existing models such as ridge, lasso, and group lasso through explicitly incorporating discriminative information. As a special case we focus on a quadratic model that admits a closed-form analytical solution. The corresponding classifier is called discriminative regression machine (DRM). Three iterative algorithms are further established for the DRM to enhance the efficiency and scalability for real applications. Our approach and the algorithms are applicable to general types of data including images, high-dimensional data, and imbalanced data. We compare the DRM with currently state-of-the-art classifiers. Our extensive experimental results show superior performance of the DRM and confirm the effectiveness of the proposed approach.

preprint2018arXiv

Exploration of Reduced Scaling Formulation of Equation of Motion Coupled-Cluster Singles and Doubles Based on State-Averaged Pair Natural Orbitals

A reduced-complexity variant of equation-of-motion coupled-cluster singles and doubles (EOM-CCSD) method is formulated in terms of state-averaged excited state pair natural orbitals (PNO) designed to describe manifolds of excited states. State-averaged excited state PNOs for the {\em target} manifold are determined by averaging CIS(D) pair densities over the computational manifold. To assess the performance of PNO-EOM-CCSD approach on extended systems the new massively parallel canonical EOM-CCSD program has been developed in the Massively Parallel Quantum Chemistry program that allows treatment of systems with 50+ atoms using realistic basis sets with 1000+ functions. The use of state-averaged PNOs offers several potential advantages relative to the recently proposed state-specific PNOs: our approach is robust with respect to root flipping and state degeneracies, it is more economical when computing large manifolds of states, and it simplifies evaluation of transition-specific observables such as dipole moments. With the PNO truncation threshold of $10^{-7}$, the errors in excitation energies are on average below 0.02 eV for the first six singlet states of 28 organic molecules included in the standard test set of Thiel and co-workers (J. Chem. Phys. 2008, 128, 134110) with 50-70 state-averaged PNOs per pair.