Source author record

Haopeng Zhang

Haopeng Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Computation and Language Machine Learning Neural and Evolutionary Computing astro-ph.GA math.DS math.OC

Catalog footprint

What is connected

9works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

HAVEN: Hierarchically Aligned Multimodal Benchmark for Unified Video Understanding

While Multimodal Large Language Models (MLLMs) exhibit strong performance on standard video tasks, their ability to faithfully summarize and reason over complex narratives remains poorly evaluated. Existing summarization benchmarks fragment supervision across isolated granularities, such as keyframes, key shots, or disjointed text summaries, failing to capture the inherently hierarchical structure of cross-modal alignment. To address this critical gap, we introduce HAVEN, a hierarchically aligned multimodal benchmark for unified video understanding. HAVEN pioneers a fully granular (frame, shot, and video levels) and fully multimodal (video and text) dataset architecture, complete with explicit, continuous alignment between modalities. Built upon this unified annotation paradigm, we propose a comprehensive evaluation suite spanning summarization, temporal reasoning, multimodal grounding, and saliency ranking. Extensive benchmarking of state-of-the-art MLLMs exposes a persistent gap between surface-level textual fluency and grounded multimodal understanding. Ultimately, HAVEN advances the evaluation of multimodal systems beyond traditional QA formats, offering a rigorous, standardized testbed to drive future research in interpretable, hierarchical video understanding. We publicly release the dataset, benchmark suite, and evaluation protocols.

preprint2026arXiv

MMViR: A Multi-Modal and Multi-Granularity Representation for Long-range Video Understanding

Long videos, ranging from minutes to hours, present significant challenges for current Multi-modal Large Language Models (MLLMs) due to their complex events, diverse scenes, and long-range dependencies. Direct encoding of such videos is computationally too expensive, while simple video-to-text conversion often results in redundant or fragmented content. To address these limitations, we introduce MMViR, a novel multi-modal, multi-grained structured representation for long video understanding. MMViR identifies key turning points to segment the video and constructs a three-level description that couples global narratives with fine-grained visual details. This design supports efficient query-based retrieval and generalizes well across various scenarios. Extensive evaluations across three tasks, including QA, summarization, and retrieval, show that MMViR outperforms the prior strongest method, achieving a 19.67% improvement in hour-long video understanding while reducing processing latency to 45.4% of the original.

preprint2026arXiv

S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs

Pre-training on text-attributed graphs (TAGs) is central to building transferable graph foundation models, where LLM-as-Aligner methods align graph and text representations through the semantic knowledge of large language models. However, these methods usually assume that node texts provide sufficient and reliable supervision, an assumption often violated in real-world sparse TAGs. When textual anchors are missing, noisy, or uneven across domains, graph structures must be aligned with weak semantic evidence, leading to unreliable structure-semantics correspondence and sparsity-induced transfer bias. This paper presents S2Aligner, a sparsity-aware and structure-enhanced LLM-as-Aligner framework for graph-text pre-training on sparse TAGs. The key idea is to decouple semantic alignment from structural modeling, allowing topology-aware signals to enhance alignment without contaminating the shared semantic space. Specifically, S2Aligner decomposes graph-text representations into semantic and structural components, uses structure-oriented reconstruction with consistency control to inject reliable topology cues into text representations, and suppresses inconsistent structural signals under textual sparsity. Moreover, S2Aligner introduces sparsity-aware cross-domain risk balancing, which calibrates domain risks through a global-domain density ratio and downweights unreliable sparse samples via graph reliability estimation. Theoretical analysis shows that this objective reduces cross-domain generalization gaps by controlling domain risk discrepancy. Extensive experiments across diverse graph domains, sparsity levels, and downstream tasks demonstrate that S2Aligner consistently outperforms existing baselines.

preprint2022arXiv

Evidence for Co-rotation Origin of Super Metal Rich Stars in LAMOST-Gaia: Multiple Ridges with a Similar Slope in phi versus Lz Plane

Super metal-rich (SMR) stars in the solar neighborhood are thought to be born in the inner disk and came to present location by radial migration, which is most intense at the co-rotation resonance (CR) of the Galactic bar. In this work, we show evidence for the CR origin of SMR stars in LAMOST-Gaia by detecting six ridges and undulations in the phi versus Lz space coded by median VR, following a similar slope of -8 km/s kpc/deg. The slope is predicted by Monario et al.'s model for CR of a large and slow Galactic bar. For the first time, we show the variation of angular momentum with azimuths from -10 deg to 20 deg for two outer and broad undulations with negative VR around -18 km/s following this slope. The wave-like pattern with large amplitude outside CR and a wide peak of the second undulations indicate that minor merger of the Sagittarius dwarf galaxy with the disk might play a role besides the significant impact of CR of the Galactic bar.

preprint2022arXiv

Improving the Faithfulness of Abstractive Summarization via Entity Coverage Control

Abstractive summarization systems leveraging pre-training language models have achieved superior results on benchmark datasets. However, such models have been shown to be more prone to hallucinate facts that are unfaithful to the input context. In this paper, we propose a method to remedy entity-level extrinsic hallucinations with Entity Coverage Control (ECC). We first compute entity coverage precision and prepend the corresponding control code for each training example, which implicitly guides the model to recognize faithfulness contents in the training phase. We further extend our method via intermediate fine-tuning on large but noisy data extracted from Wikipedia to unlock zero-shot summarization. We show that the proposed method leads to more faithful and salient abstractive summarization in supervised fine-tuning and zero-shot settings according to our experimental results on three benchmark datasets XSum, Pubmed, and SAMSum of very different domains and styles.

preprint2020arXiv

Graph-Bert: Only Attention is Needed for Learning Graph Representations

The dominant graph neural networks (GNNs) over-rely on the graph links, several serious performance problems with which have been witnessed already, e.g., suspended animation problem and over-smoothing problem. What's more, the inherently inter-connected nature precludes parallelization within the graph, which becomes critical for large-sized graph, as memory constraints limit batching across the nodes. In this paper, we will introduce a new graph neural network, namely GRAPH-BERT (Graph based BERT), solely based on the attention mechanism without any graph convolution or aggregation operators. Instead of feeding GRAPH-BERT with the complete large input graph, we propose to train GRAPH-BERT with sampled linkless subgraphs within their local contexts. GRAPH-BERT can be learned effectively in a standalone mode. Meanwhile, a pre-trained GRAPH-BERT can also be transferred to other application tasks directly or with necessary fine-tuning if any supervised label information or certain application oriented objective is available. We have tested the effectiveness of GRAPH-BERT on several graph benchmark datasets. Based the pre-trained GRAPH-BERT with the node attribute reconstruction and structure recovery tasks, we further fine-tune GRAPH-BERT on node classification and graph clustering tasks specifically. The experimental results have demonstrated that GRAPH-BERT can out-perform the existing GNNs in both the learning effectiveness and efficiency.

preprint2015arXiv

Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation

Due to large variations in shape, appearance, and viewing conditions, object recognition is a key precursory challenge in the fields of object manipulation and robotic/AI visual reasoning in general. Recognizing object categories, particular instances of objects and viewpoints/poses of objects are three critical subproblems robots must solve in order to accurately grasp/manipulate objects and reason about their environments. Multi-view images of the same object lie on intrinsic low-dimensional manifolds in descriptor spaces (e.g. visual/depth descriptor spaces). These object manifolds share the same topology despite being geometrically different. Each object manifold can be represented as a deformed version of a unified manifold. The object manifolds can thus be parameterized by its homeomorphic mapping/reconstruction from the unified manifold. In this work, we develop a novel framework to jointly solve the three challenging recognition sub-problems, by explicitly modeling the deformations of object manifolds and factorizing it in a view-invariant space for recognition. We perform extensive experiments on several challenging datasets and achieve state-of-the-art results.

preprint2014arXiv

Convergence Analysis and Parallel Computing Implementation for the Multiagent Coordination Optimization Algorithm

In this report, a novel variation of Particle Swarm Optimization (PSO) algorithm, called Multiagent Coordination Optimization (MCO), is implemented in a parallel computing way for practical use by introducing MATLAB built-in function "parfor" into MCO. Then we rigorously analyze the global convergence of MCO by means of semistability theory. Besides sharing global optimal solutions with the PSO algorithm, the MCO algorithm integrates cooperative swarm behavior of multiple agents into the update formula by sharing velocity and position information between neighbors to improve its performance. Numerical evaluation of the parallel MCO algorithm is provided in the report by running the proposed algorithm on supercomputers in the High Performance Computing Center at Texas Tech University. In particular, the optimal value and consuming time are compared with PSO and serial MCO by solving several benchmark functions in the literature, respectively. Based on the simulation results, the performance of the parallel MCO is not only superb compared with PSO for solving many nonlinear, noncovex optimization problems, but also is of high efficiency by saving the computational time.

preprint2014arXiv

Shape Primitive Histogram: A Novel Low-Level Face Representation for Face Recognition

We further exploit the representational power of Haar wavelet and present a novel low-level face representation named Shape Primitives Histogram (SPH) for face recognition. Since human faces exist abundant shape features, we address the face representation issue from the perspective of the shape feature extraction. In our approach, we divide faces into a number of tiny shape fragments and reduce these shape fragments to several uniform atomic shape patterns called Shape Primitives. A convolution with Haar Wavelet templates is applied to each shape fragment to identify its belonging shape primitive. After that, we do a histogram statistic of shape primitives in each spatial local image patch for incorporating the spatial information. Finally, each face is represented as a feature vector via concatenating all the local histograms of shape primitives. Four popular face databases, namely ORL, AR, Yale-B and LFW-a databases, are employed to evaluate SPH and experimentally study the choices of the parameters. Extensive experimental results demonstrate that the proposed approach outperform the state-of-the-arts.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computer Vision Computation and Language Machine Learning Neural and Evolutionary Computing astro-ph.GA math.DS math.OC

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.18579:author:2:haopeng-zhang

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.19223:author:2:haopeng-zhang

Imported May 20, 2026Synced May 20, 2026

2 works

Jiawei Zhang

Researcher

Jiawei Zhang contributes to research discovery and scholarly infrastructure.

Open to collaborate

1 works

Ahmed Elgammal

Researcher

Ahmed Elgammal contributes to research discovery and scholarly infrastructure.

Open to collaborate

1 works

Congying Xia

Researcher

Congying Xia contributes to research discovery and scholarly infrastructure.

Open to collaborate

1 works

Dan Yang

Researcher

Dan Yang contributes to research discovery and scholarly infrastructure.

Open to collaborate

Haopeng Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

HAVEN: Hierarchically Aligned Multimodal Benchmark for Unified Video Understanding

MMViR: A Multi-Modal and Multi-Granularity Representation for Long-range Video Understanding

S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs

Evidence for Co-rotation Origin of Super Metal Rich Stars in LAMOST-Gaia: Multiple Ridges with a Similar Slope in phi versus Lz Plane

Improving the Faithfulness of Abstractive Summarization via Entity Coverage Control

Graph-Bert: Only Attention is Needed for Learning Graph Representations

Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation

Convergence Analysis and Parallel Computing Implementation for the Multiagent Coordination Optimization Algorithm

Shape Primitive Histogram: A Novel Low-Level Face Representation for Face Recognition