Source author record

Kewei Wang

Kewei Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Computation and Language Machine Learning physics.med-ph Quantitative Methods

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DocScope: Benchmarking Verifiable Reasoning for Trustworthy Long-Document Understanding

Evaluating whether Multimodal Large Language Models can produce trustworthy, verifiable reasoning over long, visually rich documents requires evaluation beyond end-to-end answer accuracy. We introduce DocScope, a benchmark that formulates long-document QA as a structured reasoning trajectory prediction problem: given a complete PDF document and a question, the model outputs evidence pages, supporting evidence regions, relevant factual statements, and a final answer. We design a four-stage evaluation protocol -- Page Localization, Region Grounding, Fact Extraction, and Answer Verification -- that audits each level of the trajectory independently through inter-stage decoupling, with all judges selected and calibrated via human alignment studies. DocScope comprises 1,124 questions derived from 273 documents, with all hierarchical evidence annotations completed by human annotators. We benchmark 6 proprietary models, 12 open-weight models, and several domain-specific systems. Our experiments reveal that answer accuracy cannot substitute for trajectory-level evaluation: even among correct answers, the highest observed rate of complete evidence chains is only 29\%. Across all models, region grounding remains the weakest trajectory stage. Furthermore, the primary difficulty stems from aggregating evidence dispersed across long distances and multiple document clusters, while an oracle study identifies faithful perception and fact extraction as the dominant capability bottleneck. Cross-architecture comparisons further suggest that activated parameter count matters more than total scale. The benchmark and code will be publicly released at https://github.com/MiliLab/DocScope.

preprint2022arXiv

FAHP-based Mathematical Model for Exercise Rehabilitation Management of Diabetes Mellitus

Exercise rehabilitation is an important part in the comprehensive management of patients with diabetes and there is a need to conduct comprehensively evaluation of several factors such as the physical fitness, cardiovascular risk and diabetic disease factors. However, special disease features of diabetes and its wide heterogeneity make it difficult to apply individualized approaches. In this study, a novel framework was established based on the Fuzzy Analytic Hierarchy Process (FAHP) approach to calculate various physiological factors weights when developing a diabetic exercise prescription. Proposed factors were investigated with respect to three groups which contains 12 different aspects. The relative weights were assessed by a database which established through a questionnaire survey. It is concluded that the physical fitness factors and cardiovascular risk factors need to be paid more attention to considered in the formulation of exercise rehabilitation programs than disease factors. And the cardiopulmonary function of physical fitness factors accounts for the highest importance. Furthermore, it was found that blood lipids have the lowest importance among studied factors. The mathematical model of exercise rehabilitation program for diabetes patients was established, which provided the theoretical basis for individualized guidance of exercise rehabilitation program.

preprint2022arXiv

Robust Object Detection With Inaccurate Bounding Boxes

Learning accurate object detectors often requires large-scale training data with precise object bounding boxes. However, labeling such data is expensive and time-consuming. As the crowd-sourcing labeling process and the ambiguities of the objects may raise noisy bounding box annotations, the object detectors will suffer from the degenerated training data. In this work, we aim to address the challenge of learning robust object detectors with inaccurate bounding boxes. Inspired by the fact that localization precision suffers significantly from inaccurate bounding boxes while classification accuracy is less affected, we propose leveraging classification as a guidance signal for refining localization results. Specifically, by treating an object as a bag of instances, we introduce an Object-Aware Multiple Instance Learning approach (OA-MIL), featured with object-aware instance selection and object-aware instance extension. The former aims to select accurate instances for training, instead of directly using inaccurate box annotations. The latter focuses on generating high-quality instances for selection. Extensive experiments on synthetic noisy datasets (i.e., noisy PASCAL VOC and MS-COCO) and a real noisy wheat head dataset demonstrate the effectiveness of our OA-MIL. Code is available at https://github.com/cxliu0/OA-MIL.

preprint2022arXiv

SSR-HEF: Crowd Counting with Multi-Scale Semantic Refining and Hard Example Focusing

Crowd counting based on density maps is generally regarded as a regression task.Deep learning is used to learn the mapping between image content and crowd density distribution. Although great success has been achieved, some pedestrians far away from the camera are difficult to be detected. And the number of hard examples is often larger. Existing methods with simple Euclidean distance algorithm indiscriminately optimize the hard and easy examples so that the densities of hard examples are usually incorrectly predicted to be lower or even zero, which results in large counting errors. To address this problem, we are the first to propose the Hard Example Focusing(HEF) algorithm for the regression task of crowd counting. The HEF algorithm makes our model rapidly focus on hard examples by attenuating the contribution of easy examples.Then higher importance will be given to the hard examples with wrong estimations. Moreover, the scale variations in crowd scenes are large, and the scale annotations are labor-intensive and expensive. By proposing a multi-Scale Semantic Refining (SSR) strategy, lower layers of our model can break through the limitation of deep learning to capture semantic features of different scales to sufficiently deal with the scale variation. We perform extensive experiments on six benchmark datasets to verify the proposed method. Results indicate the superiority of our proposed method over the state-of-the-art methods. Moreover, our designed model is smaller and faster.

Kewei Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

DocScope: Benchmarking Verifiable Reasoning for Trustworthy Long-Document Understanding

FAHP-based Mathematical Model for Exercise Rehabilitation Management of Diabetes Mellitus

Robust Object Detection With Inaccurate Bounding Boxes

SSR-HEF: Crowd Counting with Multi-Scale Semantic Refining and Hard Example Focusing