Source author record

Jingyuan Zhao

Jingyuan Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence astro-ph.SR Computer Vision Robotics

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Multistage Extraction Pipeline for Long Scanned Financial Documents: An Empirical Study in Industrial KYC Workflows

Structured information extraction from long, multilingual scanned financial documents is a core requirement in industrial KYC and compliance workflows. These documents are typically non machine readable, noisy, and visually heterogeneous. They usually span dozens of pages while containing only sparse task relevant information. Although recent vision-language models achieve strong benchmark performance, directly applying them end to end to full financial reports often leads to unreliable extraction under real world conditions. We present a multistage extraction framework that integrates image preprocessing, multilingual OCR, hybrid page-level retrieval, and compact VLM-based structured extraction. The design separates page localization from multimodal reasoning, enabling more accurate extraction from complex multipage documents. We evaluated the framework on 120 production KYC documents comprising about 3000 multilingual scanned pages. Across multiple OCR-VLM combinations, the proposed pipeline consistently outperforms direct PDF-to-VLM baselines, improving field-level accuracy by up to 31.9 percentage points. The best configuration, PaddleOCR with MiniCPM2.6, achieves 87.27 percent accuracy. Ablation studies show that page-level retrieval is the dominant factor in performance improvements, particularly for complex financial statements and non-English documents.

preprint2026arXiv

PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor

To develop a reliable AI for psychological assessment, we introduce \texttt{PsychEval}, a multi-session, multi-therapy, and highly realistic benchmark designed to address three key challenges: \textbf{1) Can we train a highly realistic AI counselor?} Realistic counseling is a longitudinal task requiring sustained memory and dynamic goal tracking. We propose a multi-session benchmark (spanning 6-10 sessions across three distinct stages) that demands critical capabilities such as memory continuity, adaptive reasoning, and longitudinal planning. The dataset is annotated with extensive professional skills, comprising over 677 meta-skills and 4577 atomic skills. \textbf{2) How to train a multi-therapy AI counselor?} While existing models often focus on a single therapy, complex cases frequently require flexible strategies among various therapies. We construct a diverse dataset covering five therapeutic modalities (Psychodynamic, Behaviorism, CBT, Humanistic Existentialist, and Postmodernist) alongside an integrative therapy with a unified three-stage clinical framework across six core psychological topics. \textbf{3) How to systematically evaluate an AI counselor?} We establish a holistic evaluation framework with 18 therapy-specific and therapy-shared metrics across Client-Level and Counselor-Level dimensions. To support this, we also construct over 2,000 diverse client profiles. Extensive experimental analysis fully validates the superior quality and clinical fidelity of our dataset. Crucially, \texttt{PsychEval} transcends static benchmarking to serve as a high-fidelity reinforcement learning environment that enables the self-evolutionary training of clinically responsible and adaptive AI counselors.

preprint2022arXiv

Ultrasound-Guided Assistive Robots for Scoliosis Assessment with Optimization-based Control and Variable Impedance

Assistive robots for healthcare have seen a growing demand due to the great potential of relieving medical practitioners from routine jobs. In this paper, we investigate the development of an optimization-based control framework for an ultrasound-guided assistive robot to perform scoliosis assessment. A conventional procedure for scoliosis assessment with ultrasound imaging typically requires a medical practitioner to slide an ultrasound probe along a patient's back. To automate this type of procedure, we need to consider multiple objectives, such as contact force, position, orientation, energy, posture, etc. To address the aforementioned components, we propose to formulate the control framework design as a quadratic programming problem with each objective weighed by its task priority subject to a set of equality and inequality constraints. In addition, as the robot needs to establish constant contact with the patient during spine scanning, we incorporate variable impedance regulation of the end-effector position and orientation in the control architecture to enhance safety and stability during the physical human-robot interaction. Wherein, the variable impedance gains are retrieved by learning from the medical expert's demonstrations. The proposed methodology is evaluated by conducting real-world experiments of autonomous scoliosis assessment with a robot manipulator xArm. The effectiveness is verified by the obtained coronal spinal images of both a phantom and a human subject.

preprint2020arXiv

The Unusual Eruption of the Extragalactic Classical Nova M31N 2017-09a

M31N 2017-09a is a classical nova and was observed for some 160 days following its initial eruption, during which time it underwent a number of bright secondary outbursts. The light-curve is characterized by continual variation with excursions of at least 0.5 magnitudes on a daily time-scale. The lower envelope of the eruption suggests that a single power-law can describe the decline rate. The eruption is relatively long with $t_2 = 111$, and $t_3 = 153$ days.