Source author record

Yichao Jin

Yichao Jin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Multimedia Networking and Internet Architecture

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Multistage Extraction Pipeline for Long Scanned Financial Documents: An Empirical Study in Industrial KYC Workflows

Structured information extraction from long, multilingual scanned financial documents is a core requirement in industrial KYC and compliance workflows. These documents are typically non machine readable, noisy, and visually heterogeneous. They usually span dozens of pages while containing only sparse task relevant information. Although recent vision-language models achieve strong benchmark performance, directly applying them end to end to full financial reports often leads to unreliable extraction under real world conditions. We present a multistage extraction framework that integrates image preprocessing, multilingual OCR, hybrid page-level retrieval, and compact VLM-based structured extraction. The design separates page localization from multimodal reasoning, enabling more accurate extraction from complex multipage documents. We evaluated the framework on 120 production KYC documents comprising about 3000 multilingual scanned pages. Across multiple OCR-VLM combinations, the proposed pipeline consistently outperforms direct PDF-to-VLM baselines, improving field-level accuracy by up to 31.9 percentage points. The best configuration, PaddleOCR with MiniCPM2.6, achieves 87.27 percent accuracy. Ablation studies show that page-level retrieval is the dominant factor in performance improvements, particularly for complex financial statements and non-English documents.

preprint2022arXiv

Reliable IoT Firmware Updates: A Large-scale Mesh Network Performance Investigation

Internet of Things (IoT) networks require regular firmware updates to ensure enhanced security and stability. As we move towards methodologies of codifying security and policy decisions and exchanging them over IoT large-scale deployments (security-as-a-code), these demands should be considered a routine operation. However, rolling out firmware updates to large-scale networks presents a crucial challenge for constrained wireless environments with large numbers of IoT devices. This paper initially investigates how the current state-of-the-art protocols operate in such adverse conditions by measuring various Quality-of-Service (QoS) Key Performance Indicators (KPIs) of the shared wireless medium. We later discuss how Concurrent Transmissions (CT) can extend the scalability of IoT protocols and ensure reliable firmware roll-outs over large geographical areas. Measuring KPIs such as the mesh join time, the throughput, and the number of nodes forming a network, we provide great insight into how an IoT environment will behave under a large-scale firmware roll-out. Finally, we conducted our performance investigation over the UMBRELLA platform, a real-world IoT testbed deployed in Bristol, UK. This ensures our findings represent a realistic IoT scenario and meet the strict QoS requirements of today's IoT applications.

preprint2014arXiv

Enhancing User Experience for Multi-Screen Social TV Streaming over Wireless Networks

Recently, multi-screen cloud social TV is invented to transform TV into social experience. People watching the same content on social TV may come from different locations, while freely interact with each other through text, image, audio and video. This crucial virtual living-room experience adds social aspects into existing performance metrics. In this paper, we parse social TV user experience into three elements (i.e., inter-user delay, video quality of experience (QoE), and resource efficiency), and provide a joint analytical framework to enhance user experience. Specifically, we propose a cloud-based optimal playback rate allocation scheme to maximize the overall QoE while upper bounding inter-user delay. Experiment results show that our algorithm achieves near-optimal tradeoff between inter-user delay and video quality, and demonstrates resilient performance even under very fast wireless channel fading.