Source author record

Chang Zhao

Chang Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AIVD: Adaptive Edge-Cloud Collaboration for Accurate and Efficient Industrial Visual Detection

Multimodal large language models (MLLMs) demonstrate exceptional capabilities in semantic understanding and visual reasoning, yet they still face challenges in precise object localization and resource-constrained edge-cloud deployment. To address this, this paper proposes the AIVD framework, which achieves unified precise localization and high-quality semantic generation through the collaboration between lightweight edge detectors and cloud-based MLLMs. To enhance the cloud MLLM's robustness against edge cropped-box noise and scenario variations, we design an efficient fine-tuning strategy with visual-semantic collaborative augmentation, significantly improving classification accuracy and semantic consistency. Furthermore, to maintain high throughput and low latency across heterogeneous edge devices and dynamic network conditions, we propose a heterogeneous resource-aware dynamic scheduling algorithm. Experimental results demonstrate that AIVD substantially reduces resource consumption while improving MLLM classification performance and semantic generation quality. The proposed scheduling strategy also achieves higher throughput and lower latency across diverse scenarios.

preprint2026arXiv

Designing streetscapes from street-view imagery using diffusion models

Street-view imagery (SVI) is widely used to quantify key indicators of urban environment, such as green- ery, sky, or road view indices. However, existing studies largely focus on measuring current streetscapes and rarely support the generation of alternative and non-existing urban scenarios, which is a core task in geospatial disciplines such as urban planning and design. To address this gap, we propose a gener- ative multimodal AI framework that synthesizes alternative streetscapes conditioned on targeted visual metrics, enabling direct visual exploration of urban scenarios. We first construct a multimodal dataset that aligns SVIs with textual descriptions, segmentation maps, road masks, and quantitative metrics of visual elements in Chicago and Orlando. Using this dataset, we demonstrate that diffusion models can produce realistic and semantically consistent streetscape imagery while responding to both textual and imagery controls. Our quantitative evaluations show that incorporating visual controls can improve semantic consistency, reducing the LPIPS index by approximately 6% while maintaining global visual realism. In addition, overall semantic consistency increases by 23.7% in Orlando and 46.4% in Chicago, as measured by the mIoU index, with class-wise gains exceeding even 100% improvement for building view indices. Streetscape generation can be controlled in a fine-grained manner by both visual and textual prompts, and when textual and visual controls conflict, imagery controls consistently dominate, indicating a clear control hierarchy and the importance of further developing visual controls for urban scene generation. Overall, this work establishes an important benchmark for streetscape generation us- ing SVIs and diffusion models, and illustrates how generative AI can serve as a practical, scalable, and controllable approach for urban scenario exploration.

preprint2026arXiv

ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving

With the rapid advancement of large language models (LLMs) technologies, their application in the domain of autonomous driving has become increasingly widespread. However, existing methods suffer from unstructured reasoning, poor generalization, and misalignment with human driving intent. While Chain-of-Thought (CoT) reasoning enhances decision transparency, conventional supervised fine-tuning (SFT) fails to fully exploit its potential, and reinforcement learning (RL) approaches face instability and suboptimal reasoning depth. We propose ThinkDrive, a CoT guided progressive RL fine-tuning framework for autonomous driving that synergizes explicit reasoning with difficulty-aware adaptive policy optimization. Our method employs a two-stage training strategy. First, we perform SFT using CoT explanations. Then, we apply progressive RL with a difficulty-aware adaptive policy optimizer that dynamically adjusts learning intensity based on sample complexity. We evaluate our approach on a public dataset. The results show that ThinkDrive outperforms strong RL baselines by 1.45%, 1.95%, and 1.01% on exam, easy-exam, and accuracy, respectively. Moreover, a 2B-parameter model trained with our method surpasses the much larger GPT-4o by 3.28% on the exam metric.

Chang Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

AIVD: Adaptive Edge-Cloud Collaboration for Accurate and Efficient Industrial Visual Detection

Designing streetscapes from street-view imagery using diffusion models

ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving