Source author record

Tao Feng

Tao Feng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence econ.GN q-fin.EC

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

Vision-Language Continual Learning (VLCL) has attracted significant research attention for its robust capabilities, and the adoption of Parameter-Efficient Fine-Tuning (PEFT) strategies is enabling these models to achieve competitive performance with substantially reduced resource consumption. However, dominated First-Order (FO) optimization is prone to trap models in suboptimal local minima, especially in limited exploration subspace within PEFT. To overcome this challenge, this paper pioneers a systematic exploration of adopting Zeroth-Order (ZO) optimization for PEFT-based VLCL. We first identify the incompatibility of naive full-ZO adoption in VLCL due to optimization process instability. We then investigate the application of ZO optimization from a modality branch-wise to a fine-grained layer-wise across various training units to identify an optimal strategy. Besides, a key theoretical insight reveals that vision modality exhibit higher variance than language counterparts in VLCL during the ZO optimization process, and we propose a modality-aware ZO strategy, which adopts gradient sign normalization in ZO and constrains vision modality perturbation to further improve performance. Benefiting from the adoption of ZO optimization, PEFT-based VLCL fulfills better ability to escape local minima during the optimization process, extensive experiments on four benchmarks demonstrate that our method achieves state-of-the-art results.

preprint2026arXiv

Destination Drone: A Comprehensive Analysis of Japanese Consumer Choice Behavior and Intentions for Drone Delivery Services

The potential for drone delivery services to transform logistics systems and consumer behavior has gained increasing attention. However, comprehensive empirical evidence on consumer delivery choice behavior within the context of transportation and urban air logistics remains limited, particularly in Japan. This study addresses this gap by examining Japanese consumers' preferences and behavioral intentions toward drone delivery services. Using a stated preference (SP) survey and discrete choice modeling approaches, including multinomial logit (MNL) and mixed logit (MMNL) models, the analysis evaluates how delivery cost, delivery time, drop-off location, product type, and social influence affect delivery mode choices across different demographic groups. The results indicate that although consumers express interest in drone delivery, perceived cost and concerns related to reliability continue to constrain adoption. Younger and male consumers exhibit higher preferences for drone delivery, while product type, particularly daily consumer goods and medical or healthcare items, plays a significant role in shaping preferences. Post-estimation willingness-to-pay and elasticity analyses further highlight consumers' sensitivity to delivery pricing and speed attributes. Overall, the findings provide actionable insights for logistics service providers and policymakers regarding pricing strategies, service targeting, and deployment approaches for integrating drone delivery into Japan's evolving logistics system.

preprint2026arXiv

MMVIAD: Multi-view Multi-task Video Understanding for Industrial Anomaly Detection

Industrial anomaly detection is critical for manufacturing quality control, yet existing datasets mainly focus on static images or sparse views, which do not fully reflect continuous inspection processes in real industrial scenarios. We introduce MMVIAD (Multi-view Multi-task Video Industrial Anomaly Detection), to the best of our knowledge the first continuous multi-view video dataset for industrial anomaly detection and understanding, together with a benchmark for multi-task evaluation. MMVIAD contains object-centric 2-second inspection clips with approximately 120 degrees of camera motion, covering 48 object categories, 14 environments, and 6 structural anomaly types. It supports anomaly detection, defect classification, object classification, and anomaly visible-time localization. Systematic evaluations on MMVIAD show that current commercial and open-source video MLLMs remain far below human performance, especially for fine-grained defect recognition and temporal grounding. To improve transferable anomaly understanding, we further develop a two-stage post-training pipeline where PS-SFT (Perception-Structured Supervised Fine-Tuning) initializes perception-structured reasoning and VISTA-GRPO (Visibility-grounded Industrial Structured Temporal Anomaly Group Relative Policy Optimization) refines the model with semantic-gated defect reward and visibility-aware temporal reward, producing the final model VISTA. On MMVIAD-Unseen, VISTA improves the base model's average score across the four tasks from 45.0 to 57.5, surpassing GPT-5.4. Source code is available at https://github.com/Georgekeepmoving/MMVIAD.

Tao Feng

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

Destination Drone: A Comprehensive Analysis of Japanese Consumer Choice Behavior and Intentions for Drone Delivery Services

MMVIAD: Multi-view Multi-task Video Understanding for Industrial Anomaly Detection