Source author record

Dongyu Zhang

Dongyu Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision Machine Learning Applications Computation and Language cs.CY Robotics Social and Information Networks

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models

Vision-Language-Action (VLA) models remain brittle in long-horizon, contact-rich manipulation because success-only imitation provides little supervision for execution drift, while failed rollouts are often discarded. We introduce RePO-VLA, a recovery-driven policy optimization framework that assigns distinct roles to success, recovery, and failure trajectories. RePO-VLA first applies Recovery-Aware Initialization (RAI), slicing recovery segments and resetting history so corrective actions depend on the current adverse state rather than the preceding failure. It then learns a Progress-Aware Semantic Value Function (PAS-VF), aligning spatiotemporal trajectory features with instructions and successful references. The resulting labels salvage useful failure prefixes via reliability decay, while low-value labels mark drift and terminal breakdowns, teaching differences among nominal, failed, and corrective actions. The data engine turns adverse states into planner-generated or human-collected corrective rollouts, teaching recovery to the success manifold. Value-Conditioned Refinement (VCR) trains the policy to prefer high-progress actions. At deployment, a fixed high value ($v=1.0$) biases actions toward the learned success manifold without online failure detectors or heuristic retries. We introduce FRBench, with standardized error injection and recovery-focused evaluation. Across simulated and real-world bimanual tasks, RePO-VLA improves robustness, raising adversarial success from 20% to 75% on average and up to 80% in scaled real-world trials.

preprint2026arXiv

VisualQuest: A Benchmark for Abstract Visual Reasoning in MLLMs

We introduce VisualQuest, a novel dataset designed to rigorously evaluate multimodal large language models (MLLMs) on abstract visual reasoning tasks that require the integration of symbolic, cultural, and linguistic knowledge. Unlike existing benchmarks that focus on direct image captioning or classification of realistic images, VisualQuest comprises 3,551 non-photographic, stylized images spanning four categories: Public Figures, Popular Culture, Linguistic Expressions, and Literary Works. Each image is paired with targeted questions to probe complex reasoning. We benchmark ten state-of-the-art MLLMs and find that only Gemini-2.5-flash and GPT-4o achieve strong overall performance, while 3.7 percent of the images remain unrecognized by any model, underscoring persistent challenges in multimodal understanding. Fine-grained analysis shows that Gemini excels at recognizing stylized public figures, whereas GPT-4o leads in linguistic reasoning tasks such as visual puns and emoji combinations. VisualQuest provides a comprehensive and challenging resource for advancing research in abstract visual reasoning and highlights key areas for future model improvement. The dataset is available at https://github.com/xkt88/VISUALQUEST.

preprint2022arXiv

Instant Response Few-shot Object Detection with Meta Strategy and Explicit Localization Inference

Aiming at recognizing and localizing the object of novel categories by a few reference samples, few-shot object detection (FSOD) is a quite challenging task. Previous works often depend on the fine-tuning process to transfer their model to the novel category and rarely consider the defect of fine-tuning, resulting in many application drawbacks. For example, these methods are far from satisfying in the episode-changeable scenarios due to excessive fine-tuning times, and their performance on low-quality (e.g., low-shot and class-incomplete) support sets degrades severely. To this end, this paper proposes an instant response few-shot object detector (IR-FSOD) that can accurately and directly detect the objects of novel categories without the fine-tuning process. To accomplish the objective, we carefully analyze the defects of individual modules in the Faster R-CNN framework under the FSOD setting and then extend it to IR-FSOD by improving these defects. Specifically, we first propose two simple but effective meta-strategies for the box classifier and RPN module to enable the object detection of novel categories with instant response. Then, we introduce two explicit inferences into the localization module to alleviate its over-fitting to the base categories, including explicit localization score and semi-explicit box regression. Extensive experiments show that the IR-FSOD framework not only achieves few-shot object detection with the instant response but also reaches state-of-the-art performance in precision and recall under various FSOD settings.

preprint2022arXiv

Surf or sleep? Understanding the influence of bedtime patterns on campus

Poor sleep habits may cause serious problems of mind and body, and it is a commonly observed issue for college students due to study workload as well as peer and social influence. Understanding its impact and identifying students with poor sleep habits matters a lot in educational management. Most of the current research is either based on self-reports and questionnaires, suffering from a small sample size and social desirability bias, or the methods used are not suitable for the education system. In this paper, we develop a general data-driven method for identifying students' sleep patterns according to their Internet access pattern stored in the education management system and explore its influence from various aspects. First, we design a Possion-based probabilistic mixture model to cluster students according to the distribution of bedtime and identify students who are used to staying up late. Second, we profile students from five aspects (including eight dimensions) based on campus-behavior data and build Bayesian networks to explore the relationship between behavioral characteristics and sleeping habits. Finally, we test the predictability of sleeping habits. This paper not only contributes to the understanding of student sleep from a cognitive and behavioral perspective but also presents a new approach that provides an effective framework for various educational institutions to detect the sleeping patterns of students.

preprint2022arXiv

TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks

Foodborne illness is a serious but preventable public health problem -- with delays in detecting the associated outbreaks resulting in productivity loss, expensive recalls, public safety hazards, and even loss of life. While social media is a promising source for identifying unreported foodborne illnesses, there is a dearth of labeled datasets for developing effective outbreak detection models. To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks. TWEET-FID collected from Twitter is annotated with three facets: tweet class, entity type, and slot type, with labels produced by experts as well as by crowdsource workers. We introduce several domain tasks leveraging these three facets: text relevance classification (TRC), entity mention detection (EMD), and slot filling (SF). We describe the end-to-end methodology for dataset design, creation, and labeling for supporting model development for these tasks. A comprehensive set of results for these tasks leveraging state-of-the-art single- and multi-task deep learning methods on the TWEET-FID dataset are provided. This dataset opens opportunities for future research in foodborne outbreak detection.

preprint2020arXiv

Judging a Book by Its Cover: The Effect of Facial Perception on Centrality in Social Networks

Facial appearance matters in social networks. Individuals frequently make trait judgments from facial clues. Although these face-based impressions lack the evidence to determine validity, they are of vital importance, because they may relate to human network-based social behavior, such as seeking certain individuals for help, advice, dating, and cooperation, and thus they may relate to centrality in social networks. However, little to no work has investigated the apparent facial traits that influence network centrality, despite the large amount of research on attributions of the central position including personality and behavior. In this paper, we examine whether perceived traits based on facial appearance affect network centrality by exploring the initial stage of social network formation in a first-year college residential area. We took face photos of participants who are freshmen living in the same residential area, and we asked them to nominate community members linking to different networks. We then collected facial perception data by requiring other participants to rate facial images for three main attributions: dominance, trustworthiness, and attractiveness. Meanwhile, we proposed a framework to discover how facial appearance affects social networks. Our results revealed that perceived facial traits were correlated with the network centrality and that they were indicative to predict the centrality of people in different networks. Our findings provide psychological evidence regarding the interaction between faces and network centrality. Our findings also offer insights in to a combination of psychological and social network techniques, and they highlight the function of facial bias in cuing and signaling social traits. To the best of our knowledge, we are the first to explore the influence of facial perception on centrality in social networks.

preprint2020arXiv

Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking

The success of DNNs has driven the extensive applications of person re-identification (ReID) into a new era. However, whether ReID inherits the vulnerability of DNNs remains unexplored. To examine the robustness of ReID systems is rather important because the insecurity of ReID systems may cause severe losses, e.g., the criminals may use the adversarial perturbations to cheat the CCTV systems. In this work, we examine the insecurity of current best-performing ReID models by proposing a learning-to-mis-rank formulation to perturb the ranking of the system output. As the cross-dataset transferability is crucial in the ReID domain, we also perform a back-box attack by developing a novel multi-stage network architecture that pyramids the features of different levels to extract general and transferable features for the adversarial perturbations. Our method can control the number of malicious pixels by using differentiable multi-shot sampling. To guarantee the inconspicuousness of the attack, we also propose a new perception loss to achieve better visual quality. Extensive experiments on four of the largest ReID benchmarks (i.e., Market1501 [45], CUHK03 [18], DukeMTMC [33], and MSMT17 [40]) not only show the effectiveness of our method, but also provides directions of the future improvement in the robustness of ReID systems. For example, the accuracy of one of the best-performing ReID systems drops sharply from 91.8% to 1.4% after being attacked by our method. Some attack results are shown in Fig. 1. The code is available at https://github.com/whj363636/Adversarial-attack-on-Person-ReID-With-Deep-Mis-Ranking.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint