Source author record

Wansen Wu

Wansen Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cs.CY Multimedia

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Autonomous Crowdsensing: Operating and Organizing Crowdsensing for Sensing Automation

The precise characterization and modeling of Cyber-Physical-Social Systems (CPSS) requires more comprehensive and accurate data, which imposes heightened demands on intelligent sensing capabilities. To address this issue, Crowdsensing Intelligence (CSI) has been proposed to collect data from CPSS by harnessing the collective intelligence of a diverse workforce. Our first and second Distributed/Decentralized Hybrid Workshop on Crowdsensing Intelligence (DHW-CSI) have focused on principles and high-level processes of organizing and operating CSI, as well as the participants, methods, and stages involved in CSI. This letter reports the outcomes of the latest DHW-CSI, focusing on Autonomous Crowdsensing (ACS) enabled by a range of technologies such as decentralized autonomous organizations and operations, large language models, and human-oriented operating systems. Specifically, we explain what ACS is and explore its distinctive features in comparison to traditional crowdsensing. Moreover, we present the ``6A-goal" of ACS and propose potential avenues for future research.

preprint2023arXiv

DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation

Following language instructions to navigate in unseen environments is a challenging task for autonomous embodied agents. With strong representation capabilities, pretrained vision-and-language models are widely used in VLN. However, most of them are trained on web-crawled general-purpose datasets, which incurs a considerable domain gap when used for VLN tasks. To address the problem, we propose a novel and model-agnostic domain-aware prompt learning (DAP) framework. For equipping the pretrained models with specific object-level and scene-level cross-modal alignment in VLN tasks, DAP applies a low-cost prompt tuning paradigm to learn soft visual prompts for extracting in-domain image semantics. Specifically, we first generate a set of in-domain image-text pairs with the help of the CLIP model. Then we introduce soft visual prompts in the input space of the visual encoder in a pretrained model. DAP injects in-domain visual knowledge into the visual encoder of the pretrained model in an efficient way. Experimental results on both R2R and REVERIE show the superiority of DAP compared to existing state-of-the-art methods.

preprint2022arXiv

Vision-Language Navigation: A Survey and Taxonomy

Vision-Language Navigation (VLN) tasks require an agent to follow human language instructions to navigate in previously unseen environments. This challenging field involving problems in natural language processing, computer vision, robotics, etc., has spawn many excellent works focusing on various VLN tasks. This paper provides a comprehensive survey and an insightful taxonomy of these tasks based on the different characteristics of language instructions in these tasks. Depending on whether the navigation instructions are given for once or multiple times, this paper divides the tasks into two categories, i.e., single-turn and multi-turn tasks. For single-turn tasks, we further subdivide them into goal-oriented and route-oriented based on whether the instructions designate a single goal location or specify a sequence of multiple locations. For multi-turn tasks, we subdivide them into passive and interactive tasks based on whether the agent is allowed to question the instruction or not. These tasks require different capabilities of the agent and entail various model designs. We identify progress made on the tasks and look into the limitations of existing VLN models and task settings. Finally, we discuss several open issues of VLN and point out some opportunities in the future, i.e., incorporating knowledge with VLN models and implementing them in the real physical world.