Source author record

Dongxing Mao

Dongxing Mao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision physics.app-ph

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity. Existing works leveraging Large Language Model (LLM) or LLM-based AI agents have shown capabilities in automating tasks on Android and Web platforms. However, these tasks are primarily aimed at simple device usage and entertainment operations. This paper presents a novel benchmark, AssistGUI, to evaluate whether models are capable of manipulating the mouse and keyboard on the Windows platform in response to user-requested tasks. We carefully collected a set of 100 tasks from nine widely-used software applications, such as, After Effects and MS Word, each accompanied by the necessary project files for better evaluation. Moreover, we propose an advanced Actor-Critic Embodied Agent framework, which incorporates a sophisticated GUI parser driven by an LLM-agent and an enhanced reasoning mechanism adept at handling lengthy procedural tasks. Our experimental results reveal that our GUI Parser and Reasoning mechanism outshine existing methods in performance. Nevertheless, the potential remains substantial, with the best model attaining only a 46% success rate on our benchmark. We conclude with a thorough analysis of the current methods' limitations, setting the stage for future breakthroughs in this domain.

preprint2022arXiv

AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant

A long-standing goal of intelligent assistants such as AR glasses/robots has been to assist users in affordance-centric real-world scenarios, such as "how can I run the microwave for 1 minute?". However, there is still no clear task definition and suitable benchmarks. In this paper, we define a new task called Affordance-centric Question-driven Task Completion, where the AI assistant should learn from instructional videos to provide step-by-step help in the user's view. To support the task, we constructed AssistQ, a new dataset comprising 531 question-answer samples from 100 newly filmed instructional videos. We also developed a novel Question-to-Actions (Q2A) model to address the AQTC task and validate it on the AssistQ dataset. The results show that our model significantly outperforms several VQA-related baselines while still having large room for improvement. We expect our task and dataset to advance Egocentric AI Assistant's development. Our project page is available at: https://showlab.github.io/assistq/.

preprint2020arXiv

Ultra-broadband acoustic ventilation barriers via hybrid-functional metasurfaces

Ventilation barriers allowing simultaneous sound blocking and free airflow passage are of great challenge but necessary for particular scenarios calling for sound-proofing ventilation. Previous works based on local resonance or Fano-like interference serve a narrow working range around the resonant or destructive-interference frequency. Efforts made on broadband designs show a limited bandwidth typically smaller than half an octave. Here, we theoretically design an ultra-broadband ventilation barrier via hybridizing dissipation and interference. Confirmed by experiments, the synergistic effect from our hybrid-functional metasurface significantly expand the scope of its working frequencies, leading to an effective blocking of more than 90% of incident energy in the range of 650-2000 Hz, while its structural thickness is only 53 mm $(\sim λ/ 10)$. Our design shows great flexibility in customizing the broadband and is capable of handling sound coming from various directions, which has potential in air-permeable yet sound-proofing applications.