Researcher profile

Dongxing Mao

Dongxing Mao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2024arXiv

ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity. Existing works leveraging Large Language Model (LLM) or LLM-based AI agents have shown capabilities in automating tasks on Android and Web platforms. However, these tasks are primarily aimed at simple device usage and entertainment operations. This paper presents a novel benchmark, AssistGUI, to evaluate whether models are capable of manipulating the mouse and keyboard on the Windows platform in response to user-requested tasks. We carefully collected a set of 100 tasks from nine widely-used software applications, such as, After Effects and MS Word, each accompanied by the necessary project files for better evaluation. Moreover, we propose an advanced Actor-Critic Embodied Agent framework, which incorporates a sophisticated GUI parser driven by an LLM-agent and an enhanced reasoning mechanism adept at handling lengthy procedural tasks. Our experimental results reveal that our GUI Parser and Reasoning mechanism outshine existing methods in performance. Nevertheless, the potential remains substantial, with the best model attaining only a 46% success rate on our benchmark. We conclude with a thorough analysis of the current methods' limitations, setting the stage for future breakthroughs in this domain.

preprint2022arXiv

AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant

A long-standing goal of intelligent assistants such as AR glasses/robots has been to assist users in affordance-centric real-world scenarios, such as "how can I run the microwave for 1 minute?". However, there is still no clear task definition and suitable benchmarks. In this paper, we define a new task called Affordance-centric Question-driven Task Completion, where the AI assistant should learn from instructional videos to provide step-by-step help in the user's view. To support the task, we constructed AssistQ, a new dataset comprising 531 question-answer samples from 100 newly filmed instructional videos. We also developed a novel Question-to-Actions (Q2A) model to address the AQTC task and validate it on the AssistQ dataset. The results show that our model significantly outperforms several VQA-related baselines while still having large room for improvement. We expect our task and dataset to advance Egocentric AI Assistant's development. Our project page is available at: https://showlab.github.io/assistq/.

preprint2020arXiv

Ultra-broadband acoustic ventilation barriers via hybrid-functional metasurfaces

Ventilation barriers allowing simultaneous sound blocking and free airflow passage are of great challenge but necessary for particular scenarios calling for sound-proofing ventilation. Previous works based on local resonance or Fano-like interference serve a narrow working range around the resonant or destructive-interference frequency. Efforts made on broadband designs show a limited bandwidth typically smaller than half an octave. Here, we theoretically design an ultra-broadband ventilation barrier via hybridizing dissipation and interference. Confirmed by experiments, the synergistic effect from our hybrid-functional metasurface significantly expand the scope of its working frequencies, leading to an effective blocking of more than 90% of incident energy in the range of 650-2000 Hz, while its structural thickness is only 53 mm $(\sim λ/ 10)$. Our design shows great flexibility in customizing the broadband and is capable of handling sound coming from various directions, which has potential in air-permeable yet sound-proofing applications.