Source author record

Ge Gao

Ge Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Human-Computer Interaction Machine Learning physics.ins-det Applications Artificial Intelligence Computation and Language Cryptography and Security physics.acc-ph Robotics Software Engineering

Catalog footprint

What is connected

11works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

WebTrap: Stealthy Mid-Task Hijacking of Browser Agents During Navigation

Browser agents are increasingly deployed in long-horizon tasks, which require executing extended action chains to accomplish user goals. However, this prolonged execution process provides attackers with more opportunities to inject malicious instructions. Existing prompt injection attacks against browser agents expose two key gaps: (1) low effectiveness, as attacks optimized for toy baselines fail to achieve end-to-end goals in real-world scenarios with complex environments and longer steps; (2) weak stealthiness, since most attacks pit the attack goal against the user goal, causing a significant drop in system usability under attack. To address these gaps, we propose WebTrap, a mid-task hijacking injection attack. It employs multi-step instruction fusion steering to seamlessly combine both goals, enabling the agent to resume the original user task after executing the attack goal. Furthermore, we design a context-grounded generation method to align the injected content with the task environment and system instructions, maximizing the hijacking success rate. Extensive experiments on two browser agent tasks, based on extended WASP and InjecAgent environments, demonstrate that our method achieves a high attack success rate while preserving the usability of the original system. We find that WebTrap exploits the agent's navigation vulnerabilities, binding the two goals so tightly that standard defense mechanisms cannot restore the system to normal operation. These findings reveal a critical vulnerability in agent systems during long-horizon tasks that they can be stealthily hijacked.

preprint2024arXiv

GridFormer: Point-Grid Transformer for Surface Reconstruction

Implicit neural networks have emerged as a crucial technology in 3D surface reconstruction. To reconstruct continuous surfaces from discrete point clouds, encoding the input points into regular grid features (plane or volume) has been commonly employed in existing approaches. However, these methods typically use the grid as an index for uniformly scattering point features. Compared with the irregular point features, the regular grid features may sacrifice some reconstruction details but improve efficiency. To take full advantage of these two types of features, we introduce a novel and high-efficiency attention mechanism between the grid and point features named Point-Grid Transformer (GridFormer). This mechanism treats the grid as a transfer point connecting the space and point cloud. Our method maximizes the spatial expressiveness of grid features and maintains computational efficiency. Furthermore, optimizing predictions over the entire space could potentially result in blurred boundaries. To address this issue, we further propose a boundary optimization strategy incorporating margin binary cross-entropy loss and boundary sampling. This approach enables us to achieve a more precise representation of the object structure. Our experiments validate that our method is effective and outperforms the state-of-the-art approaches under widely used benchmarks by producing more precise geometry reconstructions. The code is available at https://github.com/list17/GridFormer.

preprint2022arXiv

Calico: Relocatable On-cloth Wearables with Fast, Reliable, and Precise Locomotion

We explore Calico, a miniature relocatable wearable system with fast and precise locomotion for on-body interaction, actuation and sensing. Calico consists of a two-wheel robot and an on-cloth track mechanism or "railway," on which the robot travels. The robot is self-contained, small in size, and has additional sensor expansion options. The track system allows the robot to move along the user's body and reach any predetermined location. It also includes rotational switches to enable complex routing options when diverging tracks are presented. We report the design and implementation of Calico with a series of technical evaluations for system performance. We then present a few application scenarios, and user studies to understand the potential of Calico as a dance trainer and also explore the qualitative perception of our scenarios to inform future research in this space.

preprint2022arXiv

High-Uniformity Calculation Method of Four-Coil Configuration in Large-Caliber Magnetic Field Immunity Testing System

Power electronic equipment regulated by the International Thermonuclear Experimental Reactor (ITER) organization must pass the relevant steady-state magnetic field immunity test. The main body of magnetic field immunity test is magnetic field generator coil. Through mathematical derivation in this paper, the magnetic field calculation formulas of four-coil configuration under ideal and actual models are obtained. The traditional method of magnetic field performance calculation is compared with the general formula method under the ideal model. A global parameter optimization method based on Lagrange Multiplier by KKT conditions is proposed to obtain the coil parameters of high-uniformity magnetic field. The magnetic field distribution in the uniform zone is revealed by the finite element method. The model analysis is proved to be correct and effective by experimental results. The research of this paper provides a practical scheme for the coil design with high magnetic field and high-quality uniformity.

preprint2022arXiv

Reconstructing Missing EHRs Using Time-Aware Within- and Cross-Visit Information for Septic Shock Early Prediction

Real-world Electronic Health Records (EHRs) are often plagued by a high rate of missing data. In our EHRs, for example, the missing rates can be as high as 90% for some features, with an average missing rate of around 70% across all features. We propose a Time-Aware Dual-Cross-Visit missing value imputation method, named TA-DualCV, which spontaneously leverages multivariate dependencies across features and longitudinal dependencies both within- and cross-visit to maximize the information extracted from limited observable records in EHRs. Specifically, TA-DualCV captures the latent structure of missing patterns across measurements of different features and it also considers the time continuity and capture the latent temporal missing patterns based on both time-steps and irregular time-intervals. TA-DualCV is evaluated using three large real-world EHRs on two types of tasks: an unsupervised imputation task by varying mask rates up to 90% and a supervised 24-hour early prediction of septic shock using Long Short-Term Memory (LSTM). Our results show that TA-DualCV performs significantly better than all of the existing state-of-the-art imputation baselines, such as DETROIT and TAME, on both types of tasks.

preprint2022arXiv

Simulating Bandit Learning from User Feedback for Extractive Question Answering

We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We cast the problem as contextual bandit learning, and analyze the characteristics of several learning scenarios with focus on reducing data annotation. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers, and that one can use existing datasets to deploy systems in new domains without any annotation, but instead improving the system on-the-fly via user feedback.

preprint2022arXiv

The Preliminary design of DC Magnet Power Supply System for ITER Static Magnetic Field Test facility

ITER (International Thermonuclear Experimental Reactor) static magnetic field (SMF) test facility requires a DC power supply with low voltage, high current, and high stability. Due to the limitation ofswitching loss, there is a contradiction between the output current capability and the output ripple. Large output current usually leads to low switching frequency, and low switching frequency will generate a large number of harmonics. To solve the problems, a topology based on the interleaving parallel buck converter is used and tested in this paper. Moreover, the topology is realized with only a small number of switching metal-oxide-semiconductor field effect transistors (MOSFETs). This article introduces the system design scheme and control method in detail. The analysis of harmonic and simulation are carried out. The validity of proposed scheme and control strategy were confirmed by experiments, the power supply system can supply large current of 15kA and has ability of low ripple.

preprint2021arXiv

CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds

It is often desired to train 6D pose estimation systems on synthetic data because manual annotation is expensive. However, due to the large domain gap between the synthetic and real images, synthesizing color images is expensive. In contrast, this domain gap is considerably smaller and easier to fill for depth information. In this work, we present a system that regresses 6D object pose from depth information represented by point clouds, and a lightweight data synthesis pipeline that creates synthetic point cloud segments for training. We use an augmented autoencoder (AAE) for learning a latent code that encodes 6D object pose information for pose regression. The data synthesis pipeline only requires texture-less 3D object models and desired viewpoints, and it is cheap in terms of both time and hardware storage. Our data synthesis process is up to three orders of magnitude faster than commonly applied approaches that render RGB image data. We show the effectiveness of our system on the LineMOD, LineMOD Occlusion, and YCB Video datasets. The implementation of our system is available at: https://github.com/GeeeG/CloudAAE.

preprint2021arXiv

Early Performance Prediction using Interpretable Patterns in Programming Process Data

Instructors have limited time and resources to help struggling students, and these resources should be directed to the students who most need them. To address this, researchers have constructed models that can predict students' final course performance early in a semester. However, many predictive models are limited to static and generic student features (e.g. demographics, GPA), rather than computing-specific evidence that assesses a student's progress in class. Many programming environments now capture complete time-stamped records of students' actions during programming. In this work, we leverage this rich, fine-grained log data to build a model to predict student course outcomes. From the log data, we extract patterns of behaviors that are predictive of students' success using an approach called differential sequence mining. We evaluate our approach on a dataset from 106 students in a block-based, introductory programming course. The patterns extracted from our approach can predict final programming performance with 79% accuracy using only the first programming assignment, outperforming two baseline methods. In addition, we show that the patterns are interpretable and correspond to concrete, effective -- and ineffective -- novice programming behaviors. We also discuss these patterns and their implications for classroom instruction.

preprint2020arXiv

6D Object Pose Regression via Supervised Learning on Point Clouds

This paper addresses the task of estimating the 6 degrees of freedom pose of a known 3D object from depth information represented by a point cloud. Deep features learned by convolutional neural networks from color information have been the dominant features to be used for inferring object poses, while depth information receives much less attention. However, depth information contains rich geometric information of the object shape, which is important for inferring the object pose. We use depth information represented by point clouds as the input to both deep networks and geometry-based pose refinement and use separate networks for rotation and translation regression. We argue that the axis-angle representation is a suitable rotation representation for deep learning, and use a geodesic loss function for rotation regression. Ablation studies show that these design choices outperform alternatives such as the quaternion representation and L2 loss, or regressing translation and rotation with the same network. Our simple yet effective approach clearly outperforms state-of-the-art methods on the YCB-video dataset. The implementation and trained model are avaliable at: https://github.com/GeeeG/CloudPose.

preprint2020arXiv

CheXplain: Enabling Physicians to Explore and UnderstandData-Driven, AI-Enabled Medical Imaging Analysis

The recent development of data-driven AI promises to automate medical diagnosis; however, most AI functions as 'black boxes' to physicians with limited computational knowledge. Using medical imaging as a point of departure, we conducted three iterations of design activities to formulate CheXplain---a system that enables physicians to explore and understand AI-enabled chest X-ray analysis: (1) a paired survey between referring physicians and radiologists reveals whether, when, and what kinds of explanations are needed; (2) a low-fidelity prototype co-designed with three physicians formulates eight key features; and (3) a high-fidelity prototype evaluated by another six physicians provides detailed summative insights on how each feature enables the exploration and understanding of AI. We summarize by discussing recommendations for future work to design and implement explainable medical AI systems that encompass four recurring themes: motivation, constraint, explanation, and justification.

Ge Gao

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

WebTrap: Stealthy Mid-Task Hijacking of Browser Agents During Navigation

GridFormer: Point-Grid Transformer for Surface Reconstruction

Calico: Relocatable On-cloth Wearables with Fast, Reliable, and Precise Locomotion

High-Uniformity Calculation Method of Four-Coil Configuration in Large-Caliber Magnetic Field Immunity Testing System

Reconstructing Missing EHRs Using Time-Aware Within- and Cross-Visit Information for Septic Shock Early Prediction

Simulating Bandit Learning from User Feedback for Extractive Question Answering

The Preliminary design of DC Magnet Power Supply System for ITER Static Magnetic Field Test facility

CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds

Early Performance Prediction using Interpretable Patterns in Programming Process Data

6D Object Pose Regression via Supervised Learning on Point Clouds

CheXplain: Enabling Physicians to Explore and UnderstandData-Driven, AI-Enabled Medical Imaging Analysis