Researcher profile

Hongkai Chen

Hongkai Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks

Procedural tasks with multiple ordered steps are ubiquitous in daily life. Recent advances in multimodal large language models (MLLMs) have enabled personal assistants that support daily activities. However, existing systems primarily provide reactive guidance triggered by user queries, or limited proactive assistance for isolated short-term events rather than long-horizon procedural tasks. In this work, we introduce Pro$^2$Assist, a step-aware proactive assistant that continuously tracks fine-grained task progress and reasons over the user's evolving state to provide timely assistance throughout tasks. Pro$^2$Assist leverages multimodal data from augmented reality (AR) glasses to achieve motion-based perception. It then extracts step-oriented procedural context from multi-scale temporal dynamics and task-specific expert knowledge. Based on both sensory input and procedural context, Pro$^2$Assist performs continuous reasoning to infer user needs and display timely assistance on AR glasses. We evaluate Pro$^2$Assist using a dataset curated from public sources and a real-world dataset collected on our testbed with AR glasses. Extensive evaluations show that Pro$^2$Assist outperforms the best-performing baselines by over 21% in procedural action understanding accuracy, and it achieves up to 2.29x the proactive timing accuracy of baselines. A user study with 20 participants further shows that 90% find Pro$^2$Assist useful, indicating its effectiveness for real-world procedural assistance.

preprint2022arXiv

An STL-based Approach to Resilient Control for Cyber-Physical Systems

We present ResilienC, a framework for resilient control of Cyber-Physical Systems subject to STL-based requirements. ResilienC utilizes a recently developed formalism for specifying CPS resiliency in terms of sets of $(\mathit{rec},\mathit{dur})$ real-valued pairs, where $\mathit{rec}$ represents the system's capability to rapidly recover from a property violation (recoverability), and $\mathit{dur}$ is reflective of its ability to avoid violations post-recovery (durability). We define the resilient STL control problem as one of multi-objective optimization, where the recoverability and durability of the desired STL specification are maximized. When neither objective is prioritized over the other, the solution to the problem is a set of Pareto-optimal system trajectories. We present a precise solution method to the resilient STL control problem using a mixed-integer linear programming encoding and an a posteriori $ε$-constraint approach for efficiently retrieving the complete set of optimally resilient solutions. In ResilienC, at each time-step, the optimal control action selected from the set of Pareto-optimal solutions by a Decision Maker strategy realizes a form of Model Predictive Control. We demonstrate the practical utility of the ResilienC framework on two significant case studies: autonomous vehicle lane keeping and deadline-driven, multi-region package delivery.

preprint2022arXiv

An STL-based Formulation of Resilience in Cyber-Physical Systems

Resiliency is the ability to quickly recover from a violation and avoid future violations for as long as possible. Such a property is of fundamental importance for Cyber-Physical Systems (CPS), and yet, to date, there is no widely agreed-upon formal treatment of CPS resiliency. We present an STL-based framework for reasoning about resiliency in CPS in which resiliency has a syntactic characterization in the form of an STL-based Resiliency Specification (SRS). Given an arbitrary STL formula $φ$, time bounds $α$ and $β$, the SRS of $φ$, $R_{α,β}(φ)$, is the STL formula $\negφ\mathbf{U}_{[0,α]}\mathbf{G}_{[0,β)}φ$, specifying that recovery from a violation of $φ$ occur within time $α$ (recoverability), and subsequently that $φ$ be maintained for duration $β$ (durability). These $R$-expressions, which are atoms in our SRS logic, can be combined using STL operators, allowing one to express composite resiliency specifications, e.g., multiple SRSs must hold simultaneously, or the system must eventually be resilient. We define a quantitative semantics for SRSs in the form of a Resilience Satisfaction Value (ReSV) function $r$ and prove its soundness and completeness w.r.t. STL's Boolean semantics. The $r$-value for $R_{α,β}(φ)$ atoms is a singleton set containing a pair quantifying recoverability and durability. The $r$-value for a composite SRS formula results in a set of non-dominated recoverability-durability pairs, given that the ReSVs of subformulas might not be directly comparable (e.g., one subformula has superior durability but worse recoverability than another). To the best of our knowledge, this is the first multi-dimensional quantitative semantics for an STL-based logic. Two case studies demonstrate the practical utility of our approach.

preprint2022arXiv

ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications. To capture context at both global and local granularity, we propose ASpanFormer, a Transformer-based detector-free matcher that is built on hierarchical attention structure, adopting a novel attention operation which is capable of adjusting attention span in a self-adaptive manner. To achieve this goal, first, flow maps are regressed in each cross attention phase to locate the center of search region. Next, a sampling grid is generated around the center, whose size, instead of being empirically configured as fixed, is adaptively computed from a pixel uncertainty estimated along with the flow map. Finally, attention is computed across two images within derived regions, referred to as attention span. By these means, we are able to not only maintain long-range dependencies, but also enable fine-grained attention among pixels of high relevance that compensates essential locality and piece-wise smoothness in matching tasks. State-of-the-art accuracy on a wide range of evaluation benchmarks validates the strong matching capability of our method.

preprint2022arXiv

CarFi: Rider Localization Using Wi-Fi CSI

With the rise of hailing services, people are increasingly relying on shared mobility (e.g., Uber, Lyft) drivers to pick up for transportation. However, such drivers and riders have difficulties finding each other in urban areas as GPS signals get blocked by skyscrapers, in crowded environments (e.g., in stadiums, airports, and bars), at night, and in bad weather. It wastes their time, creates a bad user experience, and causes more CO2 emissions due to idle driving. In this work, we explore the potential of Wi-Fi to help drivers to determine the street side of the riders. Our proposed system is called CarFi that uses Wi-Fi CSI from two antennas placed inside a moving vehicle and a data-driven technique to determine the street side of the rider. By collecting real-world data in realistic and challenging settings by blocking the signal with other people and other parked cars, we see that CarFi is 95.44% accurate in rider-side determination in both line of sight (LoS) and non-line of sight (nLoS) conditions, and can be run on an embedded GPU in real-time.

preprint2021arXiv

PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Removing outlier correspondences is one of the critical steps for successful feature-based point cloud registration. Despite the increasing popularity of introducing deep learning methods in this field, spatial consistency, which is essentially established by a Euclidean transformation between point clouds, has received almost no individual attention in existing learning frameworks. In this paper, we present PointDSC, a novel deep neural network that explicitly incorporates spatial consistency for pruning outlier correspondences. First, we propose a nonlocal feature aggregation module, weighted by both feature and spatial coherence, for feature embedding of the input correspondences. Second, we formulate a differentiable spectral matching module, supervised by pairwise spatial compatibility, to estimate the inlier confidence of each correspondence from the embedded features. With modest computation cost, our method outperforms the state-of-the-art hand-crafted and learning-based outlier rejection approaches on several real-world datasets by a significant margin. We also show its wide applicability by combining PointDSC with different 3D local descriptors.

preprint2020arXiv

ASLFeat: Learning Local Features of Accurate Shape and Localization

This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors. First, the ability to estimate the local shape (scale, orientation, etc.) of feature points is often neglected during dense feature extraction, while the shape-awareness is crucial to acquire stronger geometric invariance. Second, the localization accuracy of detected keypoints is not sufficient to reliably recover camera geometry, which has become the bottleneck in tasks such as 3D reconstruction. In this paper, we present ASLFeat, with three light-weight yet effective modifications to mitigate above issues. First, we resort to deformable convolutional networks to densely estimate and apply local transformation. Second, we take advantage of the inherent feature hierarchy to restore spatial resolution and low-level details for accurate keypoint localization. Finally, we use a peakiness measurement to relate feature responses and derive more indicative detection scores. The effect of each modification is thoroughly studied, and the evaluation is extensively conducted across a variety of practical scenarios. State-of-the-art results are reported that demonstrate the superiority of our methods.

preprint2020arXiv

MPC-guided Imitation Learning of Neural Network Policies for the Artificial Pancreas

Even though model predictive control (MPC) is currently the main algorithm for insulin control in the artificial pancreas (AP), it usually requires complex online optimizations, which are infeasible for resource-constrained medical devices. MPC also typically relies on state estimation, an error-prone process. In this paper, we introduce a novel approach to AP control that uses Imitation Learning to synthesize neural-network insulin policies from MPC-computed demonstrations. Such policies are computationally efficient and, by instrumenting MPC at training time with full state information, they can directly map measurements into optimal therapy decisions, thus bypassing state estimation. We apply Bayesian inference via Monte Carlo Dropout to learn policies, which allows us to quantify prediction uncertainty and thereby derive safer therapy decisions. We show that our control policies trained under a specific patient model readily generalize (in terms of model parameters and disturbance distributions) to patient cohorts, consistently outperforming traditional MPC with state estimation.