Researcher profile

Xin Lai

Xin Lai contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents

When a phone-use agent avoids harm, does that show safety, or simply inability to act? Existing evaluations often cannot tell. A harmful outcome may be avoided because the agent recognized the risk and chose the safe action, or because it failed to understand the screen or execute any relevant action at all. These cases have different causes and call for different fixes, yet current benchmarks often merge them under task success, refusal, or final harmful outcome. We address this problem with PhoneSafety, a benchmark of 700 safety-critical moments drawn from real phone interactions across more than 130 apps. Each instance isolates the next decision at a risky moment and asks a simple question: does the model take the safe action, take the unsafe action, or fail to do anything useful? We evaluate eight representative phone-use agents under this framework. Our results reveal two main patterns. First, stronger general phone-use ability does not reliably imply safer choices at risky moments. Models that perform better on ordinary app tasks are not always the ones that behave more safely when the next action matters. Second, failures to do anything useful behave like a capability signal rather than a safety signal: they are concentrated in more visually and operationally demanding settings and remain stable when the evaluation protocol changes. Across models, failures split into two recurring patterns: unsafe choices in settings where the model can act but chooses wrongly, and inability to act in more visually and operationally demanding screens. Overall, a harmless outcome is not enough to count as evidence of safety. Evaluating phone-use agents requires separating unsafe judgment from inability to act.

preprint2022arXiv

DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation

Unsupervised domain adaptation in semantic segmentation has been raised to alleviate the reliance on expensive pixel-wise annotations. It leverages a labeled source domain dataset as well as unlabeled target domain images to learn a segmentation network. In this paper, we observe two main issues of the existing domain-invariant learning framework. (1) Being distracted by the feature distribution alignment, the network cannot focus on the segmentation task. (2) Fitting source domain data well would compromise the target domain performance. To address these issues, we propose DecoupleNet that alleviates source domain overfitting and enables the final model to focus more on the segmentation task. Furthermore, we put forward Self-Discrimination (SD) and introduce an auxiliary classifier to learn more discriminative target domain features with pseudo labels. Finally, we propose Online Enhanced Self-Training (OEST) to contextually enhance the quality of pseudo labels in an online manner. Experiments show our method outperforms existing state-of-the-art methods, and extensive ablation studies verify the effectiveness of each component. Code is available at https://github.com/dvlab-research/DecoupleNet.

preprint2022arXiv

Generalized Few-shot Semantic Segmentation

Training semantic segmentation models requires a large amount of finely annotated data, making it hard to quickly adapt to novel classes not satisfying this condition. Few-Shot Segmentation (FS-Seg) tackles this problem with many constraints. In this paper, we introduce a new benchmark, called Generalized Few-Shot Semantic Segmentation (GFS-Seg), to analyze the generalization ability of simultaneously segmenting the novel categories with very few examples and the base categories with sufficient examples. It is the first study showing that previous representative state-of-the-art FS-Seg methods fall short in GFS-Seg and the performance discrepancy mainly comes from the constrained setting of FS-Seg. To make GFS-Seg tractable, we set up a GFS-Seg baseline that achieves decent performance without structural change on the original model. Then, since context is essential for semantic segmentation, we propose the Context-Aware Prototype Learning (CAPL) that significantly improves performance by 1) leveraging the co-occurrence prior knowledge from support samples, and 2) dynamically enriching contextual information to the classifier, conditioned on the content of each query image. Both two contributions are experimentally shown to have substantial practical merit. Extensive experiments on Pascal-VOC and COCO manifest the effectiveness of CAPL, and CAPL generalizes well to FS-Seg by achieving competitive performance. Code is available at https://github.com/dvlab-research/GFS-Seg.

preprint2022arXiv

Peridynamic stress is the static first Piola-Kirchhoff Virial stress

The peridynamic stress formula proposed by Lehoucq and Silling [1, 2] is cumbersome to be implemented in numerical computations. Here, we show that the peridynamic stress tensor has the exact mathematical expression as that of the first Piola-Kirchhoff static Virial stress originated from Irving-Kirkwood-Noll formalism [3, 4] through the Hardy-Murdoch procedure [5, 6], which offers a simple and clear expression for numerical calculations of peridynamic stress. Several numerical verifications have been carried out to validate the accuracy of proposed peridynamic stress formula in predicting the stress states in the vicinity of the crack tip and other sources of stress concentration. The peridynamic stress is evaluated within the bond-based peridynamics with prototype microelastic brittle (PMB) material model. It is found that the PMB material model may exhibit nonlinear constitutive behaviors at large deformations. The stress fields calculated through the proposed peridynamic stress formula show good agreements with finite element analysis results, analytical solutions, and experimental data, demonstrating the promising potential of derived peridynamic stress formula in simulating the stress states of problems with discontinuities, especially in the bond-based peridynamics.

preprint2022arXiv

Stratified Transformer for 3D Point Cloud Segmentation

3D point cloud segmentation has made tremendous progress in recent years. Most current methods focus on aggregating local features, but fail to directly model long-range dependencies. In this paper, we propose Stratified Transformer that is able to capture long-range contexts and demonstrates strong generalization ability and high performance. Specifically, we first put forward a novel key sampling strategy. For each query point, we sample nearby points densely and distant points sparsely as its keys in a stratified way, which enables the model to enlarge the effective receptive field and enjoy long-range contexts at a low computational cost. Also, to combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information, which facilitates convergence and boosts performance. Besides, we adopt contextual relative position encoding to adaptively capture position information. Finally, a memory-efficient implementation is introduced to overcome the issue of varying point numbers in each window. Extensive experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets. Code is available at https://github.com/dvlab-research/Stratified-Transformer.

preprint2020arXiv

CoinMagic: A Differential Privacy Framework for Ring Signature Schemes

By allowing users to obscure their transactions via including "mixins" (chaff coins), ring signature schemes have been widely used to protect a sender's identity of a transaction in privacy-preserving blockchain systems, like Monero and Bytecoin. However, recent works point out that the existing ring signature scheme is vulnerable to the "chain-reaction" analysis (i.e., the spent coin in a given ring signature can be deduced through elimination). Especially, when the diversity of mixins is low, the spent coin will have a high risk to be detected. To overcome the weakness, the ring signature should be consisted of a set of mixins with high diversity and produce observations having "similar" distributions for any two coins. In this paper, we propose a notion, namely $ε$-coin-indistinguishability ($ε$-CI), to formally define the "similar" distribution guaranteed through a differential privacy scheme. Then, we formally define the CI-aware mixins selection problem with disjoint-superset constraint (CIA-MS-DS), which aims to find a mixin set that has maximal diversity and satisfies the constraints of $ε$-CI and the budget. In CIA-MS-DS, each ring signature is either disjoint with or the superset of its preceding ring signatures. We prove that CIA-MS-DS is NP-hard and thus intractable. To solve the CIA-MS-DS problem, we propose two approximation algorithms, namely the Progressive Algorithm and the Game Theoretic Algorithm, with theoretic guarantees. Through extensive experiments on both real data sets and synthetic data sets, we demonstrate the efficiency and the effectiveness of our approaches.