Source author record

Shaowei Zhu

Shaowei Zhu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Cryptography and Security Distributed, Parallel, and Cluster Computing Machine Learning math.NA Numerical Analysis

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

TTrace: Lightweight Error Checking and Diagnosis for Distributed Training

Distributed training is essential for scaling the training of large neural network models, such as large language models (LLMs), across thousands of GPUs. However, the complexity of distributed training programs makes them particularly prone to silent bugs, which do not produce explicit error signals but lead to incorrect training outcomes. Effectively detecting and localizing such silent bugs in distributed training is challenging. Common debugging practices based on monitoring training loss or gradient norm curves are indirect, inefficient, and provide no way to localize bugs. To address those challenges, we design and implement TTrace, the first systematic differential testing system for detecting and localizing silent bugs in distributed training. TTrace aligns intermediate tensors from distributed training with those from a trusted reference implementation. To properly compare the floating-point values in the corresponding tensors, we propose a novel mathematical analysis that provides a guideline for setting tolerances, enabling TTrace to distinguish bug-induced errors from numerical errors. Experimental results demonstrate that TTrace effectively detects 11 existing bugs and 3 new bugs in the widely used Megatron-LM framework, while requiring fewer than 10 lines of code changes. TTrace is effective in various training recipes, including low-precision recipes involving BF16 and FP8. Notably, a popular open-source training framework has already adopted the method proposed by TTrace in its development workflow.

preprint2023arXiv

Reversible Attack based on Local Visual Adversarial Perturbation

Adding perturbations to images can mislead classification models to produce incorrect results. Recently, researchers exploited adversarial perturbations to protect image privacy from retrieval by intelligent models. However, adding adversarial perturbations to images destroys the original data, making images useless in digital forensics and other fields. To prevent illegal or unauthorized access to sensitive image data such as human faces without impeding legitimate users, the use of reversible adversarial attack techniques is increasing. The original image can be recovered from its reversible adversarial examples. However, existing reversible adversarial attack methods are designed for traditional imperceptible adversarial perturbations and ignore the local visible adversarial perturbation. In this paper, we propose a new method for generating reversible adversarial examples based on local visible adversarial perturbation. The information needed for image recovery is embedded into the area beyond the adversarial patch by the reversible data hiding technique. To reduce image distortion, lossless compression and the B-R-G (bluered-green) embedding principle are adopted. Experiments on CIFAR-10 and ImageNet datasets show that the proposed method can restore the original images error-free while ensuring good attack performance.

preprint2022arXiv

Verifiable Access Control for Augmented Reality Localization and Mapping

Localization and mapping is a key technology for bridging the virtual and physical worlds in augmented reality (AR). Localization and mapping works by creating and querying maps made of anchor points that enable the overlay of these two worlds. As a result, information about the physical world is captured in the map and naturally gives rise to concerns around who can map physical spaces as well as who can access or modify the virtual ones. This paper discusses how we can provide access controls over virtual maps as a basic building block to enhance security and privacy of AR systems. In particular, we propose VACMaps: an access control system for localization and mapping using formal methods. VACMaps defines a domain-specific language that enables users to specify access control policies for virtual spaces. Access requests to virtual spaces are then evaluated against relevant policies in a way that preserves confidentiality and integrity of virtual spaces owned by the users. The precise semantics of the policies are defined by SMT formulas, which allow VACMaps to reason about properties of access policies automatically. An evaluation of VACMaps is provided using an AR testbed of a single-family home. We show that VACMaps is scalable in that it can run at practical speeds and that it can also reason about access control policies automatically to detect potential policy misconfigurations.