Researcher profile

Gang Tan

Gang Tan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

On the Robustness of Fairness Practices: A Causal Framework for Systematic Evaluation

Machine learning (ML) algorithms are increasingly deployed to make critical decisions in socioeconomic applications such as finance, criminal justice, and autonomous driving. However, due to their data-driven and pattern-seeking nature, ML algorithms may develop decision logic that disproportionately distributes opportunities, benefits, resources, or information among different population groups, potentially harming marginalized communities. In response to such fairness concerns, the software engineering and ML communities have made significant efforts to establish the best practices for creating fair ML software. These include fairness interventions for training ML models, such as including sensitive features, selecting non-sensitive attributes, and applying bias mitigators. But how reliably can software professionals tasked with developing data-driven systems depend on these recommendations? And how well do these practices generalize in the presence of faulty labels, missing data, or distribution shifts? These questions form the core theme of this paper.

preprint2026arXiv

SandCell: Sandboxing Rust Beyond Unsafe Code

Rust is a modern systems programming language that ensures memory safety by enforcing ownership and borrowing rules at compile time. While the unsafe keyword allows programmers to bypass these restrictions, it introduces significant risks. Various approaches for isolating unsafe code to protect safe Rust from vulnerabilities have been proposed, yet these methods provide only fixed isolation boundaries and do not accommodate expressive policies that require sandboxing both safe and unsafe code. This paper presents SandCell for flexible and lightweight isolation in Rust by leveraging existing syntactic boundaries. SandCell allows programmers to specify which components to sandbox with minimal annotation effort, enabling fine-grained control over isolation. The system also introduces novel techniques to minimize overhead when transferring data between sandboxes. Our evaluation demonstrates SandCell's effectiveness in preventing vulnerabilities across various Rust applications while maintaining reasonable performance overheads.

preprint2022arXiv

$μ$Dep: Mutation-based Dependency Generation for Precise Taint Analysis on Android Native Code

The existence of native code in Android apps plays an important role in triggering inconspicuous propagation of secrets and circumventing malware detection. However, the state-of-the-art information-flow analysis tools for Android apps all have limited capabilities of analyzing native code. Due to the complexity of binary-level static analysis, most static analyzers choose to build conservative models for a selected portion of native code. Though the recent inter-language analysis improves the capability of tracking information flow in native code, it is still far from attaining similar effectiveness of the state-of-the-art information-flow analyzers that focus on non-native Java methods. To overcome the above constraints, we propose a new analysis framework, $μ$Dep, to detect sensitive information flows of the Android apps containing native code. In this framework, we combine a control-flow based static binary analysis with a mutation-based dynamic analysis to model the tainting behaviors of native code in the apps. Based on the result of the analyses, $μ$Dep conducts a stub generation for the related native functions to facilitate the state-of-the-art analyzer DroidSafe with fine-grained tainting behavior summaries of native code. The experimental results show that our framework is competitive on the accuracy, and effective in analyzing the information flows in real-world apps and malware compared with the state-of-the-art inter-language static analysis.

preprint2022arXiv

Building a Privacy-Preserving Smart Camera System

Millions of consumers depend on smart camera systems to remotely monitor their homes and businesses. However, the architecture and design of popular commercial systems require users to relinquish control of their data to untrusted third parties, such as service providers (e.g., the cloud). Third parties therefore can (and in some instances have) access the video footage without the users' knowledge or consent -- violating the core tenet of user privacy. In this paper, we present CaCTUs, a privacy-preserving smart Camera system Controlled Totally by Users. CaCTUs returns control to the user; the root of trust begins with the user and is maintained through a series of cryptographic protocols, designed to support popular features, such as sharing, deleting, and viewing videos live. We show that the system can support live streaming with a latency of 2s at a frame rate of 10fps and a resolution of 480p. In so doing, we demonstrate that it is feasible to implement a performant smart-camera system that leverages the convenience of a cloud-based model while retaining the ability to control access to (private) data.

preprint2022arXiv

Fairness-aware Configuration of Machine Learning Libraries

This paper investigates the parameter space of machine learning (ML) algorithms in aggravating or mitigating fairness bugs. Data-driven software is increasingly applied in social-critical applications where ensuring fairness is of paramount importance. The existing approaches focus on addressing fairness bugs by either modifying the input dataset or modifying the learning algorithms. On the other hand, the selection of hyperparameters, which provide finer controls of ML algorithms, may enable a less intrusive approach to influence the fairness. Can hyperparameters amplify or suppress discrimination present in the input dataset? How can we help programmers in detecting, understanding, and exploiting the role of hyperparameters to improve the fairness? We design three search-based software testing algorithms to uncover the precision-fairness frontier of the hyperparameter space. We complement these algorithms with statistical debugging to explain the role of these parameters in improving fairness. We implement the proposed approaches in the tool Parfait-ML (PARameter FAIrness Testing for ML Libraries) and show its effectiveness and utility over five mature ML algorithms as used in six social-critical applications. In these applications, our approach successfully identified hyperparameters that significantly improve (vis-a-vis the state-of-the-art techniques) the fairness without sacrificing precision. Surprisingly, for some algorithms (e.g., random forest), our approach showed that certain configuration of hyperparameters (e.g., restricting the search space of attributes) can amplify biases across applications. Upon further investigation, we found intuitive explanations of these phenomena, and the results corroborate similar observations from the literature.

preprint2022arXiv

Privacy-Preserving Protocols for Smart Cameras and Other IoT Devices

Millions of consumers depend on smart camera systems to remotely monitor their homes and businesses. However, the architecture and design of popular commercial systems require users to relinquish control of their data to untrusted third parties, such as service providers (e.g., the cloud). Third parties therefore can (and in some instances have) access the video footage without the users' knowledge or consent -- violating the core tenet of user privacy. In this paper, we introduce CaCTUs, a privacy-preserving smart camera system that returns control to the user; the root of trust begins with the user and is maintained through a series of cryptographic protocols designed to support popular features, such as sharing, deleting, and viewing videos live. In so doing, we demonstrate that it is feasible to implement a performant smart-camera system that leverages the convenience of a cloud-based model while retaining the ability to control access to (private) data. We then discuss how our techniques and protocols can also be extended to privacy-preserving designs of other IoT devices recording time series data.

preprint2020arXiv

IoTRepair: Systematically Addressing Device Faults in Commodity IoT (Extended Paper)

IoT devices are decentralized and deployed in un-stable environments, which causes them to be prone to various kinds of faults, such as device failure and network disruption. Yet, current IoT platforms require programmers to handle faults manually, a complex and error-prone task. In this paper, we present IoTRepair, a fault-handling system for IoT that (1)integrates a fault identification module to track faulty devices,(2) provides a library of fault-handling functions for effectively handling different fault types, (3) provides a fault handler on top of the library for autonomous IoT fault handling, with user and developer configuration as input. Through an evaluation in a simulated lab environment and with various fault injectio nmethods,IoTRepair is compared with current fault-handling solutions. The fault handler reduces the incorrect states on average 50.01%, which corresponds to less unsafe and insecure device states. Overall, through a systematic design of an IoT fault handler, we provide users flexibility and convenience in handling complex IoT fault handling, allowing safer IoT environments.

preprint2020arXiv

Methodologies for Quantifying (Re-)randomization Security and Timing under JIT-ROP

Just-in-time return-oriented programming (JIT-ROP) allows one to dynamically discover instruction pages and launch code reuse attacks, effectively bypassing most fine-grained address space layout randomization (ASLR) protection. However, in-depth questions regarding the impact of code (re-)randomization on code reuse attacks have not been studied. For example, how would one compute the re-randomization interval effectively by considering the speed of gadget convergence to defeat JIT-ROP attacks?; how do starting pointers in JIT-ROP impact gadget availability and gadget convergence time?; what impact do fine-grained code randomizations have on the Turing-complete expressive power of JIT-ROP payloads? We conduct a comprehensive measurement study on the effectiveness of fine-grained code randomization schemes, with 5 tools, 20 applications including 6 browsers, 1 browser engine, and 25 dynamic libraries. We provide methodologies to measure JIT-ROP gadget availability, quality, and their Turing-complete expressiveness, as well as to empirically determine the upper bound of re-randomization intervals in re-randomization schemes using the Turing-complete (TC), priority, MOV TC, and payload gadget sets. Experiments show that the upper bound ranges from 1.5 to 3.5 seconds in our tested applications. Besides, our results show that locations of leaked pointers used in JIT-ROP attacks have no impacts on gadget availability, but have an impact on how fast attackers find gadgets. Our results also show that instruction-level single-round randomization thwarts current gadget finding techniques under the JIT-ROP threat model.