Source author record

Zubair Shafiq

Zubair Shafiq appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Machine Learning Computation and Language cs.CY Information Retrieval Networking and Internet Architecture Social and Information Networks

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Nowhere to Hide: Detecting Obfuscated Fingerprinting Scripts

As the web moves away from stateful tracking, browser fingerprinting is becoming more prevalent. Unfortunately, existing approaches to detect browser fingerprinting do not take into account potential evasion tactics such as code obfuscation. To address this gap, we investigate the robustness of a state-of-the-art fingerprinting detection approach against various off-the-shelf obfuscation tools. Overall, we find that the combination of static and dynamic analysis is robust against different types of obfuscation. While some obfuscators are able to induce false negatives in static analysis, dynamic analysis is still able detect these cases. Since obfuscation does not induce significant false positives, the combination of static and dynamic analysis is still able to accurately detect obfuscated fingerprinting scripts.

preprint2022arXiv

On The Robustness of Offensive Language Classifiers

Social media platforms are deploying machine learning based offensive language classification systems to combat hateful, racist, and other forms of offensive speech at scale. However, despite their real-world deployment, we do not yet comprehensively understand the extent to which offensive language classifiers are robust against adversarial attacks. Prior work in this space is limited to studying robustness of offensive language classifiers against primitive attacks such as misspellings and extraneous spaces. To address this gap, we systematically analyze the robustness of state-of-the-art offensive language classifiers against more crafty adversarial attacks that leverage greedy- and attention-based word selection and context-aware embeddings for word replacement. Our results on multiple datasets show that these crafty adversarial attacks can degrade the accuracy of offensive language classifiers by more than 50% while also being able to preserve the readability and meaning of the modified text.

preprint2022arXiv

YouTube, The Great Radicalizer? Auditing and Mitigating Ideological Biases in YouTube Recommendations

Recommendations algorithms of social media platforms are often criticized for placing users in "rabbit holes" of (increasingly) ideologically biased content. Despite these concerns, prior evidence on this algorithmic radicalization is inconsistent. Furthermore, prior work lacks systematic interventions that reduce the potential ideological bias in recommendation algorithms. We conduct a systematic audit of YouTube's recommendation system using a hundred thousand sock puppets to determine the presence of ideological bias (i.e., are recommendations aligned with users' ideology), its magnitude (i.e., are users recommended an increasing number of videos aligned with their ideology), and radicalization (i.e., are the recommendations progressively more extreme). Furthermore, we design and evaluate a bottom-up intervention to minimize ideological bias in recommendations without relying on cooperation from YouTube. We find that YouTube's recommendations do direct users -- especially right-leaning users -- to ideologically biased and increasingly radical content on both homepages and in up-next recommendations. Our intervention effectively mitigates the observed bias, leading to more recommendations to ideologically neutral, diverse, and dissimilar content, yet debiasing is especially challenging for right-leaning users. Our systematic assessment shows that while YouTube recommendations lead to ideological bias, such bias can be mitigated through our intervention.

preprint2020arXiv

A Girl Has A Name: Detecting Authorship Obfuscation

Authorship attribution aims to identify the author of a text based on the stylometric analysis. Authorship obfuscation, on the other hand, aims to protect against authorship attribution by modifying a text's style. In this paper, we evaluate the stealthiness of state-of-the-art authorship obfuscation methods under an adversarial threat model. An obfuscator is stealthy to the extent an adversary finds it challenging to detect whether or not a text modified by the obfuscator is obfuscated - a decision that is key to the adversary interested in authorship attribution. We show that the existing authorship obfuscation methods are not stealthy as their obfuscated texts can be identified with an average F1 score of 0.87. The reason for the lack of stealthiness is that these obfuscators degrade text smoothness, as ascertained by neural language models, in a detectable manner. Our results highlight the need to develop stealthy authorship obfuscation methods that can better protect the identity of an author seeking anonymity.

preprint2020arXiv

A4 : Evading Learning-based Adblockers

Efforts by online ad publishers to circumvent traditional ad blockers towards regaining fiduciary benefits, have been demonstrably successful. As a result, there have recently emerged a set of adblockers that apply machine learning instead of manually curated rules and have been shown to be more robust in blocking ads on websites including social media sites such as Facebook. Among these, AdGraph is arguably the state-of-the-art learning-based adblocker. In this paper, we develop A4, a tool that intelligently crafts adversarial samples of ads to evade AdGraph. Unlike the popular research on adversarial samples against images or videos that are considered less- to un-restricted, the samples that A4 generates preserve application semantics of the web page, or are actionable. Through several experiments we show that A4 can bypass AdGraph about 60% of the time, which surpasses the state-of-the-art attack by a significant margin of 84.3%; in addition, changes to the visual layout of the web page due to these perturbations are imperceptible. We envision the algorithmic framework proposed in A4 is also promising in improving adversarial attacks against other learning-based web applications with similar requirements.

preprint2020arXiv

CanaryTrap: Detecting Data Misuse by Third-Party Apps on Online Social Networks

Online social networks support a vibrant ecosystem of third-party apps that get access to personal information of a large number of users. Despite several recent high-profile incidents, methods to systematically detect data misuse by third-party apps on online social networks are lacking. We propose CanaryTrap to detect misuse of data shared with third-party apps. CanaryTrap associates a honeytoken to a user account and then monitors its unrecognized use via different channels after sharing it with the third-party app. We design and implement CanaryTrap to investigate misuse of data shared with third-party apps on Facebook. Specifically, we share the email address associated with a Facebook account as a honeytoken by installing a third-party app. We then monitor the received emails and use Facebook's ad transparency tool to detect any unrecognized use of the shared honeytoken. Our deployment of CanaryTrap to monitor 1,024 Facebook apps has uncovered multiple cases of misuse of data shared with third-party apps on Facebook including ransomware, spam, and targeted advertising.

preprint2020arXiv

Characterizing Smart Home IoT Traffic in the Wild

As the smart home IoT ecosystem flourishes, it is imperative to gain a better understanding of the unique challenges it poses in terms of management, security, and privacy. Prior studies are limited because they examine smart home IoT devices in testbed environments or at a small scale. To address this gap, we present a measurement study of smart home IoT devices in the wild by instrumenting home gateways and passively collecting real-world network traffic logs from more than 200 homes across a large metropolitan area in the United States. We characterize smart home IoT traffic in terms of its volume, temporal patterns, and external endpoints along with focusing on certain security and privacy concerns. We first show that traffic characteristics reflect the functionality of smart home IoT devices such as smart TVs generating high volume traffic to content streaming services following diurnal patterns associated with human activity. While the smart home IoT ecosystem seems fragmented, our analysis reveals that it is mostly centralized due to its reliance on a few popular cloud and DNS services. Our findings also highlight several interesting security and privacy concerns in smart home IoT ecosystem such as the need to improve policy-based access control for IoT traffic, lack of use of application layer encryption, and prevalence of third-party advertising and tracking services. Our findings have important implications for future research on improving management, security, and privacy of the smart home IoT ecosystem.

preprint2020arXiv

Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors

Browser fingerprinting is an invasive and opaque stateless tracking technique. Browser vendors, academics, and standards bodies have long struggled to provide meaningful protections against browser fingerprinting that are both accurate and do not degrade user experience. We propose FP-Inspector, a machine learning based syntactic-semantic approach to accurately detect browser fingerprinting. We show that FP-Inspector performs well, allowing us to detect 26% more fingerprinting scripts than the state-of-the-art. We show that an API-level fingerprinting countermeasure, built upon FP-Inspector, helps reduce website breakage by a factor of 2. We use FP-Inspector to perform a measurement study of browser fingerprinting on top-100K websites. We find that browser fingerprinting is now present on more than 10% of the top-100K websites and over a quarter of the top-10K websites. We also discover previously unreported uses of JavaScript APIs by fingerprinting scripts suggesting that they are looking to exploit APIs in new and unexpected ways.

Zubair Shafiq

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Nowhere to Hide: Detecting Obfuscated Fingerprinting Scripts

On The Robustness of Offensive Language Classifiers

YouTube, The Great Radicalizer? Auditing and Mitigating Ideological Biases in YouTube Recommendations

A Girl Has A Name: Detecting Authorship Obfuscation

A4 : Evading Learning-based Adblockers

CanaryTrap: Detecting Data Misuse by Third-Party Apps on Online Social Networks

Characterizing Smart Home IoT Traffic in the Wild

Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors