Researcher profile

Sarah M. Erfani

Sarah M. Erfani contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models

Self-supervised speech models (S3Ms) achieve strong downstream performance, yet their learned representations remain poorly understood under natural and adversarial perturbations. Prior studies rely on representation similarity or global dimensionality, offering limited visibility into local geometric changes. We ask: how do perturbations deform local geometry, and do these shifts track downstream automatic speech recognition (ASR) degradation? To address this, we present GRIDS, a framework using Local Intrinsic Dimensionality (LID) across layer-wise representations in WavLM and wav2vec 2.0. We find that LID increases for all low signal-to noise ratio (SNR) perturbations and diverges at high SNR: benign noise converges toward the clean profile, while adversarial inputs retain early-layer LID elevation. We show LID elevation co-occurs with increased WER, and that layer-wise LID features enable anomaly detection (AUROC 0.78-1.00), opening the door to transcript-free monitoring in S3Ms.

preprint2022arXiv

Performance analysis of coreset selection for quantum implementation of K-Means clustering algorithm

Quantum computing is anticipated to offer immense computational capabilities which could provide efficient solutions to many data science problems. However, the current generation of quantum devices are small and noisy, which makes it difficult to process large data sets relevant for practical problems. Coreset selection aims to circumvent this problem by reducing the size of input data without compromising the accuracy. Recent work has shown that coreset selection can help to implement quantum K-Means clustering problem. However, the impact of coreset selection on the performance of quantum K-Means clustering has not been explored. In this work, we compare the relative performance of two coreset techniques (BFL16 and ONESHOT), and the size of coreset construction in each case, with respect to a variety of data sets and layout the advantages and limitations of coreset selection in implementing quantum algorithms. We also investigated the effect of depolarisation quantum noise and bit-flip error, and implemented the Quantum AutoEncoder technique for surpassing the noise effect. Our work provides useful insights for future implementation of data science algorithms on near-term quantum devices where problem size has been reduced by coreset selection.

preprint2021arXiv

A Deep Adversarial Model for Suffix and Remaining Time Prediction of Event Sequences

Event suffix and remaining time prediction are sequence to sequence learning tasks. They have wide applications in different areas such as economics, digital health, business process management and IT infrastructure monitoring. Timestamped event sequences contain ordered events which carry at least two attributes: the event's label and its timestamp. Suffix and remaining time prediction are about obtaining the most likely continuation of event labels and the remaining time until the sequence finishes, respectively. Recent deep learning-based works for such predictions are prone to potentially large prediction errors because of closed-loop training (i.e., the next event is conditioned on the ground truth of previous events) and open-loop inference (i.e., the next event is conditioned on previously predicted events). In this work, we propose an encoder-decoder architecture for open-loop training to advance the suffix and remaining time prediction of event sequences. To capture the joint temporal dynamics of events, we harness the power of adversarial learning techniques to boost prediction performance. We consider four real-life datasets and three baselines in our experiments. The results show improvements up to four times compared to the state of the art in suffix and remaining time prediction of event sequences, specifically in the realm of business process executions. We also show that the obtained improvements of adversarial training are superior compared to standard training under the same experimental setup.

preprint2020arXiv

Defending Distributed Classifiers Against Data Poisoning Attacks

Support Vector Machines (SVMs) are vulnerable to targeted training data manipulations such as poisoning attacks and label flips. By carefully manipulating a subset of training samples, the attacker forces the learner to compute an incorrect decision boundary, thereby cause misclassifications. Considering the increased importance of SVMs in engineering and life-critical applications, we develop a novel defense algorithm that improves resistance against such attacks. Local Intrinsic Dimensionality (LID) is a promising metric that characterizes the outlierness of data samples. In this work, we introduce a new approximation of LID called K-LID that uses kernel distance in the LID calculation, which allows LID to be calculated in high dimensional transformed spaces. We introduce a weighted SVM against such attacks using K-LID as a distinguishing characteristic that de-emphasizes the effect of suspicious data samples on the SVM decision boundary. Each sample is weighted on how likely its K-LID value is from the benign K-LID distribution rather than the attacked K-LID distribution. We then demonstrate how the proposed defense can be applied to a distributed SVM framework through a case study on an SDR-based surveillance system. Experiments with benchmark data sets show that the proposed defense reduces classification error rates substantially (10% on average).

preprint2020arXiv

Defending Regression Learners Against Poisoning Attacks

Regression models, which are widely used from engineering applications to financial forecasting, are vulnerable to targeted malicious attacks such as training data poisoning, through which adversaries can manipulate their predictions. Previous works that attempt to address this problem rely on assumptions about the nature of the attack/attacker or overestimate the knowledge of the learner, making them impractical. We introduce a novel Local Intrinsic Dimensionality (LID) based measure called N-LID that measures the local deviation of a given data point's LID with respect to its neighbors. We then show that N-LID can distinguish poisoned samples from normal samples and propose an N-LID based defense approach that makes no assumptions of the attacker. Through extensive numerical experiments with benchmark datasets, we show that the proposed defense mechanism outperforms the state of the art defenses in terms of prediction accuracy (up to 76% lower MSE compared to an undefended ridge model) and running time.