Source author record

Sudipta Chattopadhyay

Sudipta Chattopadhyay appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Machine Learning Artificial Intelligence Computation and Language Computer Vision Data Structures and Algorithms Robotics Software Engineering

Catalog footprint

What is connected

13works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

ChatIoT: Large Language Model-based Security Assistant for Internet of Things with Retrieval-Augmented Generation

Internet of Things (IoT) has gained widespread popularity, revolutionizing industries and daily life. However, it has also emerged as a prime target for attacks. Numerous efforts have been made to improve IoT security, and substantial IoT security and threat information, such as datasets and reports, have been developed. However, existing research often falls short in leveraging these insights to assist or guide users in harnessing IoT security practices in a clear and actionable way. In this paper, we propose ChatIoT, a large language model (LLM)-based IoT security assistant designed to disseminate IoT security and threat intelligence. By leveraging the versatile property of retrieval-augmented generation (RAG), ChatIoT successfully integrates the advanced language understanding and reasoning capabilities of LLM with fast-evolving IoT security information. Moreover, we develop an end-to-end data processing toolkit to handle heterogeneous datasets. This toolkit converts datasets of various formats into retrievable documents and optimizes chunking strategies for efficient retrieval. Additionally, we define a set of common use case specifications to guide the LLM in generating answers aligned with users' specific needs and expertise levels. Finally, we implement a prototype of ChatIoT and conduct extensive experiments with different LLMs, such as LLaMA3, LLaMA3.1, and GPT-4o. Experimental evaluations demonstrate that ChatIoT can generate more reliable, relevant, and technical in-depth answers for most use cases. When evaluating the answers with LLaMA3:70B, ChatIoT improves the above metrics by over 10% on average, particularly in relevance and technicality, compared to using LLMs alone.

preprint2023arXiv

Towards Backdoor Attacks and Defense in Robust Machine Learning Models

The introduction of robust optimisation has pushed the state-of-the-art in defending against adversarial attacks. Notably, the state-of-the-art projected gradient descent (PGD)-based training method has been shown to be universally and reliably effective in defending against adversarial inputs. This robustness approach uses PGD as a reliable and universal "first-order adversary". However, the behaviour of such optimisation has not been studied in the light of a fundamentally different class of attacks called backdoors. In this paper, we study how to inject and defend against backdoor attacks for robust models trained using PGD-based robust optimisation. We demonstrate that these models are susceptible to backdoor attacks. Subsequently, we observe that backdoors are reflected in the feature representation of such models. Then, this observation is leveraged to detect such backdoor-infected models via a detection technique called AEGIS. Specifically, given a robust Deep Neural Network (DNN) that is trained using PGD-based first-order adversarial training approach, AEGIS uses feature clustering to effectively detect whether such DNNs are backdoor-infected or clean. In our evaluation of several visible and hidden backdoor triggers on major classification tasks using CIFAR-10, MNIST and FMNIST datasets, AEGIS effectively detects PGD-trained robust DNNs infected with backdoors. AEGIS detects such backdoor-infected models with 91.6% accuracy (11 out of 12 tested models), without any false positives. Furthermore, AEGIS detects the targeted class in the backdoor-infected model with a reasonably low (11.1%) false positive rate. Our investigation reveals that salient features of adversarially robust DNNs could be promising to break the stealthy nature of backdoor attacks.

preprint2022arXiv

A Systematic Survey of Attack Detection and Prevention in Connected and Autonomous Vehicles

The number of Connected and Autonomous Vehicles (CAVs) is increasing rapidly in various smart transportation services and applications, considering many benefits to society, people, and the environment. Several research surveys for CAVs were conducted by primarily focusing on various security threats and vulnerabilities in the domain of CAVs to classify different types of attacks, impacts of attacks, attack features, cyber-risk, defense methodologies against attacks, and safety standards. However, the importance of attack detection and prevention approaches for CAVs has not been discussed extensively in the state-of-the-art surveys, and there is a clear gap in the existing literature on such methodologies to detect new and conventional threats and protect the CAV systems from unexpected hazards on the road. Some surveys have a limited discussion on Attacks Detection and Prevention Systems (ADPS), but such surveys provide only partial coverage of different types of ADPS for CAVs. Furthermore, there is a scope for discussing security, privacy, and efficiency challenges in ADPS that can give an overview of important security and performance attributes. This survey paper, therefore, presents the significance of CAVs in the market, potential challenges in CAVs, key requirements of essential security and privacy properties, various capabilities of adversaries, possible attacks in CAVs, and performance evaluation parameters for ADPS. An extensive analysis is discussed of different ADPS categories for CAVs and state-of-the-art research works based on each ADPS category that gives the latest findings in this research domain. This survey also discusses crucial and open security research problems that are required to be focused on the secure deployment of CAVs in the market.

preprint2022arXiv

AequeVox: Automated Fairness Testing of Speech Recognition Systems

Automatic Speech Recognition (ASR) systems have become ubiquitous. They can be found in a variety of form factors and are increasingly important in our daily lives. As such, ensuring that these systems are equitable to different subgroups of the population is crucial. In this paper, we introduce, AequeVox, an automated testing framework for evaluating the fairness of ASR systems. AequeVox simulates different environments to assess the effectiveness of ASR systems for different populations. In addition, we investigate whether the chosen simulations are comprehensible to humans. We further propose a fault localization technique capable of identifying words that are not robust to these varying environments. Both components of AequeVox are able to operate in the absence of ground truth data. We evaluated AequeVox on speech from four different datasets using three different commercial ASRs. Our experiments reveal that non-native English, female and Nigerian English speakers generate 109%, 528.5% and 156.9% more errors, on average than native English, male and UK Midlands speakers, respectively. Our user study also reveals that 82.9% of the simulations (employed through speech transformations) had a comprehensibility rating above seven (out of ten), with the lowest rating being 6.78. This further validates the fairness violations discovered by AequeVox. Finally, we show that the non-robust words, as predicted by the fault localization technique embodied in AequeVox, show 223.8% more errors than the predicted robust words across all ASRs.

preprint2022arXiv

Astraea: Grammar-based Fairness Testing

Software often produces biased outputs. In particular, machine learning (ML) based software are known to produce erroneous predictions when processing discriminatory inputs. Such unfair program behavior can be caused by societal bias. In the last few years, Amazon, Microsoft and Google have provided software services that produce unfair outputs, mostly due to societal bias (e.g. gender or race). In such events, developers are saddled with the task of conducting fairness testing. Fairness testing is challenging; developers are tasked with generating discriminatory inputs that reveal and explain biases. We propose a grammar-based fairness testing approach (called ASTRAEA) which leverages context-free grammars to generate discriminatory inputs that reveal fairness violations in software systems. Using probabilistic grammars, ASTRAEA also provides fault diagnosis by isolating the cause of observed software bias. ASTRAEA's diagnoses facilitate the improvement of ML fairness. ASTRAEA was evaluated on 18 software systems that provide three major natural language processing (NLP) services. In our evaluation, ASTRAEA generated fairness violations with a rate of ~18%. ASTRAEA generated over 573K discriminatory test cases and found over 102K fairness violations. Furthermore, ASTRAEA improves software fairness by ~76%, via model-retraining.

preprint2022arXiv

Model Agnostic Defence against Backdoor Attacks in Machine Learning

Machine Learning (ML) has automated a multitude of our day-to-day decision making domains such as education, employment and driving automation. The continued success of ML largely depends on our ability to trust the model we are using. Recently, a new class of attacks called Backdoor Attacks have been developed. These attacks undermine the user's trust in ML models. In this work, we present NEO, a model agnostic framework to detect and mitigate such backdoor attacks in image classification ML models. For a given image classification model, our approach analyses the inputs it receives and determines if the model is backdoored. In addition to this feature, we also mitigate these attacks by determining the correct predictions of the poisoned images. An appealing feature of NEO is that it can, for the first time, isolate and reconstruct the backdoor trigger. NEO is also the first defence methodology, to the best of our knowledge that is completely blackbox. We have implemented NEO and evaluated it against three state of the art poisoned models. These models include highly critical applications such as traffic sign detection (USTS) and facial detection. In our evaluation, we show that NEO can detect $\approx$88% of the poisoned inputs on average and it is as fast as 4.4 ms per input image. We also reconstruct the poisoned input for the user to effectively test their systems.

preprint2021arXiv

Repairing Adversarial Texts through Perturbation

It is known that neural networks are subject to attacks through adversarial perturbations, i.e., inputs which are maliciously crafted through perturbations to induce wrong predictions. Furthermore, such attacks are impossible to eliminate, i.e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training. Multiple approaches have been developed to detect and reject such adversarial inputs, mostly in the image domain. Rejecting suspicious inputs however may not be always feasible or ideal. First, normal inputs may be rejected due to false alarms generated by the detection algorithm. Second, denial-of-service attacks may be conducted by feeding such systems with adversarial inputs. To address the gap, in this work, we propose an approach to automatically repair adversarial texts at runtime. Given a text which is suspected to be adversarial, we novelly apply multiple adversarial perturbation methods in a positive way to identify a repair, i.e., a slightly mutated but semantically equivalent text that the neural network correctly classifies. Our approach has been experimented with multiple models trained for natural language processing tasks and the results show that our approach is effective, i.e., it successfully repairs about 80\% of the adversarial texts. Furthermore, depending on the applied perturbation method, an adversarial text could be repaired in as short as one second on average.

preprint2020arXiv

Circ-Tree: A B+-Tree Variant with Circular Design for Persistent Memory

Several B+-tree variants have been developed to exploit the performance potential of byte-addressable non-volatile memory (NVM). In this paper, we attentively investigate the properties of B+-tree and find that, a conventional B+-tree node is a linear structure in which key-value (KV) pairs are maintained from the zero offset of the node. These pairs are shifted in a unidirectional fashion for insertions and deletions. Inserting and deleting one KV pair may inflict a large amount of write amplifications due to shifting KV pairs. This badly impairs the performance of in-NVM B+-tree. In this paper, we propose a novel circular design for B+-tree. With regard to NVM's byte-addressability, our Circ-tree design embraces tree nodes in a circular structure without a fixed base address, and bidirectionally shifts KV pairs in a node for insertions and deletions to minimize write amplifications. We have implemented a prototype for Circ-Tree and conducted extensive experiments. Experimental results show that Circ-Tree significantly outperforms two state-of-the-art in-NVM B+-tree variants, i.e., NV-tree and FAST+FAIR, by up to 1.6x and 8.6x, respectively, in terms of write performance. The end-to-end comparison by running YCSB to KV store systems built on NV-tree, FAST+FAIR, and Circ-Tree reveals that Circ-Tree yields up to 29.3% and 47.4% higher write performance, respectively, than NV-tree and FAST+FAIR.

preprint2020arXiv

Securing Autonomous Service Robots through Fuzzing, Detection, and Mitigation

Autonomous service robots share social spaces with humans, usually working together for domestic or professional tasks. Cyber security breaches in such robots undermine the trust between humans and robots. In this paper, we investigate how to apprehend and inflict security threats at the design and implementation stage of a movable autonomous service robot. To this end, we leverage the idea of directed fuzzing and design RoboFuzz that systematically tests an autonomous service robot in line with the robot's states and the surrounding environment. The methodology of RoboFuzz is to study critical environmental parameters affecting the robot's state transitions and subject the robot control program with rational but harmful sensor values so as to compromise the robot. Furthermore, we develop detection and mitigation algorithms to counteract the impact of RoboFuzz. The difficulties mainly lie in the trade-off among limited computation resources, timely detection and the retention of work efficiency in mitigation. In particular, we propose detection and mitigation methods that take advantage of historical records of obstacles to detect inconsistent obstacle appearances regarding untrustworthy sensor values and navigate the movable robot to continue moving so as to carry on a planned task. By doing so, we manage to maintain a low cost for detection and mitigation but also retain the robot's work efficacy. We have prototyped the bundle of RoboFuzz, detection and mitigation algorithms in a real-world movable robot. Experimental results confirm that RoboFuzz makes a success rate of up to 93.3% in imposing concrete threats to the robot while the overall loss of work efficacy is merely 4.1% at the mitigation mode.

preprint2020arXiv

STITCHER: Correlating Digital Forensic Evidence on Internet-of-Things Devices

The increasing adoption of Internet-of-Things (IoT) devices present new challenges to digital forensic investigators and law enforcement agencies when investigation into cybercrime on these new platforms are required. However, there has been no formal study to document actual challenges faced by investigators and whether existing tools help them in their work. Prior issues such as the correlation and consistency problem in digital forensic evidence have also become a pressing concern in light of numerous evidence sources from IoT devices. Motivated by these observations, we conduct a user study with 39 digital forensic investigators from both public and private sectors to document the challenges they faced in traditional and IoT digital forensics. We also created a tool, STITCHER, that addresses the technical challenges faced by investigators when handling IoT digital forensics investigation. We simulated an IoT crime that mimics sophisticated cybercriminals and invited our user study participants to utilize STITCHER to investigate the crime. The efficacy of STITCHER is confirmed by our study results where 96.2% of users indicated that STITCHER assisted them in handling the crime, and 61.5% of users who used STITCHER with its full features solved the crime completely.

preprint2019arXiv

KLEESPECTRE: Detecting Information Leakage through Speculative Cache Attacks via Symbolic Execution

Spectre attacks disclosed in early 2018 expose data leakage scenarios via cache side channels. Specifically, speculatively executed paths due to branch mis-prediction may bring secret data into the cache which are then exposed via cache side channels even after the speculative execution is squashed. Symbolic execution is a well-known test generation method to cover program paths at the level of the application software. In this paper, we extend symbolic execution with modelingof cache and speculative execution. Our tool KLEESPECTRE, built on top of the KLEE symbolic execution engine, can thus provide a testing engine to check for the data leakage through cache side-channel as shown via Spectre attacks. Our symbolic cache model can verify whether the sensitive data leakage due to speculative execution can be observed by an attacker at a given program point. Our experiments show that KLEESPECTREcan effectively detect data leakage along speculatively executed paths and our cache model can further make the leakage detection much more precise.

preprint2019arXiv

Road Context-aware Intrusion Detection System for Autonomous Cars

Security is of primary importance to vehicles. The viability of performing remote intrusions onto the in-vehicle network has been manifested. In regard to unmanned autonomous cars, limited work has been done to detect intrusions for them while existing intrusion detection systems (IDSs) embrace limitations against strong adversaries. In this paper, we consider the very nature of autonomous car and leverage the road context to build a novel IDS, named Road context-aware IDS (RAIDS). When a computer-controlled car is driving through continuous roads, road contexts and genuine frames transmitted on the car's in-vehicle network should resemble a regular and intelligible pattern. RAIDS hence employs a lightweight machine learning model to extract road contexts from sensory information (e.g., camera images and distance sensor values) that are used to generate control signals for maneuvering the car. With such ongoing road context, RAIDS validates corresponding frames observed on the in-vehicle network. Anomalous frames that substantially deviate from road context will be discerned as intrusions. We have implemented a prototype of RAIDS with neural networks, and conducted experiments on a Raspberry Pi with extensive datasets and meaningful intrusion cases. Evaluations show that RAIDS significantly outperforms state-of-the-art IDS without using road context by up to 99.9% accuracy and short response time.

preprint2016arXiv

Quantifying the Information Leak in Cache Attacks through Symbolic Execution

Cache timing attacks allow attackers to infer the properties of a secret execution by observing cache hits and misses. But how much information can actually leak through such attacks? For a given program, a cache model, and an input, our CHALICE framework leverages symbolic execution to compute the amount of information that can possibly leak through cache attacks. At the core of CHALICE is a novel approach to quantify information leak that can highlight critical cache side-channel leaks on arbitrary binary code. In our evaluation on real-world programs from OpenSSL and Linux GDK libraries, CHALICE effectively quantifies information leaks: For an AES-128 implementation on Linux, for instance, CHALICE finds that a cache attack can leak as much as 127 out of 128 bits of the encryption key.

Sudipta Chattopadhyay

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

ChatIoT: Large Language Model-based Security Assistant for Internet of Things with Retrieval-Augmented Generation

Towards Backdoor Attacks and Defense in Robust Machine Learning Models

A Systematic Survey of Attack Detection and Prevention in Connected and Autonomous Vehicles

AequeVox: Automated Fairness Testing of Speech Recognition Systems

Astraea: Grammar-based Fairness Testing

Model Agnostic Defence against Backdoor Attacks in Machine Learning

Repairing Adversarial Texts through Perturbation

Circ-Tree: A B+-Tree Variant with Circular Design for Persistent Memory

Securing Autonomous Service Robots through Fuzzing, Detection, and Mitigation

STITCHER: Correlating Digital Forensic Evidence on Internet-of-Things Devices

KLEESPECTRE: Detecting Information Leakage through Speculative Cache Attacks via Symbolic Execution

Road Context-aware Intrusion Detection System for Autonomous Cars

Quantifying the Information Leak in Cache Attacks through Symbolic Execution