Researcher profile

Salman Toor

Salman Toor contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

FedQAS: Privacy-aware machine reading comprehension with federated learning

Machine reading comprehension (MRC) of text data is one important task in Natural Language Understanding. It is a complex NLP problem with a lot of ongoing research fueled by the release of the Stanford Question Answering Dataset (SQuAD) and Conversational Question Answering (CoQA). It is considered to be an effort to teach computers how to "understand" a text, and then to be able to answer questions about it using deep learning. However, until now large-scale training on private text data and knowledge sharing has been missing for this NLP task. Hence, we present FedQAS, a privacy-preserving machine reading system capable of leveraging large-scale private data without the need to pool those datasets in a central location. The proposed approach combines transformer models and federated learning technologies. The system is developed using the FEDn framework and deployed as a proof-of-concept alliance initiative. FedQAS is flexible, language-agnostic, and allows intuitive participation and execution of local model training. In addition, we present the architecture and implementation of the system, as well as provide a reference evaluation based on the SQUAD dataset, to showcase how it overcomes data privacy issues and enables knowledge sharing between alliance members in a Federated learning setting.

preprint2022arXiv

Scalable federated machine learning with FEDn

Federated machine learning has great promise to overcome the input privacy challenge in machine learning. The appearance of several projects capable of simulating federated learning has led to a corresponding rapid progress on algorithmic aspects of the problem. However, there is still a lack of federated machine learning frameworks that focus on fundamental aspects such as scalability, robustness, security, and performance in a geographically distributed setting. To bridge this gap we have designed and developed the FEDn framework. A main feature of FEDn is to support both cross-device and cross-silo training settings. This makes FEDn a powerful tool for researching a wide range of machine learning applications in a realistic setting.

preprint2022arXiv

To test, or not to test: A proactive approach for deciding complete performance test initiation

Software performance testing requires a set of inputs that exercise different sections of the code to identify performance issues. However, running tests on a large set of inputs can be a very time-consuming process. It is even more problematic when test inputs are constantly growing, which is the case with a large-scale scientific organization such as CERN where the process of performing scientific experiment generates plethora of data that is analyzed by physicists leading to new scientific discoveries. Therefore, in this article, we present a test input minimization approach based on a clustering technique to handle the issue of testing on growing data. Furthermore, we use clustering information to propose an approach that recommends the tester to decide when to run the complete test suite for performance testing. To demonstrate the efficacy of our approach, we applied it to two different code updates of a web service which is used at CERN and we found that the recommendation for performance test initiation made by our approach for an update with bottleneck is valid.

preprint2021arXiv

Understanding the Quality of Container Security Vulnerability Detection Tools

Virtualization enables information and communications technology industry to better manage computing resources. In this regard, improvements in virtualization approaches together with the need for consistent runtime environment, lower overhead and smaller package size has led to the growing adoption of containers. This is a technology, which packages an application, its dependencies and Operating System (OS) to run as an isolated unit. However, the pressing concern with the use of containers is its susceptibility to security attacks. Consequently, a number of container scanning tools are available for detecting container security vulnerabilities. Therefore, in this study, we investigate the quality of existing container scanning tools by proposing two metrics that reflects coverage and accuracy. We analyze 59 popular public container images for Java applications hosted on DockerHub using different container scanning tools (such as Clair, Anchore, and Microscanner). Our findings show that existing container scanning approach does not detect application package vulnerabilities. Furthermore, existing tools do not have high accuracy, since 34% vulnerabilities are being missed by the best performing tool. Finally, we also demonstrate quality of Docker images for Java applications hosted on DockerHub by assessing complete vulnerability landscape i.e., number of vulnerabilities detected in images.

preprint2020arXiv

Smart Resource Management for Data Streaming using an Online Bin-packing Strategy

Data stream processing frameworks provide reliable and efficient mechanisms for executing complex workflows over large datasets. A common challenge for the majority of currently available streaming frameworks is efficient utilization of resources. Most frameworks use static or semi-static settings for resource utilization that work well for established use cases but lead to marginal improvements for unseen scenarios. Another pressing issue is the efficient processing of large individual objects such as images and matrices typical for scientific datasets. HarmonicIO has proven to be a good solution for streams of relatively large individual objects, as demonstrated in a benchmark comparison with the Spark and Kafka streaming frameworks. We here present an extension of the HarmonicIO framework based on the online bin-packing algorithm, to allow for efficient utilization of resources. Based on a real world use case from large-scale microscopy pipelines, we compare results of the new system to Spark's auto-scaling mechanism.