Source author record

Salman Toor

Salman Toor appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Machine Learning Software Engineering Artificial Intelligence Computation and Language Computational Engineering, Finance, and Science Cryptography and Security

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

FedQAS: Privacy-aware machine reading comprehension with federated learning

Machine reading comprehension (MRC) of text data is one important task in Natural Language Understanding. It is a complex NLP problem with a lot of ongoing research fueled by the release of the Stanford Question Answering Dataset (SQuAD) and Conversational Question Answering (CoQA). It is considered to be an effort to teach computers how to "understand" a text, and then to be able to answer questions about it using deep learning. However, until now large-scale training on private text data and knowledge sharing has been missing for this NLP task. Hence, we present FedQAS, a privacy-preserving machine reading system capable of leveraging large-scale private data without the need to pool those datasets in a central location. The proposed approach combines transformer models and federated learning technologies. The system is developed using the FEDn framework and deployed as a proof-of-concept alliance initiative. FedQAS is flexible, language-agnostic, and allows intuitive participation and execution of local model training. In addition, we present the architecture and implementation of the system, as well as provide a reference evaluation based on the SQUAD dataset, to showcase how it overcomes data privacy issues and enables knowledge sharing between alliance members in a Federated learning setting.

preprint2022arXiv

Scalable federated machine learning with FEDn

Federated machine learning has great promise to overcome the input privacy challenge in machine learning. The appearance of several projects capable of simulating federated learning has led to a corresponding rapid progress on algorithmic aspects of the problem. However, there is still a lack of federated machine learning frameworks that focus on fundamental aspects such as scalability, robustness, security, and performance in a geographically distributed setting. To bridge this gap we have designed and developed the FEDn framework. A main feature of FEDn is to support both cross-device and cross-silo training settings. This makes FEDn a powerful tool for researching a wide range of machine learning applications in a realistic setting.

preprint2022arXiv

To test, or not to test: A proactive approach for deciding complete performance test initiation

Software performance testing requires a set of inputs that exercise different sections of the code to identify performance issues. However, running tests on a large set of inputs can be a very time-consuming process. It is even more problematic when test inputs are constantly growing, which is the case with a large-scale scientific organization such as CERN where the process of performing scientific experiment generates plethora of data that is analyzed by physicists leading to new scientific discoveries. Therefore, in this article, we present a test input minimization approach based on a clustering technique to handle the issue of testing on growing data. Furthermore, we use clustering information to propose an approach that recommends the tester to decide when to run the complete test suite for performance testing. To demonstrate the efficacy of our approach, we applied it to two different code updates of a web service which is used at CERN and we found that the recommendation for performance test initiation made by our approach for an update with bottleneck is valid.

preprint2021arXiv

Understanding the Quality of Container Security Vulnerability Detection Tools

Virtualization enables information and communications technology industry to better manage computing resources. In this regard, improvements in virtualization approaches together with the need for consistent runtime environment, lower overhead and smaller package size has led to the growing adoption of containers. This is a technology, which packages an application, its dependencies and Operating System (OS) to run as an isolated unit. However, the pressing concern with the use of containers is its susceptibility to security attacks. Consequently, a number of container scanning tools are available for detecting container security vulnerabilities. Therefore, in this study, we investigate the quality of existing container scanning tools by proposing two metrics that reflects coverage and accuracy. We analyze 59 popular public container images for Java applications hosted on DockerHub using different container scanning tools (such as Clair, Anchore, and Microscanner). Our findings show that existing container scanning approach does not detect application package vulnerabilities. Furthermore, existing tools do not have high accuracy, since 34% vulnerabilities are being missed by the best performing tool. Finally, we also demonstrate quality of Docker images for Java applications hosted on DockerHub by assessing complete vulnerability landscape i.e., number of vulnerabilities detected in images.

preprint2020arXiv

Smart Resource Management for Data Streaming using an Online Bin-packing Strategy

Data stream processing frameworks provide reliable and efficient mechanisms for executing complex workflows over large datasets. A common challenge for the majority of currently available streaming frameworks is efficient utilization of resources. Most frameworks use static or semi-static settings for resource utilization that work well for established use cases but lead to marginal improvements for unseen scenarios. Another pressing issue is the efficient processing of large individual objects such as images and matrices typical for scientific datasets. HarmonicIO has proven to be a good solution for streams of relatively large individual objects, as demonstrated in a benchmark comparison with the Spark and Kafka streaming frameworks. We here present an extension of the HarmonicIO framework based on the online bin-packing algorithm, to allow for efficient utilization of resources. Based on a real world use case from large-scale microscopy pipelines, we compare results of the new system to Spark's auto-scaling mechanism.

preprint2015arXiv

MOLNs: A cloud platform for interactive, reproducible and scalable spatial stochastic computational experiments in systems biology using PyURDME

Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools, a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments.

preprint2010arXiv

Performance and Stability of the Chelonia Storage Cloud

In this paper we present the Chelonia storage cloud middleware. It was designed to fill the requirements gap between those of large, sophisticated scientific collaborations which have adopted the grid paradigm for their distributed storage needs, and of corporate business communities which are gravitating towards the cloud paradigm. The similarities to and differences between Chelonia and several well-known grid- and cloud-based storage solutions are commented. The design of Chelonia has been chosen to optimize high reliability and scalability of an integrated system of heterogeneous, geographically dispersed storage sites and the ability to easily expand the system dynamically. The architecture and implementation in term of web-services running inside the Advanced Resource Connector Hosting Environment Dameon (ARC HED) are described. We present results of tests in both local-area and wide-area networks that demonstrate the fault-tolerance, stability and scalability of Chelonia.

Salman Toor

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

FedQAS: Privacy-aware machine reading comprehension with federated learning

Scalable federated machine learning with FEDn

To test, or not to test: A proactive approach for deciding complete performance test initiation

Understanding the Quality of Container Security Vulnerability Detection Tools

Smart Resource Management for Data Streaming using an Online Bin-packing Strategy

MOLNs: A cloud platform for interactive, reproducible and scalable spatial stochastic computational experiments in systems biology using PyURDME

Performance and Stability of the Chelonia Storage Cloud