Source author record

Zaid Al-Ars

Zaid Al-Ars appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Emerging Technologies quant-ph Distributed, Parallel, and Cluster Computing Machine Learning Computation and Language Computational Engineering, Finance, and Science Cryptography and Security eess.IV Genomics Hardware Architecture Information Theory math.IT Multiagent Systems

Catalog footprint

What is connected

12works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Fidel: Reconstructing Private Training Samples from Weight Updates in Federated Learning

With the increasing number of data collectors such as smartphones, immense amounts of data are available. Federated learning was developed to allow for distributed learning on a massive scale whilst still protecting each users' privacy. This privacy is claimed by the notion that the centralized server does not have any access to a client's data, solely the client's model update. In this paper, we evaluate a novel attack method within regular federated learning which we name the First Dense Layer Attack (Fidel). The methodology of using this attack is discussed, and as a proof of viability we show how this attack method can be used to great effect for densely connected networks and convolutional neural networks. We evaluate some key design decisions and show that the usage of ReLu and Dropout are detrimental to the privacy of a client's local dataset. We show how to recover on average twenty out of thirty private data samples from a client's model update employing a fully connected neural network with very little computational resources required. Similarly, we show that over thirteen out of twenty samples can be recovered from a convolutional neural network update. An open source implementation of this attack can be found here https://github.com/Davidenthoven/Fidel-Reconstruction-Demo

preprint2022arXiv

Memory-Disaggregated In-Memory Object Store Framework for Big Data Applications

The concept of memory disaggregation has recently been gaining traction in research. With memory disaggregation, data center compute nodes can directly access memory on adjacent nodes and are therefore able to overcome local memory restrictions, introducing a new data management paradigm for distributed computing. This paper proposes and demonstrates a memory disaggregated in-memory object store framework for big data applications by leveraging the newly introduced ThymesisFlow memory disaggregation system. The framework extends the functionality of the pre-existing Apache Arrow Plasma object store framework to distributed systems by enabling clients to easily and efficiently produce and consume data objects across multiple compute nodes. This allows big data applications to increasingly leverage parallel processing at reduced development costs. In addition, the paper includes latency and throughput measurements that indicate only a modest performance penalty is incurred for remote disaggregated memory access as opposed to local (~6.5 vs ~5.75 GiB/s). The results can be used to guide the design of future systems that leverage memory disaggregation as well as the newly presented framework. This work is open-source and publicly accessible at https://doi.org/10.5281/zenodo.6368998.

preprint2022arXiv

QPack Scores: Quantitative performance metrics for application-oriented quantum computer benchmarking

This paper presents the benchmark score definitions of QPack, an application-oriented cross-platform benchmarking suite for quantum computers and simulators, which makes use of scalable Quantum Approximate Optimization Algorithm and Variational Quantum Eigensolver applications. Using a varied set of benchmark applications, an insight of how well a quantum computer or its simulator performs on a general NISQ-era application can be quantitatively made. This paper presents what quantum execution data can be collected and transformed into benchmark scores for application-oriented quantum benchmarking. Definitions are given for an overall benchmark score, as well as sub-scores based on runtime, accuracy, scalability and capacity performance. Using these scores, a comparison is made between various quantum computer simulators, running both locally and on vendors' remote cloud services. We also use the QPack benchmark to collect a small set of quantum execution data of the IBMQ Nairobi quantum processor. The goal of the QPack benchmark scores is to give a holistic insight into quantum performance and the ability to make easy and quick comparisons between different quantum computers

preprint2022arXiv

QPack: Quantum Approximate Optimization Algorithms as universal benchmark for quantum computers

In this paper, we present QPack, a universal benchmark for Noisy Intermediate-Scale Quantum (NISQ) computers based on Quantum Approximate Optimization Algorithms (QAOA). Unlike other evaluation metrics in the field, this benchmark evaluates not only one, but multiple important aspects of quantum computing hardware: the maximum problem size a quantum computer can solve, the required runtime, as well as the achieved accuracy. The applications MaxCut, dominating set and traveling salesman are included to provide variation in resource requirements. This will allow for a diverse benchmark that promotes optimal design considerations, avoiding hardware implementations for specific applications. We also discuss the design aspects that are taken in consideration for the QPack benchmark, with critical quantum benchmark requirements in mind. An implementation is presented, providing practical metrics. QPack is presented as a hardware agnostic benchmark by making use of the XACC library. We demonstrate the application of the benchmark on various IBM machines, as well as a range of simulators.

preprint2022arXiv

Quantum circuit design for universal distribution using a superposition of classical automata

In this research, we present a quantum circuit design and implementation for a parallel universal linear bounded automata. This circuit is able to accelerate the inference of algorithmic structures in data for discovering causal generative models. The computation model is practically restricted in time and space resources. A classical exhaustive enumeration of all possible programs on the automata is shown for a couple of example cases. The precise quantum circuit design that allows executing a superposition of programs, along with a superposition of inputs as in the standard quantum Turing machine formulation, is presented. This is the first time, a superposition of classical automata is implemented on the circuit model of quantum computation, having the corresponding mechanistic parts of a classical Turing machine. The superposition of programs allows our model to be used for experimenting with the space of program-output behaviors in algorithmic information theory. Our implementations on OpenQL and Qiskit quantum programming language is copy-left and is publicly available on GitHub.

preprint2021arXiv

BioDynaMo: a general platform for scalable agent-based simulation

Motivation: Agent-based modeling is an indispensable tool for studying complex biological systems. However, existing simulators do not always take full advantage of modern hardware and often have a field-specific software design. Results: We present a novel simulation platform called BioDynaMo that alleviates both of these problems. BioDynaMo features a general-purpose and high-performance simulation engine. We demonstrate that BioDynaMo can be used to simulate use cases in: neuroscience, oncology, and epidemiology. For each use case we validate our findings with experimental data or an analytical solution. Our performance results show that BioDynaMo performs up to three orders of magnitude faster than the state-of-the-art baseline. This improvement makes it feasible to simulate each use case with one billion agents on a single server, showcasing the potential BioDynaMo has for computational biology research. Availability: BioDynaMo is an open-source project under the Apache 2.0 license and is available at www.biodynamo.org. Instructions to reproduce the results are available in supplementary information. Contact: lukas.breitwieser@inf.ethz.ch, a.s.hesam@tudelft.nl, omutlu@ethz.ch, r.bauer@surrey.ac.uk Supplementary information: Available at https://doi.org/10.5281/zenodo.4501515

preprint2020arXiv

An Overview of Federated Deep Learning Privacy Attacks and Defensive Strategies

With the increased attention and legislation for data-privacy, collaborative machine learning (ML) algorithms are being developed to ensure the protection of private data used for processing. Federated learning (FL) is the most popular of these methods, which provides privacy preservation by facilitating collaborative training of a shared model without the need to exchange any private data with a centralized server. Rather, an abstraction of the data in the form of a machine learning model update is sent. Recent studies showed that such model updates may still very well leak private information and thus more structured risk assessment is needed. In this paper, we analyze existing vulnerabilities of FL and subsequently perform a literature review of the possible attack methods targetingFL privacy protection capabilities. These attack methods are then categorized by a basic taxonomy. Additionally, we provide a literature study of the most recent defensive strategies and algorithms for FL aimed to overcome these attacks. These defensive strategies are categorized by their respective underlying defence principle. The paper concludes that the application of a single defensive strategy is not enough to provide adequate protection to all available attack methods.

preprint2020arXiv

NASB: Neural Architecture Search for Binary Convolutional Neural Networks

Binary Convolutional Neural Networks (CNNs) have significantly reduced the number of arithmetic operations and the size of memory storage needed for CNNs, which makes their deployment on mobile and embedded systems more feasible. However, the CNN architecture after binarizing requires to be redesigned and refined significantly due to two reasons: 1. the large accumulation error of binarization in the forward propagation, and 2. the severe gradient mismatch problem of binarization in the backward propagation. Even though the substantial effort has been invested in designing architectures for single and multiple binary CNNs, it is still difficult to find an optimal architecture for binary CNNs. In this paper, we propose a strategy, named NASB, which adopts Neural Architecture Search (NAS) to find an optimal architecture for the binarization of CNNs. Due to the flexibility of this automated strategy, the obtained architecture is not only suitable for binarization but also has low overhead, achieving a better trade-off between the accuracy and computational complexity of hand-optimized binary CNNs. The implementation of NASB strategy is evaluated on the ImageNet dataset and demonstrated as a better solution compared to existing quantized CNNs. With the insignificant overhead increase, NASB outperforms existing single and multiple binary CNNs by up to 4.0% and 1.0% Top-1 accuracy respectively, bringing them closer to the precision of their full precision counterpart. The code and pretrained models will be publicly available.

preprint2020arXiv

Real-Time Face and Landmark Localization for Eyeblink Detection

Pavlovian eyeblink conditioning is a powerful experiment used in the field of neuroscience to measure multiple aspects of how we learn in our daily life. To track the movement of the eyelid during an experiment, researchers have traditionally made use of potentiometers or electromyography. More recently, the use of computer vision and image processing alleviated the need for these techniques but currently employed methods require human intervention and are not fast enough to enable real-time processing. In this work, a face- and landmark-detection algorithm have been carefully combined in order to provide fully automated eyelid tracking, and have further been accelerated to make the first crucial step towards online, closed-loop experiments. Such experiments have not been achieved so far and are expected to offer significant insights in the workings of neurological and psychiatric disorders. Based on an extensive literature search, various different algorithms for face detection and landmark detection have been analyzed and evaluated. Two algorithms were identified as most suitable for eyelid detection: the Histogram-of-Oriented-Gradients (HOG) algorithm for face detection and the Ensemble-of-Regression-Trees (ERT) algorithm for landmark detection. These two algorithms have been accelerated on GPU and CPU, achieving speedups of 1,753$\times$ and 11$\times$, respectively. To demonstrate the usefulness of our eyelid-detection algorithm, a research hypothesis was formed and a well-established neuroscientific experiment was employed: eyeblink detection. Our experimental evaluation reveals an overall application runtime of 0.533 ms per frame, which is 1,101$\times$ faster than the sequential implementation and well within the real-time requirements of eyeblink conditioning in humans, i.e. faster than 500 frames per second.

preprint2020arXiv

SoFAr: Shortcut-based Fractal Architectures for Binary Convolutional Neural Networks

Binary Convolutional Neural Networks (BCNNs) can significantly improve the efficiency of Deep Convolutional Neural Networks (DCNNs) for their deployment on resource-constrained platforms, such as mobile and embedded systems. However, the accuracy degradation of BCNNs is still considerable compared with their full precision counterpart, impeding their practical deployment. Because of the inevitable binarization error in the forward propagation and gradient mismatch problem in the backward propagation, it is nontrivial to train BCNNs to achieve satisfactory accuracy. To ease the difficulty of training, the shortcut-based BCNNs, such as residual connection-based Bi-real ResNet and dense connection-based BinaryDenseNet, introduce additional shortcuts in addition to the shortcuts already present in their full precision counterparts. Furthermore, fractal architectures have been also been used to improve the training process of full-precision DCNNs since the fractal structure triggers effects akin to deep supervision and lateral student-teacher information flow. Inspired by the shortcuts and fractal architectures, we propose two Shortcut-based Fractal Architectures (SoFAr) specifically designed for BCNNs: 1. residual connection-based fractal architectures for binary ResNet, and 2. dense connection-based fractal architectures for binary DenseNet. Our proposed SoFAr combines the adoption of shortcuts and the fractal architectures in one unified model, which is helpful in the training of BCNNs. Results show that our proposed SoFAr achieves better accuracy compared with shortcut-based BCNNs. Specifically, the Top-1 accuracy of our proposed RF-c4d8 ResNet37(41) and DRF-c2d2 DenseNet51(53) on ImageNet outperforms Bi-real ResNet18(64) and BinaryDenseNet51(32) by 3.29% and 1.41%, respectively, with the same computational complexity overhead.

preprint2020arXiv

Towards Lossless Binary Convolutional Neural Networks Using Piecewise Approximation

Binary Convolutional Neural Networks (CNNs) can significantly reduce the number of arithmetic operations and the size of memory storage, which makes the deployment of CNNs on mobile or embedded systems more promising. However, the accuracy degradation of single and multiple binary CNNs is unacceptable for modern architectures and large scale datasets like ImageNet. In this paper, we proposed a Piecewise Approximation (PA) scheme for multiple binary CNNs which lessens accuracy loss by approximating full precision weights and activations efficiently and maintains parallelism of bitwise operations to guarantee efficiency. Unlike previous approaches, the proposed PA scheme segments piece-wisely the full precision weights and activations, and approximates each piece with a scaling coefficient. Our implementation on ResNet with different depths on ImageNet can reduce both Top-1 and Top-5 classification accuracy gap compared with full precision to approximately 1.0%. Benefited from the binarization of the downsampling layer, our proposed PA-ResNet50 requires less memory usage and two times Flops than single binary CNNs with 4 weights and 5 activations bases. The PA scheme can also generalize to other architectures like DenseNet and MobileNet with similar approximation power as ResNet which is promising for other tasks using binary convolutions. The code and pretrained models will be publicly available.

preprint2019arXiv

An algorithm for DNA read alignment on quantum accelerators

With small-scale quantum processors transitioning from experimental physics labs to industrial products, these processors allow us to efficiently compute important algorithms in various fields. In this paper, we propose a quantum algorithm to address the challenging field of big data processing for genome sequence reconstruction. This research describes an architecture-aware implementation of a quantum algorithm for sub-sequence alignment. A new algorithm named QiBAM (quantum indexed bidirectional associative memory) is proposed, that uses approximate pattern-matching based on Hamming distances. QiBAM extends the Grover's search algorithm in two ways to allow for: (1) approximate matches needed for read errors in genomics, and (2) a distributed search for multiple solutions over the quantum encoding of DNA sequences. This approach gives a quadratic speedup over the classical algorithm. A full implementation of the algorithm is provided and verified using the OpenQL compiler and QX simulator framework. This represents a first exploration towards a full-stack quantum accelerated genome sequencing pipeline design. The open-source implementation can be found on https://github.com/prince-ph0en1x/QAGS.

Zaid Al-Ars

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Fidel: Reconstructing Private Training Samples from Weight Updates in Federated Learning

Memory-Disaggregated In-Memory Object Store Framework for Big Data Applications

QPack Scores: Quantitative performance metrics for application-oriented quantum computer benchmarking

QPack: Quantum Approximate Optimization Algorithms as universal benchmark for quantum computers

Quantum circuit design for universal distribution using a superposition of classical automata

BioDynaMo: a general platform for scalable agent-based simulation

An Overview of Federated Deep Learning Privacy Attacks and Defensive Strategies

NASB: Neural Architecture Search for Binary Convolutional Neural Networks

Real-Time Face and Landmark Localization for Eyeblink Detection

SoFAr: Shortcut-based Fractal Architectures for Binary Convolutional Neural Networks

Towards Lossless Binary Convolutional Neural Networks Using Piecewise Approximation

An algorithm for DNA read alignment on quantum accelerators