Source author record

Mostafa El-Khamy

Mostafa El-Khamy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Computer Vision eess.AS eess.IV Neural and Evolutionary Computing Sound Distributed, Parallel, and Cluster Computing

Catalog footprint

What is connected

18works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Learning with Succinct Common Representation Based on Wyner's Common Information

A new bimodal generative model is proposed for generating conditional and joint samples, accompanied with a training method with learning a succinct bottleneck representation. The proposed model, dubbed as the variational Wyner model, is designed based on two classical problems in network information theory -- distributed simulation and channel synthesis -- in which Wyner's common information arises as the fundamental limit on the succinctness of the common representation. The model is trained by minimizing the symmetric Kullback--Leibler divergence between variational and model distributions with regularization terms for common information, reconstruction consistency, and latent space matching terms, which is carried out via an adversarial density ratio estimation technique. The utility of the proposed approach is demonstrated through experiments for joint and conditional generation with synthetic and real-world datasets, as well as a challenging zero-shot image retrieval task.

preprint2022arXiv

MLPerf Mobile Inference Benchmark

This paper presents the first industry-standard open-source machine learning (ML) benchmark to allow perfor mance and accuracy evaluation of mobile devices with different AI chips and software stacks. The benchmark draws from the expertise of leading mobile-SoC vendors, ML-framework providers, and model producers. It comprises a suite of models that operate with standard data sets, quality metrics and run rules. We describe the design and implementation of this domain-specific ML benchmark. The current benchmark version comes as a mobile app for different computer vision and natural language processing tasks. The benchmark also supports non-smartphone devices, such as laptops and mobile PCs. Benchmark results from the first two rounds reveal the overwhelming complexity of the underlying mobile ML system stack, emphasizing the need for transparency in mobile ML performance analysis. The results also show that the strides being made all through the ML stack improve performance. Within six months, offline throughput improved by 3x, while latency reduced by as much as 12x. ML is an evolving field with changing use cases, models, data sets and quality targets. MLPerf Mobile will evolve and serve as an open-source community framework to guide research and innovation for mobile AI.

preprint2020arXiv

Data-Free Network Quantization With Adversarial Knowledge Distillation

Network quantization is an essential procedure in deep learning for development of efficient fixed-point inference models on mobile or edge platforms. However, as datasets grow larger and privacy regulations become stricter, data sharing for model compression gets more difficult and restricted. In this paper, we consider data-free network quantization with synthetic data. The synthetic data are generated from a generator, while no data are used in training the generator and in quantization. To this end, we propose data-free adversarial knowledge distillation, which minimizes the maximum distance between the outputs of the teacher and the (quantized) student for any adversarial samples from a generator. To generate adversarial samples similar to the original data, we additionally propose matching statistics from the batch normalization layers for generated data and the original data in the teacher. Furthermore, we show the gain of producing diverse adversarial samples by using multiple generators and multiple students. Our experiments show the state-of-the-art data-free model compression and quantization results for (wide) residual networks and MobileNet on SVHN, CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets. The accuracy losses compared to using the original datasets are shown to be very minimal.

preprint2020arXiv

End-to-End Multi-Task Denoising for the Joint Optimization of Perceptual Speech Metrics

Although supervised learning based on a deep neural network has recently achieved substantial improvement on speech enhancement, the existing schemes have either of two critical issues: spectrum or metric mismatches. The spectrum mismatch is a well known issue that any spectrum modification after short-time Fourier transform (STFT), in general, cannot be fully recovered after inverse short-time Fourier transform (ISTFT). The metric mismatch is that a conventional mean square error (MSE) loss function is typically sub-optimal to maximize perceptual speech measure such as signal-to-distortion ratio (SDR), perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI). This paper presents a new end-to-end denoising framework. First, the network optimization is performed on the time-domain signals after ISTFT to avoid the spectrum mismatch. Second, three loss functions based on SDR, PESQ and STOI are proposed to minimize the metric mismatch. The experimental result showed the proposed denoising scheme significantly improved SDR, PESQ and STOI performance over the existing methods. Moreover, the proposed scheme also provided good generalization performance over generative denoising models on the perceptual speech metrics not used as a loss function during training.

preprint2020arXiv

GSANet: Semantic Segmentation with Global and Selective Attention

This paper proposes a novel deep learning architecture for semantic segmentation. The proposed Global and Selective Attention Network (GSANet) features Atrous Spatial Pyramid Pooling (ASPP) with a novel sparsemax global attention and a novel selective attention that deploys a condensation and diffusion mechanism to aggregate the multi-scale contextual information from the extracted deep features. A selective attention decoder is also proposed to process the GSA-ASPP outputs for optimizing the softmax volume. We are the first to benchmark the performance of semantic segmentation networks with the low-complexity feature extraction network (FXN) MobileNetEdge, that is optimized for low latency on edge devices. We show that GSANet can result in more accurate segmentation with MobileNetEdge, as well as with strong FXNs, such as Xception. GSANet improves the state-of-art semantic segmentation accuracy on both the ADE20k and the Cityscapes datasets.

preprint2020arXiv

Learning Sparse Low-Precision Neural Networks With Learnable Regularization

We consider learning deep neural networks (DNNs) that consist of low-precision weights and activations for efficient inference of fixed-point operations. In training low-precision networks, gradient descent in the backward pass is performed with high-precision weights while quantized low-precision weights and activations are used in the forward pass to calculate the loss function for training. Thus, the gradient descent becomes suboptimal, and accuracy loss follows. In order to reduce the mismatch in the forward and backward passes, we utilize mean squared quantization error (MSQE) regularization. In particular, we propose using a learnable regularization coefficient with the MSQE regularizer to reinforce the convergence of high-precision weights to their quantized values. We also investigate how partial L2 regularization can be employed for weight pruning in a similar manner. Finally, combining weight pruning, quantization, and entropy coding, we establish a low-precision DNN compression pipeline. In our experiments, the proposed method yields low-precision MobileNet and ShuffleNet models on ImageNet classification with the state-of-the-art compression ratios of 7.13 and 6.79, respectively. Moreover, we examine our method for image super resolution networks to produce 8-bit low-precision models at negligible performance loss.

preprint2020arXiv

NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

This paper reviews the NTIRE 2020 challenge on real world super-resolution. It focuses on the participating methods and final results. The challenge addresses the real world setting, where paired true high and low-resolution images are unavailable. For training, only one set of source input images is therefore provided along with a set of unpaired high-quality target images. In Track 1: Image Processing artifacts, the aim is to super-resolve images with synthetically generated image processing artifacts. This allows for quantitative benchmarking of the approaches \wrt a ground-truth image. In Track 2: Smartphone Images, real low-quality smart phone images have to be super-resolved. In both tracks, the ultimate goal is to achieve the best perceptual quality, evaluated using a human study. This is the second challenge on the subject, following AIM 2019, targeting to advance the state-of-the-art in super-resolution. To measure the performance we use the benchmark protocol from AIM 2019. In total 22 teams competed in the final testing phase, demonstrating new and innovative solutions to the problem.

preprint2020arXiv

T-GSA: Transformer with Gaussian-weighted self-attention for speech enhancement

Transformer neural networks (TNN) demonstrated state-of-art performance on many natural language processing (NLP) tasks, replacing recurrent neural networks (RNNs), such as LSTMs or GRUs. However, TNNs did not perform well in speech enhancement, whose contextual nature is different than NLP tasks, like machine translation. Self-attention is a core building block of the Transformer, which not only enables parallelization of sequence computation, but also provides the constant path length between symbols that is essential to learning long-range dependencies. In this paper, we propose a Transformer with Gaussian-weighted self-attention (T-GSA), whose attention weights are attenuated according to the distance between target and context symbols. The experimental results show that the proposed T-GSA has significantly improved speech-enhancement performance, compared to the Transformer and RNNs.

preprint2020arXiv

WAFFLe: Weight Anonymized Factorization for Federated Learning

In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. In light of this need, federated learning has emerged as a popular training paradigm. However, many federated learning approaches trade transmitting data for communicating updated weight parameters for each local device. Therefore, a successful breach that would have otherwise directly compromised the data instead grants whitebox access to the local model, which opens the door to a number of attacks, including exposing the very data federated learning seeks to protect. Additionally, in distributed scenarios, individual client devices commonly exhibit high statistical heterogeneity. Many common federated approaches learn a single global model; while this may do well on average, performance degrades when the i.i.d. assumption is violated, underfitting individuals further from the mean, and raising questions of fairness. To address these issues, we propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks. Experiments on MNIST, FashionMNIST, and CIFAR-10 demonstrate WAFFLe's significant improvement to local test performance and fairness while simultaneously providing an extra layer of security.

preprint2016arXiv

Binary Polar Codes are Optimized Codes for Bitwise Multistage Decoding

Polar codes are considered the latest major breakthrough in coding theory. Polar codes were introduced by Arıkan in 2008. In this letter, we show that the binary polar codes are the same as the optimized codes for bitwise multistage decoding (OCBM), which have been discovered before by Stolte in 2002. The equivalence between the techniques used for the constructions and decodings of both codes is established.

preprint2015arXiv

Rate-Compatible Polar Codes for Wireless Channels

A design of rate-compatible polar codes suitable for HARQ communications is proposed in this paper. An important feature of the proposed design is that the puncturing order is chosen with low complexity on a base code of short length, which is then further polarized to the desired length. A practical rate-matching system that has the flexibility to choose any desired rate through puncturing or repetition while preserving the polarization is suggested. The proposed rate-matching system is combined with channel interleaving and a bit-mapping procedure that preserves the polarization of the rate-compatible polar code family over bit-interleaved coded modulation systems. Simulation results on AWGN and fast fading channels with different modulation orders show the robustness of the proposed rate-compatible polar code in both Chase combining and incremental redundancy HARQ communications.

preprint2014arXiv

Achieving the Uniform Rate Region of General Multiple Access Channels by Polar Coding

We consider the problem of polar coding for transmission over $m$-user multiple access channels. In the proposed scheme, all users encode their messages using a polar encoder, while a joint successive cancellation decoder is deployed at the receiver. The encoding is done separately across the users and is independent of the target achievable rate, in the sense that the encoder core is the regular Arıkan's polarization matrix. For the code construction, the positions of information bits and frozen bits for each of the users are decided jointly. This is done by treating the whole polar transformation across all the $m$ users as a single polar transformation with a certain base code. We prove that the covering radius of the dominant face of the uniform rate region is upper bounded by $r = \frac{(m-1)\sqrt{m}}{L}$, where $L$ represents the length of the base code. We then prove that the proposed polar coding scheme achieves the whole uniform rate region, with small enough resolution characterized by $r$, by changing the decoding order in the joint successive cancellation decoder. The encoding and decoding complexities are $O(N \log N)$, where $N$ is the code block length, and the asymptotic block error probability of $O(2^{-N^{0.5 - ε}})$ is guaranteed. Examples of achievable rates for the case of $3$-user multiple access channel are provided.

preprint2013arXiv

BICM Performance Improvement via Online LLR Optimization

We consider bit interleaved coded modulation (BICM) receiver performance improvement based on the concept of generalized mutual information (GMI). Increasing achievable rates of BICM receiver with GMI maximization by proper scaling of the log likelihood ratio (LLR) is investigated. While it has been shown in the literature that look-up table based LLR scaling functions matched to each specific transmission scenario may provide close to optimal solutions, this method is difficult to adapt to time-varying channel conditions. To solve this problem, an online adaptive scaling factor searching algorithm is developed. Uniform scaling factors are applied to LLRs from different bit channels of each data frame by maximizing an approximate GMI that characterizes the transmission conditions of current data frame. Numerical analysis on effective achievable rates as well as link level simulation of realistic mobile transmission scenarios indicate that the proposed method is simple yet effective.

preprint2013arXiv

Compound Polar Codes

A capacity-achieving scheme based on polar codes is proposed for reliable communication over multi-channels which can be directly applied to bit-interleaved coded modulation schemes. We start by reviewing the ground-breaking work of polar codes and then discuss our proposed scheme. Instead of encoding separately across the individual underlying channels, which requires multiple encoders and decoders, we take advantage of the recursive structure of polar codes to construct a unified scheme with a single encoder and decoder that can be used over the multi-channels. We prove that the scheme achieves the capacity over this multi-channel. Numerical analysis and simulation results for BICM channels at finite block lengths shows a considerable improvement in the probability of error comparing to a conventional separated scheme.

preprint2013arXiv

On the Construction and Decoding of Concatenated Polar Codes

A scheme for concatenating the recently invented polar codes with interleaved block codes is considered. By concatenating binary polar codes with interleaved Reed-Solomon codes, we prove that the proposed concatenation scheme captures the capacity-achieving property of polar codes, while having a significantly better error-decay rate. We show that for any $ε> 0$, and total frame length $N$, the parameters of the scheme can be set such that the frame error probability is less than $2^{-N^{1-ε}}$, while the scheme is still capacity achieving. This improves upon $2^{-N^{0.5-\eps}}$, the frame error probability of Arikan's polar codes. We also propose decoding algorithms for concatenated polar codes, which significantly improve the error-rate performance at finite block lengths while preserving the low decoding complexity.

preprint2013arXiv

Performance Limits and Practical Decoding of Interleaved Reed-Solomon Polar Concatenated Codes

A scheme for concatenating the recently invented polar codes with non-binary MDS codes, as Reed-Solomon codes, is considered. By concatenating binary polar codes with interleaved Reed-Solomon codes, we prove that the proposed concatenation scheme captures the capacity-achieving property of polar codes, while having a significantly better error-decay rate. We show that for any $ε> 0$, and total frame length $N$, the parameters of the scheme can be set such that the frame error probability is less than $2^{-N^{1-ε}}$, while the scheme is still capacity achieving. This improves upon $2^{-N^{0.5-ε}}$, the frame error probability of Arikan's polar codes. The proposed concatenated polar codes and Arikan's polar codes are also compared for transmission over channels with erasure bursts. We provide a sufficient condition on the length of erasure burst which guarantees failure of the polar decoder. On the other hand, it is shown that the parameters of the concatenated polar code can be set in such a way that the capacity-achieving properties of polar codes are preserved. We also propose decoding algorithms for concatenated polar codes, which significantly improve the error-rate performance at finite block lengths while preserving the low decoding complexity.

preprint2013arXiv

Performance of Spatially-Coupled LDPC Codes and Threshold Saturation over BICM Channels

We study the performance of binary spatially-coupled low-density parity-check codes (SC-LDPC) when used with bit-interleaved coded-modulation (BICM) schemes. This paper considers the cases when transmission takes place over additive white Gaussian noise (AWGN)channels and Rayleigh fast-fading channels. The technique of upper bounding the maximum-a-posteriori (MAP) decoding performance of LDPC codes using an area theorem is extended for BICM schemes. The upper bound is computed for both the optimal MAP demapper and the suboptimal max-log-MAP (MLM) demapper. It is observed that this bound approaches the noise threshold of BICM channels for regular LDPC codes with large degrees. The rest of the paper extends these techniques to SC-LDPC codes and the phenomenon of threshold saturation is demonstrated numerically. Based on numerical evidence, we conjecture that the belief-propagation (BP) decoding threshold of SC-LDPC codes approaches the MAP decoding threshold of the underlying LDPC ensemble on BICM channels. Numerical results also show that SC-LDPC codes approach the BICM capacity over different channels and modulation schemes.

preprint2005arXiv

Iterative Algebraic Soft-Decision List Decoding of Reed-Solomon Codes

In this paper, we present an iterative soft-decision decoding algorithm for Reed-Solomon codes offering both complexity and performance advantages over previously known decoding algorithms. Our algorithm is a list decoding algorithm which combines two powerful soft decision decoding techniques which were previously regarded in the literature as competitive, namely, the Koetter-Vardy algebraic soft-decision decoding algorithm and belief-propagation based on adaptive parity check matrices, recently proposed by Jiang and Narayanan. Building on the Jiang-Narayanan algorithm, we present a belief-propagation based algorithm with a significant reduction in computational complexity. We introduce the concept of using a belief-propagation based decoder to enhance the soft-input information prior to decoding with an algebraic soft-decision decoder. Our algorithm can also be viewed as an interpolation multiplicity assignment scheme for algebraic soft-decision decoding of Reed-Solomon codes.

Mostafa El-Khamy

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Learning with Succinct Common Representation Based on Wyner's Common Information

MLPerf Mobile Inference Benchmark

Data-Free Network Quantization With Adversarial Knowledge Distillation

End-to-End Multi-Task Denoising for the Joint Optimization of Perceptual Speech Metrics

GSANet: Semantic Segmentation with Global and Selective Attention

Learning Sparse Low-Precision Neural Networks With Learnable Regularization

NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

T-GSA: Transformer with Gaussian-weighted self-attention for speech enhancement

WAFFLe: Weight Anonymized Factorization for Federated Learning

Binary Polar Codes are Optimized Codes for Bitwise Multistage Decoding

Rate-Compatible Polar Codes for Wireless Channels

Achieving the Uniform Rate Region of General Multiple Access Channels by Polar Coding

BICM Performance Improvement via Online LLR Optimization

Compound Polar Codes

On the Construction and Decoding of Concatenated Polar Codes

Performance Limits and Practical Decoding of Interleaved Reed-Solomon Polar Concatenated Codes

Performance of Spatially-Coupled LDPC Codes and Threshold Saturation over BICM Channels

Iterative Algebraic Soft-Decision List Decoding of Reed-Solomon Codes