Source author record

Vahid Tarokh

Vahid Tarokh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

38works

25topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Fisher Task Distance and Its Application in Neural Architecture Search

We formulate an asymmetric (or non-commutative) distance between tasks based on Fisher Information Matrices, called Fisher task distance. This distance represents the complexity of transferring the knowledge from one task to another. We provide a proof of consistency for our distance through theorems and experiments on various classification tasks from MNIST, CIFAR-10, CIFAR-100, ImageNet, and Taskonomy datasets. Next, we construct an online neural architecture search framework using the Fisher task distance, in which we have access to the past learned tasks. By using the Fisher task distance, we can identify the closest learned tasks to the target task, and utilize the knowledge learned from these related tasks for the target task. Here, we show how the proposed distance between a target task and a set of learned tasks can be used to reduce the neural architecture search space for the target task. The complexity reduction in search space for task-specific architectures is achieved by building on the optimized architectures for similar tasks instead of doing a full search and without using this side information. Experimental results for tasks in MNIST, CIFAR-10, CIFAR-100, ImageNet datasets demonstrate the efficacy of the proposed approach and its improvements, in terms of the performance and the number of parameters, over other gradient-based search methods, such as ENAS, DARTS, PC-DARTS.

preprint2022arXiv

Improved Automated Machine Learning from Transfer Learning

In this paper, we propose a neural architecture search framework based on a similarity measure between some baseline tasks and a target task. We first define the notion of the task similarity based on the log-determinant of the Fisher Information matrix. Next, we compute the task similarity from each of the baseline tasks to the target task. By utilizing the relation between a target and a set of learned baseline tasks, the search space of architectures for the target task can be significantly reduced, making the discovery of the best candidates in the set of possible architectures tractable and efficient, in terms of GPU days. This method eliminates the requirement for training the networks from scratch for a given target task as well as introducing the bias in the initialization of the search space from the human domain.

preprint2022arXiv

Modeling Extremes with d-max-decreasing Neural Networks

We propose a novel neural network architecture that enables non-parametric calibration and generation of multivariate extreme value distributions (MEVs). MEVs arise from Extreme Value Theory (EVT) as the necessary class of models when extrapolating a distributional fit over large spatial and temporal scales based on data observed in intermediate scales. In turn, EVT dictates that $d$-max-decreasing, a stronger form of convexity, is an essential shape constraint in the characterization of MEVs. As far as we know, our proposed architecture provides the first class of non-parametric estimators for MEVs that preserve these essential shape constraints. We show that our architecture approximates the dependence structure encoded by MEVs at parametric rate. Moreover, we present a new method for sampling high-dimensional MEVs using a generative model. We demonstrate our methodology on a wide range of experimental settings, ranging from environmental sciences to financial mathematics and verify that the structural properties of MEVs are retained compared to existing methods.

preprint2022arXiv

Multi-Agent Adversarial Attacks for Multi-Channel Communications

Recently Reinforcement Learning (RL) has been applied as an anti-adversarial remedy in wireless communication networks. However, studying the RL-based approaches from the adversary's perspective has received little attention. Additionally, RL-based approaches in an anti-adversary or adversarial paradigm mostly consider single-channel communication (either channel selection or single channel power control), while multi-channel communication is more common in practice. In this paper, we propose a multi-agent adversary system (MAAS) for modeling and analyzing adversaries in a wireless communication scenario by careful design of the reward function under realistic communication scenarios. In particular, by modeling the adversaries as learning agents, we show that the proposed MAAS is able to successfully choose the transmitted channel(s) and their respective allocated power(s) without any prior knowledge of the sender strategy. Compared to the single-agent adversary (SAA), multi-agents in MAAS can achieve significant reduction in signal-to-noise ratio (SINR) under the same power constraints and partial observability, while providing improved stability and a more efficient learning process. Moreover, through empirical studies we show that the results in simulation are close to the ones in communication in reality, a conclusion that is pivotal to the validity of performance of agents evaluated in simulations.

preprint2022arXiv

On The Energy Statistics of Feature Maps in Pruning of Neural Networks with Skip-Connections

We propose a new structured pruning framework for compressing Deep Neural Networks (DNNs) with skip connections, based on measuring the statistical dependency of hidden layers and predicted outputs. The dependence measure defined by the energy statistics of hidden layers serves as a model-free measure of information between the feature maps and the output of the network. The estimated dependence measure is subsequently used to prune a collection of redundant and uninformative layers. Model-freeness of our measure guarantees that no parametric assumptions on the feature map distribution are required, making it computationally appealing for very high dimensional feature space in DNNs. Extensive numerical experiments on various architectures show the efficacy of the proposed pruning approach with competitive performance to state-of-the-art methods.

preprint2022arXiv

Semi-Empirical Objective Functions for MCMC Proposal Optimization

Current objective functions used for training neural MCMC proposal distributions implicitly rely on architectural restrictions to yield sensible optimization results, which hampers the development of highly expressive neural MCMC proposal architectures. In this work, we introduce and demonstrate a semi-empirical procedure for determining approximate objective functions suitable for optimizing arbitrarily parameterized proposal distributions in MCMC methods. Our proposed Ab Initio objective functions consist of the weighted combination of functions following constraints on their global optima and transformation invariances that we argue should be upheld by general measures of MCMC efficiency for use in proposal optimization. Our experimental results demonstrate that Ab Initio objective functions maintain favorable performance and preferable optimization behavior compared to existing objective functions for neural MCMC optimization. We find that Ab Initio objective functions are sufficiently robust to enable the confident optimization of neural proposal distributions parameterized by deep generative networks extending beyond the regimes of traditional MCMC schemes

preprint2022arXiv

Task Affinity with Maximum Bipartite Matching in Few-Shot Learning

We propose an asymmetric affinity score for representing the complexity of utilizing the knowledge of one task for learning another one. Our method is based on the maximum bipartite matching algorithm and utilizes the Fisher Information matrix. We provide theoretical analyses demonstrating that the proposed score is mathematically well-defined, and subsequently use the affinity score to propose a novel algorithm for the few-shot learning problem. In particular, using this score, we find relevant training data labels to the test data and leverage the discovered relevant data for episodically fine-tuning a few-shot model. Results on various few-shot benchmark datasets demonstrate the efficacy of the proposed approach by improving the classification accuracy over the state-of-the-art methods even when using smaller models.

preprint2021arXiv

Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers

Analyzing large-scale data from simulations of turbulent flows is memory intensive, requiring significant resources. This major challenge highlights the need for data compression techniques. In this study, we apply a physics-informed Deep Learning technique based on vector quantization to generate a discrete, low-dimensional representation of data from simulations of three-dimensional turbulent flows. The deep learning framework is composed of convolutional layers and incorporates physical constraints on the flow, such as preserving incompressibility and global statistical characteristics of the velocity gradients. The accuracy of the model is assessed using statistical, comparison-based similarity and physics-based metrics. The training data set is produced from Direct Numerical Simulation of an incompressible, statistically stationary, isotropic turbulent flow. The performance of this lossy data compression scheme is evaluated not only with unseen data from the stationary, isotropic turbulent flow, but also with data from decaying isotropic turbulence, a Taylor-Green vortex flow, and a turbulent channel flow. Defining the compression ratio (CR) as the ratio of original data size to the compressed one, the results show that our model based on vector quantization can offer CR$=85$ with a mean square error (MSE) of $O(10^{-3})$, and predictions that faithfully reproduce the statistics of the flow, except at the very smallest scales where there is some loss. Compared to the recent study of Glaws. et. al. (Physical Review Fluids, 5(11):114602, 2020), which was based on a conventional autoencoder (where compression is performed in a continuous space), our model improves the CR by more than $30$ percent...

preprint2021arXiv

Generative Archimedean Copulas

We propose a new generative modeling technique for learning multidimensional cumulative distribution functions (CDFs) in the form of copulas. Specifically, we consider certain classes of copulas known as Archimedean and hierarchical Archimedean copulas, popular for their parsimonious representation and ability to model different tail dependencies. We consider their representation as mixture models with Laplace transforms of latent random variables from generative neural networks. This alternative representation allows for computational efficiencies and easy sampling, especially in high dimensions. We describe multiple methods for optimizing the network parameters. Finally, we present empirical results that demonstrate the efficacy of our proposed method in learning multidimensional CDFs and its computational efficiency compared to existing methods.

preprint2021arXiv

GeoStat Representations of Time Series for Fast Classification

Recent advances in time series classification have largely focused on methods that either employ deep learning or utilize other machine learning models for feature extraction. Though successful, their power often comes at the requirement of computational complexity. In this paper, we introduce GeoStat representations for time series. GeoStat representations are based off of a generalization of recent methods for trajectory classification, and summarize the information of a time series in terms of comprehensive statistics of (possibly windowed) distributions of easy to compute differential geometric quantities, requiring no dynamic time warping. The features used are intuitive and require minimal parameter tuning. We perform an exhaustive evaluation of GeoStat on a number of real datasets, showing that simple KNN and SVM classifiers trained on these representations exhibit surprising performance relative to modern single model methods requiring significant computational power, achieving state of the art results in many cases. In particular, we show that this methodology achieves good performance on a challenging dataset involving the classification of fishing vessels, where our methods achieve good performance relative to the state of the art despite only having access to approximately two percent of the dataset used in training and evaluating this state of the art.

preprint2021arXiv

Projected Latent Markov Chain Monte Carlo: Conditional Sampling of Normalizing Flows

We introduce Projected Latent Markov Chain Monte Carlo (PL-MCMC), a technique for sampling from the high-dimensional conditional distributions learned by a normalizing flow. We prove that a Metropolis-Hastings implementation of PL-MCMC asymptotically samples from the exact conditional distributions associated with a normalizing flow. As a conditional sampling method, PL-MCMC enables Monte Carlo Expectation Maximization (MC-EM) training of normalizing flows from incomplete data. Through experimental tests applying normalizing flows to missing data tasks for a variety of data sets, we demonstrate the efficacy of PL-MCMC for conditional sampling from normalizing flows.

preprint2020arXiv

Convergence Rate of Empirical Spectral Distribution of Random Matrices from Linear Codes

It is known that the empirical spectral distribution of random matrices obtained from linear codes of increasing length converges to the well-known Marchenko-Pastur law, if the Hamming distance of the dual codes is at least 5. In this paper, we prove that the convergence in probability is at least of the order $n^{-1/4}$ where $n$ is the length of the code.

preprint2020arXiv

Cross-subject Decoding of Eye Movement Goals from Local Field Potentials

Objective. We consider the cross-subject decoding problem from local field potential (LFP) signals, where training data collected from the prefrontal cortex (PFC) of a source subject is used to decode intended motor actions in a destination subject. Approach. We propose a novel supervised transfer learning technique, referred to as data centering, which is used to adapt the feature space of the source to the feature space of the destination. The key ingredients of data centering are the transfer functions used to model the deterministic component of the relationship between the source and destination feature spaces. We propose an efficient data-driven estimation approach for linear transfer functions that uses the first and second order moments of the class-conditional distributions. Main result. We apply our data centering technique with linear transfer functions for cross-subject decoding of eye movement intentions in an experiment where two macaque monkeys perform memory-guided visual saccades to one of eight target locations. The results show peak cross-subject decoding performance of $80\%$, which marks a substantial improvement over random choice decoder. In addition to this, data centering also outperforms standard sampling-based methods in setups with imbalanced training data. Significance. The analyses presented herein demonstrate that the proposed data centering is a viable novel technique for reliable LFP-based cross-subject brain-computer interfacing and neural prostheses.

preprint2020arXiv

Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization

Various types of parameter restart schemes have been proposed for accelerated gradient algorithms to facilitate their practical convergence in convex optimization. However, the convergence properties of accelerated gradient algorithms under parameter restart remain obscure in nonconvex optimization. In this paper, we propose a novel accelerated proximal gradient algorithm with parameter restart (named APG-restart) for solving nonconvex and nonsmooth problems. Our APG-restart is designed to 1) allow for adopting flexible parameter restart schemes that cover many existing ones; 2) have a global sub-linear convergence rate in nonconvex and nonsmooth optimization; and 3) have guaranteed convergence to a critical point and have various types of asymptotic convergence rates depending on the parameterization of local geometry in nonconvex and nonsmooth optimization. Numerical experiments demonstrate the effectiveness of our proposed algorithm.

preprint2020arXiv

Robust Marine Buoy Placement for Ship Detection Using Dropout K-Means

Marine buoys aid in the battle against Illegal, Unreported and Unregulated (IUU) fishing by detecting fishing vessels in their vicinity. Marine buoys, however, may be disrupted by natural causes and buoy vandalism. In this paper, we formulate marine buoy placement as a clustering problem, and propose dropout k-means and dropout k-median to improve placement robustness to buoy disruption. We simulated the passage of ships in the Gabonese waters near West Africa using historical Automatic Identification System (AIS) data, then compared the ship detection probability of dropout k-means to classic k-means and dropout k-median to classic k-median. With 5 buoys, the buoy arrangement computed by classic k-means, dropout k-means, classic k-median and dropout k-median have ship detection probabilities of 38%, 45%, 48% and 52%.

preprint2020arXiv

Speech Emotion Recognition with Dual-Sequence LSTM Architecture

Speech Emotion Recognition (SER) has emerged as a critical component of the next generation human-machine interfacing technologies. In this work, we propose a new dual-level model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. Each utterance is preprocessed into MFCC features and two mel-spectrograms at different time-frequency resolutions. A standard LSTM processes the MFCC features, while a novel LSTM architecture, denoted as Dual-Sequence LSTM (DS-LSTM), processes the two mel-spectrograms simultaneously. The outputs are later averaged to produce a final classification of the utterance. Our proposed model achieves, on average, a weighted accuracy of 72.7% and an unweighted accuracy of 73.3%---a 6% improvement over current state-of-the-art unimodal models---and is comparable with multimodal models that leverage textual information as well as audio signals.

preprint2020arXiv

SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in smooth nonconvex optimization. However, SPIDER uses an accuracy-dependent stepsize that slows down the convergence in practice, and cannot handle objective functions that involve nonsmooth regularizers. In this paper, we propose SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee. In particular, we show that proximal SpiderBoost achieves an oracle complexity of $\mathcal{O}(\min\{n^{1/2}ε^{-2},ε^{-3}\})$ in composite nonconvex optimization, improving the state-of-the-art result by a factor of $\mathcal{O}(\min\{n^{1/6},ε^{-1/3}\})$. We further develop a novel momentum scheme to accelerate SpiderBoost for composite optimization, which achieves the near-optimal oracle complexity in theory and substantial improvement in experiments.

preprint2019arXiv

Deep Clustering of Compressed Variational Embeddings

Motivated by the ever-increasing demands for limited communication bandwidth and low-power consumption, we propose a new methodology, named joint Variational Autoencoders with Bernoulli mixture models (VAB), for performing clustering in the compressed data domain. The idea is to reduce the data dimension by Variational Autoencoders (VAEs) and group data representations by Bernoulli mixture models (BMMs). Once jointly trained for compression and clustering, the model can be decomposed into two parts: a data vendor that encodes the raw data into compressed data, and a data consumer that classifies the received (compressed) data. In this way, the data vendor benefits from data security and communication bandwidth, while the data consumer benefits from low computational complexity. To enable training using the gradient descent algorithm, we propose to use the Gumbel-Softmax distribution to resolve the infeasibility of the back-propagation algorithm when assessing categorical samples.

preprint2019arXiv

DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression

We propose a new architecture for distributed image compression from a group of distributed data sources. The work is motivated by practical needs of data-driven codec design, low power consumption, robustness, and data privacy. The proposed architecture, which we refer to as Distributed Recurrent Autoencoder for Scalable Image Compression (DRASIC), is able to train distributed encoders and one joint decoder on correlated data sources. Its compression capability is much better than the method of training codecs separately. Meanwhile, the performance of our distributed system with 10 distributed sources is only within 2 dB peak signal-to-noise ratio (PSNR) of the performance of a single codec trained with all data sources. We experiment distributed sources with different correlations and show how our data-driven methodology well matches the Slepian-Wolf Theorem in Distributed Source Coding (DSC). To the best of our knowledge, this is the first data-driven DSC framework for general distributed code design with deep learning.

preprint2019arXiv

Restricted Recurrent Neural Networks

Recurrent Neural Network (RNN) and its variations such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have become standard building blocks for learning online data of sequential nature in many research areas, including natural language processing and speech data analysis. In this paper, we present a new methodology to significantly reduce the number of parameters in RNNs while maintaining performance that is comparable or even better than classical RNNs. The new proposal, referred to as Restricted Recurrent Neural Network (RRNN), restricts the weight matrices corresponding to the input data and hidden states at each time step to share a large proportion of parameters. The new architecture can be regarded as a compression of its classical counterpart, but it does not require pre-training or sophisticated parameter fine-tuning, both of which are major issues in most existing compression techniques. Experiments on natural language modeling show that compared with its classical counterpart, the restricted recurrent architecture generally produces comparable results at about 50\% compression rate. In particular, the Restricted LSTM can outperform classical RNN with even less number of parameters.

preprint2019arXiv

Supervised Encoding for Discrete Representation Learning

Classical supervised classification tasks search for a nonlinear mapping that maps each encoded feature directly to a probability mass over the labels. Such a learning framework typically lacks the intuition that encoded features from the same class tend to be similar and thus has little interpretability for the learned features. In this paper, we propose a novel supervised learning model named Supervised-Encoding Quantizer (SEQ). The SEQ applies a quantizer to cluster and classify the encoded features. We found that the quantizer provides an interpretable graph where each cluster in the graph represents a class of data samples that have a particular style. We also trained a decoder that can decode convex combinations of the encoded features from similar and different clusters and provide guidance on style transfer between sub-classes.

preprint2016arXiv

Bridging AIC and BIC: a new criterion for autoregression

We introduce a new criterion to determine the order of an autoregressive model fitted to time series data. It has the benefits of the two well-known model selection techniques, the Akaike information criterion and the Bayesian information criterion. When the data is generated from a finite order autoregression, the Bayesian information criterion is known to be consistent, and so is the new criterion. When the true order is infinity or suitably high with respect to the sample size, the Akaike information criterion is known to be efficient in the sense that its prediction performance is asymptotically equivalent to the best offered by the candidate models; in this case, the new criterion behaves in a similar manner. Different from the two classical criteria, the proposed criterion adaptively achieves either consistency or efficiency depending on the underlying true model. In practice where the observed time series is given without any prior information about the model specification, the proposed order selection criterion is more flexible and robust compared with classical approaches. Numerical results are presented demonstrating the adaptivity of the proposed technique when applied to various datasets.

preprint2016arXiv

Distributed Resource Allocation Using One-Way Communication with Applications to Power Networks

Typical coordination schemes for future power grids require two-way communications. Since the number of end power-consuming devices is large, the bandwidth requirements for such two-way communication schemes may be prohibitive. Motivated by this observation, we study distributed coordination schemes that require only one-way limited communications. In particular, we investigate how dual descent distributed optimization algorithm can be employed in power networks using one-way communication. In this iterative algorithm, system coordinators broadcast coordinating (or pricing) signals to the users/devices who update power consumption based on the received signal. Then system coordinators update the coordinating signals based on the physical measurement of the aggregate power usage. We provide conditions to guarantee the feasibility of the aggregated power usage at each iteration so as to avoid blackout. Furthermore, we prove the convergence of algorithms under these conditions, and establish its rate of convergence. We illustrate the performance of our algorithms using numerical simulations. These results show that one-way limited communication may be viable for coordinating/operating the future smart grids.

preprint2016arXiv

Robustness Analysis for an Online Decentralized Descent Power allocation algorithm

As independent service providers shift from conventional energy to renewable energy sources, the power distribution system will likely experience increasingly significant fluctuation in supply, given the uncertain and intermittent nature of renewable sources like wind and solar energy. These fluctuations in power generation, coupled with time-varying consumer demands of electricity and the massive scale of power distribution networks present the need to not only design real-time decentralized power allocation algorithms, but also characterize how effective they are given fast-changing consumer demands and power generation capacities. In this paper, we present an Online Decentralized Dual Descent (OD3) power allocation algorithm and determine (in the worst case) how much of observed social welfare and price volatility can be explained by fluctuations in generation capacity and consumer demand. Convergence properties and performance guarantees of the OD3 algorithm are analyzed by characterizing the difference between the online decision and the optimal decision. The theoretical results in the paper are validated and illustrated by numerical experiments using real data.

preprint2015arXiv

Comment on "Asymptotic Achievability of the Cramér-Rao Bound for Noisy Compressive Sampling"

In [1], we proved the asymptotic achievability of the Cramér-Rao bound in the compressive sensing setting in the linear sparsity regime. In the proof, we used an erroneous closed-form expression of $ασ^2$ for the genie-aided Cramér-Rao bound $σ^2 \textrm{Tr} (\mathbf{A}^*_\mathcal{I} \mathbf{A}_\mathcal{I})^{-1}$ from Lemma 3.5, which appears in Eqs. (20) and (29). The proof, however, holds if one avoids replacing $σ^2 \textrm{Tr} (\mathbf{A}^*_\mathcal{I} \mathbf{A}_\mathcal{I})^{-1}$ by the expression of Lemma 3.5, and hence the claim of the Main Theorem stands true. In Chapter 2 of the Ph. D. dissertation by Behtash Babadi [2], this error was fixed and a more detailed proof in the non-asymptotic regime was presented. A draft of Chapter 2 of [2] is included in this note, verbatim. We would like to refer the interested reader to the full dissertation, which is electronically archived in the ProQuest database [2], and a draft of which can be accessed through the author's homepage under: http://ece.umd.edu/~behtash/babadi_thesis_2011.pdf.

preprint2015arXiv

Complementary Lattice Arrays for Coded Aperture Imaging

In this work, we consider complementary lattice arrays in order to enable a broader range of designs for coded aperture imaging systems. We provide a general framework and methods that generate richer and more flexible designs than existing ones. Besides this, we review and interpret the state-of-the-art uniformly redundant arrays (URA) designs, broaden the related concepts, and further propose some new design methods.

preprint2015arXiv

Data-Driven Learning of the Number of States in Multi-State Autoregressive Models

In this work, we consider the class of multi-state autoregressive processes that can be used to model non-stationary time-series of interest. In order to capture different autoregressive (AR) states underlying an observed time series, it is crucial to select the appropriate number of states. We propose a new model selection technique based on the Gap statistics, which uses a null reference distribution on the stable AR filters to check whether adding a new AR state significantly improves the performance of the model. To that end, we define a new distance measure between AR filters based on mean squared prediction error (MSPE), and propose an efficient method to generate random stable filters that are uniformly distributed in the coefficient space. Numerical results are provided to evaluate the performance of the proposed approach.

preprint2015arXiv

Key Pre-Distributions From Graph-Based Block Designs

With the development of wireless communication technologies which considerably contributed to the development of wireless sensor networks (WSN), we have witnessed an ever-increasing WSN based applications which induced a host of research activities in both academia and industry. Since most of the target WSN applications are very sensitive, security issue is one of the major challenges in the deployment of WSN. One of the important building blocks in securing WSN is key management. Traditional key management solutions developed for other networks are not suitable for WSN since WSN networks are resource (e.g. memory, computation, energy) limited. Key pre-distribution algorithms have recently evolved as efficient alternatives of key management in these networks. In the key pre-distribution systems, secure communication is achieved between a pair of nodes either by the existence of a key allowing for direct communication or by a chain of keys forming a key-path between the pair. In this paper, we propose methods which bring prior knowledge of network characteristics and application constraints into the design of key pre-distribution schemes, in order to provide better security and connectivity while requiring less resources. Our methods are based on casting the prior information as a graph. Motivated by this idea, we also propose a class of quasi-symmetric designs referred here to as g-designs. These produce key pre-distribution schemes that significantly improve upon the existing constructions based on unital designs. We give some examples, and point out open problems for future research.

preprint2015arXiv

Learning the Number of Autoregressive Mixtures in Time Series Using the Gap Statistics

Using a proper model to characterize a time series is crucial in making accurate predictions. In this work we use time-varying autoregressive process (TVAR) to describe non-stationary time series and model it as a mixture of multiple stable autoregressive (AR) processes. We introduce a new model selection technique based on Gap statistics to learn the appropriate number of AR filters needed to model a time series. We define a new distance measure between stable AR filters and draw a reference curve that is used to measure how much adding a new AR filter improves the performance of the model, and then choose the number of AR filters that has the maximum gap with the reference curve. To that end, we propose a new method in order to generate uniform random stable AR filters in root domain. Numerical results are provided demonstrating the performance of the proposed approach.

preprint2014arXiv

New Conditions for Sparse Phase Retrieval

We consider the problem of sparse phase retrieval, where a $k$-sparse signal ${\bf x} \in {\mathbb R}^n \textrm{ (or } {\mathbb C}^n\textrm{)}$ is measured as ${\bf y} = |{\bf Ax}|,$ where ${\bf A} \in {\mathbb R}^{m \times n} \textrm{ (or } {\mathbb C}^{m \times n}\textrm{ respectively)}$ is a measurement matrix and $|\cdot|$ is the element-wise absolute value. For a real signal and a real measurement matrix ${\bf A}$, we show that $m = 2k$ measurements are necessary and sufficient to recover ${\bf x}$ uniquely. For complex signal ${\bf x} \in {\mathbb C}^n$ and ${\bf A} \in {\mathbb C}^{m \times n}$, we show that $m = 4k-2$ phaseless measurements are sufficient to recover ${\bf x}$. It is known that the multiplying constant $4$ in $m = 4k-2$ cannot be improved.

preprint2011arXiv

On the Order Optimality of Large-scale Underwater Networks

Capacity scaling laws are analyzed in an underwater acoustic network with $n$ regularly located nodes on a square, in which both bandwidth and received signal power can be limited significantly. A narrow-band model is assumed where the carrier frequency is allowed to scale as a function of $n$. In the network, we characterize an attenuation parameter that depends on the frequency scaling as well as the transmission distance. Cut-set upper bounds on the throughput scaling are then derived in both extended and dense networks having unit node density and unit area, respectively. It is first analyzed that under extended networks, the upper bound is inversely proportional to the attenuation parameter, thus resulting in a highly power-limited network. Interestingly, it is seen that the upper bound for extended networks is intrinsically related to the attenuation parameter but not the spreading factor. On the other hand, in dense networks, we show that there exists either a bandwidth or power limitation, or both, according to the path-loss attenuation regimes, thus yielding the upper bound that has three fundamentally different operating regimes. Furthermore, we describe an achievable scheme based on the simple nearest-neighbor multi-hop (MH) transmission. We show that under extended networks, the MH scheme is order-optimal for all the operating regimes. An achievability result is also presented in dense networks, where the operating regimes that guarantee the order optimality are identified. It thus turns out that frequency scaling is instrumental towards achieving the order optimality in the regimes. Finally, these scaling results are extended to a random network realization. As a result, vital information for fundamental limits of a variety of underwater network scenarios is provided by showing capacity scaling laws.

preprint2010arXiv

Bi-directional half-duplex protocols with multiple relays

In a bi-directional relay channel, two nodes wish to exchange independent messages over a shared wireless half-duplex channel with the help of relays. Recent work has considered information theoretic limits of the bi-directional relay channel with a single relay. In this work we consider bi-directional relaying with multiple relays. We derive achievable rate regions and outer bounds for half-duplex protocols with multiple decode and forward relays and compare these to the same protocols with amplify and forward relays in an additive white Gaussian noise channel. We consider three novel classes of half-duplex protocols: the (m,2) 2 phase protocol with m relays, the (m,3) 3 phase protocol with m relays, and general (m, t) Multiple Hops and Multiple Relays (MHMR) protocols, where m is the total number of relays and 3<t< m+3 is the number of temporal phases in the protocol. The (m,2) and (m,3) protocols extend previous bi-directional relaying protocols for a single m=1 relay, while the new (m,t) protocol efficiently combines multi-hop routing with message-level network coding. Finally, we provide a comprehensive treatment of the MHMR protocols with decode and forward relaying and amplify and forward relaying in the Gaussian noise, obtaining their respective achievable rate regions, outer bounds and relative performance under different SNRs and relay geometries, including an analytical comparison on the protocols at low and high SNR.

preprint2010arXiv

Improved Capacity Scaling in Wireless Networks With Infrastructure

This paper analyzes the impact and benefits of infrastructure support in improving the throughput scaling in networks of $n$ randomly located wireless nodes. The infrastructure uses multi-antenna base stations (BSs), in which the number of BSs and the number of antennas at each BS can scale at arbitrary rates relative to $n$. Under the model, capacity scaling laws are analyzed for both dense and extended networks. Two BS-based routing schemes are first introduced in this study: an infrastructure-supported single-hop (ISH) routing protocol with multiple-access uplink and broadcast downlink and an infrastructure-supported multi-hop (IMH) routing protocol. Then, their achievable throughput scalings are analyzed. These schemes are compared against two conventional schemes without BSs: the multi-hop (MH) transmission and hierarchical cooperation (HC) schemes. It is shown that a linear throughput scaling is achieved in dense networks, as in the case without help of BSs. In contrast, the proposed BS-based routing schemes can, under realistic network conditions, improve the throughput scaling significantly in extended networks. The gain comes from the following advantages of these BS-based protocols. First, more nodes can transmit simultaneously in the proposed scheme than in the MH scheme if the number of BSs and the number of antennas are large enough. Second, by improving the long-distance signal-to-noise ratio (SNR), the received signal power can be larger than that of the HC, enabling a better throughput scaling under extended networks. Furthermore, by deriving the corresponding information-theoretic cut-set upper bounds, it is shown under extended networks that a combination of four schemes IMH, ISH, MH, and HC is order-optimal in all operating regimes.

preprint2010arXiv

On Capacity Scaling of Underwater Networks: An Information-Theoretic Perspective

Capacity scaling laws are analyzed in an underwater acoustic network with $n$ regularly located nodes on a square. A narrow-band model is assumed where the carrier frequency is allowed to scale as a function of $n$. In the network, we characterize an attenuation parameter that depends on the frequency scaling as well as the transmission distance. A cut-set upper bound on the throughput scaling is then derived in extended networks. Our result indicates that the upper bound is inversely proportional to the attenuation parameter, thus resulting in a highly power-limited network. Interestingly, it is seen that unlike the case of wireless radio networks, our upper bound is intrinsically related to the attenuation parameter but not the spreading factor. Furthermore, we describe an achievable scheme based on the simple nearest neighbor multi-hop (MH) transmission. It is shown under extended networks that the MH scheme is order-optimal as the attenuation parameter scales exponentially with $\sqrt{n}$ (or faster). Finally, these scaling results are extended to a random network realization.

preprint2009arXiv

Cognitive Networks Achieve Throughput Scaling of a Homogeneous Network

We study two distinct, but overlapping, networks that operate at the same time, space, and frequency. The first network consists of $n$ randomly distributed \emph{primary users}, which form either an ad hoc network, or an infrastructure-supported ad hoc network with $l$ additional base stations. The second network consists of $m$ randomly distributed, ad hoc secondary users or cognitive users. The primary users have priority access to the spectrum and do not need to change their communication protocol in the presence of secondary users. The secondary users, however, need to adjust their protocol based on knowledge about the locations of the primary nodes to bring little loss to the primary network's throughput. By introducing preservation regions around primary receivers and avoidance regions around primary base stations, we propose two modified multihop routing protocols for the cognitive users. Base on percolation theory, we show that when the secondary network is denser than the primary network, both networks can simultaneously achieve the same throughput scaling law as a stand-alone network. Furthermore, the primary network throughput is subject to only a vanishingly fractional loss. Specifically, for the ad hoc and the infrastructure-supported primary models, the primary network achieves sum throughputs of order $n^{1/2}$ and $\max\{n^{1/2},l\}$, respectively. For both primary network models, for any $δ>0$, the secondary network can achieve sum throughput of order $m^{1/2-δ}$ with an arbitrarily small fraction of outage. Thus, almost all secondary source-destination pairs can communicate at a rate of order $m^{-1/2-δ}$.

preprint2007arXiv

Rate of Channel Hardening of Antenna Selection Diversity Schemes and Its Implication on Scheduling

For a multiple antenna system, we compute the asymptotic distribution of antenna selection gain when the transmitter selects the transmit antenna with the strongest channel. We use this to asymptotically estimate the underlying channel capacity distributions, and demonstrate that unlike multiple-input/multiple-output (MIMO) systems, the channel for antenna selection systems hardens at a slower rate, and thus a significant multiuser scheduling gain can exist - O(1/ log m) for channel selection as opposed to O(1/ sqrt{m}) for MIMO, where m is the number of transmit antennas. Additionally, even without this scheduling gain, it is demonstrated that transmit antenna selection systems outperform open loop MIMO systems in low signal-to-interference-plus-noise ratio (SINR) regimes, particularly for a small number of receive antennas. This may have some implications on wireless system design, because most of the users in modern wireless systems have low SINRs

preprint2007arXiv

Scaling Laws of Cognitive Networks

We consider a cognitive network consisting of n random pairs of cognitive transmitters and receivers communicating simultaneously in the presence of multiple primary users. Of interest is how the maximum throughput achieved by the cognitive users scales with n. Furthermore, how far these users must be from a primary user to guarantee a given primary outage. Two scenarios are considered for the network scaling law: (i) when each cognitive transmitter uses constant power to communicate with a cognitive receiver at a bounded distance away, and (ii) when each cognitive transmitter scales its power according to the distance to a considered primary user, allowing the cognitive transmitter-receiver distances to grow. Using single-hop transmission, suitable for cognitive devices of opportunistic nature, we show that, in both scenarios, with path loss larger than 2, the cognitive network throughput scales linearly with the number of cognitive users. We then explore the radius of a primary exclusive region void of cognitive transmitters. We obtain bounds on this radius for a given primary outage constraint. These bounds can help in the design of a primary network with exclusive regions, outside of which cognitive users may transmit freely. Our results show that opportunistic secondary spectrum access using single-hop transmission is promising.

preprint2005arXiv

Collaborative Beamforming for Distributed Wireless Ad Hoc Sensor Networks

The performance of collaborative beamforming is analyzed using the theory of random arrays. The statistical average and distribution of the beampattern of randomly generated phased arrays is derived in the framework of wireless ad hoc sensor networks. Each sensor node is assumed to have a single isotropic antenna and nodes in the cluster collaboratively transmit the signal such that the signal in the target direction is coherently added in the far- eld region. It is shown that with N sensor nodes uniformly distributed over a disk, the directivity can approach N, provided that the nodes are located sparsely enough. The distribution of the maximum sidelobe peak is also studied. With the application to ad hoc networks in mind, two scenarios, closed-loop and open-loop, are considered. Associated with these scenarios, the effects of phase jitter and location estimation errors on the average beampattern are also analyzed.

Vahid Tarokh

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

Fisher Task Distance and Its Application in Neural Architecture Search

Improved Automated Machine Learning from Transfer Learning

Modeling Extremes with d-max-decreasing Neural Networks

Multi-Agent Adversarial Attacks for Multi-Channel Communications

On The Energy Statistics of Feature Maps in Pruning of Neural Networks with Skip-Connections

Semi-Empirical Objective Functions for MCMC Proposal Optimization

Task Affinity with Maximum Bipartite Matching in Few-Shot Learning

Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers

Generative Archimedean Copulas

GeoStat Representations of Time Series for Fast Classification

Projected Latent Markov Chain Monte Carlo: Conditional Sampling of Normalizing Flows

Convergence Rate of Empirical Spectral Distribution of Random Matrices from Linear Codes

Cross-subject Decoding of Eye Movement Goals from Local Field Potentials

Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization

Robust Marine Buoy Placement for Ship Detection Using Dropout K-Means

Speech Emotion Recognition with Dual-Sequence LSTM Architecture

SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

Deep Clustering of Compressed Variational Embeddings

DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression

Restricted Recurrent Neural Networks

Supervised Encoding for Discrete Representation Learning

Bridging AIC and BIC: a new criterion for autoregression

Distributed Resource Allocation Using One-Way Communication with Applications to Power Networks

Robustness Analysis for an Online Decentralized Descent Power allocation algorithm

Comment on "Asymptotic Achievability of the Cramér-Rao Bound for Noisy Compressive Sampling"

Complementary Lattice Arrays for Coded Aperture Imaging

Data-Driven Learning of the Number of States in Multi-State Autoregressive Models

Key Pre-Distributions From Graph-Based Block Designs

Learning the Number of Autoregressive Mixtures in Time Series Using the Gap Statistics

New Conditions for Sparse Phase Retrieval

On the Order Optimality of Large-scale Underwater Networks

Bi-directional half-duplex protocols with multiple relays

Improved Capacity Scaling in Wireless Networks With Infrastructure

On Capacity Scaling of Underwater Networks: An Information-Theoretic Perspective

Cognitive Networks Achieve Throughput Scaling of a Homogeneous Network

Rate of Channel Hardening of Antenna Selection Diversity Schemes and Its Implication on Scheduling

Scaling Laws of Cognitive Networks

Collaborative Beamforming for Distributed Wireless Ad Hoc Sensor Networks