Source author record

Arash Behboodi

Arash Behboodi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning eess.SP Artificial Intelligence Computation and Language

Catalog footprint

What is connected

17works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by iteratively updating internal representations while retaining a standard Key-Value (KV) cache across iterations, causing memory consumption to grow linearly with reasoning depth. Consequently, increasing the number of reasoning iterations can lead to prohibitive memory usage, limiting the practical scalability of such architectures. In this work, we propose Memory-Efficient Looped Transformer (MELT), a novel architecture that decouples reasoning depth from memory consumption. Instead of using a standard KV cache per layer and loop, MELT maintains a single KV cache per layer that is shared across reasoning loops. This cache is updated over time via a learnable gating mechanism. To enable stable and efficient training under this architecture, we propose to train MELT using chunk-wise training in a two phase procedure: interpolated transition, followed by attention-aligned distillation, both from the LoopLM starting model to MELT. Empirically, we show that MELT models fine-tuned from pretrained Ouro parameters outperform standard LLMs of comparable size, while maintaining a memory footprint comparable to those models and dramatically smaller than Ouro's. Overall, MELT achieves constant-memory iterative reasoning without sacrificing LoopLM performance, using only a lightweight post-training procedure.

preprint2022arXiv

Deep Learning-based Channel Estimation for Wideband Hybrid MmWave Massive MIMO

Hybrid analog-digital (HAD) architecture is widely adopted in practical millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems to reduce hardware cost and energy consumption. However, channel estimation in the context of HAD is challenging due to only limited radio frequency (RF) chains at transceivers. Although various compressive sensing (CS) algorithms have been developed to solve this problem by exploiting inherent channel sparsity and sparsity structures, practical effects, such as power leakage and beam squint, can still make the real channel features deviate from the assumed models and result in performance degradation. Also, the high complexity of CS algorithms caused by a large number of iterations hinders their applications in practice. To tackle these issues, we develop a deep learning (DL)-based channel estimation approach where the sparse Bayesian learning (SBL) algorithm is unfolded into a deep neural network (DNN). In each SBL layer, Gaussian variance parameters of the sparse angular domain channel are updated by a tailored DNN, which is able to effectively capture complicated channel sparsity structures in various domains. Besides, the measurement matrix is jointly optimized for performance improvement. Then, the proposed approach is extended to the multi-block case where channel correlation in time is further exploited to adaptively predict the measurement matrix and facilitate the update of Gaussian variance parameters. Based on simulation results, the proposed approaches significantly outperform existing approaches but with reduced complexity.

preprint2022arXiv

Equivariant Priors for Compressed Sensing with Unknown Orientation

In compressed sensing, the goal is to reconstruct the signal from an underdetermined system of linear measurements. Thus, prior knowledge about the signal of interest and its structure is required. Additionally, in many scenarios, the signal has an unknown orientation prior to measurements. To address such recovery problems, we propose using equivariant generative models as a prior, which encapsulate orientation information in their latent space. Thereby, we show that signals with unknown orientations can be recovered with iterative gradient descent on the latent space of these models and provide additional theoretical recovery guarantees. We construct an equivariant variational autoencoder and use the decoder as generative prior for compressed sensing. We discuss additional potential gains of the proposed approach in terms of convergence and latency.

preprint2022arXiv

Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks

Motivated by the learned iterative soft thresholding algorithm (LISTA), we introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements. By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types, ranging from recurrent ones to networks more similar to standard feedforward neural networks. Based on training samples, via empirical risk minimization we aim at learning the optimal network parameters and thereby the optimal network that reconstructs signals from their low-dimensional linear measurements. We derive generalization bounds by analyzing the Rademacher complexity of hypothesis classes consisting of such deep networks, that also take into account the thresholding parameters. We obtain estimates of the sample complexity that essentially depend only linearly on the number of parameters and on the depth. We apply our main result to obtain specific generalization bounds for several practical examples, including different algorithms for (implicit) dictionary learning, and convolutional neural networks.

preprint2022arXiv

Learning Perturbations for Soft-Output Linear MIMO Demappers

Tree-based demappers for multiple-input multiple-output (MIMO) detection such as the sphere decoder can achieve near-optimal performance but incur high computational cost due to their sequential nature. In this paper, we propose the perturbed linear demapper (PLM), which is a novel data-driven model for computing soft outputs in parallel. To achieve this, the PLM learns a distribution centered on an initial linear estimate and a log-likelihood ratio clipping parameter using end-to-end Bayesian optimization. Furthermore, we show that lattice-reduction can be naturally incorporated into the PLM pipeline, which allows to trade off computational cost against coded block error rate reduction. We find that the optimized PLM can achieve near maximum-likelihood (ML) performance in Rayleigh channels, making it an efficient alternative to tree-based demappers.

preprint2022arXiv

MIMO-GAN: Generative MIMO Channel Modeling

We propose generative channel modeling to learn statistical channel models from channel input-output measurements. Generative channel models can learn more complicated distributions and represent the field data more faithfully. They are tractable and easy to sample from, which can potentially speed up the simulation rounds. To achieve this, we leverage advances in GAN, which helps us learn an implicit distribution over stochastic MIMO channels from observed measurements. In particular, our approach MIMO-GAN implicitly models the wireless channel as a distribution of time-domain band-limited impulse responses. We evaluate MIMO-GAN on 3GPP TDL MIMO channels and observe high-consistency in capturing power, delay and spatial correlation statistics of the underlying channel. In particular, we observe MIMO-GAN achieve errors of under 3.57 ns average delay and -18.7 dB power.

preprint2022arXiv

Neural RF SLAM for unsupervised positioning and mapping with channel state information

We present a neural network architecture for jointly learning user locations and environment mapping up to isometry, in an unsupervised way, from channel state information (CSI) values with no location information. The model is based on an encoder-decoder architecture. The encoder network maps CSI values to the user location. The decoder network models the physics of propagation by parametrizing the environment using virtual anchors. It aims at reconstructing, from the encoder output and virtual anchor location, the set of time of flights (ToFs) that are extracted from CSI using super-resolution methods. The neural network task is set prediction and is accordingly trained end-to-end. The proposed model learns an interpretable latent, i.e., user location, by just enforcing a physics-based decoder. It is shown that the proposed model achieves sub-meter accuracy on synthetic ray tracing based datasets with single anchor SISO setup while recovering the environment map up to 4cm median error in a 2D environment and 15cm in a 3D environment

preprint2022arXiv

Position Aided Beam Prediction in the Real World: How Useful GPS Locations Actually Are?

Millimeter-wave (mmWave) communication systems rely on narrow beams for achieving sufficient receive signal power. Adjusting these beams is typically associated with large training overhead, which becomes particularly critical for highly-mobile applications. Intuitively, since optimal beam selection can benefit from the knowledge of the positions of communication terminals, there has been increasing interest in leveraging position data to reduce the overhead in mmWave beam prediction. Prior work, however, studied this problem using only synthetic data that generally does not accurately represent real-world measurements. In this paper, we investigate position-aided beam prediction using a real-world large-scale dataset to derive insights into precisely how much overhead can be saved in practice. Furthermore, we analyze which machine learning algorithms perform best, what factors degrade inference performance in real data, and which machine learning metrics are more meaningful in capturing the actual communication system performance.

preprint2022arXiv

The Restricted Isometry Property of Block Diagonal Matrices for Group-Sparse Signal Recovery

Group-sparsity is a common low-complexity signal model with widespread application across various domains of science and engineering. The recovery of such signal ensembles from compressive measurements has been extensively studied in the literature under the assumption that measurement operators are modeled as densely populated random matrices. In this paper, we turn our attention to an acquisition model intended to ease the energy consumption of sensing devices by splitting the measurements up into distinct signal blocks. More precisely, we present uniform guarantees for group-sparse signal recovery in the scenario where a number of sensors obtain independent partial signal observations modeled by block diagonal measurement matrices. We establish a group-sparse variant of the classical restricted isometry property for block diagonal sensing matrices acting on group-sparse vectors, and provide conditions under which subgaussian block diagonal random matrices satisfy this group-RIP with high probability. Two different scenarios are considered in particular. In the first scenario, we assume that each sensor is equipped with an independently drawn measurement matrix. We later lift this requirement by considering measurement matrices with constant block diagonal entries. In other words, every sensor is equipped with a copy of the same prototype matrix. The problem of establishing the group-RIP is cast into a form in which one needs to establish the concentration behavior of the suprema of chaos processes which involves estimating Talagrand's $γ_2$ functional. As a side effect of the proof, we present an extension to Maurey's empirical method to provide new bounds on the covering number of sets consisting of finite convex combinations of possibly infinite sets.

preprint2021arXiv

Neural Augmentation of Kalman Filter with Hypernetwork for Channel Tracking

We propose Hypernetwork Kalman Filter (HKF) for tracking applications with multiple different dynamics. The HKF combines generalization power of Kalman filters with expressive power of neural networks. Instead of keeping a bank of Kalman filters and choosing one based on approximating the actual dynamics, HKF adapts itself to each dynamics based on the observed sequence. Through extensive experiments on CDL-B channel model, we show that the HKF can be used for tracking the channel over a wide range of Doppler values, matching Kalman filter performance with genie Doppler information. At high Doppler values, it achieves around 2dB gain over genie Kalman filter. The HKF generalizes well to unseen Doppler, SNR values and pilot patterns unlike LSTM, which suffers from severe performance degradation.

preprint2020arXiv

Gradient $\ell_1$ Regularization for Quantization Robustness

We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for "on the fly'' post-training quantization to various bit-widths. We show that by modeling quantization as a $\ell_\infty$-bounded perturbation, the first-order term in the loss expansion can be regularized using the $\ell_1$-norm of gradients. We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and ImageNet datasets.

preprint2020arXiv

Sensing Matrix Design and Sparse Recovery on the Sphere and the Rotation Group

In this paper, {the goal is to design deterministic sampling patterns on the sphere and the rotation group} and, thereby, construct sensing matrices for sparse recovery of band-limited functions. It is first shown that random sensing matrices, which consists of random samples of Wigner D-functions, satisfy the Restricted Isometry Property (RIP) with proper preconditioning and can be used for sparse recovery on the rotation group. The mutual coherence, however, is used to assess the performance of deterministic and regular sensing matrices. We show that many of widely used regular sampling patterns yield sensing matrices with the worst possible mutual coherence, and therefore are undesirable for sparse recovery. Using tools from angular momentum analysis in quantum mechanics, we provide a new expression for the mutual coherence, which encourages the use of regular elevation samples. We construct low coherence deterministic matrices by fixing the regular samples on the elevation and minimizing the mutual coherence over the azimuth-polarization choice. It is shown that once the elevation sampling is fixed, the mutual coherence has a lower bound that depends only on the elevation samples. This lower bound, however, can be achieved for spherical harmonics, which leads to new sensing matrices with better coherence than other representative regular sampling patterns. This is reflected as well in our numerical experiments where our proposed sampling patterns perfectly match the phase transition of random sampling patterns.

preprint2014arXiv

Mixed Noisy Network Coding and Cooperative Unicasting in Wireless Networks

The problem of communicating a single message to a destination in presence of multiple relay nodes, referred to as cooperative unicast network, is considered. First, we introduce "Mixed Noisy Network Coding" (MNNC) scheme which generalizes "Noisy Network Coding" (NNC) where relays are allowed to decode-and-forward (DF) messages while all of them (without exception) transmit noisy descriptions of their observations. These descriptions are exploited at the destination and the DF relays aim to decode the transmitted messages while creating full cooperation among the nodes. Moreover, the destination and the DF relays can independently select the set of descriptions to be decoded or treated as interference. This concept is further extended to multi-hopping scenarios, referred to as "Layered MNNC" (LMNNC), where DF relays are organized into disjoint groups representing one hop in the network. For cooperative unicast additive white Gaussian noise (AWGN) networks we show that -provided DF relays are properly chosen- MNNC improves over all previously established constant gaps to the cut-set bound. Secondly, we consider the composite cooperative unicast network where the channel parameters are randomly drawn before communication starts and remain fixed during the transmission. Each draw is assumed to be unknown at the source and fully known at the destination but only partly known at the relays. We introduce through MNNC scheme the concept of "Selective Coding Strategy" (SCS) that enables relays to decide dynamically whether, in addition to communicate noisy descriptions, is possible to decode and forward messages. It is demonstrated through slow-fading AWGN relay networks that SCS clearly outperforms conventional coding schemes.

preprint2012arXiv

Cooperative Strategies for Simultaneous and Broadcast Relay Channels

Consider the \emph{simultaneous relay channel} (SRC) which consists of a set of relay channels where the source wishes to transmit common and private information to each of the destinations. This problem is recognized as being equivalent to that of sending common and private information to several destinations in presence of helper relays where each channel outcome becomes a branch of the \emph{broadcast relay channel} (BRC). Cooperative schemes and capacity region for a set with two memoryless relay channels are investigated. The proposed coding schemes, based on \emph{Decode-and-Forward} (DF) and \emph{Compress-and-Forward} (CF) must be capable of transmitting information simultaneously to all destinations in such set. Depending on the quality of source-to-relay and relay-to-destination channels, inner bounds on the capacity of the general BRC are derived. Three cases of particular interest are considered: cooperation is based on DF strategy for both users --referred to as DF-DF region--, cooperation is based on CF strategy for both users --referred to as CF-CF region--, and cooperation is based on DF strategy for one destination and CF for the other --referred to as DF-CF region--. These results can be seen as a generalization and hence unification of previous works. An outer-bound on the capacity of the general BRC is also derived. Capacity results are obtained for the specific cases of semi-degraded and degraded Gaussian simultaneous relay channels. Rates are evaluated for Gaussian models where the source must guarantee a minimum amount of information to both users while additional information is sent to each of them.

preprint2012arXiv

Selective Coding Strategy for Unicast Composite Networks

Consider a composite unicast relay network where the channel statistic is randomly drawn from a set of conditional distributions indexed by a random variable, which is assumed to be unknown at the source, fully known at the destination and only partly known at the relays. Commonly, the coding strategy at each relay is fixed regardless of its channel measurement. A novel coding for unicast composite networks with multiple relays is introduced. This enables the relays to select dynamically --based on its channel measurement-- the best coding scheme between compress-and-forward (CF) and decode-and-forward (DF). As a part of the main result, a generalization of Noisy Network Coding is shown for the case of unicast general networks where the relays are divided between those using DF and CF coding. Furthermore, the relays using DF scheme can exploit the help of those based on CF scheme via offset coding. It is demonstrated via numerical results that this novel coding, referred to as Selective Coding Strategy (SCS), outperforms conventional coding schemes.

preprint2010arXiv

Broadcasting over the Relay Channel with Oblivious Cooperative Strategy

This paper investigates the problem of information transmission over the simultaneous relay channel with two users (or two possible channel outcomes) where for one of them the more suitable strategy is Decode-and-Forward (DF) while for the other one is Compress-and-Forward (CF). In this setting, it is assumed that the source wishes to send common and private informations to each of the users (or channel outcomes). This problem is relevant to: (i) the transmission of information over the broadcast relay channel (BRC) with different relaying strategies and (ii) the transmission of information over the conventional relay channel where the source is oblivious to the coding strategy of relay. A novel coding that integrates simultaneously DF and CF schemes is proposed and an inner bound on the capacity region is derived for the case of general memoryless BRCs. As special case, the Gaussian BRC is studied where it is shown that by means of the suggested broadcast coding the common rate can be improved compared to existing strategies. Applications of these results arise in broadcast scenarios with relays or in wireless scenarios where the source does not know whether the relay is collocated with the source or with the destination.

preprint2010arXiv

Capacity of a Class of Broadcast Relay Channels

Consider the broadcast relay channel (BRC) which consists of a source sending information over a two user broadcast channel in presence of two relay nodes that help the transmission to the destinations. Clearly, this network with five nodes involves all the problems encountered in relay and broadcast channels. New inner bounds on the capacity region of this class of channels are derived. These results can be seen as a generalization and hence unification of previous work in this topic. Our bounds are based on the idea of recombination of message bits and various effective coding strategies for relay and broadcast channels. Capacity result is obtained for the semi-degraded BRC-CR, where one relay channel is degraded while the other one is reversely degraded. An inner and upper bound is also presented for the degraded BRC with common relay (BRC-CR), where both the relay and broadcast channel are degraded which is the capacity for the Gaussian case. Application of these results arise in the context of opportunistic cooperation of cellular networks.

Arash Behboodi

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

Deep Learning-based Channel Estimation for Wideband Hybrid MmWave Massive MIMO

Equivariant Priors for Compressed Sensing with Unknown Orientation

Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks

Learning Perturbations for Soft-Output Linear MIMO Demappers

MIMO-GAN: Generative MIMO Channel Modeling

Neural RF SLAM for unsupervised positioning and mapping with channel state information

Position Aided Beam Prediction in the Real World: How Useful GPS Locations Actually Are?

The Restricted Isometry Property of Block Diagonal Matrices for Group-Sparse Signal Recovery

Neural Augmentation of Kalman Filter with Hypernetwork for Channel Tracking

Gradient $\ell_1$ Regularization for Quantization Robustness

Sensing Matrix Design and Sparse Recovery on the Sphere and the Rotation Group

Mixed Noisy Network Coding and Cooperative Unicasting in Wireless Networks

Cooperative Strategies for Simultaneous and Broadcast Relay Channels

Selective Coding Strategy for Unicast Composite Networks

Broadcasting over the Relay Channel with Oblivious Cooperative Strategy

Capacity of a Class of Broadcast Relay Channels