Researcher profile

Arash Behboodi

Arash Behboodi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by iteratively updating internal representations while retaining a standard Key-Value (KV) cache across iterations, causing memory consumption to grow linearly with reasoning depth. Consequently, increasing the number of reasoning iterations can lead to prohibitive memory usage, limiting the practical scalability of such architectures. In this work, we propose Memory-Efficient Looped Transformer (MELT), a novel architecture that decouples reasoning depth from memory consumption. Instead of using a standard KV cache per layer and loop, MELT maintains a single KV cache per layer that is shared across reasoning loops. This cache is updated over time via a learnable gating mechanism. To enable stable and efficient training under this architecture, we propose to train MELT using chunk-wise training in a two phase procedure: interpolated transition, followed by attention-aligned distillation, both from the LoopLM starting model to MELT. Empirically, we show that MELT models fine-tuned from pretrained Ouro parameters outperform standard LLMs of comparable size, while maintaining a memory footprint comparable to those models and dramatically smaller than Ouro's. Overall, MELT achieves constant-memory iterative reasoning without sacrificing LoopLM performance, using only a lightweight post-training procedure.

preprint2022arXiv

Deep Learning-based Channel Estimation for Wideband Hybrid MmWave Massive MIMO

Hybrid analog-digital (HAD) architecture is widely adopted in practical millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems to reduce hardware cost and energy consumption. However, channel estimation in the context of HAD is challenging due to only limited radio frequency (RF) chains at transceivers. Although various compressive sensing (CS) algorithms have been developed to solve this problem by exploiting inherent channel sparsity and sparsity structures, practical effects, such as power leakage and beam squint, can still make the real channel features deviate from the assumed models and result in performance degradation. Also, the high complexity of CS algorithms caused by a large number of iterations hinders their applications in practice. To tackle these issues, we develop a deep learning (DL)-based channel estimation approach where the sparse Bayesian learning (SBL) algorithm is unfolded into a deep neural network (DNN). In each SBL layer, Gaussian variance parameters of the sparse angular domain channel are updated by a tailored DNN, which is able to effectively capture complicated channel sparsity structures in various domains. Besides, the measurement matrix is jointly optimized for performance improvement. Then, the proposed approach is extended to the multi-block case where channel correlation in time is further exploited to adaptively predict the measurement matrix and facilitate the update of Gaussian variance parameters. Based on simulation results, the proposed approaches significantly outperform existing approaches but with reduced complexity.

preprint2022arXiv

Equivariant Priors for Compressed Sensing with Unknown Orientation

In compressed sensing, the goal is to reconstruct the signal from an underdetermined system of linear measurements. Thus, prior knowledge about the signal of interest and its structure is required. Additionally, in many scenarios, the signal has an unknown orientation prior to measurements. To address such recovery problems, we propose using equivariant generative models as a prior, which encapsulate orientation information in their latent space. Thereby, we show that signals with unknown orientations can be recovered with iterative gradient descent on the latent space of these models and provide additional theoretical recovery guarantees. We construct an equivariant variational autoencoder and use the decoder as generative prior for compressed sensing. We discuss additional potential gains of the proposed approach in terms of convergence and latency.

preprint2022arXiv

Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks

Motivated by the learned iterative soft thresholding algorithm (LISTA), we introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements. By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types, ranging from recurrent ones to networks more similar to standard feedforward neural networks. Based on training samples, via empirical risk minimization we aim at learning the optimal network parameters and thereby the optimal network that reconstructs signals from their low-dimensional linear measurements. We derive generalization bounds by analyzing the Rademacher complexity of hypothesis classes consisting of such deep networks, that also take into account the thresholding parameters. We obtain estimates of the sample complexity that essentially depend only linearly on the number of parameters and on the depth. We apply our main result to obtain specific generalization bounds for several practical examples, including different algorithms for (implicit) dictionary learning, and convolutional neural networks.

preprint2022arXiv

Learning Perturbations for Soft-Output Linear MIMO Demappers

Tree-based demappers for multiple-input multiple-output (MIMO) detection such as the sphere decoder can achieve near-optimal performance but incur high computational cost due to their sequential nature. In this paper, we propose the perturbed linear demapper (PLM), which is a novel data-driven model for computing soft outputs in parallel. To achieve this, the PLM learns a distribution centered on an initial linear estimate and a log-likelihood ratio clipping parameter using end-to-end Bayesian optimization. Furthermore, we show that lattice-reduction can be naturally incorporated into the PLM pipeline, which allows to trade off computational cost against coded block error rate reduction. We find that the optimized PLM can achieve near maximum-likelihood (ML) performance in Rayleigh channels, making it an efficient alternative to tree-based demappers.

preprint2022arXiv

MIMO-GAN: Generative MIMO Channel Modeling

We propose generative channel modeling to learn statistical channel models from channel input-output measurements. Generative channel models can learn more complicated distributions and represent the field data more faithfully. They are tractable and easy to sample from, which can potentially speed up the simulation rounds. To achieve this, we leverage advances in GAN, which helps us learn an implicit distribution over stochastic MIMO channels from observed measurements. In particular, our approach MIMO-GAN implicitly models the wireless channel as a distribution of time-domain band-limited impulse responses. We evaluate MIMO-GAN on 3GPP TDL MIMO channels and observe high-consistency in capturing power, delay and spatial correlation statistics of the underlying channel. In particular, we observe MIMO-GAN achieve errors of under 3.57 ns average delay and -18.7 dB power.

preprint2022arXiv

Neural RF SLAM for unsupervised positioning and mapping with channel state information

We present a neural network architecture for jointly learning user locations and environment mapping up to isometry, in an unsupervised way, from channel state information (CSI) values with no location information. The model is based on an encoder-decoder architecture. The encoder network maps CSI values to the user location. The decoder network models the physics of propagation by parametrizing the environment using virtual anchors. It aims at reconstructing, from the encoder output and virtual anchor location, the set of time of flights (ToFs) that are extracted from CSI using super-resolution methods. The neural network task is set prediction and is accordingly trained end-to-end. The proposed model learns an interpretable latent, i.e., user location, by just enforcing a physics-based decoder. It is shown that the proposed model achieves sub-meter accuracy on synthetic ray tracing based datasets with single anchor SISO setup while recovering the environment map up to 4cm median error in a 2D environment and 15cm in a 3D environment

preprint2022arXiv

Position Aided Beam Prediction in the Real World: How Useful GPS Locations Actually Are?

Millimeter-wave (mmWave) communication systems rely on narrow beams for achieving sufficient receive signal power. Adjusting these beams is typically associated with large training overhead, which becomes particularly critical for highly-mobile applications. Intuitively, since optimal beam selection can benefit from the knowledge of the positions of communication terminals, there has been increasing interest in leveraging position data to reduce the overhead in mmWave beam prediction. Prior work, however, studied this problem using only synthetic data that generally does not accurately represent real-world measurements. In this paper, we investigate position-aided beam prediction using a real-world large-scale dataset to derive insights into precisely how much overhead can be saved in practice. Furthermore, we analyze which machine learning algorithms perform best, what factors degrade inference performance in real data, and which machine learning metrics are more meaningful in capturing the actual communication system performance.

preprint2022arXiv

The Restricted Isometry Property of Block Diagonal Matrices for Group-Sparse Signal Recovery

Group-sparsity is a common low-complexity signal model with widespread application across various domains of science and engineering. The recovery of such signal ensembles from compressive measurements has been extensively studied in the literature under the assumption that measurement operators are modeled as densely populated random matrices. In this paper, we turn our attention to an acquisition model intended to ease the energy consumption of sensing devices by splitting the measurements up into distinct signal blocks. More precisely, we present uniform guarantees for group-sparse signal recovery in the scenario where a number of sensors obtain independent partial signal observations modeled by block diagonal measurement matrices. We establish a group-sparse variant of the classical restricted isometry property for block diagonal sensing matrices acting on group-sparse vectors, and provide conditions under which subgaussian block diagonal random matrices satisfy this group-RIP with high probability. Two different scenarios are considered in particular. In the first scenario, we assume that each sensor is equipped with an independently drawn measurement matrix. We later lift this requirement by considering measurement matrices with constant block diagonal entries. In other words, every sensor is equipped with a copy of the same prototype matrix. The problem of establishing the group-RIP is cast into a form in which one needs to establish the concentration behavior of the suprema of chaos processes which involves estimating Talagrand's $γ_2$ functional. As a side effect of the proof, we present an extension to Maurey's empirical method to provide new bounds on the covering number of sets consisting of finite convex combinations of possibly infinite sets.

preprint2021arXiv

Neural Augmentation of Kalman Filter with Hypernetwork for Channel Tracking

We propose Hypernetwork Kalman Filter (HKF) for tracking applications with multiple different dynamics. The HKF combines generalization power of Kalman filters with expressive power of neural networks. Instead of keeping a bank of Kalman filters and choosing one based on approximating the actual dynamics, HKF adapts itself to each dynamics based on the observed sequence. Through extensive experiments on CDL-B channel model, we show that the HKF can be used for tracking the channel over a wide range of Doppler values, matching Kalman filter performance with genie Doppler information. At high Doppler values, it achieves around 2dB gain over genie Kalman filter. The HKF generalizes well to unseen Doppler, SNR values and pilot patterns unlike LSTM, which suffers from severe performance degradation.

preprint2020arXiv

Gradient $\ell_1$ Regularization for Quantization Robustness

We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for "on the fly'' post-training quantization to various bit-widths. We show that by modeling quantization as a $\ell_\infty$-bounded perturbation, the first-order term in the loss expansion can be regularized using the $\ell_1$-norm of gradients. We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and ImageNet datasets.

preprint2020arXiv

Sensing Matrix Design and Sparse Recovery on the Sphere and the Rotation Group

In this paper, {the goal is to design deterministic sampling patterns on the sphere and the rotation group} and, thereby, construct sensing matrices for sparse recovery of band-limited functions. It is first shown that random sensing matrices, which consists of random samples of Wigner D-functions, satisfy the Restricted Isometry Property (RIP) with proper preconditioning and can be used for sparse recovery on the rotation group. The mutual coherence, however, is used to assess the performance of deterministic and regular sensing matrices. We show that many of widely used regular sampling patterns yield sensing matrices with the worst possible mutual coherence, and therefore are undesirable for sparse recovery. Using tools from angular momentum analysis in quantum mechanics, we provide a new expression for the mutual coherence, which encourages the use of regular elevation samples. We construct low coherence deterministic matrices by fixing the regular samples on the elevation and minimizing the mutual coherence over the azimuth-polarization choice. It is shown that once the elevation sampling is fixed, the mutual coherence has a lower bound that depends only on the elevation samples. This lower bound, however, can be achieved for spherical harmonics, which leads to new sensing matrices with better coherence than other representative regular sampling patterns. This is reflected as well in our numerical experiments where our proposed sampling patterns perfectly match the phase transition of random sampling patterns.