Source author record

Mohammad Mohammadi Amiri

Mohammad Mohammadi Amiri appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Distributed, Parallel, and Cluster Computing Artificial Intelligence eess.SP Computation and Language Computational Engineering, Finance, and Science Cryptography and Security cs.CY

Catalog footprint

What is connected

15works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences

Recursive retraining of generative models poses a critical representation challenge: when synthetic outputs are curated based on a fixed reward signal, the model tends to collapse onto a narrow set of outputs that over-optimize that objective. Prior work suggests that such collapse is unavoidable without adding real data into the mix. We revisit this conclusion from an alignment perspective and show that collapse can be mitigated through curation based on multiple reward functions. We formalize the dynamics of recursive training under heterogeneous preferences and prove that, under certain conditions, the model converges to a stable distribution that allocates probability mass across competing high-reward regions. The limiting distribution preserves diversity and provably satisfies a weighted Nash bargaining solution, offering a formal interpretation of value aggregation in synthetic retraining loops.

preprint2026arXiv

MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling

Optimization modeling translates real decision-making problems into mathematical optimization models and solver-executable implementations. Although language models are increasingly used to generate optimization formulations and solver code, existing benchmarks are almost entirely text-only. This omits many optimization-modeling tasks that arise in operational practice, where requirements are described in text but instance information is conveyed through visual artifacts such as tables, graphs, maps, schedules, and dashboards. We introduce multimodal optimization modeling, a benchmark setting in which models must construct both a mathematical formulation and executable solver code from a text-and-visual problem specification. To evaluate this setting, we develop a solver-grounded framework that generates structured optimization instances, verifies each with an exact solver, and builds both the model-facing inputs and hidden reference files from the same verified source. We instantiate the framework as MM-OptBench, a benchmark of 780 solver-verified instances spanning 6 optimization families, 26 subcategories, and 3 structural difficulty levels. We evaluate 9 multimodal large language models (MLLMs), including 6 frontier general-purpose models and 3 math-specialized models, with aggregate, family-level, difficulty-level, and failure-mode analyses. The results show that the task remains far from solved: the best two models reach 52.1% and 51.3% pass@1, while on average across the six general-purpose MLLMs, pass@1 is 43.4% on easy instances and 15.9% on hard instances. All three math-specialized MLLMs solve 0/780 instances. Failure attribution shows that errors arise both when extracting instance data from text and visuals and when turning extracted data into solver-correct formulations and code. MM-OptBench provides a testbed for solver-grounded, decision-oriented multimodal intelligence.

preprint2026arXiv

RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs

Fine-tuning safety-aligned language models for downstream tasks often leads to substantial degradation of refusal behavior, making models vulnerable to adversarial misuse. While prior work has shown that safety-relevant features are encoded in structured representations within the model's activation space, how these representations change during fine-tuning and why alignment degrades remains poorly understood. In this work, we investigate the representation-level mechanisms underlying alignment degradation. Our analysis shows that standard fine-tuning induces systematic drift in safety-relevant representations, distorts their geometric structure, and introduces interference between task optimization and safety features. These effects collectively lead to increased harmful compliance. Motivated by these findings, we introduce REFUSALGUARD, a representation-level fine-tuning framework that preserves safety-relevant structure during model adaptation. Our approach constrains updates in hidden representation space, ensuring that safety-mediating components remain stable while allowing task-specific learning in complementary directions. We evaluate REFUSALGUARD across multiple model families, including LLaMA, Gemma, and Qwen, on adversarial safety benchmarks such as AdvBench, DirectHarm4, and JailbreakBench, as well as downstream utility tasks. Our approach achieves attack success rates comparable to base safety-aligned models while maintaining competitive task performance, significantly outperforming baselines.

preprint2022arXiv

Fundamentals of Task-Agnostic Data Valuation

We study valuing the data of a data owner/seller for a data seeker/buyer. Data valuation is often carried out for a specific task assuming a particular utility metric, such as test accuracy on a validation set, that may not exist in practice. In this work, we focus on task-agnostic data valuation without any validation requirements. The data buyer has access to a limited amount of data (which could be publicly available) and seeks more data samples from a data seller. We formulate the problem as estimating the differences in the statistical properties of the data at the seller with respect to the baseline data available at the buyer. We capture these statistical differences through second moment by measuring diversity and relevance of the seller's data for the buyer; we estimate these measures through queries to the seller without requesting raw data. We design the queries with the proposed approach so that the seller is blind to the buyer's raw data and has no knowledge to fabricate responses to queries to obtain a desired outcome of the diversity and relevance trade-off.We will show through extensive experiments on real tabular and image datasets that the proposed estimates capture the diversity and relevance of the seller's data for the buyer.

preprint2022arXiv

Wireless for Machine Learning

As data generation increasingly takes place on devices without a wired connection, machine learning (ML) related traffic will be ubiquitous in wireless networks. Many studies have shown that traditional wireless protocols are highly inefficient or unsustainable to support ML, which creates the need for new wireless communication methods. In this survey, we give an exhaustive review of the state-of-the-art wireless methods that are specifically designed to support ML services over distributed datasets. Currently, there are two clear themes within the literature, analog over-the-air computation and digital radio resource management optimized for ML. This survey gives a comprehensive introduction to these methods, reviews the most important works, highlights open problems, and discusses application scenarios.

preprint2020arXiv

A Compressive Sensing Approach for Federated Learning over Massive MIMO Communication Systems

Federated learning is a privacy-preserving approach to train a global model at a central server by collaborating with wireless devices, each with its own local training data set. In this paper, we present a compressive sensing approach for federated learning over massive multiple-input multiple-output communication systems in which the central server equipped with a massive antenna array communicates with the wireless devices. One major challenge in system design is to reconstruct local gradient vectors accurately at the central server, which are computed-and-sent from the wireless devices. To overcome this challenge, we first establish a transmission strategy to construct sparse transmitted signals from the local gradient vectors at the devices. We then propose a compressive sensing algorithm enabling the server to iteratively find the linear minimum-mean-square-error (LMMSE) estimate of the transmitted signal by exploiting its sparsity. We also derive an analytical threshold for the residual error at each iteration, to design the stopping criterion of the proposed algorithm. We show that for a sparse transmitted signal, the proposed algorithm requires less computationally complexity than LMMSE. Simulation results demonstrate that the presented approach outperforms conventional linear beamforming approaches and reduces the performance gap between federated learning and centralized learning with perfect reconstruction.

preprint2020arXiv

Convergence of Federated Learning over a Noisy Downlink

We study federated learning (FL), where power-limited wireless devices utilize their local datasets to collaboratively train a global model with the help of a remote parameter server (PS). The PS has access to the global model and shares it with the devices for local training, and the devices return the result of their local updates to the PS to update the global model. This framework requires downlink transmission from the PS to the devices and uplink transmission from the devices to the PS. The goal of this study is to investigate the impact of the bandwidth-limited shared wireless medium in both the downlink and uplink on the performance of FL with a focus on the downlink. To this end, the downlink and uplink channels are modeled as fading broadcast and multiple access channels, respectively, both with limited bandwidth. For downlink transmission, we first introduce a digital approach, where a quantization technique is employed at the PS to broadcast the global model update at a common rate such that all the devices can decode it. Next, we propose analog downlink transmission, where the global model is broadcast by the PS in an uncoded manner. We consider analog transmission over the uplink in both cases. We further analyze the convergence behavior of the proposed analog approach assuming that the uplink transmission is error-free. Numerical experiments show that the analog downlink approach provides significant improvement over the digital one, despite a significantly lower transmit power at the PS. The experimental results corroborate the convergence results, and show that a smaller number of local iterations should be used when the data distribution is more biased, and also when the devices have a better estimate of the global model in the analog downlink approach.

preprint2020arXiv

Convergence of Update Aware Device Scheduling for Federated Learning at the Wireless Edge

We study federated learning (FL) at the wireless edge, where power-limited devices with local datasets collaboratively train a joint model with the help of a remote parameter server (PS). We assume that the devices are connected to the PS through a bandwidth-limited shared wireless channel. At each iteration of FL, a subset of the devices are scheduled to transmit their local model updates to the PS over orthogonal channel resources, while each participating device must compress its model update to accommodate to its link capacity. We design novel scheduling and resource allocation policies that decide on the subset of the devices to transmit at each round, and how the resources should be allocated among the participating devices, not only based on their channel conditions, but also on the significance of their local model updates. We then establish convergence of a wireless FL algorithm with device scheduling, where devices have limited capacity to convey their messages. The results of numerical experiments show that the proposed scheduling policy, based on both the channel conditions and the significance of the local model updates, provides a better long-term performance than scheduling policies based only on either of the two metrics individually. Furthermore, we observe that when the data is independent and identically distributed (i.i.d.) across devices, selecting a single device at each round provides the best performance, while when the data distribution is non-i.i.d., scheduling multiple devices at each round improves the performance. This observation is verified by the convergence result, which shows that the number of scheduled devices should increase for a less diverse and more biased data distribution.

preprint2020arXiv

Federated Learning over Wireless Fading Channels

We study federated machine learning at the wireless network edge, where limited power wireless devices, each with its own dataset, build a joint model with the help of a remote parameter server (PS). We consider a bandwidth-limited fading multiple access channel (MAC) from the wireless devices to the PS, and propose various techniques to implement distributed stochastic gradient descent (DSGD). We first propose a digital DSGD (D-DSGD) scheme, in which one device is selected opportunistically for transmission at each iteration based on the channel conditions; the scheduled device quantizes its gradient estimate to a finite number of bits imposed by the channel condition, and transmits these bits to the PS in a reliable manner. Next, motivated by the additive nature of the wireless MAC, we propose a novel analog communication scheme, referred to as the compressed analog DSGD (CA-DSGD), where the devices first sparsify their gradient estimates while accumulating error, and project the resultant sparse vector into a low-dimensional vector for bandwidth reduction. Numerical results show that D-DSGD outperforms other digital approaches in the literature; however, in general the proposed CA-DSGD algorithm converges faster than the D-DSGD scheme and other schemes in the literature, and reaches a higher level of accuracy. We have observed that the gap between the analog and digital schemes increases when the datasets of devices are not independent and identically distributed (i.i.d.). Furthermore, the performance of the CA-DSGD scheme is shown to be robust against imperfect channel state information (CSI) at the devices. Overall these results show clear advantages for the proposed analog over-the-air DSGD scheme, which suggests that learning and communication algorithms should be designed jointly to achieve the best end-to-end performance in machine learning applications at the wireless edge.

preprint2020arXiv

Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air

We study federated machine learning (ML) at the wireless edge, where power- and bandwidth-limited wireless devices with local datasets carry out distributed stochastic gradient descent (DSGD) with the help of a remote parameter server (PS). Standard approaches assume separate computation and communication, where local gradient estimates are compressed and transmitted to the PS over orthogonal links. Following this digital approach, we introduce D-DSGD, in which the wireless devices employ gradient quantization and error accumulation, and transmit their gradient estimates to the PS over a multiple access channel (MAC). We then introduce a novel analog scheme, called A-DSGD, which exploits the additive nature of the wireless MAC for over-the-air gradient computation, and provide convergence analysis for this approach. In A-DSGD, the devices first sparsify their gradient estimates, and then project them to a lower dimensional space imposed by the available channel bandwidth. These projections are sent directly over the MAC without employing any digital code. Numerical results show that A-DSGD converges faster than D-DSGD thanks to its more efficient use of the limited bandwidth and the natural alignment of the gradient estimates over the channel. The improvement is particularly compelling at low power and low bandwidth regimes. We also illustrate for a classification problem that, A-DSGD is more robust to bias in data distribution across devices, while D-DSGD significantly outperforms other digital schemes in the literature. We also observe that both D-DSGD and A-DSGD perform better by increasing the number of devices (while keeping the total dataset size constant), showing their ability in harnessing the computation power of edge devices.

preprint2020arXiv

Multi-Antenna Coded Content Delivery with Caching: A Low-Complexity Solution

We study downlink beamforming in a single-cell network with a multi-antenna base station serving cache-enabled users. Assuming a library of files with a common rate, we formulate the minimum transmit power with proactive caching and coded delivery as a non-convex optimization problem. While this multiple multicast problem can be efficiently solved by successive convex approximation (SCA), the complexity of the problem grows exponentially with the number of subfiles delivered to each user in each time slot, which itself grows exponentially with the number of users. We introduce a low-complexity alternative through time-sharing that limits the number of subfiles received by a user in each time slot. We then consider the joint design of beamforming and content delivery with sparsity constraints to limit the number of subfiles received by a user in each time slot. Numerical simulations show that the low-complexity scheme has only a small performance gap to that obtained by solving the joint problem with sparsity constraints, and outperforms state-of-the-art results at all signal-to-noise ratio (SNR) and rate values with a sufficient number of transmit antennas. A lower bound on the achievable degrees-of-freedom (DoF) of the low-complexity scheme is derived to characterize its performance in the high SNR regime.

preprint2019arXiv

Computation Scheduling for Distributed Machine Learning with Straggling Workers

We study scheduling of computation tasks across n workers in a large scale distributed learning problem with the help of a master. Computation and communication delays are assumed to be random, and redundant computations are assigned to workers in order to tolerate stragglers. We consider sequential computation of tasks assigned to a worker, while the result of each computation is sent to the master right after its completion. Each computation round, which can model an iteration of the stochastic gradient descent (SGD) algorithm, is completed once the master receives k distinct computations, referred to as the computation target. Our goal is to characterize the average completion time as a function of the computation load, which denotes the portion of the dataset available at each worker, and the computation target. We propose two computation scheduling schemes that specify the tasks assigned to each worker, as well as their computation schedule, i.e., the order of execution. Assuming a general statistical model for computation and communication delays, we derive the average completion time of the proposed schemes. We also establish a lower bound on the minimum average completion time by assuming prior knowledge of the random delays. Experimental results carried out on Amazon EC2 cluster show a significant reduction in the average completion time over existing coded and uncoded computing schemes. It is also shown numerically that the gap between the proposed scheme and the lower bound is relatively small, confirming the efficiency of the proposed scheduling design.

preprint2016arXiv

Coded Caching for a Large Number Of Users

Information theoretic analysis of a coded caching system is considered, in which a server with a database of N equal-size files, each F bits long, serves K users. Each user is assumed to have a local cache that can store M files, i.e., capacity of MF bits. Proactive caching to user terminals is considered, in which the caches are filled by the server in advance during the placement phase, without knowing the user requests. Each user requests a single file, and all the requests are satisfied simultaneously through a shared error-free link during the delivery phase. First, centralized coded caching is studied assuming both the number and the identity of the active users in the delivery phase are known by the server during the placement phase. A novel group-based centralized coded caching (GBC) scheme is proposed for a cache capacity of M = N/K. It is shown that this scheme achieves a smaller delivery rate than all the known schemes in the literature. The improvement is then extended to a wider range of cache capacities through memory-sharing between the proposed scheme and other known schemes in the literature. Next, the proposed centralized coded caching idea is exploited in the decentralized setting, in which the identities of the users that participate in the delivery phase are assumed to be unknown during the placement phase. It is shown that the proposed decentralized caching scheme also achieves a delivery rate smaller than the state-of-the-art. Numerical simulations are also presented to corroborate our theoretical results.

preprint2016arXiv

Decentralized Coded Caching with Distinct Cache Capacities

Decentralized coded caching is studied for a content server with $N$ files, each of size $F$ bits, serving $K$ active users, each equipped with a cache of distinct capacity. It is assumed that the users' caches are filled in advance during the off-peak traffic period without the knowledge of the number of active users, their identities, or the particular demands. User demands are revealed during the peak traffic period, and are served simultaneously through an error-free shared link. A new decentralized coded caching scheme is proposed for this scenario, and it is shown to improve upon the state-of-the-art in terms of the required delivery rate over the shared link, when there are more users in the system than the number of files. Numerical results indicate that the improvement becomes more significant as the cache capacities of the users become more skewed.

preprint2016arXiv

Fundamental Limits of Coded Caching: Improved Delivery Rate-Cache Capacity Trade-off

A centralized coded caching system, consisting of a server delivering N popular files, each of size F bits, to K users through an error-free shared link, is considered. It is assumed that each user is equipped with a local cache memory with capacity MF bits, and contents can be proactively cached into these caches over a low traffic period; however, without the knowledge of the user demands. During the peak traffic period each user requests a single file from the server. The goal is to minimize the number of bits delivered by the server over the shared link, known as the delivery rate, over all user demand combinations. A novel coded caching scheme for the cache capacity of M= (N-1)/K is proposed. It is shown that the proposed scheme achieves a smaller delivery rate than the existing coded caching schemes in the literature when K > N >= 3. Furthermore, we argue that the delivery rate of the proposed scheme is within a constant multiplicative factor of 2 of the optimal delivery rate for cache capacities 1/K <= M <= (N-1)/K, when K > N >= 3.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.01913:author:2:mohammad-mohammadi-amiri

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.07724:author:2:mohammad-mohammadi-amiri

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.12154:author:4:mohammad-mohammadi-amiri

Imported May 20, 2026Synced May 20, 2026

7 works

Deniz Gunduz

Researcher

Deniz Gunduz contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

H. Vincent Poor

Researcher

H. Vincent Poor contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Deniz Gündüz

Researcher

Deniz Gündüz contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Qianqian Yang

Researcher

Qianqian Yang contributes to research discovery and scholarly infrastructure.

Open to collaborate

Mohammad Mohammadi Amiri

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences

MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling

RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs

Fundamentals of Task-Agnostic Data Valuation

Wireless for Machine Learning

A Compressive Sensing Approach for Federated Learning over Massive MIMO Communication Systems

Convergence of Federated Learning over a Noisy Downlink

Convergence of Update Aware Device Scheduling for Federated Learning at the Wireless Edge

Federated Learning over Wireless Fading Channels

Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air

Multi-Antenna Coded Content Delivery with Caching: A Low-Complexity Solution

Computation Scheduling for Distributed Machine Learning with Straggling Workers

Coded Caching for a Large Number Of Users

Decentralized Coded Caching with Distinct Cache Capacities

Fundamental Limits of Coded Caching: Improved Delivery Rate-Cache Capacity Trade-off