Source author record

Meisam Razaviyayn

Meisam Razaviyayn appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Information Theory math.IT Artificial Intelligence Computation and Language Computer Science and Game Theory Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Hardware Architecture math.AG Methodology Numerical Analysis Systems and Control

Catalog footprint

What is connected

26works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs

Diversity is essential for language-model applications ranging from creative generation to scientific discovery, yet modern LLMs often collapse into a narrow subset of plausible outputs. While prior work has developed benchmarks for measuring this lack of diversity, less is known about how the step-by-step probability distributions at inference time cause the problem. We introduce a validity--diversity framework that attributes diversity collapse to how an LLM allocates probability mass across valid and invalid continuations during decoding. This framework decomposes the bottleneck into two complementary forms of miscalibration. First, order calibration: valid tokens are not reliably ranked above invalid tokens, so rank-based cutoff rules must trade off between recovering valid continuations and admitting invalid ones. Second, shape calibration: probability mass is overly concentrated only on few valid continuations while having a heavy-tail of mixed valid and invalid tokens, so maintaining high validity limits diversity. We formalize both mechanisms and show that local failures compound across decoding steps, producing strong sequence-level losses in diversity. Empirically, we develop controlled diagnostics for probing these bottlenecks, including tasks with exactly known valid sets and oracle cutoff baselines. Across 14 language models spanning multiple families and scales, we find that diversity collapse is not merely a limitation of particular sampling heuristics, but a consequence of order and shape miscalibration in the LLM distribution.

preprint2025arXiv

Nested Learning: The Illusion of Deep Learning Architectures

Despite the recent progresses, particularly in developing Language Models, there are fundamental challenges and unanswered questions about how such models can continually learn/memorize, self-improve, and find effective solutions. In this paper, we present a new learning paradigm, called Nested Learning (NL), that coherently represents a machine learning model with a set of nested, multi-level, and/or parallel optimization problems, each of which with its own context flow. Through the lenses of NL, existing deep learning methods learns from data through compressing their own context flow, and in-context learning naturally emerges in large models. NL suggests a philosophy to design more expressive learning algorithms with more levels, resulting in higher-order in-context learning and potentially unlocking effective continual learning capabilities. We advocate for NL by presenting three core contributions: (1) Expressive Optimizers: We show that known gradient-based optimizers, such as Adam, SGD with Momentum, etc., are in fact associative memory modules that aim to compress the gradients' information (by gradient descent). Building on this insight, we present other more expressive optimizers with deep memory and/or more powerful learning rules; (2) Self-Modifying Learning Module: Taking advantage of NL's insights on learning algorithms, we present a sequence model that learns how to modify itself by learning its own update algorithm; and (3) Continuum Memory System: We present a new formulation for memory system that generalizes the traditional viewpoint of long/short-term memory. Combining our self-modifying sequence model with the continuum memory system, we present a continual learning module, called Hope, showing promising results in language modeling, knowledge incorporation, and few-shot generalization tasks, continual learning, and long-context reasoning tasks.

preprint2023arXiv

A Stochastic Optimization Framework for Fair Risk Minimization

Despite the success of large-scale empirical risk minimization (ERM) at achieving high accuracy across a variety of machine learning tasks, fair ERM is hindered by the incompatibility of fairness constraints with stochastic optimization. We consider the problem of fair classification with discrete sensitive attributes and potentially large models and data sets, requiring stochastic solvers. Existing in-processing fairness algorithms are either impractical in the large-scale setting because they require large batches of data at each iteration or they are not guaranteed to converge. In this paper, we develop the first stochastic in-processing fairness algorithm with guaranteed convergence. For demographic parity, equalized odds, and equal opportunity notions of fairness, we provide slight variations of our algorithm--called FERMI--and prove that each of these variations converges in stochastic optimization with any batch size. Empirically, we show that FERMI is amenable to stochastic solvers with multiple (non-binary) sensitive attributes and non-binary targets, performing well even with minibatch size as small as one. Extensive experiments show that FERMI achieves the most favorable tradeoffs between fairness violation and test accuracy across all tested setups compared with state-of-the-art baselines for demographic parity, equalized odds, equal opportunity. These benefits are especially significant with small batch sizes and for non-binary classification with large number of sensitive attributes, making FERMI a practical, scalable fairness algorithm. The code for all of the experiments in this paper is available at: https://github.com/optimization-for-data-driven-science/FERMI.

preprint2022arXiv

A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions

As deep learning (DL) efficacy grows, concerns for poor model explainability grow also. Attribution methods address the issue of explainability by quantifying the importance of an input feature for a model prediction. Among various methods, Integrated Gradients (IG) sets itself apart by claiming other methods failed to satisfy desirable axioms, while IG and methods like it uniquely satisfy said axioms. This paper comments on fundamental aspects of IG and its applications/extensions: 1) We identify key differences between IG function spaces and the supporting literature's function spaces which problematize previous claims of IG uniqueness. We show that with the introduction of an additional axiom, \textit{non-decreasing positivity}, the uniqueness claims can be established. 2) We address the question of input sensitivity by identifying function classes where IG is/is not Lipschitz in the attributed input. 3) We show that axioms for single-baseline methods have analogous properties for methods with probability distribution baselines. 4) We introduce a computationally efficient method of identifying internal neurons that contribute to specified regions of an IG attribution map. Finally, we present experimental results validating this method.

preprint2022arXiv

Congestion Reduction via Personalized Incentives

With rapid population growth and urban development, traffic congestion has become an inescapable issue, especially in large cities. Many congestion reduction strategies have been proposed in the past, ranging from roadway extension to transportation demand management. In particular, congestion pricing schemes have been used as negative reinforcements for traffic control. In this project, we study an alternative approach of offering positive incentives to drivers to take different routes. More specifically, we propose an algorithm to reduce traffic congestion and improve routing efficiency via offering personalized incentives to drivers. We exploit the wide-accessibility of smart devices to communicate with drivers and develop an incentive offering mechanism using individuals' preferences and aggregate traffic information. The incentives are offered after solving a large-scale optimization problem in order to minimize the total travel time (or minimize any cost function of the network such as total Carbon emission). Since this massive size optimization problem needs to be solved continually in the network, we developed a distributed computational approach. The proposed distributed algorithm is guaranteed to converge under a mild set of assumptions that are verified with real data. We evaluated the performance of our algorithm using traffic data from the Los Angeles area. Our experiments show congestion reduction of up to 11% in arterial roads and highways.

preprint2022arXiv

Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities

In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks. We first consider a deterministic version of the problem. We design and analyze the Zeroth-Order Gradient Descent Ascent (\texttt{ZO-GDA}) algorithm, and provide improved results compared to existing works, in terms of oracle complexity. We also propose the Zeroth-Order Gradient Descent Multi-Step Ascent (\texttt{ZO-GDMSA}) algorithm that significantly improves the oracle complexity of \texttt{ZO-GDA}. We then consider stochastic versions of \texttt{ZO-GDA} and \texttt{ZO-GDMSA}, to handle stochastic nonconvex minimax problems. For this case, we provide oracle complexity results under two assumptions on the stochastic gradient: (i) the uniformly bounded variance assumption, which is common in traditional stochastic optimization, and (ii) the Strong Growth Condition (SGC), which has been known to be satisfied by modern over-parametrized machine learning models. We establish that under the SGC assumption, the complexities of the stochastic algorithms match that of deterministic algorithms. Numerical experiments are presented to support our theoretical results.

preprint2021arXiv

Alternating Direction Method of Multipliers for Quantization

Quantization of the parameters of machine learning models, such as deep neural networks, requires solving constrained optimization problems, where the constraint set is formed by the Cartesian product of many simple discrete sets. For such optimization problems, we study the performance of the Alternating Direction Method of Multipliers for Quantization ($\texttt{ADMM-Q}$) algorithm, which is a variant of the widely-used ADMM method applied to our discrete optimization problem. We establish the convergence of the iterates of $\texttt{ADMM-Q}$ to certain $\textit{stationary points}$. To the best of our knowledge, this is the first analysis of an ADMM-type method for problems with discrete variables/constraints. Based on our theoretical insights, we develop a few variants of $\texttt{ADMM-Q}$ that can handle inexact update rules, and have improved performance via the use of "soft projection" and "injecting randomness to the algorithm". We empirically evaluate the efficacy of our proposed approaches.

preprint2020arXiv

Convergence to Second-Order Stationarity for Constrained Non-Convex Optimization

We consider the problem of finding an approximate second-order stationary point of a constrained non-convex optimization problem. We first show that, unlike the gradient descent method for unconstrained optimization, the vanilla projected gradient descent algorithm may converge to a strict saddle point even when there is only a single linear constraint. We then provide a hardness result by showing that checking $(ε_g,ε_H)$-second order stationarity is NP-hard even in the presence of linear constraints. Despite our hardness result, we identify instances of the problem for which checking second order stationarity can be done efficiently. For such instances, we propose a dynamic second order Frank--Wolfe algorithm which converges to ($ε_g, ε_H$)-second order stationary points in ${\mathcal{O}}(\max\{ε_g^{-2}, ε_H^{-3}\})$ iterations. The proposed algorithm can be used in general constrained non-convex optimization as long as the constrained quadratic sub-problem can be solved efficiently.

preprint2020arXiv

Read Mapping Near Non-Volatile Memory

DNA sequencing is the physical/biochemical process of identifying the location of the four bases (Adenine, Guanine, Cytosine, Thymine) in a DNA strand. As semiconductor technology revolutionized computing, modern DNA sequencing technology (termed Next Generation Sequencing, NGS)revolutionized genomic research. As a result, modern NGS platforms can sequence hundreds of millions of short DNA fragments in parallel. The sequenced DNA fragments, representing the output of NGS platforms, are termed reads. Besides genomic variations, NGS imperfections induce noise in reads. Mapping each read to (the most similar portion of) a reference genome of the same species, i.e., read mapping, is a common critical first step in a diverse set of emerging bioinformatics applications. Mapping represents a search-heavy memory-intensive similarity matching problem, therefore, can greatly benefit from near-memory processing. Intuition suggests using fast associative search enabled by Ternary Content Addressable Memory (TCAM) by construction. However, the excessive energy consumption and lack of support for similarity matching (under NGS and genomic variation induced noise) renders direct application of TCAM infeasible, irrespective of volatility, where only non-volatile TCAM can accommodate the large memory footprint in an area-efficient way. This paper introduces GeNVoM, a scalable, energy-efficient and high-throughput solution. Instead of optimizing an algorithm developed for general-purpose computers or GPUs, GeNVoM rethinks the algorithm and non-volatile TCAM-based accelerator design together from the ground up. Thereby GeNVoM can improve the throughput by up to 113.5 times (3.6); the energy consumption, by up to 210.9 times (1.36), when compared to a GPU (accelerator) baseline, which represents one of the highest-throughput implementations known.

preprint2020arXiv

Robustness of accelerated first-order algorithms for strongly convex optimization problems

We study the robustness of accelerated first-order algorithms to stochastic uncertainties in gradient evaluation. Specifically, for unconstrained, smooth, strongly convex optimization problems, we examine the mean-squared error in the optimization variable when the iterates are perturbed by additive white noise. This type of uncertainty may arise in situations where an approximation of the gradient is sought through measurements of a real system or in a distributed computation over a network. Even though the underlying dynamics of first-order algorithms for this class of problems are nonlinear, we establish upper bounds on the mean-squared deviation from the optimal solution that are tight up to constant factors. Our analysis quantifies fundamental trade-offs between noise amplification and convergence rates obtained via any acceleration scheme similar to Nesterov's or heavy-ball methods. To gain additional analytical insight, for strongly convex quadratic problems, we explicitly evaluate the steady-state variance of the optimization variable in terms of the eigenvalues of the Hessian of the objective function. We demonstrate that the entire spectrum of the Hessian, rather than just the extreme eigenvalues, influence robustness of noisy algorithms. We specialize this result to the problem of distributed averaging over undirected networks and examine the role of network size and topology on the robustness of noisy accelerated algorithms.

preprint2020arXiv

Solving Non-Convex Non-Differentiable Min-Max Games using Proximal Gradient Method

Min-max saddle point games appear in a wide range of applications in machine leaning and signal processing. Despite their wide applicability, theoretical studies are mostly limited to the special convex-concave structure. While some recent works generalized these results to special smooth non-convex cases, our understanding of non-smooth scenarios is still limited. In this work, we study special form of non-smooth min-max games when the objective function is (strongly) convex with respect to one of the player's decision variable. We show that a simple multi-step proximal gradient descent-ascent algorithm converges to $ε$-first-order Nash equilibrium of the min-max game with the number of gradient evaluations being polynomial in $1/ε$. We will also show that our notion of stationarity is stronger than existing ones in the literature. Finally, we evaluate the performance of the proposed algorithm through adversarial attack on a LASSO estimator.

preprint2015arXiv

A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data

This article presents a powerful algorithmic framework for big data optimization, called the Block Successive Upper bound Minimization (BSUM). The BSUM includes as special cases many well-known methods for analyzing massive data sets, such as the Block Coordinate Descent (BCD), the Convex-Concave Procedure (CCCP), the Block Coordinate Proximal Gradient (BCPG) method, the Nonnegative Matrix Factorization (NMF), the Expectation Maximization (EM) method and so on. In this article, various features and properties of the BSUM are discussed from the viewpoint of design flexibility, computational efficiency, parallel/distributed implementation and the required communication overhead. Illustrative examples from networking, signal processing and machine learning are presented to demonstrate the practical performance of the BSUM framework

preprint2015arXiv

Computational Intractability of Dictionary Learning for Sparse Representation

In this paper we consider the dictionary learning problem for sparse representation. We first show that this problem is NP-hard by polynomial time reduction of the densest cut problem. Then, using successive convex approximation strategies, we propose efficient dictionary learning schemes to solve several practical formulations of this problem to stationary points. Unlike many existing algorithms in the literature, such as K-SVD, our proposed dictionary learning scheme is theoretically guaranteed to converge to the set of stationary points under certain mild assumptions. For the image denoising application, the performance and the efficiency of the proposed dictionary learning scheme are comparable to that of K-SVD algorithm in simulation.

preprint2015arXiv

Computing B-Stationary Points of Nonsmooth DC Programs

Motivated by a class of applied problems arising from physical layer based security in a digital communication system, in particular, by a secrecy sum-rate maximization problem, this paper studies a nonsmooth, difference-of-convex (dc) minimization problem. The contributions of this paper are: (i) clarify several kinds of stationary solutions and their relations; (ii) develop and establish the convergence of a novel algorithm for computing a d-stationary solution of a problem with a convex feasible set that is arguably the sharpest kind among the various stationary solutions; (iii) extend the algorithm in several directions including: a randomized choice of the subproblems that could help the practical convergence of the algorithm, a distributed penalty approach for problems whose objective functions are sums of dc functions, and problems with a specially structured (nonconvex) dc constraint. For the latter class of problems, a pointwise Slater constraint qualification is introduced that facilitates the verification and computation of a B(ouligand)-stationary point.

preprint2015arXiv

Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex Problems

The alternating direction method of multipliers (ADMM) is widely used to solve large-scale linearly constrained optimization problems, convex or nonconvex, in many engineering fields. However there is a general lack of theoretical understanding of the algorithm when the objective function is nonconvex. In this paper we analyze the convergence of the ADMM for solving certain nonconvex consensus and sharing problems, and show that the classical ADMM converges to the set of stationary solutions, provided that the penalty parameter in the augmented Lagrangian is chosen to be sufficiently large. For the sharing problems, we show that the ADMM is convergent regardless of the number of variable blocks. Our analysis does not impose any assumptions on the iterates generated by the algorithm, and is broadly applicable to many ADMM variants involving proximal update rules and various flexible block selection rules.

preprint2015arXiv

Discrete Rényi Classifiers

Consider the binary classification problem of predicting a target variable $Y$ from a discrete feature vector $X = (X_1,...,X_d)$. When the probability distribution $\mathbb{P}(X,Y)$ is known, the optimal classifier, leading to the minimum misclassification rate, is given by the Maximum A-posteriori Probability decision rule. However, estimating the complete joint distribution $\mathbb{P}(X,Y)$ is computationally and statistically impossible for large values of $d$. An alternative approach is to first estimate some low order marginals of $\mathbb{P}(X,Y)$ and then design the classifier based on the estimated low order marginals. This approach is also helpful when the complete training data instances are not available due to privacy concerns. In this work, we consider the problem of finding the optimum classifier based on some estimated low order marginals of $(X,Y)$. We prove that for a given set of marginals, the minimum Hirschfeld-Gebelein-Renyi (HGR) correlation principle introduced in [1] leads to a randomized classification rule which is shown to have a misclassification rate no larger than twice the misclassification rate of the optimal classifier. Then, under a separability condition, we show that the proposed algorithm is equivalent to a randomized linear regression approach. In addition, this method naturally results in a robust feature selection method selecting a subset of features having the maximum worst case HGR correlation with the target variable. Our theoretical upper-bound is similar to the recent Discrete Chebyshev Classifier (DCC) approach [2], while the proposed algorithm has significant computational advantages since it only requires solving a least square optimization problem. Finally, we numerically compare our proposed algorithm with the DCC classifier and show that the proposed algorithm results in better misclassification rate over various datasets.

preprint2015arXiv

Iteration Complexity Analysis of Block Coordinate Descent Methods

In this paper, we provide a unified iteration complexity analysis for a family of general block coordinate descent (BCD) methods, covering popular methods such as the block coordinate gradient descent (BCGD) and the block coordinate proximal gradient (BCPG), under various different coordinate update rules. We unify these algorithms under the so-called Block Successive Upper-bound Minimization (BSUM) framework, and show that for a broad class of multi-block nonsmooth convex problems, all algorithms covered by the BSUM framework achieve a global sublinear iteration complexity of $O(1/r)$, where r is the iteration index. Moreover, for the case of block coordinate minimization (BCM) where each block is minimized exactly, we establish the sublinear convergence rate of $O(1/r)$ without per block strong convexity assumption. Further, we show that when there are only two blocks of variables, a special BSUM algorithm with Gauss-Seidel rule can be accelerated to achieve an improved rate of $O(1/r^2)$.

preprint2015arXiv

Minimum HGR Correlation Principle: From Marginals to Joint Distribution

Given low order moment information over the random variables $\mathbf{X} = (X_1,X_2,\ldots,X_p)$ and $Y$, what distribution minimizes the Hirschfeld-Gebelein-Rényi (HGR) maximal correlation coefficient between $\mathbf{X}$ and $Y$, while remains faithful to the given moments? The answer to this question is important especially in order to fit models over $(\mathbf{X},Y)$ with minimum dependence among the random variables $\mathbf{X}$ and $Y$. In this paper, we investigate this question first in the continuous setting by showing that the jointly Gaussian distribution achieves the minimum HGR correlation coefficient among distributions with the given first and second order moments. Then, we pose a similar question in the discrete scenario by fixing the pairwise marginals of the random variables $\mathbf{X}$ and $Y$. To answer this question in the discrete setting, we first derive a lower bound for the HGR correlation coefficient over the class of distributions with fixed pairwise marginals. Then we show that this lower bound is tight if there exists a distribution with certain {\it additive} structure satisfying the given pairwise marginals. Moreover, the distribution with the additive structure achieves the minimum HGR correlation coefficient. Finally, we conclude by showing that the event of obtaining pairwise marginals containing an additive structured distribution has a positive Lebesgue measure over the probability simplex.

preprint2014arXiv

A Block Successive Upper Bound Minimization Method of Multipliers for Linearly Constrained Convex Optimization

Consider the problem of minimizing the sum of a smooth convex function and a separable nonsmooth convex function subject to linear coupling constraints. Problems of this form arise in many contemporary applications including signal processing, wireless networking and smart grid provisioning. Motivated by the huge size of these applications, we propose a new class of first order primal-dual algorithms called the block successive upper-bound minimization method of multipliers (BSUM-M) to solve this family of problems. The BSUM-M updates the primal variable blocks successively by minimizing locally tight upper-bounds of the augmented Lagrangian of the original problem, followed by a gradient type update for the dual variable in closed form. We show that under certain regularity conditions, and when the primal block variables are updated in either a deterministic or a random fashion, the BSUM-M converges to the set of optimal solutions. Moreover, in the absence of linear constraints, we show that the BSUM-M, which reduces to the block successive upper-bound minimization (BSUM) method, is capable of linear convergence without strong convexity.

preprint2014arXiv

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

Consider the problem of minimizing the sum of a smooth (possibly non-convex) and a convex (possibly nonsmooth) function involving a large number of variables. A popular approach to solve this problem is the block coordinate descent (BCD) method whereby at each iteration only one variable block is updated while the remaining variables are held fixed. With the recent advances in the developments of the multi-core parallel processing technology, it is desirable to parallelize the BCD method by allowing multiple blocks to be updated simultaneously at each iteration of the algorithm. In this work, we propose an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function. We investigate the convergence of this parallel BCD method for both randomized and cyclic variable selection rules. We analyze the asymptotic and non-asymptotic convergence behavior of the algorithm for both convex and non-convex objective functions. The numerical experiments suggest that for a special case of Lasso minimization problem, the cyclic block selection rule can outperform the randomized rule.

preprint2013arXiv

A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks

Consider the problem of minimizing the expected value of a cost function parameterized by a random variable. The classical sample average approximation (SAA) method for solving this problem requires minimization of an ensemble average of the objective at each step, which can be expensive. In this paper, we propose a stochastic successive upper-bound minimization method (SSUM) which minimizes an approximate ensemble average at each iteration. To ensure convergence and to facilitate computation, we require the approximate ensemble average to be a locally tight upper-bound of the expected cost function and be easily optimized. The main contributions of this work include the development and analysis of the SSUM method as well as its applications in linear transceiver design for wireless communication networks and online dictionary learning. Moreover, using the SSUM framework, we extend the classical stochastic (sub-)gradient (SG) method to the case of minimizing a nonsmooth nonconvex objective function and establish its convergence.

preprint2013arXiv

Joint User Grouping and Linear Virtual Beamforming: Complexity, Algorithms and Approximation Bounds

In a wireless system with a large number of distributed nodes, the quality of communication can be greatly improved by pooling the nodes to perform joint transmission/reception. In this paper, we consider the problem of optimally selecting a subset of nodes from potentially a large number of candidates to form a virtual multi-antenna system, while at the same time designing their joint linear transmission strategies. We focus on two specific application scenarios: 1) multiple single antenna transmitters cooperatively transmit to a receiver; 2) a single transmitter transmits to a receiver with the help of a number of cooperative relays. We formulate the joint node selection and beamforming problems as cardinality constrained optimization problems with both discrete variables (used for selecting cooperative nodes) and continuous variables (used for designing beamformers). For each application scenario, we first characterize the computational complexity of the joint optimization problem, and then propose novel semi-definite relaxation (SDR) techniques to obtain approximate solutions. We show that the new SDR algorithms have a guaranteed approximation performance in terms of the gap to global optimality, regardless of channel realizations. The effectiveness of the proposed algorithms is demonstrated via numerical experiments.

preprint2012arXiv

A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization

The block coordinate descent (BCD) method is widely used for minimizing a continuous function f of several block variables. At each iteration of this method, a single block of variables is optimized, while the remaining variables are held fixed. To ensure the convergence of the BCD method, the subproblem to be optimized in each iteration needs to be solved exactly to its unique optimal solution. Unfortunately, these requirements are often too restrictive for many practical scenarios. In this paper, we study an alternative inexact BCD approach which updates the variable blocks by successively minimizing a sequence of approximations of f which are either locally tight upper bounds of f or strictly convex local approximations of f. We focus on characterizing the convergence properties for a fairly wide class of such methods, especially for the cases where the objective functions are either non-differentiable or nonconvex. Our results unify and extend the existing convergence results for many classical algorithms such as the BCD method, the difference of convex functions (DC) method, the expectation maximization (EM) algorithm, as well as the alternating proximal minimization algorithm.

preprint2012arXiv

Linear Transceiver Design for a MIMO Interfering Broadcast Channel Achieving Max-Min Fairness

We consider the problem of linear transceiver design to achieve max-min fairness in a downlink MIMO multicell network. This problem can be formulated as maximizing the minimum rate among all the users in an interfering broadcast channel (IBC). In this paper we show that when the number of antennas is at least two at each of the transmitters and the receivers, the min rate maximization problem is NP-hard in the number of users. Moreover, we develop a low-complexity algorithm for this problem by iteratively solving a sequence of convex subproblems, and establish its global convergence to a stationary point of the original minimum rate maximization problem. Numerical simulations show that this algorithm is efficient in achieving fairness among all the users.

preprint2011arXiv

On the Degrees of Freedom Achievable Through Interference Alignment in a MIMO Interference Channel

Consider a K-user flat fading MIMO interference channel where the k-th transmitter (or receiver) is equipped with M_k (respectively N_k) antennas. If a large number of statistically independent channel extensions are allowed either across time or frequency, the recent work [1] suggests that the total achievable degrees of freedom (DoF) can be maximized via interference alignment, resulting in a total DoF that grows linearly with K even if M_k and N_k are bounded. In this work we consider the case where no channel extension is allowed, and establish a general condition that must be satisfied by any degrees of freedom tuple (d_1, d2, ..., d_K) achievable through linear interference alignment. For a symmetric system with M_k = M, N_k = N, d_k = d for all k, this condition implies that the total achievable DoF cannot grow linearly with K, and is in fact no more than K(M + N)=(K + 1). We also show that this bound is tight when the number of antennas at each transceiver is divisible by the number of data streams.

preprint2010arXiv

Linear Transceiver Design for Interference Alignment: Complexity and Computation

Consider a MIMO interference channel whereby each transmitter and receiver are equipped with multiple antennas. The basic problem is to design optimal linear transceivers (or beamformers) that can maximize system throughput. The recent work [1] suggests that optimal beamformers should maximize the total degrees of freedom and achieve interference alignment in high SNR. In this paper we first consider the interference alignment problem in spatial domain and prove that the problem of maximizing the total degrees of freedom for a given MIMO interference channel is NP-hard. Furthermore, we show that even checking the achievability of a given tuple of degrees of freedom for all receivers is NP-hard when each receiver is equipped with at least three antennas. Interestingly, the same problem becomes polynomial time solvable when each transmit/receive node is equipped with no more than two antennas. Finally, we propose a distributed algorithm for transmit covariance matrix design, while assuming each receiver uses a linear MMSE beamformer. The simulation results show that the proposed algorithm outperforms the existing interference alignment algorithms in terms of system throughput.

Meisam Razaviyayn

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs

Nested Learning: The Illusion of Deep Learning Architectures

A Stochastic Optimization Framework for Fair Risk Minimization

A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions

Congestion Reduction via Personalized Incentives

Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities

Alternating Direction Method of Multipliers for Quantization

Convergence to Second-Order Stationarity for Constrained Non-Convex Optimization

Read Mapping Near Non-Volatile Memory

Robustness of accelerated first-order algorithms for strongly convex optimization problems

Solving Non-Convex Non-Differentiable Min-Max Games using Proximal Gradient Method

A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data

Computational Intractability of Dictionary Learning for Sparse Representation

Computing B-Stationary Points of Nonsmooth DC Programs

Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex Problems

Discrete Rényi Classifiers

Iteration Complexity Analysis of Block Coordinate Descent Methods

Minimum HGR Correlation Principle: From Marginals to Joint Distribution

A Block Successive Upper Bound Minimization Method of Multipliers for Linearly Constrained Convex Optimization

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks

Joint User Grouping and Linear Virtual Beamforming: Complexity, Algorithms and Approximation Bounds

A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization

Linear Transceiver Design for a MIMO Interfering Broadcast Channel Achieving Max-Min Fairness

On the Degrees of Freedom Achievable Through Interference Alignment in a MIMO Interference Channel

Linear Transceiver Design for Interference Alignment: Complexity and Computation