Source author record

Katya Scheinberg

Katya Scheinberg appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Computer Vision Information Theory math.IT math.NA math.PR

Catalog footprint

What is connected

14works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Finding Optimal Policy for Queueing Models: New Parameterization

Queueing systems appear in many important real-life applications including communication networks, transportation and manufacturing systems. Reinforcement learning (RL) framework is a suitable model for the queueing control problem where the underlying dynamics are usually unknown and the agent receives little information from the environment to navigate. In this work, we investigate the optimization aspects of the queueing model as a RL environment and provide insight to learn the optimal policy efficiently. We propose a new parameterization of the policy by using the intrinsic properties of queueing network systems. Experiments show good performance of our methods with various load conditions from light to heavy traffic.

preprint2022arXiv

Nesterov Accelerated Shuffling Gradient Method for Convex Optimization

In this paper, we propose Nesterov Accelerated Shuffling Gradient (NASG), a new algorithm for the convex finite-sum minimization problems. Our method integrates the traditional Nesterov's acceleration momentum with different shuffling sampling schemes. We show that our algorithm has an improved rate of $\mathcal{O}(1/T)$ using unified shuffling schemes, where $T$ is the number of epochs. This rate is better than that of any other shuffling gradient methods in convex regime. Our convergence analysis does not require an assumption on bounded domain or a bounded gradient condition. For randomized shuffling schemes, we improve the convergence bound further. When employing some initial condition, we show that our method converges faster near the small neighborhood of the solution. Numerical simulations demonstrate the efficiency of our algorithm.

preprint2021arXiv

Global Convergence Rate Analysis of a Generic Line Search Algorithm with Noise

In this paper, we develop convergence analysis of a modified line search method for objective functions whose value is computed with noise and whose gradient estimates are inexact and possibly random. The noise is assumed to be bounded in absolute value without any additional assumptions. We extend the framework based on stochastic methods from [Cartis and Scheinberg, 2018] which was developed to provide analysis of a standard line search method with exact function values and random gradients to the case of noisy functions. We introduce two alternative conditions on the gradient which when satisfied with some sufficiently large probability at each iteration, guarantees convergence properties of the line search method. We derive expected complexity bounds to reach a near optimal neighborhood for convex, strongly convex and nonconvex functions. The exact dependence of the convergence neighborhood on the noise is specified.

preprint2020arXiv

Adaptive Stochastic Optimization

Optimization lies at the heart of machine learning and signal processing. Contemporary approaches based on the stochastic gradient method are non-adaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application. This article summarizes recent research and motivates future work on adaptive stochastic optimization methods, which have the potential to offer significant computational savings when training large-scale systems.

preprint2020arXiv

Inexact SARAH Algorithm for Stochastic Optimization

We develop and analyze a variant of the SARAH algorithm, which does not require computation of the exact gradient. Thus this new method can be applied to general expectation minimization problems rather than only finite sum problems. While the original SARAH algorithm, as well as its predecessor, SVRG, require an exact gradient computation on each outer iteration, the inexact variant of SARAH (iSARAH), which we develop here, requires only stochastic gradient computed on a mini-batch of sufficient size. The proposed method combines variance reduction via sample size selection and iterative stochastic gradient updates. We analyze the convergence rate of the algorithms for strongly convex and non-strongly convex cases, under smooth assumption with appropriate mini-batch size selected for each case. We show that with an additional, reasonable, assumption iSARAH achieves the best known complexity among stochastic methods in the case of non-strongly convex stochastic functions.

preprint2019arXiv

Feature Engineering and Forecasting via Derivative-free Optimization and Ensemble of Sequence-to-sequence Networks with Applications in Renewable Energy

This study introduces a framework for the forecasting, reconstruction and feature engineering of multivariate processes along with its renewable energy applications. We integrate derivative-free optimization with an ensemble of sequence-to-sequence networks and design a new resampling technique called additive resampling, which, along with Bootstrap aggregating (bagging) resampling, are applied to initialize the ensemble structure. Moreover, we explore the proposed framework performance on three renewable energy sources---wind, solar and ocean wave---and conduct several short- to long-term forecasts showing the superiority of the proposed method compared to numerous machine learning techniques. The findings indicate that the introduced method performs more accurately when the forecasting horizon becomes longer. In addition, we modify the framework for automated feature selection. The model represents a clear interpretation of the selected features. Furthermore, we investigate the effects of different environmental and marine factors on the wind speed and ocean output power, respectively, and report the selected features. Finally, we explore the online forecasting setting and illustrate that the model outperforms alternatives through different measurement errors.

preprint2016arXiv

Stochastic Optimization Using a Trust-Region Method and Random Models

In this paper, we propose and analyze a trust-region model-based algorithm for solving unconstrained stochastic optimization problems. Our framework utilizes random models of an objective function $f(x)$, obtained from stochastic observations of the function or its gradient. Our method also utilizes estimates of function values to gauge progress that is being made. The convergence analysis relies on requirements that these models and these estimates are sufficiently accurate with sufficiently high, but fixed, probability. Beyond these conditions, no assumptions are made on how these models and estimates are generated. Under these general conditions we show an almost sure global convergence of the method to a first order stationary point. In the second part of the paper, we present examples of generating sufficiently accurate random models under biased or unbiased noise assumptions. Lastly, we present some computational results showing the benefits of the proposed method compared to existing approaches that are based on sample averaging or stochastic gradients.

preprint2015arXiv

Practical Inexact Proximal Quasi-Newton Method with Global Complexity Analysis

Recently several methods were proposed for sparse optimization which make careful use of second-order information [10, 28, 16, 3] to improve local convergence rates. These methods construct a composite quadratic approximation using Hessian information, optimize this approximation using a first-order method, such as coordinate descent and employ a line search to ensure sufficient descent. Here we propose a general framework, which includes slightly modified versions of existing algorithms and also a new algorithm, which uses limited memory BFGS Hessian approximations, and provide a novel global convergence rate analysis, which covers methods that solve subproblems via coordinate descent.

preprint2013arXiv

Computation of sparse low degree interpolating polynomials and their application to derivative-free optimization

Interpolation-based trust-region methods are an important class of algorithms for Derivative-Free Optimization which rely on locally approximating an objective function by quadratic polynomial interpolation models, frequently built from less points than there are basis components. Often, in practical applications, the contribution of the problem variables to the objective function is such that many pairwise correlations between variables are negligible, implying, in the smooth case, a sparse structure in the Hessian matrix. To be able to exploit Hessian sparsity, existing optimization approaches require the knowledge of the sparsity structure. The goal of this paper is to develop and analyze a method where the sparse models are constructed automatically. The sparse recovery theory developed recently in the field of compressed sensing characterizes conditions under which a sparse vector can be accurately recovered from few random measurements. Such a recovery is achieved by minimizing the l1-norm of a vector subject to the measurements constraints. We suggest an approach for building sparse quadratic polynomial interpolation models by minimizing the l1-norm of the entries of the model Hessian subject to the interpolation conditions. We show that this procedure recovers accurate models when the function Hessian is sparse, using relatively few randomly selected sample points. Motivated by this result, we developed a practical interpolation-based trust-region method using deterministic sample sets and minimum l1-norm quadratic models. Our computational results show that the new approach exhibits a promising numerical performance both in the general case and in the sparse one.

preprint2013arXiv

Convergence of trust-region methods based on probabilistic models

In this paper we consider the use of probabilistic or random models within a classical trust-region framework for optimization of deterministic smooth general nonlinear functions. Our method and setting differs from many stochastic optimization approaches in two principal ways. Firstly, we assume that the value of the function itself can be computed without noise, in other words, that the function is deterministic. Secondly, we use random models of higher quality than those produced by usual stochastic gradient methods. In particular, a first order model based on random approximation of the gradient is required to provide sufficient quality of approximation with probability greater than or equal to 1/2. This is in contrast with stochastic gradient approaches, where the model is assumed to be "correct" only in expectation. As a result of this particular setting, we are able to prove convergence, with probability one, of a trust-region method which is almost identical to the classical method. Hence we show that a standard optimization framework can be used in cases when models are random and may or may not provide good approximations, as long as "good" models are more likely than "bad" models. Our results are based on the use of properties of martingales. Our motivation comes from using random sample sets and interpolation models in derivative-free optimization. However, our framework is general and can be applied with any source of uncertainty in the model. We discuss various applications for our methods in the paper.

preprint2013arXiv

Efficiently Using Second Order Information in Large l1 Regularization Problems

We propose a novel general algorithm LHAC that efficiently uses second-order information to train a class of large-scale l1-regularized problems. Our method executes cheap iterations while achieving fast local convergence rate by exploiting the special structure of a low-rank matrix, constructed via quasi-Newton approximation of the Hessian of the smooth loss function. A greedy active-set strategy, based on the largest violations in the dual constraints, is employed to maintain a working set that iteratively estimates the complement of the optimal active set. This allows for smaller size of subproblems and eventually identifies the optimal active set. Empirical comparisons confirm that LHAC is highly competitive with several recently proposed state-of-the-art specialized solvers for sparse logistic regression and sparse inverse covariance matrix selection.

preprint2013arXiv

On partial sparse recovery

We consider the problem of recovering a partially sparse solution of an underdetermined system of linear equations by minimizing the $\ell_1$-norm of the part of the solution vector which is known to be sparse. Such a problem is closely related to a classical problem in Compressed Sensing where the $\ell_1$-norm of the whole solution vector is minimized. We introduce analogues of restricted isometry and null space properties for the recovery of partially sparse vectors and show that these new properties are implied by their original counterparts. We show also how to extend recovery under noisy measurements to the partially sparse case.

preprint2010arXiv

Fast Alternating Linearization Methods for Minimizing the Sum of Two Convex Functions

We present in this paper first-order alternating linearization algorithms based on an alternating direction augmented Lagrangian approach for minimizing the sum of two convex functions. Our basic methods require at most $O(1/ε)$ iterations to obtain an $ε$-optimal solution, while our accelerated (i.e., fast) versions of them require at most $O(1/\sqrtε)$ iterations, with little change in the computational effort required at each iteration. For both types of methods, we present one algorithm that requires both functions to be smooth with Lipschitz continuous gradients and one algorithm that needs only one of the functions to be so. Algorithms in this paper are Gauss-Seidel type methods, in contrast to the ones proposed by Goldfarb and Ma in [21] where the algorithms are Jacobi type methods. Numerical results are reported to support our theoretical conclusions and demonstrate the practical potential of our algorithms.

preprint2010arXiv

Sparse Inverse Covariance Selection via Alternating Linearization Methods

Gaussian graphical models are of great interest in statistical learning. Because the conditional independencies between different nodes correspond to zero entries in the inverse covariance matrix of the Gaussian distribution, one can learn the structure of the graph by estimating a sparse inverse covariance matrix from sample data, by solving a convex maximum likelihood problem with an $\ell_1$-regularization term. In this paper, we propose a first-order method based on an alternating linearization technique that exploits the problem's special structure; in particular, the subproblems solved in each iteration have closed-form solutions. Moreover, our algorithm obtains an $ε$-optimal solution in $O(1/ε)$ iterations. Numerical experiments on both synthetic and real data from gene association networks show that a practical version of this algorithm outperforms other competitive algorithms.

Katya Scheinberg

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Finding Optimal Policy for Queueing Models: New Parameterization

Nesterov Accelerated Shuffling Gradient Method for Convex Optimization

Global Convergence Rate Analysis of a Generic Line Search Algorithm with Noise

Adaptive Stochastic Optimization

Inexact SARAH Algorithm for Stochastic Optimization

Feature Engineering and Forecasting via Derivative-free Optimization and Ensemble of Sequence-to-sequence Networks with Applications in Renewable Energy

Stochastic Optimization Using a Trust-Region Method and Random Models

Practical Inexact Proximal Quasi-Newton Method with Global Complexity Analysis

Computation of sparse low degree interpolating polynomials and their application to derivative-free optimization

Convergence of trust-region methods based on probabilistic models

Efficiently Using Second Order Information in Large l1 Regularization Problems

On partial sparse recovery

Fast Alternating Linearization Methods for Minimizing the Sum of Two Convex Functions

Sparse Inverse Covariance Selection via Alternating Linearization Methods