Researcher profile

Pablo Parrilo

Pablo Parrilo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
1topics
2close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Convergence Rate of Incremental Gradient and Newton Methods

The incremental gradient method is a prominent algorithm for minimizing a finite sum of smooth convex functions, used in many contexts including large-scale data processing applications and distributed optimization over networks. It is a first-order method that processes the functions one at a time based on their gradient information. The incremental Newton method, on the other hand, is a second-order variant which exploits additionally the curvature information of the underlying functions and can therefore be faster. In this paper, we focus on the case when the objective function is strongly convex and present fast convergence results for the incremental gradient and incremental Newton methods under the constant and diminishing stepsizes. For a decaying stepsize rule $α_k = Θ(1/k^s)$ with $s \in (0,1]$, we show that the distance of the IG iterates to the optimal solution converges at rate ${\cal O}(1/k^{s})$ (which translates into ${\cal O}(1/k^{2s})$ rate in the suboptimality of the objective value). For $s>1/2$, this improves the previous ${\cal O}(1/\sqrt{k})$ results in distances obtained for the case when functions are non-smooth. We show that to achieve the fastest ${\cal O}(1/k)$ rate, incremental gradient needs a stepsize that requires tuning to the strong convexity parameter whereas the incremental Newton method does not. The results are based on viewing the incremental gradient method as a gradient descent method with gradient errors, devising efficient upper bounds for the gradient error to derive inequalities that relate distances of the consecutive iterates to the optimal solution and finally applying Chung's lemmas from the stochastic approximation literature to these inequalities to determine their asymptotic behavior. In addition, we construct examples to show tightness of our rate results.

preprint2022arXiv

Why Random Reshuffling Beats Stochastic Gradient Descent

We analyze the convergence rate of the random reshuffling (RR) method, which is a randomized first-order incremental algorithm for minimizing a finite sum of convex component functions. RR proceeds in cycles, picking a uniformly random order (permutation) and processing the component functions one at a time according to this order, i.e., at each cycle, each component function is sampled without replacement from the collection. Though RR has been numerically observed to outperform its with-replacement counterpart stochastic gradient descent (SGD), characterization of its convergence rate has been a long standing open question. In this paper, we answer this question by showing that when the component functions are quadratics or smooth and the sum function is strongly convex, RR with iterate averaging and a diminishing stepsize $α_k=Θ(1/k^s)$ for $s\in (1/2,1)$ converges at rate $Θ(1/k^{2s})$ with probability one in the suboptimality of the objective value, thus improving upon the $Ω(1/k)$ rate of SGD. Our analysis draws on the theory of Polyak-Ruppert averaging and relies on decoupling the dependent cycle gradient error into an independent term over cycles and another term dominated by $α_k^2$. This allows us to apply law of large numbers to an appropriately weighted version of the cycle gradient errors, where the weights depend on the stepsize. We also provide high probability convergence rate estimates that shows decay rate of different terms and allows us to propose a modification of RR with convergence rate ${\cal O}(\frac{1}{k^2})$.