Source author record

Phalguni Nanda

Phalguni Nanda appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Computation Machine Learning math.OC Methodology

Catalog footprint

What is connected

2works

5topics

3close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Natural Policy Gradient as Doubly Smoothed Policy Iteration: A Bellman-Operator Framework

In this work, we show that natural policy gradient, a core algorithm in reinforcement learning, admits an exact formulation as a smoothed and averaged form of policy iteration. Specifically, we introduce doubly smoothed policy iteration (DSPI), a Bellman-operator framework in which each policy is obtained by applying a regularized greedy step to a weighted average of past $Q$-functions. DSPI includes policy iteration, dual-averaged policy iteration, natural policy gradient, and more general policy dual averaging methods as special cases. Using only monotonicity and contraction of smoothed Bellman operators, we prove distribution-free global geometric convergence of DSPI. Consequently, standard natural policy gradient and policy dual averaging achieve an iteration complexity of $\mathcal{O}((1-γ)^{-1}\log((1-γ)^{-1}ε^{-1}))$ for computing an $ε$-optimal policy, without modifying the MDP, adding regularization beyond the mirror map inherent in the update, or using adaptive, trajectory-dependent stepsizes. For the unregularized greedy case, corresponding to dual-averaged policy iteration, we also prove finite termination. The same Bellman-operator framework further extends to discounted MDPs with linear function approximation and stochastic shortest path problems.

preprint2020arXiv

Optimal Replacement Policy under Cumulative Damage Model and Strength Degradation with Applications

In many real-life scenarios, system failure depends on dynamic stress-strength interference, where strength degrades and stress accumulates concurrently over time. In this paper, we consider the problem of finding an optimal replacement strategy that balances the cost of replacement with the cost of failure and results in a minimum expected cost per unit time under cumulative damage model with strength degradation. The existing recommendations are applicable only under restricted distributional assumptions and/or with fixed strength. As theoretical evaluation of the expected cost per unit time turns out to be very complicated, a simulation-based algorithm is proposed to evaluate the expected cost rate and find the optimal replacement strategy. The proposed method is easy to implement having wider domain of application. For illustration, the proposed method is applied to real case studies on mailbox and cell-phone battery experiments.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint