Researcher profile

Phalguni Nanda

Phalguni Nanda contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
5topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Natural Policy Gradient as Doubly Smoothed Policy Iteration: A Bellman-Operator Framework

In this work, we show that natural policy gradient, a core algorithm in reinforcement learning, admits an exact formulation as a smoothed and averaged form of policy iteration. Specifically, we introduce doubly smoothed policy iteration (DSPI), a Bellman-operator framework in which each policy is obtained by applying a regularized greedy step to a weighted average of past $Q$-functions. DSPI includes policy iteration, dual-averaged policy iteration, natural policy gradient, and more general policy dual averaging methods as special cases. Using only monotonicity and contraction of smoothed Bellman operators, we prove distribution-free global geometric convergence of DSPI. Consequently, standard natural policy gradient and policy dual averaging achieve an iteration complexity of $\mathcal{O}((1-γ)^{-1}\log((1-γ)^{-1}ε^{-1}))$ for computing an $ε$-optimal policy, without modifying the MDP, adding regularization beyond the mirror map inherent in the update, or using adaptive, trajectory-dependent stepsizes. For the unregularized greedy case, corresponding to dual-averaged policy iteration, we also prove finite termination. The same Bellman-operator framework further extends to discounted MDPs with linear function approximation and stochastic shortest path problems.

preprint2020arXiv

Optimal Replacement Policy under Cumulative Damage Model and Strength Degradation with Applications

In many real-life scenarios, system failure depends on dynamic stress-strength interference, where strength degrades and stress accumulates concurrently over time. In this paper, we consider the problem of finding an optimal replacement strategy that balances the cost of replacement with the cost of failure and results in a minimum expected cost per unit time under cumulative damage model with strength degradation. The existing recommendations are applicable only under restricted distributional assumptions and/or with fixed strength. As theoretical evaluation of the expected cost per unit time turns out to be very complicated, a simulation-based algorithm is proposed to evaluate the expected cost rate and find the optimal replacement strategy. The proposed method is easy to implement having wider domain of application. For illustration, the proposed method is applied to real case studies on mailbox and cell-phone battery experiments.