Graph explorer

Programming by Rewards

We formalize and study ``programming by rewards'' (PBR), a new approach for specifying and synthesizing subroutines for optimizing some quantitative metric such as performance, resource utilization, or correctness over a benchmark. A PBR specification consists of (1) input features $x$, and (2) a reward function $r$, modeled as a black-box component (which we can only run), that assigns a reward for each execution. The goal of the synthesizer is to synthesize a "decision function" $f$ which transforms the features to a decision value for the black-box component so as to maximize the expected reward $E[r \circ f (x)]$ for executing decisions $f(x)$ for various values of $x$. We consider a space of decision functions in a DSL of loop-free if-then-else programs, which can branch on linear functions of the input features in a tree-structure and compute a linear function of the inputs in the leaves of the tree. We find that this DSL captures decision functions that are manually written in practice by programmers. Our technical contribution is the use of continuous-optimization techniques to perform synthesis of such decision functions as if-then-else programs. We also sh

12 nodes17 linksoverview previewProgramming by Rewards
12 nodes17 links
Programming by Rewards12 visible / 12 total nodes / 38 links
Related contextRelated contextRelated contextRelated contextRelated contextRelated contextCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipAuthorshipAuthorshipAuthorshipAuthorshipTopic signalTopic signalTopic signalTopic signalAuthorshipAuthorshipAuthorshipWProgramming by Rewardspreprint / 2020ANagarajan NatarajanResearcherAAjaykrishna KarthikeyanResearcherAPrateek JainResearcherAIvan RadicekResearcherTMachine Learning49008 worksTArtificial Intelligence22915 worksTSoftware Engineering3620 worksTProgramming Languages1239 worksASriram RajamaniResearcherASumit GulwaniResearcherAJohannes GehrkeResearcher
PaperSignal 1011 links

Programming by Rewards

preprint / 2020

Open