Researcher profile

Daniel Russo

Daniel Russo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

We study reinforcement learning in hybrid discrete-continuous action spaces, such as settings where the discrete component selects a regime (or index) and the continuous component optimizes within it -- a structure common in robotics, control, and operations problems. Standard model-free policy gradient methods rely on score-function (SF) estimators and suffer from severe credit-assignment issues in high-dimensional settings, leading to poor gradient quality. On the other hand, differentiable simulation largely sidesteps these issues by backpropagating through a simulator, but the presence of discrete actions or non-smooth dynamics yields biased or uninformative gradients. To address this, we propose Hybrid Policy Optimization (HPO), which backpropagates through the simulator wherever smoothness permits, using a mixed gradient estimator that combines pathwise and SF gradients while maintaining unbiasedness. We also show how problems with action discontinuities can be reformulated in hybrid form, further broadening its applicability. Empirically, HPO substantially outperforms PPO on inventory control and switched linear-quadratic regulator problems, with performance gaps increasing as the continuous action dimension grows. Finally, we characterize the structure of the mixed gradient, showing that its cross term -- which captures how continuous actions influence future discrete decisions -- becomes negligible near a discrete best response, thereby enabling approximate decentralized updates of the continuous and discrete components and reducing variance near optimality. All resources are available at github.com/MatiasAlvo/hybrid-rl.

preprint2024arXiv

Navigating the Complexity of Generative AI Adoption in Software Engineering

In this paper, the adoption patterns of Generative Artificial Intelligence (AI) tools within software engineering are investigated. Influencing factors at the individual, technological, and societal levels are analyzed using a mixed-methods approach for an extensive comprehension of AI adoption. An initial structured interview was conducted with 100 software engineers, employing the Technology Acceptance Model (TAM), the Diffusion of Innovations theory (DOI), and the Social Cognitive Theory (SCT) as guiding theories. A theoretical model named the Human-AI Collaboration and Adaptation Framework (HACAF) was deduced using the Gioia Methodology, characterizing AI adoption in software engineering. This model's validity was subsequently tested through Partial Least Squares - Structural Equation Modeling (PLS-SEM), using data collected from 183 software professionals. The results indicate that the adoption of AI tools in these early integration stages is primarily driven by their compatibility with existing development workflows. This finding counters the traditional theories of technology acceptance. Contrary to expectations, the influence of perceived usefulness, social aspects, and personal innovativeness on adoption appeared to be less significant. This paper yields significant insights for the design of future AI tools and supplies a structure for devising effective strategies for organizational implementation.

preprint2023arXiv

Satisfaction and Performance of Software Developers during Enforced Work from Home in the COVID-19 Pandemic

Following the onset of the COVID-19 pandemic and subsequent lockdowns, the daily lives of software engineers were heavily disrupted as they were abruptly forced to work remotely from home. To better understand and contrast typical working days in this new reality with work in pre-pandemic times, we conducted one exploratory (N = 192) and one confirmatory study (N = 290) with software engineers recruited remotely. Specifically, we build on self-determination theory to evaluate whether and how specific activities are associated with software engineers' satisfaction and productivity. To explore the subject domain, we first ran a two-wave longitudinal study. We found that the time software engineers spent on specific activities (e.g., coding, bugfixing, helping others) while working from home was similar to pre-pandemic times. Also, the amount of time developers spent on each activity was unrelated to their general well-being, perceived productivity, and other variables such as basic needs. Our confirmatory study found that activity-specific variables (e.g., how much autonomy software engineers had during coding) do predict activity satisfaction and productivity but not by activity-independent variables such as general resilience or a good work-life balance. Interestingly, we found that satisfaction and autonomy were significantly higher when software engineers were helping others and lower when they were bugfixing. Finally, we discuss implications for software engineers, management, and researchers. In particular, active company policies to support developers' need for autonomy, relatedness, and competence appear particularly effective in a WFH context.

preprint2022arXiv

A Theory of Scrum Team Effectiveness

Scrum teams are at the heart of the Scrum framework. Nevertheless, an integrated and systemic theory that can explain what makes some Scrum teams more effective than others is still missing. To address this gap, we performed a seven-year-long mixed-method investigation composed of two main phases. First, we induced a theoretical model from thirteen exploratory field studies. Our model proposes that the effectiveness of Scrum teams depends on five high-level factors - responsiveness, stakeholder concern, continuous improvement, team autonomy, and management support - and thirteen lower-level factors. In the second phase of our study, we validated our model with a Covariance-Based Structural Equation Modeling (SEM) analysis using data from about 5,000 developers and 2,000 Scrum teams that we gathered with a custom-built survey. Results suggest a very good fit of the empirical data in our theoretical model (CFI = 0.959, RMSEA = 0.038, SRMR = 0.035). Accordingly, this research allowed us to (1) propose and validate a generalizable theory for effective Scrum teams and (2) formulate clear recommendations for how organizations can better support Scrum teams.

preprint2022arXiv

Approximation Benefits of Policy Gradient Methods with Aggregated States

Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregated representations, where the state space is partitioned and either the policy or value function approximation is held constant over partitions. This paper shows a policy gradient method converges to a policy whose regret per-period is bounded by $ε$, the largest difference between two elements of the state-action value function belonging to a common partition. With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as $ε/(1-γ)$, where $γ$ is a discount factor. Faced with inherent approximation error, methods that locally optimize the true decision-objective can be far more robust.

preprint2022arXiv

Global Optimality Guarantees For Policy Gradient Methods

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by standard dynamic programming techniques, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to a stationary point. This work identifies structural properties -- shared by several classic control problems -- that ensure the policy gradient objective function has no suboptimal stationary points despite being non-convex. When these conditions are strengthened, this objective satisfies a Polyak-lojasiewicz (gradient dominance) condition that yields convergence rates. We also provide bounds on the optimality gap of any stationary point when some of these conditions are relaxed.

preprint2022arXiv

Recruiting Software Engineers on Prolific

Recruiting participants for software engineering research has been a primary concern of the human factors community. This is particularly true for quantitative investigations that require a minimum sample size not to be statistically underpowered. Traditional data collection techniques, such as mailing lists, are highly doubtful due to self-selection biases. The introduction of crowdsourcing platforms allows researchers to select informants with the exact requirements foreseen by the study design, gather data in a concise time frame, compensate their work with fair hourly pay, and most importantly, have a high degree of control over the entire data collection process. This experience report discusses our experience conducting sample studies using Prolific, an academic crowdsourcing platform. Topics discussed are the type of studies, selection processes, and power computation.

preprint2021arXiv

Empirical Standards for Software Engineering Research

Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.

preprint2021arXiv

Learning to Stop with Surprisingly Few Samples

We consider a discounted infinite horizon optimal stopping problem. If the underlying distribution is known a priori, the solution of this problem is obtained via dynamic programming (DP) and is given by a well known threshold rule. When information on this distribution is lacking, a natural (though naive) approach is "explore-then-exploit," whereby the unknown distribution or its parameters are estimated over an initial exploration phase, and this estimate is then used in the DP to determine actions over the residual exploitation phase. We show: (i) with proper tuning, this approach leads to performance comparable to the full information DP solution; and (ii) despite common wisdom on the sensitivity of such "plug in" approaches in DP due to propagation of estimation errors, a surprisingly "short" (logarithmic in the horizon) exploration horizon suffices to obtain said performance. In cases where the underlying distribution is heavy-tailed, these observations are even more pronounced: a ${\it single \, sample}$ exploration phase suffices.

preprint2021arXiv

The Daily Life of Software Engineers during the COVID-19 Pandemic

Following the onset of the COVID-19 pandemic and subsequent lockdowns, software engineers' daily life was disrupted and abruptly forced into remote working from home. This change deeply impacted typical working routines, affecting both well-being and productivity. Moreover, this pandemic will have long-lasting effects in the software industry, with several tech companies allowing their employees to work from home indefinitely if they wish to do so. Therefore, it is crucial to analyze and understand how a typical working day looks like when working from home and how individual activities affect software developers' well-being and productivity. We performed a two-wave longitudinal study involving almost 200 globally carefully selected software professionals, inferring daily activities with perceived well-being, productivity, and other relevant psychological and social variables. Results suggest that the time software engineers spent doing specific activities from home was similar when working in the office. However, we also found some significant mean differences. The amount of time developers spent on each activity was unrelated to their well-being, perceived productivity, and other variables. We conclude that working remotely is not per se a challenge for organizations or developers.

preprint2020arXiv

A Tutorial on Thompson Sampling

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance. The algorithm addresses a broad range of problems in a computationally efficient manner and is therefore enjoying wide use. This tutorial covers the algorithm and its application, illustrating concepts through a range of examples, including Bernoulli bandit problems, shortest path problems, product recommendation, assortment, active learning with neural networks, and reinforcement learning in Markov decision processes. Most of these problems involve complex information structures, where information revealed by taking an action informs beliefs about other actions. We will also discuss when and why Thompson sampling is or is not effective and relations to alternative algorithms.

preprint2020arXiv

Satisficing in Time-Sensitive Bandit Learning

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action. One shortcoming is that this orientation does not account for time sensitivity, which can play a crucial role when learning an optimal action requires much more information than near-optimal ones. Indeed, popular approaches such as upper-confidence-bound methods and Thompson sampling can fare poorly in such situations. We consider instead learning a satisficing action, which is near-optimal while requiring less information, and propose satisficing Thompson sampling, an algorithm that serves this purpose. We establish a general bound on expected discounted regret and study the application of satisficing Thompson sampling to linear and infinite-armed bandits, demonstrating arbitrarily large benefits over Thompson sampling. We also discuss the relation between the notion of satisficing and the theory of rate distortion, which offers guidance on the selection of satisficing actions.

preprint2019arXiv

SQuAP-Ont: an Ontology of Software Quality Relational Factors from Financial Systems

Quality, architecture, and process are considered the keystones of software engineering. ISO defines them in three separate standards. However, their interaction has been scarcely studied, so far. The SQuAP model (Software Quality, Architecture, Process) describes twenty-eight main factors that impact on software quality in banking systems, and each factor is described as a relation among some characteristics from the three ISO standards. Hence, SQuAP makes such relations emerge rigorously, although informally. In this paper, we present SQuAP-Ont, an OWL ontology designed by following a well-established methodology based on the re-use of Ontology Design Patterns (i.e. ODPs). SQuAP-Ont formalises the relations emerging from SQuAP to represent and reason via Linked Data about software engineering in a three-dimensional model consisting of quality, architecture, and process ISO characteristics.