Researcher profile

Mykel Kochenderfer

Mykel Kochenderfer contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

SecureForge: Finding and Preventing Vulnerabilities in LLM-Generated Code via Prompt Optimization

LLM coding agents now generate code at an unprecedented scale, yet LLM-generated code introduces cybersecurity vulnerabilities into codebases without human involvement. Even when frontier models are explicitly asked to write secure production code with relevant weaknesses to avoid in context, we find that they still produce verifiable vulnerabilities on average 23% of the time across a corpus of 250 benign coding prompts. We introduce SecureForge, an automated pipeline that both audits security risks of frontier models and produces auditing-informed secure system prompts that reduce output security vulnerabilities while maintaining unit test performance. SecureForge first identifies benign prompts that produce statically detectable vulnerabilities, and then amplifies them into a large synthetic prompt corpus of diverse scenarios using a Markovian sampling technique to jointly maintain error rates and prompt diversity. This corpus is then used to iteratively optimize the system prompts to reduce output security vulnerabilities. On frontier models, SecureForge yields a statistically significant Pareto improvement in both unit test success and output security, with output vulnerabilities reduced by up to 48%. The resulting system prompts transfer zero-shot to in-the-wild coding agent prompts, without any exposure to real user prompt distributions during optimization.

preprint2022arXiv

A Light-Weight Multi-Objective Asynchronous Hyper-Parameter Optimizer

We describe a light-weight yet performant system for hyper-parameter optimization that approximately minimizes an overall scalar cost function that is obtained by combining multiple performance objectives using a target-priority-limit scalarizer. It also supports a trade-off mode, where the goal is to find an appropriate trade-off among objectives by interacting with the user. We focus on the common scenario where there are on the order of tens of hyper-parameters, each with various attributes such as a range of continuous values, or a finite list of values, and whether it should be treated on a linear or logarithmic scale. The system supports multiple asynchronous simulations and is robust to simulation stragglers and failures.

preprint2022arXiv

Coordinated Multi-Agent Pathfinding for Drones and Trucks over Road Networks

We address the problem of routing a team of drones and trucks over large-scale urban road networks. To conserve their limited flight energy, drones can use trucks as temporary modes of transit en route to their own destinations. Such coordination can yield significant savings in total vehicle distance traveled, i.e., truck travel distance and drone flight distance, compared to operating drones and trucks independently. But it comes at the potentially prohibitive computational cost of deciding which trucks and drones should coordinate and when and where it is most beneficial to do so. We tackle this fundamental trade-off by decoupling our overall intractable problem into tractable sub-problems that we solve stage-wise. The first stage solves only for trucks, by computing paths that make them more likely to be useful transit options for drones. The second stage solves only for drones, by routing them over a composite of the road network and the transit network defined by truck paths from the first stage. We design a comprehensive algorithmic framework that frames each stage as a multi-agent path-finding problem and implement two distinct methods for solving them. We evaluate our approach on extensive simulations with up to $100$ agents on the real-world Manhattan road network containing nearly $4500$ vertices and $10000$ edges. Our framework saves on more than $50\%$ of vehicle distance traveled compared to independently solving for trucks and drones, and computes solutions for all settings within $5$ minutes on commodity hardware.

preprint2022arXiv

Disentangling Epistemic and Aleatoric Uncertainty in Reinforcement Learning

Characterizing aleatoric and epistemic uncertainty on the predicted rewards can help in building reliable reinforcement learning (RL) systems. Aleatoric uncertainty results from the irreducible environment stochasticity leading to inherently risky states and actions. Epistemic uncertainty results from the limited information accumulated during learning to make informed decisions. Characterizing aleatoric and epistemic uncertainty can be used to speed up learning in a training environment, improve generalization to similar testing environments, and flag unfamiliar behavior in anomalous testing environments. In this work, we introduce a framework for disentangling aleatoric and epistemic uncertainty in RL. (1) We first define four desiderata that capture the desired behavior for aleatoric and epistemic uncertainty estimation in RL at both training and testing time. (2) We then present four RL models inspired by supervised learning (i.e. Monte Carlo dropout, ensemble, deep kernel learning models, and evidential networks) to instantiate aleatoric and epistemic uncertainty. Finally, (3) we propose a practical evaluation method to evaluate uncertainty estimation in model-free RL based on detection of out-of-distribution environments and generalization to perturbed environments. We present theoretical and experimental evidence to validate that carefully equipping model-free RL agents with supervised learning uncertainty methods can fulfill our desiderata.

preprint2022arXiv

FIG-OP: Exploring Large-Scale Unknown Environments on a Fixed Time Budget

We present a method for autonomous exploration of large-scale unknown environments under mission time constraints. We start by proposing the Frontloaded Information Gain Orienteering Problem (FIG-OP) -- a generalization of the traditional orienteering problem where the assumption of a reliable environmental model no longer holds. The FIG-OP addresses model uncertainty by frontloading expected information gain through the addition of a greedy incentive, effectively expediting the moment in which new area is uncovered. In order to reason across multi-kilometre environments, we solve FIG-OP over an information-efficient world representation, constructed through the aggregation of information from a topological and metric map. Our method was extensively tested and field-hardened across various complex environments, ranging from subway systems to mines. In comparative simulations, we observe that the FIG-OP solution exhibits improved coverage efficiency over solutions generated by greedy and traditional orienteering-based approaches (i.e. severe and minimal model uncertainty assumptions, respectively).

preprint2022arXiv

Improving Automated Driving through POMDP Planning with Human Internal States

This work examines the hypothesis that partially observable Markov decision process (POMDP) planning with human driver internal states can significantly improve both safety and efficiency in autonomous freeway driving. We evaluate this hypothesis in a simulated scenario where an autonomous car must safely perform three lane changes in rapid succession. Approximate POMDP solutions are obtained through the partially observable Monte Carlo planning with observation widening (POMCPOW) algorithm. This approach outperforms over-confident and conservative MDP baselines and matches or outperforms QMDP. Relative to the MDP baselines, POMCPOW typically cuts the rate of unsafe situations in half or increases the success rate by 50%.

preprint2022arXiv

Portfolio Construction as Linearly Constrained Separable Optimization

Mean-variance portfolio optimization problems often involve separable nonconvex terms, including penalties on capital gains, integer share constraints, and minimum position and trade sizes. We propose a heuristic algorithm for such problems based on the alternating direction method of multipliers (ADMM). This method allows for solve times in tens to hundreds of milliseconds with around 1000 securities and 100 risk factors. We also obtain a bound on the achievable performance. Our heuristic and bound are both derived from similar results for other optimization problems with a separable objective and affine equality constraints. We discuss a concrete implementation in the case where the separable terms in the objective are piecewise quadratic, and we empirically demonstrate its effectiveness for tax-aware portfolio construction.

preprint2022arXiv

Strategic Asset Allocation with Illiquid Alternatives

We address the problem of strategic asset allocation (SAA) with portfolios that include illiquid alternative asset classes. The main challenge in portfolio construction with illiquid asset classes is that we do not have direct control over our positions, as we do in liquid asset classes. Instead we can only make commitments; the position builds up over time as capital calls come in, and reduces over time as distributions occur, neither of which the investor has direct control over. The effect on positions of our commitments is subject to a delay, typically of a few years, and is also unknown or stochastic. A further challenge is the requirement that we can meet the capital calls, with very high probability, with our liquid assets. We formulate the illiquid dynamics as a random linear system, and propose a convex optimization based model predictive control (MPC) policy for allocating liquid assets and making new illiquid commitments in each period. Despite the challenges of time delay and uncertainty, we show that this policy attains performance surprisingly close to a fictional setting where we pretend the illiquid asset classes are completely liquid, and we can arbitrarily and immediately adjust our positions. In this paper we focus on the growth problem, with no external liabilities or income, but the method is readily extended to handle this case.

preprint2021arXiv

Hierarchical Planning for Resource Allocation in Emergency Response Systems

A classical problem in city-scale cyber-physical systems (CPS) is resource allocation under uncertainty. Typically, such problems are modeled as Markov (or semi-Markov) decision processes. While online, offline, and decentralized approaches have been applied to such problems, they have difficulty scaling to large decision problems. We present a general approach to hierarchical planning that leverages structure in city-level CPS problems for resource allocation under uncertainty. We use the emergency response as a case study and show how a large resource allocation problem can be split into smaller problems. We then create a principled framework for solving the smaller problems and tackling the interaction between them. Finally, we use real-world data from Nashville, Tennessee, a major metropolitan area in the United States, to validate our approach. Our experiments show that the proposed approach outperforms state-of-the-art approaches used in the field of emergency response.

preprint2020arXiv

Learning Near Optimal Policies with Low Inherent Bellman Error

We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning under the notion of low inherent Bellman error, a condition normally employed to show convergence of approximate value iteration. First we relate this condition to other common frameworks and show that it is strictly more general than the low rank (or linear) MDP assumption of prior work. Second we provide an algorithm with a high probability regret bound $\widetilde O(\sum_{t=1}^H d_t \sqrt{K} + \sum_{t=1}^H \sqrt{d_t} \IBE K)$ where $H$ is the horizon, $K$ is the number of episodes, $\IBE$ is the value if the inherent Bellman error and $d_t$ is the feature dimension at timestep $t$. In addition, we show that the result is unimprovable beyond constants and logs by showing a matching lower bound. This has two important consequences: 1) it shows that exploration is possible using only \emph{batch assumptions} with an algorithm that achieves the optimal statistical rate for the setting we consider, which is more general than prior work on low-rank MDPs 2) the lack of closedness (measured by the inherent Bellman error) is only amplified by $\sqrt{d_t}$ despite working in the online setting. Finally, the algorithm reduces to the celebrated \textsc{LinUCB} when $H=1$ but with a different choice of the exploration parameter that allows handling misspecified contextual linear bandits. While computational tractability questions remain open for the MDP setting, this enriches the class of MDPs with a linear representation for the action-value function where statistically efficient reinforcement learning is possible.

preprint2020arXiv

On Algorithmic Decision Procedures in Emergency Response Systems in Smart and Connected Communities

Emergency Response Management (ERM) is a critical problem faced by communities across the globe. Despite this, it is common for ERM systems to follow myopic decision policies in the real world. Principled approaches to aid ERM decision-making under uncertainty have been explored but have failed to be accepted into real systems. We identify a key issue impeding their adoption --- algorithmic approaches to emergency response focus on reactive, post-incident dispatching actions, i.e. optimally dispatching a responder \textit{after} incidents occur. However, the critical nature of emergency response dictates that when an incident occurs, first responders always dispatch the closest available responder to the incident. We argue that the crucial period of planning for ERM systems is not post-incident, but between incidents. This is not a trivial planning problem --- a major challenge with dynamically balancing the spatial distribution of responders is the complexity of the problem. An orthogonal problem in ERM systems is planning under limited communication, which is particularly important in disaster scenarios that affect communication networks. We address both problems by proposing two partially decentralized multi-agent planning algorithms that utilize heuristics and exploit the structure of the dispatch problem. We evaluate our proposed approach using real-world data, and find that in several contexts, dynamic re-balancing the spatial distribution of emergency responders reduces both the average response time as well as its variance.

preprint2020arXiv

Online Parameter Estimation for Human Driver Behavior Prediction

Driver models are invaluable for planning in autonomous vehicles as well as validating their safety in simulation. Highly parameterized black-box driver models are very expressive, and can capture nuanced behavior. However, they usually lack interpretability and sometimes exhibit unrealistic-even dangerous-behavior. Rule-based models are interpretable, and can be designed to guarantee "safe" behavior, but are less expressive due to their low number of parameters. In this article, we show that online parameter estimation applied to the Intelligent Driver Model captures nuanced individual driving behavior while providing collision free trajectories. We solve the online parameter estimation problem using particle filtering, and benchmark performance against rule-based and black-box driver models on two real world driving data sets. We evaluate the closeness of our driver model to ground truth data demonstration and also assess the safety of the resulting emergent driving behavior.