Researcher profile

Miroslav Pajic

Miroslav Pajic contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning

In offline-to-online reinforcement learning (O2O-RL), policies are first safely trained offline using previously collected datasets and then further fine-tuned for tasks via limited online interactions. In a typical O2O-RL pipeline, candidate policies trained with offline RL are evaluated via either off-policy evaluation (OPE) or online evaluation (OE). The policy with the highest estimated value is then deployed and continually fine-tuned. However, this setup has two main issues. First, OPE can be unreliable, making it risky to deploy a policy based solely on those estimates, whereas OE may identify a viable policy with substantial online interaction, which could have been used for fine-tuning. Second--and more importantly--it is also often not possible to determine a priori whether a pretrained policy will improve with post-deployment fine-tuning, especially in non-stationary environments. As a result, procedures committing to a single deployed policy are impractical in many real-world settings. Moreover, a naive remedy that exhaustively fine-tunes all candidates would violate interaction budget constraints and is likewise infeasible. In this paper, we propose a novel adaptive approach for policy selection and fine-tuning under online interaction budgets in O2O-RL. Following the standard pipeline, we first train a set of candidate policies with different offline RL algorithms and hyperparameters; we then perform OPE to obtain initial performance estimates. We next adaptively select and fine-tune the policies based on their predicted performance via an upper-confidence-bound approach thereby making efficient use of online interactions. We demonstrate that our approach improves upon O2O-RL baselines with various benchmarks.

preprint2022arXiv

Gradient Importance Learning for Incomplete Observations

Though recent works have developed methods that can generate estimates (or imputations) of the missing entries in a dataset to facilitate downstream analysis, most depend on assumptions that may not align with real-world applications and could suffer from poor performance in subsequent tasks such as classification. This is particularly true if the data have large missingness rates or a small sample size. More importantly, the imputation error could be propagated into the prediction step that follows, which may constrain the capabilities of the prediction model. In this work, we introduce the gradient importance learning (GIL) method to train multilayer perceptrons (MLPs) and long short-term memories (LSTMs) to directly perform inference from inputs containing missing values without imputation. Specifically, we employ reinforcement learning (RL) to adjust the gradients used to train these models via back-propagation. This allows the model to exploit the underlying information behind missingness patterns. We test the approach on real-world time-series (i.e., MIMIC-III), tabular data obtained from an eye clinic, and a standard dataset (i.e., MNIST), where our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.

preprint2022arXiv

Learning Monotone Dynamics by Neural Networks

Feed-forward neural networks (FNNs) work as standard building blocks in applying artificial intelligence (AI) to the physical world. They allow learning the dynamics of unknown physical systems (e.g., biological and chemical) {to predict their future behavior}. However, they are likely to violate the physical constraints of those systems without proper treatment. This work focuses on imposing two important physical constraints: monotonicity (i.e., a partial order of system states is preserved over time) and stability (i.e., the system states converge over time) when using FNNs to learn physical dynamics. For monotonicity constraints, we propose to use nonnegative neural networks and batch normalization. For both monotonicity and stability constraints, we propose to learn the system dynamics and corresponding Lyapunov function simultaneously. As demonstrated by case studies, our methods can preserve the stability and monotonicity of FNNs and significantly reduce their prediction errors.

preprint2022arXiv

Learning-Based Vulnerability Analysis of Cyber-Physical Systems

This work focuses on the use of deep learning for vulnerability analysis of cyber-physical systems (CPS). Specifically, we consider a control architecture widely used in CPS (e.g., robotics), where the low-level control is based on e.g., the extended Kalman filter (EKF) and an anomaly detector. To facilitate analyzing the impact potential sensing attacks could have, our objective is to develop learning-enabled attack generators capable of designing stealthy attacks that maximally degrade system operation. We show how such problem can be cast within a learning-based grey-box framework where parts of the runtime information are known to the attacker, and introduce two models based on feed-forward neural networks (FNN); both models are trained offline, using a cost function that combines the attack effects on the estimation error and the residual signal used for anomaly detection, so that the trained models are capable of recursively generating such effective sensor attacks in real-time. The effectiveness of the proposed methods is illustrated on several case studies.

preprint2022arXiv

Optimal Myopic Attacks on Nonlinear Estimation

Recent high-profile incidents have exposed security risks in control systems. Particularly important and safety-critical modules for security analysis are estimation and control (E&C). Prior works have analyzed the security of E&C for linear, time-invariant systems; however, there are few analyses of nonlinear systems despite their broad use. In an effort to facilitate identifying vulnerabilities in control systems, in this work we establish a class of optimal attacks on nonlinear E&C. Specifically, we define two attack objectives and illustrate that realizing the optimal attacks against the widely-adopted extended Kalman filter with industry-standard $χ^2$ anomaly detection is equivalent to solving convex quadratically-constrained quadratic programs. Given an appropriate information model for the attacker (i.e.,~a specified amount of attacker knowledge), we provide practical relaxations on the optimal attacks to allow for their computation at runtime. We also show that the difference between the optimal and relaxed attacks is bounded. Finally, we illustrate the use of the introduced attack designs on a case-study.

preprint2022arXiv

Resiliency of Nonlinear Control Systems to Stealthy Sensor Attacks

In this work, we focus on analyzing vulnerability of nonlinear dynamical control systems to stealthy sensor attacks. We start by defining the notion of stealthy attacks in the most general form by leveraging Neyman-Pearson lemma; specifically, an attack is considered to be stealthy if it is stealthy from (i.e., undetected by) any intrusion detector -- i.e., the probability of the detection is not better than a random guess. We then provide a sufficient condition under which a nonlinear control system is vulnerable to stealthy attacks, in terms of moving the system to an unsafe region due to the attacks. In particular, we show that if the closed-loop system is incrementally exponentially stable while the open-loop plant is incrementally unstable, then the system is vulnerable to stealthy yet impactful attacks on sensors. Finally, we illustrate our results on a case study.

preprint2021arXiv

Formal Verification of Stochastic Systems with ReLU Neural Network Controllers

In this work, we address the problem of formal safety verification for stochastic cyber-physical systems (CPS) equipped with ReLU neural network (NN) controllers. Our goal is to find the set of initial states from where, with a predetermined confidence, the system will not reach an unsafe configuration within a specified time horizon. Specifically, we consider discrete-time LTI systems with Gaussian noise, which we abstract by a suitable graph. Then, we formulate a Satisfiability Modulo Convex (SMC) problem to estimate upper bounds on the transition probabilities between nodes in the graph. Using this abstraction, we propose a method to compute tight bounds on the safety probabilities of nodes in this graph, despite possible over-approximations of the transition probabilities between these nodes. Additionally, using the proposed SMC formula, we devise a heuristic method to refine the abstraction of the system in order to further improve the estimated safety bounds. Finally, we corroborate the efficacy of the proposed method with simulation results considering a robot navigation example and comparison against a state-of-the-art verification scheme.

preprint2021arXiv

Probabilistic Conformance for Cyber-Physical Systems

In system analysis, conformance indicates that two systems simultaneously satisfy the same set of specifications of interest; thus, the results from analyzing one system automatically transfer to the other, or one system can safely replace the other in practice. In this work, we study the probabilistic conformance of cyber-physical systems (CPS). We propose a notion of (approximate) probabilistic conformance for sets of complex specifications expressed by the Signal Temporal Logic (STL). Based on a novel statistical test, we develop the first statistical verification methods for the probabilistic conformance of a wide class of CPS. Using this method, we verify the conformance of the startup time of the widely-used full and simplified model of Toyota powertrain systems, the settling time of model-predictive-control-based and neural-network-based automotive lane-keeping controllers, as well as the maximal voltage deviation of full and simplified power grid systems.

preprint2020arXiv

Attack-Resilient State Estimation with Intermittent Data Authentication

Network-based attacks on control systems may alter sensor data delivered to the controller, effectively causing degradation in control performance. As a result, having access to accurate state estimates, even in the presence of attacks on sensor measurements, is of critical importance. In this paper, we analyze performance of resilient state estimators (RSEs) when any subset of sensors may be compromised by a stealthy attacker. Specifically, we consider systems with the well-known l0-based RSE and two commonly used sound intrusion detectors (IDs). For linear time-invariant plants with bounded noise, we define the notion of perfect attackability (PA) when attacks may result in unbounded estimation errors while remaining undetected by the employed ID (i.e., stealthy). We derive necessary and sufficient PA conditions, showing that a system can be perfectly attackable even if the plant is stable. While PA can be prevented with the use the standard cryptographic mechanisms (e.g.,message authentication) that ensure data integrity under network-based attacks, their continuous use imposes significant communication and computational overhead. Consequently, we also study the impact that even intermittent use of data authentication has on RSE performance guarantees in the presence of stealthy attacks. We show that if messages from some of the sensors are even intermittently authenticated, stealthy attacks could not result in unbounded state estimation errors.

preprint2020arXiv

Context-Aware Temporal Logic for Probabilistic Systems

In this paper, we introduce the context-aware probabilistic temporal logic (CAPTL) that provides an intuitive way to formalize system requirements by a set of PCTL objectives with a context-based priority structure. We formally present the syntax and semantics of CAPTL and propose a synthesis algorithm for CAPTL requirements. We also implement the algorithm based on the PRISM-games model checker. Finally, we demonstrate the usage of CAPTL on two case studies: a robotic task planning problem, and synthesizing error-resilient scheduler for micro-electrode-dot-array digital microfluidic biochips.

preprint2020arXiv

Hyperproperties for Robotics: Planning via HyperLTL

There is a growing interest on formal methods-based robotic planning for temporal logic objectives. In this work, we extend the scope of existing synthesis methods to hyper-temporal logics. We are motivated by the fact that important planning objectives, such as optimality, robustness, and privacy, (maybe implicitly) involve the interrelation between multiple paths. Such objectives are thus hyperproperties, and cannot be expressed with usual temporal logics like the linear temporal logic (LTL). We show that such hyperproperties can be expressed by HyperLTL, an extension of LTL to multiple paths. To handle the complexity of planning with HyperLTL specifications, we introduce a symbolic approach for synthesizing planning strategies on discrete transition systems. Our planning method is evaluated on several case studies.

preprint2020arXiv

Learning Expected Reward for Switched Linear Control Systems: A Non-Asymptotic View

In this work, we show existence of invariant ergodic measure for switched linear dynamical systems (SLDSs) under a norm-stability assumption of system dynamics in some unbounded subset of $\mathbb{R}^{n}$. Consequently, given a stationary Markov control policy, we derive non-asymptotic bounds for learning expected reward (w.r.t the invariant ergodic measure our closed-loop system mixes to) from time-averages using Birkhoff's Ergodic Theorem. The presented results provide a foundation for deriving non-asymptotic analysis for average reward-based optimal control of SLDSs. Finally, we illustrate the presented theoretical results in two case-studies.

preprint2020arXiv

Security Analysis for Distributed IoT-Based Industrial Automation

With ever-expanding computation and communication capabilities of modern embedded platforms, Internet of Things (IoT) technologies enable development of Reconfigurable Manufacturing Systems---a new generation of highly modularized industrial equipment suitable for highly-customized manufacturing. Sequential control in these systems is largely based on discrete events, while their formal execution semantics is specified as Control Interpreted Petri Nets (CIPN). Despite industry-wide use of programming languages based on the CIPN formalism, formal verification of such control applications in the presence of adversarial activity is not supported. Consequently, in this paper we focus on security-aware modeling and verification challenges for CIPN-based sequential control applications. Specifically, we show how CIPN models of networked industrial IoT controllers can be transformed into Time Petri Net (TPN)-based models, and composed with plant and security-aware channel models in order to enable system-level verification of safety properties in the presence of network-based attacks. Additionally, we introduce realistic channel-specific attack models that capture adversarial behavior using nondeterminism. Moreover, we show how verification results can be utilized to introduce security patches and motivate design of attack detectors that improve overall system resiliency, and allow satisfaction of critical safety properties. Finally, we evaluate our framework on an industrial case study.

preprint2020arXiv

Statistical Model Checking for Hyperproperties

Hyperproperties have shown to be a powerful tool for expressing and reasoning about information-flow security policies. In this paper, we investigate the problem of statistical model checking (SMC) for hyperproperties. Unlike exhaustive model checking, SMC works based on drawing samples from the system at hand and evaluate the specification with statistical confidence. The main benefit of applying SMC over exhaustive techniques is its efficiency and scalability. To reason about probabilistic hyperproperties, we first propose the temporal logic HyperPCLT* that extends PCTL* and HyperPCTL. We show that HyperPCLT* can express important probabilistic information-flow security policies that cannot be expressed with HyperPCTL. Then, we introduce SMC algorithms for verifying HyperPCLT* formulas on discrete-time Markov chains, based on sequential probability ratio tests (SPRT) with a new notion of multi-dimensional indifference region. Our SMC algorithms can handle both non-nested and nested probability operators for any desired significance level. To show the effectiveness of our technique, we evaluate our SMC algorithms on four case studies focused on information security: timing side-channel vulnerability in encryption, probabilistic anonymity in dining cryptographers, probabilistic noninterference of parallel programs, and the performance of a randomized cache replacement policy that acts as a countermeasure against cache flush attacks.