Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
63works
0followers
29topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

63 published item(s)

preprint2026arXiv

Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care

We reframe clinician overrides of clinical AI recommendations as implicit preference data - the same signal structure exploited by reinforcement learning from human feedback (RLHF), but richer: the annotator is a domain expert, the alternatives carry real consequences, and downstream outcomes are observable. We present a formal framework extending standard preference learning with three contributions: a five-category override taxonomy mapping override types to distinct model update targets; a preference formulation conditioned on patient state s, organizational context c, and clinician capability kappa, where kappa decomposes into execution capability kappa-exec and alignment capability kappa-align; and a dual learning architecture that jointly trains a reward model and a capability model via alternating optimization, preventing a failure mode we term suppression bias-the systematic suppression of correct-but-difficult recommendations when clinician capability falls below the execution threshold. We argue that chronic disease management under outcome-based payment contracts produces override data with uniquely favorable properties-longitudinal density, concentrated decision space, outcome labels, and natural capability variation-and that training environments combining longitudinal outcome measurement with aligned financial incentives are a necessary condition for learning a reward model aligned with patient trajectory rather than with encounter economics. This framework emerged from operational work to improve clinician capability in a live value-based care deployment.

preprint2026arXiv

Learning to Compress Time-to-Control: A Reinforcement Learning Framework for Chronic Disease Management

Reinforcement learning (RL) in healthcare has had mixed results, with reward sparsity, unreliable off-policy evaluation, and deployment-simulation gap as recurring failure modes. We argue that chronic disease management is structurally a more tractable RL setting than the acute-care problems the field has primarily studied, but only if the problem is formalized to exploit chronic care's properties. We propose such a formalization. The agent's objective is to compress time-to-control (TTC) under a tiered reward calibrated to the CMS ACCESS Model. Two quantities from our companion preference-learning paper [Singh et al. 2026] enter as load-bearing structural elements: the execution intensity εbounds action availability under a constrained Markov Decision Process, and the clinician capability κweights offline-data transitions during RL training. Together they couple preference learning and RL into a two-loop architecture. We present simulation results on synthetic state machines for hypertension and type 2 diabetes. Capability-weighted offline RL outperforms uniform-weighted offline RL and the behavior policy by 15 percentage points on T2D TTC; the uniform-weighted formulation (the standard in existing healthcare RL) underperforms even the heterogeneous behavior policy. \Epsilon-aware policies generalize across deployment regimes while ε-naive policies do not.

preprint2026arXiv

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

Generative control policies (GCPs), such as diffusion- and flow-based control policies, have emerged as effective parameterizations for robot learning. This work introduces Off-policy Generative Policy Optimization (OGPO), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward. OGPO achieves state-of-the-art performance on manipulation tasks spanning multi-task settings, high-precision insertion, and dexterous control. To our knowledge, it is also the only method that can fine-tune poorly-initialized behavior cloning policies to near full task-success with no expert data in the online replay buffer, and does so with few task-specific hyperparameter tuning. Through extensive empirical investigations, we demonstrate the OGPO drastically outperforms methods alternatives on policy steering and learning residual corrections, and identify the key mechanisms behind its performance. We further introduce practical stabilizers, including success-buffer regularization, conservative advantages, $χ^2$ regularization, and Q-variance reduction, to mitigate critic over-exploitation across state- and pixel-based settings. Beyond proposing OGPO, we conduct a systematic empirical study of GCP finetuning, identifying the stabilizing mechanisms and failure modes that govern successful off-policy full-policy improvement.

preprint2026arXiv

TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning

Fine-tuning pre-trained robot policies with reinforcement learning (RL) often inherits the bottlenecks introduced by pre-training with behavioral cloning (BC), which produces narrow action distributions that lack the coverage necessary for downstream exploration. We present a unified framework that enables the exploration necessary to enable efficient robot policy finetuning by bridging BC pre-training and RL fine-tuning. Our pre-training method, Context-Smoothed Pre-training (CSP), injects forward-diffusion noise into policy inputs, creating a continuum between precise imitation and broad action coverage. We then fine-tune pre-trained policies via Timestep-Modulated Reinforcement Learning (TMRL), which trains the agent to dynamically adjust this conditioning during fine-tuning by modulating the diffusion timestep, granting explicit control over exploration. Integrating seamlessly with arbitrary policy inputs, e.g., states, 3D point clouds, or image-based VLA policies, we show that TMRL improves RL fine-tuning sample efficiency. Notably, TMRL enables successful real-world fine-tuning on complex manipulation tasks in under one hour. Videos and code available at https://weirdlabuw.github.io/tmrl/.

preprint2025arXiv

PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies

A significant challenge for robot learning research is our ability to accurately measure and compare the performance of robot policies. Benchmarking in robotics is historically challenging due to the stochasticity, reproducibility, and time-consuming nature of real-world rollouts. This challenge is exacerbated for recent generalist policies, which has to be evaluated across a wide variety of scenes and tasks. Evaluation in simulation offers a scalable complement to real world evaluations, but the visual and physical domain gap between existing simulation benchmarks and the real world has made them an unreliable signal for policy improvement. Furthermore, building realistic and diverse simulated environments has traditionally required significant human effort and expertise. To bridge the gap, we introduce Policy Evaluation and Environment Reconstruction in Simulation (PolaRiS), a scalable real-to-sim framework for high-fidelity simulated robot evaluation. PolaRiS utilizes neural reconstruction methods to turn short video scans of real-world scenes into interactive simulation environments. Additionally, we develop a simple simulation data co-training recipe that bridges remaining real-to-sim gaps and enables zero-shot evaluation in unseen simulation environments. Through extensive paired evaluations between simulation and the real world, we demonstrate that PolaRiS evaluations provide a much stronger correlation to real world generalist policy performance than existing simulated benchmarks. Its simplicity also enables rapid creation of diverse simulated environments. As such, this work takes a step towards distributed and democratized evaluation for the next generation of robotic foundation models.

preprint2023arXiv

Multiform Evolution for High-Dimensional Problems with Low Effective Dimensionality

In this paper, we scale evolutionary algorithms to high-dimensional optimization problems that deceptively possess a low effective dimensionality (certain dimensions do not significantly affect the objective function). To this end, an instantiation of the multiform optimization paradigm is presented, where multiple low-dimensional counterparts of a target high-dimensional task are generated via random embeddings. Since the exact relationship between the auxiliary (low-dimensional) tasks and the target is a priori unknown, a multiform evolutionary algorithm is developed for unifying all formulations into a single multi-task setting. The resultant joint optimization enables the target task to efficiently reuse solutions evolved across various low-dimensional searches via cross-form genetic transfers, hence speeding up overall convergence characteristics. To validate the overall efficacy of our proposed algorithmic framework, comprehensive experimental studies are carried out on well-known continuous benchmark functions as well as a set of practical problems in the hyper-parameter tuning of machine learning models and deep learning models in classification tasks and Predator-Prey games, respectively.

preprint2023arXiv

Pendulum Actuated Spherical Robot: Dynamic Modeling & Analysis for Wobble & Precession

A spherical robot has many practical advantages as the entire electronics are protected within a hull and can be carried easily by any Unmanned Aerial Vehicle (UAV). However, its use is limited due to finding mounts for sensors. Pendulum actuated spherical robot provides space for mounting sensors at the yoke. We study the non-linear dynamics of a pendulum-actuated spherical robot to analyze the dynamics of internal assembly (yoke) for mounting sensors. For such robots, we provide a coupled dynamic model that takes care of the relationship between forward and sideways motion. We further demonstrate the effects of wobbling and precession captured by our model when the bot is controlled to execute a turning maneuver while moving with a moderate forward velocity, a practical situation encountered by spherical robots moving in an indoor setting. A simulation setup based on the developed model provides visualization of the spherical robot motion.

preprint2023arXiv

Wobble control of a pendulum actuated spherical robot

Spherical robots can conduct surveillance in hostile, cluttered environments without being damaged, as their protective shell can safely house sensors such as cameras. However, lateral oscillations, also known as wobble, occur when these sphere-shaped robots operate at low speeds, leading to shaky camera feedback. These oscillations in a pendulum-actuated spherical robot are caused by the coupling between the forward and steering motions due to nonholonomic constraints. Designing a controller to limit wobbling in these robots is challenging due to their underactuated nature. We propose a model-based controller to navigate a pendulum-actuated spherical robot using wobble-free turning maneuvers consisting of circular arcs and straight lines. The model is developed using Lagrange-D'Alembert equations and accounts for the coupled forward and steering motions. The model is further analyzed to derive expressions for radius of curvature, precession rate, wobble amplitude, and wobble frequency during circular motions. Finally, we design an input-output feedback linearization-based controller to control the robot's heading direction and wobble. Overall, the proposed controller enables a teleoperator to command a specific forward velocity and pendulum angle as per the desired turning radius while limiting the robot's lateral oscillations to enhance the quality of camera feedback.

preprint2022arXiv

Autonomous Reinforcement Learning: Formalism and Benchmarking

Reinforcement learning (RL) provides a naturalistic framing for learning through trial and error, which is appealing both because of its simplicity and effectiveness and because of its resemblance to how humans and animals acquire skills through experience. However, real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world, whereas common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms, such as robots. In this paper, we aim to address this discrepancy by laying out a framework for Autonomous Reinforcement Learning (ARL): reinforcement learning where the agent not only learns through its own experience, but also contends with lack of human supervision to reset between trials. We introduce a simulated benchmark EARL around this framework, containing a set of diverse and challenging simulated tasks reflective of the hurdles introduced to learning when only a minimal reliance on extrinsic intervention can be assumed. We show that standard approaches to episodic RL and existing approaches struggle as interventions are minimized, underscoring the need for developing new algorithms for reinforcement learning with a greater focus on autonomy.

preprint2022arXiv

Combining Varied Learners for Binary Classification using Stacked Generalization

The Machine Learning has various learning algorithms that are better in some or the other aspect when compared with each other but a common error that all algorithms will suffer from is training data with very high dimensional feature set. This usually ends up algorithms into generalization error that deplete the performance. This can be solved using an Ensemble Learning method known as Stacking commonly termed as Stacked Generalization. In this paper we perform binary classification using Stacked Generalization on high dimensional Polycystic Ovary Syndrome dataset and prove the point that model becomes generalized and metrics improve significantly. The various metrics are given in this paper that also point out a subtle transgression found with Receiver Operating Characteristic Curve that was proved to be incorrect.

preprint2022arXiv

Demonstration-Bootstrapped Autonomous Practicing via Multi-Task Reinforcement Learning

Reinforcement learning systems have the potential to enable continuous improvement in unstructured environments, leveraging data collected autonomously. However, in practice these systems require significant amounts of instrumentation or human intervention to learn in the real world. In this work, we propose a system for reinforcement learning that leverages multi-task reinforcement learning bootstrapped with prior data to enable continuous autonomous practicing, minimizing the number of resets needed while being able to learn temporally extended behaviors. We show how appropriately provided prior data can help bootstrap both low-level multi-task policies and strategies for sequencing these tasks one after another to enable learning with minimal resets. This mechanism enables our robotic system to practice with minimal human intervention at training time while being able to solve long horizon tasks at test time. We show the efficacy of the proposed system on a challenging kitchen manipulation task both in simulation and in the real world, demonstrating the ability to practice autonomously in order to solve temporally extended problems.

preprint2022arXiv

Detection of Tool based Edited Images from Error Level Analysis and Convolutional Neural Network

Image Forgery is a problem of image forensics and its detection can be leveraged using Deep Learning. In this paper we present an approach for identification of authentic and tampered images done using image editing tools with Error Level Analysis and Convolutional Neural Network. The process is performed on CASIA ITDE v2 dataset and trained for 50 and 100 epochs respectively. The respective accuracies of the training and validation sets are represented using graphs.

preprint2022arXiv

Discriminant Analysis in Contrasting Dimensions for Polycystic Ovary Syndrome Prognostication

A lot of prognostication methodologies have been formulated for early detection of Polycystic Ovary Syndrome also known as PCOS using Machine Learning. PCOS is a binary classification problem. Dimensionality Reduction methods impact the performance of Machine Learning to a greater extent and using a Supervised Dimensionality Reduction method can give us a new edge to tackle this problem. In this paper we present Discriminant Analysis in different dimensions with Linear and Quadratic form for binary classification along with metrics. We were able to achieve good accuracy and less variation with Discriminant Analysis as compared to many commonly used classification algorithms with training accuracy reaching 97.37% and testing accuracy of 95.92% using Quadratic Discriminant Analysis. Paper also gives the analysis of data with visualizations for deeper understanding of problem.

preprint2022arXiv

Effects of Parametric and Non-Parametric Methods on High Dimensional Sparse Matrix Representations

The semantics are derived from textual data that provide representations for Machine Learning algorithms. These representations are interpretable form of high dimensional sparse matrix that are given as an input to the machine learning algorithms. Since learning methods are broadly classified as parametric and non-parametric learning methods, in this paper we provide the effects of these type of algorithms on the high dimensional sparse matrix representations. In order to derive the representations from the text data, we have considered TF-IDF representation with valid reason in the paper. We have formed representations of 50, 100, 500, 1000 and 5000 dimensions respectively over which we have performed classification using Linear Discriminant Analysis and Naive Bayes as parametric learning method, Decision Tree and Support Vector Machines as non-parametric learning method. We have later provided the metrics on every single dimension of the representation and effect of every single algorithm detailed in this paper.

preprint2022arXiv

Forged Image Detection using SOTA Image Classification Deep Learning Methods for Image Forensics with Error Level Analysis

The advancement in the area of computer vision has been brought using deep learning mechanisms. Image Forensics is one of the major areas of computer vision application. Forgery of images is sub-category of image forensics and can be detected using Error Level Analysis. Using such images as an input, this can turn out to be a binary classification problem which can be leveraged using variations of convolutional neural networks. In this paper we perform transfer learning with state-of-the-art image classification models over error level analysis induced CASIA ITDE v.2 dataset. The algorithms used are VGG-19, Inception-V3, ResNet-152-V2, XceptionNet and EfficientNet-V2L with their respective methodologies and results.

preprint2022arXiv

HAA4D: Few-Shot Human Atomic Action Recognition via 3D Spatio-Temporal Skeletal Alignment

Human actions involve complex pose variations and their 2D projections can be highly ambiguous. Thus 3D spatio-temporal or 4D (i.e., 3D+T) human skeletons, which are photometric and viewpoint invariant, are an excellent alternative to 2D+T skeletons/pixels to improve action recognition accuracy. This paper proposes a new 4D dataset HAA4D which consists of more than 3,300 RGB videos in 300 human atomic action classes. HAA4D is clean, diverse, class-balanced where each class is viewpoint-balanced with the use of 4D skeletons, in which as few as one 4D skeleton per class is sufficient for training a deep recognition model. Further, the choice of atomic actions makes annotation even easier, because each video clip lasts for only a few seconds. All training and testing 3D skeletons in HAA4D are globally aligned, using a deep alignment model to the same global space, making each skeleton face the negative z-direction. Such alignment makes matching skeletons more stable by reducing intraclass variations and thus with fewer training samples per class needed for action recognition. Given the high diversity and skeletal alignment in HAA4D, we construct the first baseline few-shot 4D human atomic action recognition network without bells and whistles, which produces comparable or higher performance than relevant state-of-the-art techniques relying on embedded space encoding without explicit skeletal alignment, using the same small number of training samples of unseen classes.

preprint2022arXiv

Half a Dozen Real-World Applications of Evolutionary Multitasking, and More

Until recently, the potential to transfer evolved skills across distinct optimization problem instances (or tasks) was seldom explored in evolutionary computation. The concept of evolutionary multitasking (EMT) fills this gap. It unlocks a population's implicit parallelism to jointly solve a set of tasks, hence creating avenues for skills transfer between them. Despite it being early days, the idea of EMT has begun to show promise in a range of real-world applications. In the backdrop of recent advances, the contribution of this paper is twofold. First, a review of several application-oriented explorations of EMT in the literature is presented; the works are assimilated into half a dozen broad categories according to their respective application domains. Each of these six categories elaborates fundamental motivations to multitask, and contains a representative experimental study (referred from the literature). Second, a set of recipes is provided showing how problem formulations of general interest, those that cut across different disciplines, could be transformed in the new light of EMT. Our discussions emphasize the many practical use-cases of EMT, and is intended to spark future research towards crafting novel algorithms for real-world deployment.

preprint2022arXiv

Incentive Design and Profit Sharing in Multi-modal Transportation Network

We consider the situation where multiple transportation service providers cooperate to offer an integrated multi-modal platform to enhance the convenience to the passengers through ease in multi-modal journey planning, payment, and first and last mile connectivity. This market structure allows the multi-modal platform to coordinate profits across modes and also provide incentives to the passengers. Accordingly, in this paper, we use cooperative game theory coupled with the hyperpath-based stochastic user equilibrium framework to study such a market. We assume that the platform sets incentives (price discount or excess charge on passengers) along every edge in the transportation network. We derive the continuity and monotonicity properties of the equilibrium flow with respect to the incentives along every edge. The optimal incentives that maximize the profit of the platform are obtained through a two time-scale stochastic approximation algorithm. We use the asymmetric Nash bargaining solution to design a fair profit sharing scheme among the service providers. We show that the profit for each service provider increases after cooperation on such a platform. Finally, we complement the theoretical results through two numerical simulations.

preprint2022arXiv

Interference Constrained Beam Alignment for Time-Varying Channels via Kernelized Bandits

To fully utilize the abundant spectrum resources in millimeter wave (mmWave), Beam Alignment (BA) is necessary for large antenna arrays to achieve large array gains. In practical dynamic wireless environments, channel modeling is challenging due to time-varying and multipath effects. In this paper, we formulate the beam alignment problem as a non-stationary online learning problem with the objective to maximize the received signal strength under interference constraint. In particular, we employ the non-stationary kernelized bandit to leverage the correlation among beams and model the complex beamforming and multipath channel functions. Furthermore, to mitigate interference to other user equipment, we leverage the primal-dual method to design a constrained UCB-type kernelized bandit algorithm. Our theoretical analysis indicates that the proposed algorithm can adaptively adjust the beam in time-varying environments, such that both the cumulative regret of the received signal and constraint violations have sublinear bounds with respect to time. This result is of independent interest for applications such as adaptive pricing and news ranking. In addition, the algorithm assumes the channel is a black-box function and does not require any prior knowledge for dynamic channel modeling, and thus is applicable in a variety of scenarios. We further show that if the information about the channel variation is known, the algorithm will have better theoretical guarantees and performance. Finally, we conduct simulations to highlight the effectiveness of the proposed algorithm.

preprint2022arXiv

Performance Comparison of Simple Transformer and Res-CNN-BiLSTM for Cyberbullying Classification

The task of text classification using Bidirectional based LSTM architectures is computationally expensive and time consuming to train. For this, transformers were discovered which effectively give good performance as compared to the traditional deep learning architectures. In this paper we present a performance based comparison between simple transformer based network and Res-CNN-BiLSTM based network for cyberbullying text classification problem. The results obtained show that transformer we trained with 0.65 million parameters has significantly being able to beat the performance of Res-CNN-BiLSTM with 48.82 million parameters for faster training speeds and more generalized metrics. The paper also compares the 1-dimensional character level embedding network and 100-dimensional glove embedding network with transformer.

preprint2022arXiv

Prediction of Maneuvering Status for Aerial Vehicles using Supervised Learning Methods

Aerial Vehicles follow a guided approach based on Latitude, Longitude and Altitude. This information can be used for calculating the status of maneuvering for the aerial vehicles along the line of trajectory. This is a binary classification problem and Machine Learning can be leveraged for solving such problem. In this paper we present a methodology for deriving maneuvering status and its prediction using Linear, Distance Metric, Discriminant Analysis and Boosting Ensemble supervised learning methods. We provide various metrics along the line in the results section that give condensed comparison of the appropriate algorithm for prediction of the maneuvering status.

preprint2022arXiv

Requisites on viscoelasticity for exceptional points in passive elastodynamic metamaterials

The recent progress of non-Hermitian physics and the notion of exceptional point (EP) degeneracies in elastodynamics has led to the development of novel metamaterials for the control of elastic wave propagation, hypersensitive sensors, and actuators. The emergence of EPs in a Parity-Time symmetric system relies on judiciously engineered balanced gain and loss mechanisms. Creating gain requires complex circuits and amplification mechanisms, making engineering applications challenging. Here, we report strategies to achieve EPs in passive non-Hermitian elastodynamic systems with differential loss derived from viscoelastic materials. We compare different viscoelastic material models and show that the EP emerges only when the frequency-dependent loss-tangent of the viscoelastic material remains nearly constant in the frequency range of operation. Such type of loss tangent occurs in materials that undergo stress-relaxation over a broad spectrum of relaxation times, for example, materials that follow the Kelvin-Voigt fractional derivative (KVFD) model. Using dynamic mechanical analysis, we show that a few common viscoelastic elastomers such as Polydimethylsiloxane (PDMS) and polyurethane rubber follow the KVFD behavior such that the loss tangent becomes almost constant after a particular frequency. The material models we present and the demonstration of the potential of a widely available material system in creating EPs pave the way for developing non-Hermitian metamaterials with hypersensitivity to perturbations or enhanced emissivity.

preprint2022arXiv

Res-CNN-BiLSTM Network for overcoming Mental Health Disturbances caused due to Cyberbullying through Social Media

Mental Health Disturbance has many reasons and cyberbullying is one of the major causes that does exploitation using social media as an instrument. The cyberbullying is done on the basis of Religion, Ethnicity, Age and Gender which is a sensitive psychological issue. This can be addressed using Natural Language Processing with Deep Learning, since social media is the medium and it generates massive form of data in textual form. Such data can be leveraged to find the semantics and derive what type of cyberbullying is done and who are the people involved for early measures. Since deriving semantics is essential we proposed a Hybrid Deep Learning Model named 1-Dimensional CNN-Bidirectional-LSTMs with Residuals shortly known as Res-CNN-BiLSTM. In this paper we have proposed the architecture and compared its performance with different approaches of Embedding Deep Learning Algorithms.

preprint2022arXiv

Residual-Concatenate Neural Network with Deep Regularization Layers for Binary Classification

Many complex Deep Learning models are used with different variations for various prognostication tasks. The higher learning parameters not necessarily ensure great accuracy. This can be solved by considering changes in very deep models with many regularization based techniques. In this paper we train a deep neural network that uses many regularization layers with residual and concatenation process for best fit with Polycystic Ovary Syndrome Diagnosis prognostication. The network was built with improvements from every step of failure to meet the needs of the data and achieves an accuracy of 99.3% seamlessly.

preprint2022arXiv

Safe Model-based Off-policy Reinforcement Learning for Eco-Driving in Connected and Automated Hybrid Electric Vehicles

Connected and Automated Hybrid Electric Vehicles have the potential to reduce fuel consumption and travel time in real-world driving conditions. The eco-driving problem seeks to design optimal speed and power usage profiles based upon look-ahead information from connectivity and advanced mapping features. Recently, Deep Reinforcement Learning (DRL) has been applied to the eco-driving problem. While the previous studies synthesize simulators and model-free DRL to reduce online computation, this work proposes a Safe Off-policy Model-Based Reinforcement Learning algorithm for the eco-driving problem. The advantages over the existing literature are three-fold. First, the combination of off-policy learning and the use of a physics-based model improves the sample efficiency. Second, the training does not require any extrinsic rewarding mechanism for constraint satisfaction. Third, the feasibility of trajectory is guaranteed by using a safe set approximated by deep generative models. The performance of the proposed method is benchmarked against a baseline controller representing human drivers, a previously designed model-free DRL strategy, and the wait-and-see optimal solution. In simulation, the proposed algorithm leads to a policy with a higher average speed and a better fuel economy compared to the model-free agent. Compared to the baseline controller, the learned strategy reduces the fuel consumption by more than 21\% while keeping the average speed comparable.

preprint2022arXiv

Scalable Transfer Evolutionary Optimization: Coping with Big Task Instances

In today's digital world, we are faced with an explosion of data and models produced and manipulated by numerous large-scale cloud-based applications. Under such settings, existing transfer evolutionary optimization frameworks grapple with simultaneously satisfying two important quality attributes, namely (1) scalability against a growing number of source tasks and (2) online learning agility against sparsity of relevant sources to the target task of interest. Satisfying these attributes shall facilitate practical deployment of transfer optimization to scenarios with big task-instances, while curbing the threat of negative transfer. While applications of existing algorithms are limited to tens of source tasks, in this paper, we take a quantum leap forward in enabling more than two orders of magnitude scale-up in the number of tasks; i.e., we efficiently handle scenarios beyond 1000 source task-instances. We devise a novel transfer evolutionary optimization framework comprising two co-evolving species for joint evolutions in the space of source knowledge and in the search space of solutions to the target problem. In particular, co-evolution enables the learned knowledge to be orchestrated on the fly, expediting convergence in the target optimization task. We have conducted an extensive series of experiments across a set of practically motivated discrete and continuous optimization examples comprising a large number of source task-instances, of which only a small fraction indicate source-target relatedness. The experimental results show that not only does our proposed framework scale efficiently with a growing number of source tasks but is also effective in capturing relevant knowledge against sparsity of related sources, fulfilling the two salient features of scalability and online learning agility.

preprint2022arXiv

State of AI Ethics Report (Volume 6, February 2022)

This report from the Montreal AI Ethics Institute (MAIEI) covers the most salient progress in research and reporting over the second half of 2021 in the field of AI ethics. Particular emphasis is placed on an "Analysis of the AI Ecosystem", "Privacy", "Bias", "Social Media and Problematic Information", "AI Design and Governance", "Laws and Regulations", "Trends", and other areas covered in the "Outside the Boxes" section. The two AI spotlights feature application pieces on "Constructing and Deconstructing Gender with AI-Generated Art" as well as "Will an Artificial Intellichef be Cooking Your Next Meal at a Michelin Star Restaurant?". Given MAIEI's mission to democratize AI, submissions from external collaborators have featured, such as pieces on the "Challenges of AI Development in Vietnam: Funding, Talent and Ethics" and using "Representation and Imagination for Preventing AI Harms". The report is a comprehensive overview of what the key issues in the field of AI ethics were in 2021, what trends are emergent, what gaps exist, and a peek into what to expect from the field of AI ethics in 2022. It is a resource for researchers and practitioners alike in the field to set their research and development agendas to make contributions to the field of AI ethics.

preprint2022arXiv

Succinct Differentiation of Disparate Boosting Ensemble Learning Methods for Prognostication of Polycystic Ovary Syndrome Diagnosis

Prognostication of medical problems using the clinical data by leveraging the Machine Learning techniques with stellar precision is one of the most important real world challenges at the present time. Considering the medical problem of Polycystic Ovary Syndrome also known as PCOS is an emerging problem in women aged from 15 to 49. Diagnosing this disorder by using various Boosting Ensemble Methods is something we have presented in this paper. A detailed and compendious differentiation between Adaptive Boost, Gradient Boosting Machine, XGBoost and CatBoost with their respective performance metrics highlighting the hidden anomalies in the data and its effects on the result is something we have presented in this paper. Metrics like Confusion Matrix, Precision, Recall, F1 Score, FPR, RoC Curve and AUC have been used in this paper.

preprint2022arXiv

Superior mechanical properties by exploiting size-effects and multiscale interactions in hierarchically architected foams

Protective applications in extreme environments demand thermally stable materials with superior modulus, strength, and specific energy absorption (SEA) at lightweight. However, these properties typically have a trade-off. Hierarchically architected materials--such as the architected vertically aligned carbon nanotube (VACNT) foams--offer the potential to overcome these trade-offs to achieve synergistic enhancement in mechanical properties. Here, we adopt a full-factorial design of experiments (DOE) approach to optimize multitier design parameters to achieve synergistic enhancement in SEA, strength, and modulus at lightweight in VACNT foams with mesoscale cylindrical architecture. We exploit the size effects from geometrically-confined synthesis and the highly interactive morphology of CNTs to enable higher-order design parameter interactions that intriguingly break the diameter-to-thickness (D/t)-dependent scaling laws found in common tubular architected materials. We show that exploiting complementary hierarchical mechanisms in architected material design can lead to unprecedented synergistic enhancement of mechanical properties and performance desirable for extreme protective applications.

preprint2022arXiv

The Golden Circle: Creating Socio-technical Alignment in Content Moderation

This paper outlines a conceptual framework titled The Golden Circle that describes the roles of actors at individual, organizational, and societal levels, and their dynamics in the content moderation ecosystem. Centering harm reduction and context moderation, it argues that the ML community must attend to multimodal content moderation solutions, align their work with their organizations' goals and values, and pay attention to the ever changing social contexts in which their sociotechnical systems are embedded. This is done by accounting for the why, how, and what of content moderation from a sociological and technical lens.

preprint2022arXiv

Weighted Gaussian Process Bandits for Non-stationary Environments

In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.

preprint2021arXiv

An Algorithm to Warm Start Perturbed (WASP) Constrained Dynamic Programs

Receding horizon optimal control problems compute the solution at each time step to operate the system on a near-optimal path. However, in many practical cases, the boundary conditions, such as external inputs, constraint equations, or the objective function, vary only marginally from one time step to the next. In this case, recomputing the optimal solution at each time represents a significant burden for real-time applications. This paper proposes a novel algorithm to approximately solve a perturbed constrained dynamic program that significantly improves the computational burden when the objective function and the constraints are perturbed slightly. The method hinges on determining closed-form expressions for first-order perturbations in the optimal strategy and the Lagrange multipliers of the perturbed constrained dynamic programming problem are obtained. This information can be used to initialize any algorithm (such as the method of Lagrange multipliers, or the augmented Lagrangian method) to solve the perturbed dynamic programming problem with minimal computational resources.

preprint2021arXiv

Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence

This paper develops a unified framework, based on iterated random operator theory, to analyze the convergence of constant stepsize recursive stochastic algorithms (RSAs). RSAs use randomization to efficiently compute expectations, and so their iterates form a stochastic process. The key idea of our analysis is to lift the RSA into an appropriate higher-dimensional space and then express it as an equivalent Markov chain. Instead of determining the convergence of this Markov chain (which may not converge under constant stepsize), we study the convergence of the distribution of this Markov chain. To study this, we define a new notion of Wasserstein divergence. We show that if the distribution of the iterates in the Markov chain satisfy a contraction property with respect to the Wasserstein divergence, then the Markov chain admits an invariant distribution. We show that convergence of a large family of constant stepsize RSAs can be understood using this framework, and we provide several detailed examples.

preprint2021arXiv

Coverage Analysis of Broadcast Networks with Users Having Heterogeneous Content/Advertisement Preferences

This work is focused on the system-level performance of a broadcast network. Since all transmitters in a broadcast network transmit the identical signal, received signals from multiple transmitters can be combined to improve system performance. We develop a stochastic geometry based analytical framework to derive the coverage of a typical receiver. We show that there may exist an optimal connectivity radius that maximizes the rate coverage. Our analysis includes the fact that users may have their individual content/advertisement preferences. We assume that there are multiple classes of users with each user class prefers a particular type of content/advertisements and the users will pay the network only when then can see content aligned with their interest. The operator may choose to transmit multiple contents simultaneously to cater more users' interests to increase its revenue. We present revenue models to study the impact of the number of contents on the operator revenue. We consider two scenarios for users' distribution: one where users' interest depends on their geographical location and the one where it doesn't. With the help of numerical results and analysis, we show the impact of various parameters including content granularity, connectivity radius, and rate threshold and present important design insights.

preprint2021arXiv

Making Responsible AI the Norm rather than the Exception

This report prepared by the Montreal AI Ethics Institute provides recommendations in response to the National Security Commission on Artificial Intelligence (NSCAI) Key Considerations for Responsible Development and Fielding of Artificial Intelligence document. The report centres on the idea that Responsible AI should be made the Norm rather than an Exception. It does so by utilizing the guiding principles of: (1) alleviating friction in existing workflows, (2) empowering stakeholders to get buy-in, and (3) conducting an effective translation of abstract standards into actionable engineering practices. After providing some overarching comments on the document from the NSCAI, the report dives into the primary contribution of an actionable framework to help operationalize the ideas presented in the document from the NSCAI. The framework consists of: (1) a learning, knowledge, and information exchange (LKIE), (2) the Three Ways of Responsible AI, (3) an empirically-driven risk-prioritization matrix, and (4) achieving the right level of complexity. All components reinforce each other to move from principles to practice in service of making Responsible AI the norm rather than the exception.

preprint2021arXiv

Rank Reduction in Bimatrix Games

The rank of a bimatrix game is defined as the rank of the sum of the payoff matrices of the two players. The rank of a game is known to impact both the most suitable computation methods for determining a solution and the expressive power of the game. Under certain conditions on the payoff matrices, we devise a method that reduces the rank of the game without changing the equilibrium of the game. We leverage matrix pencil theory and the Wedderburn rank reduction formula to arrive at our results. We also present a constructive proof of the fact that in a generic square game, the rank of the game can be reduced by 1, and in generic rectangular game, the rank of the game can be reduced by 2 under certain assumptions.

preprint2020arXiv

BISTRO: Berkeley Integrated System for Transportation Optimization

This article introduces BISTRO, a new open source transportation planning decision support system that uses an agent-based simulation and optimization approach to anticipate and develop adaptive plans for possible technological disruptions and growth scenarios. The new framework was evaluated in the context of a machine learning competition hosted within Uber Technologies, Inc., in which over 400 engineers and data scientists participated. For the purposes of this competition, a benchmark model, based on the city of Sioux Falls, South Dakota, was adapted to the BISTRO framework. An important finding of this study was that in spite of rigorous analysis and testing done prior to the competition, the two top-scoring teams discovered an unbounded region of the search space, rendering the solutions largely uninterpretable for the purposes of decision-support. On the other hand, a follow-on study aimed to fix the objective function, served to demonstrate BISTRO's utility as a human-in-the-loop cyberphysical system: one that uses scenario-based optimization algorithms as a feedback mechanism to assist urban planners with iteratively refining objective function and constraints specification on intervention strategies such that the portfolio of transportation intervention strategy alternatives eventually chosen achieves high-level regional planning goals developed through participatory stakeholder engagement practices.

preprint2020arXiv

Comprehensiveness of Archives: A Modern AI-enabled Approach to Build Comprehensive Shared Cultural Heritage

Archives play a crucial role in the construction and advancement of society. Humans place a great deal of trust in archives and depend on them to craft public policies and to preserve languages, cultures, self-identity, views and values. Yet, there are certain voices and viewpoints that remain elusive in the current processes deployed in the classification and discoverability of records and archives. In this paper, we explore the ramifications and effects of centralized, due process archival systems on marginalized communities. There is strong evidence to prove the need for progressive design and technological innovation while in the pursuit of comprehensiveness, equity and justice. Intentionality and comprehensiveness is our greatest opportunity when it comes to improving archival practices and for the advancement and thrive-ability of societies at large today. Intentionality and comprehensiveness is achievable with the support of technology and the Information Age we live in today. Reopening, questioning and/or purposefully including others voices in archival processes is the intention we present in our paper. We provide examples of marginalized communities who continue to lead "community archive" movements in efforts to reclaim and protect their cultural identity, knowledge, views and futures. In conclusion, we offer design and AI-dominant technological considerations worth further investigation in efforts to bridge systemic gaps and build robust archival processes.

preprint2020arXiv

Coverage Improvement of Wireless Sensor Networks via Spatial Profile Information

This paper considers a wireless sensor network deployed to sense an environment variable with a known spatial statistical profile. We propose to use the additional information of the spatial profile to improve the sensing range of sensors while allowing some tolerance in their sensing accuracy. We show that the use of this information improves the sensing performance of the total WSN. For this, we first derive analytical expressions for various performance metrics to measure the improvement in the sensing performance of WSN. We then discuss the sensing gains quantitatively using numerical results.

preprint2020arXiv

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Deep reinforcement learning can learn effective policies for a wide range of tasks, but is notoriously difficult to use due to instability and sensitivity to hyperparameters. The reasons for this remain unclear. When using standard supervised methods (e.g., for bandits), on-policy data collection provides "hard negatives" that correct the model in precisely those states and actions that the policy is likely to visit. We call this phenomenon "corrective feedback." We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from this corrective feedback, and training on the experience collected by the algorithm is not sufficient to correct errors in the Q-function. In fact, Q-learning and related methods can exhibit pathological interactions between the distribution of experience collected by the agent and the policy induced by training on that experience, leading to potential instability, sub-optimal convergence, and poor results when learning from noisy, sparse or delayed rewards. We demonstrate the existence of this problem, both theoretically and empirically. We then show that a specific correction to the data distribution can mitigate this issue. Based on these observations, we propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training, resulting in substantial improvements in a range of challenging RL settings, such as multi-task learning and learning from noisy reward signals. Blog post presenting a summary of this work is available at: https://bair.berkeley.edu/blog/2020/03/16/discor/.

preprint2020arXiv

Ecological Reinforcement Learning

Much of the current work on reinforcement learning studies episodic settings, where the agent is reset between trials to an initial state distribution, often with well-shaped reward functions. Non-episodic settings, where the agent must learn through continuous interaction with the world without resets, and where the agent receives only delayed and sparse reward signals, is substantially more difficult, but arguably more realistic considering real-world environments do not present the learner with a convenient "reset mechanism" and easy reward shaping. In this paper, instead of studying algorithmic improvements that can address such non-episodic and sparse reward settings, we instead study the kinds of environment properties that can make learning under such conditions easier. Understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable. We first discuss what we term "environment shaping" -- modifications to the environment that provide an alternative to reward shaping, and may be easier to implement. We then discuss an even simpler property that we refer to as "dynamism," which describes the degree to which the environment changes independent of the agent's actions and can be measured by environment transition entropy. Surprisingly, we find that even this property can substantially alleviate the challenges associated with non-episodic RL in sparse reward settings. We provide an empirical evaluation on a set of new tasks focused on non-episodic learning with sparse rewards. Through this study, we hope to shift the focus of the community towards analyzing how properties of the environment can affect learning and the ultimate type of behavior that is learned via RL.

preprint2020arXiv

Fleet Sizing and Charger Allocation in Electric Vehicle Sharing Systems

In this paper, we propose a closed queueing network model for performance analysis of electric vehicle sharing systems with a certain number of chargers in each neighborhood. Depending on the demand distribution, we devise algorithms to compute the optimal fleet size and number of chargers required to maximize profit while maintaining a certain quality of service. We show that the profit is concave with respect to the fleet size and the number of chargers at each charging point. If more chargers are installed within the city, we show that it can not only reduce the fleet size, but it also improves the availability of vehicles at all the points within a city. We further show through simulation that two slow chargers may outperform one fast charger when the variance of charging time becomes relatively large in comparison to the mean charging time.

preprint2020arXiv

Green Lighting ML: Confidentiality, Integrity, and Availability of Machine Learning Systems in Deployment

Security and ethics are both core to ensuring that a machine learning system can be trusted. In production machine learning, there is generally a hand-off from those who build a model to those who deploy a model. In this hand-off, the engineers responsible for model deployment are often not privy to the details of the model and thus, the potential vulnerabilities associated with its usage, exposure, or compromise. Techniques such as model theft, model inversion, or model misuse may not be considered in model deployment, and so it is incumbent upon data scientists and machine learning engineers to understand these potential risks so they can communicate them to the engineers deploying and hosting their models. This is an open problem in the machine learning community and in order to help alleviate this issue, automated systems for validating privacy and security of models need to be developed, which will help to lower the burden of implementing these hand-offs and increasing the ubiquity of their adoption.

preprint2020arXiv

Montreal AI Ethics Institute's (MAIEI) Submission to the World Intellectual Property Organization (WIPO) Conversation on Intellectual Property (IP) and Artificial Intelligence (AI) Second Session

This document posits that, at best, a tenuous case can be made for providing AI exclusive IP over their "inventions". Furthermore, IP protections for AI are unlikely to confer the benefit of ensuring regulatory compliance. Rather, IP protections for AI "inventors" present a host of negative externalities and obscures the fact that the genuine inventor, deserving of IP, is the human agent. This document will conclude by recommending strategies for WIPO to bring IP law into the 21st century, enabling it to productively account for AI "inventions". Theme: IP Protection for AI-Generated and AI-Assisted Works Based on insights from the Montreal AI Ethics Institute (MAIEI) staff and supplemented by workshop contributions from the AI Ethics community convened by MAIEI on July 5, 2020.

preprint2020arXiv

Montreal AI Ethics Institute's Response to Scotland's AI Strategy

In January and February 2020, the Scottish Government released two documents for review by the public regarding their artificial intelligence (AI) strategy. The Montreal AI Ethics Institute (MAIEI) reviewed these documents and published a response on 4 June 2020. MAIEI's response examines several questions that touch on the proposed definition of AI; the people-centered nature of the strategy; considerations to ensure that everyone benefits from AI; the strategy's overarching vision; Scotland's AI ecosystem; the proposed strategic themes; and how to grow public confidence in AI by building responsible and ethical systems. In addition to examining the points above, MAIEI suggests that the strategy be extended to include considerations on biometric data and how that will be processed and used in the context of AI. It also highlights the importance of tackling head-on the inherently stochastic nature of deep learning systems and developing concrete guidelines to ensure that these systems are built responsibly and ethically, particularly as machine learning becomes more accessible. Finally, it concludes that any national AI strategy must clearly address the measurements of success in regards to the strategy's stated goals and vision to ensure that they are interpreted and applied consistently. To do this, there must be inclusion and transparency between those building the systems and those using them in their work.

preprint2020arXiv

Multi-Objective Vehicle Rebalancing for Ridehailing System using a Reinforcement Learning Approach

The problem of designing a rebalancing algorithm for a large-scale ridehailing system with asymmetric demand is considered here. We pose the rebalancing problem within a semi Markov decision problem (SMDP) framework with closed queues of vehicles serving stationary, but asymmetric demand, over a large city with multiple nodes (representing neighborhoods). We assume that the passengers queue up at every node until they are matched with a vehicle. The goal of the SMDP is to minimize a convex combination of the waiting time of the passengers and the total empty vehicle miles traveled. The resulting SMDP appears to be difficult to solve for closed-form expression for the rebalancing strategy. As a result, we use a deep reinforcement learning algorithm to determine the approximately optimal solution to the SMDP. The trained policy is compared with other well-known algorithms for rebalancing, which are designed to address other objectives (such as to minimize demand drop probability) for the ridehailing problem.

preprint2020arXiv

On the Coverage Performance of Boolean-Poisson Cluster Models for Wireless Sensor Networks

In this paper, we consider wireless sensor networks (WSNs) with sensor nodes exhibiting clustering in their deployment. We model the coverage region of such WSNs by Boolean Poisson cluster models (BPCM) where sensors nodes' location is according to a Poisson cluster process (PCP) and each sensor has an independent sensing range around it. We consider two variants of PCP, in particular \matern and Thomas cluster process to form Boolean \matern and Thomas cluster models. We first derive the capacity functional of these models. Using the derived expressions, we compute the sensing probability of an event and compare it with sensing probability of a WSN modeled by a Boolean Poisson model where sensors are deployed according to a Poisson point process. We also derive the power required for each cluster to collect data from all of its sensors for the three considered WSNs. We show that a BPCM WSN has less power requirement in comparison to the Boolean Poisson WSN, but it suffers from lower coverage, leading to a trade-off between per-cluster power requirement and the sensing performance. A cluster process with desired clustering may provide better coverage while maintaining low power requirements.

preprint2020arXiv

Participatory Design to build better contact- and proximity-tracing apps

With the push for contact- and proximity-tracing solutions as a means to manage the spread of the pandemic, there is a distrust between the citizens and authorities that are deploying these solutions. The efficacy of the solutions relies on meeting a minimum uptake threshold which is hitting a barrier because of a lack of trust and transparency in how these solutions are being developed. We propose participatory design as a mechanism to evoke trust and explore how it might be applied to co-create technological solutions that not only meet the needs of the users better but also expand their reach to underserved and high-risk communities. We also highlight the role of the bazaar model of development and complement that with quantitative and qualitative metrics for evaluating the solutions and convincing policymakers and other stakeholders in the value of this approach with empirical evidence.

preprint2020arXiv

Reciprocal Collision Avoidance for General Nonlinear Agents using Reinforcement Learning

Finding feasible and collision-free paths for multiple nonlinear agents is challenging in the decentralized scenarios due to limited available information of other agents and complex dynamics constraints. In this paper, we propose a fast multi-agent collision avoidance algorithm for general nonlinear agents with continuous action space, where each agent observes only positions and velocities of nearby agents. To reduce online computation, we first decompose the multi-agent scenario and solve a two agents collision avoidance problem using reinforcement learning (RL). When extending the trained policy to a multi-agent problem, safety is ensured by introducing the optimal reciprocal collision avoidance (ORCA) as linear constraints and the overall collision avoidance action could be found through simple convex optimization. Most existing RL-based multi-agent collision avoidance algorithms rely on the direct control of agent velocities. In sharp contrasts, our approach is applicable to general nonlinear agents. Realistic simulations based on nonlinear bicycle agent models are performed with various challenging scenarios, indicating a competitive performance of the proposed method in avoiding collisions, congestion and deadlock with smooth trajectories.

preprint2020arXiv

Report prepared by the Montreal AI Ethics Institute In Response to Mila's Proposal for a Contact Tracing App

Contact tracing has grown in popularity as a promising solution to the COVID-19 pandemic. The benefits of automated contact tracing are two-fold. Contact tracing promises to reduce the number of infections by being able to: 1) systematically identify all of those that have been in contact with someone who has had COVID; and, 2) ensure those that have been exposed to the virus do not unknowingly infect others. "COVI" is the name of a recent contact tracing app developed by Mila and was proposed to help combat COVID-19 in Canada. The app was designed to inform each individual of their relative risk of being infected with the virus, which Mila claimed would empower citizens to make informed decisions about their movement and allow for a data-driven approach to public health policy; all the while ensuring data is safeguarded from governments, companies, and individuals. This article will provide a critical response to Mila's COVI White Paper. Specifically, this article will discuss: the extent to which diversity has been considered in the design of the app, assumptions surrounding users' interaction with the app and the app's utility, as well as unanswered questions surrounding transparency, accountability, and security. We see this as an opportunity to supplement the excellent risk analysis done by the COVI team to surface insights that can be applied to other contact- and proximity-tracing apps that are being developed and deployed across the world. Our hope is that, through a meaningful dialogue, we can ultimately help organizations develop better solutions that respect the fundamental rights and values of the communities these solutions are meant to serve.

preprint2020arXiv

Response by the Montreal AI Ethics Institute to the European Commission's Whitepaper on AI

In February 2020, the European Commission (EC) published a white paper entitled, On Artificial Intelligence - A European approach to excellence and trust. This paper outlines the EC's policy options for the promotion and adoption of artificial intelligence (AI) in the European Union. The Montreal AI Ethics Institute (MAIEI) reviewed this paper and published a response addressing the EC's plans to build an "ecosystem of excellence" and an "ecosystem of trust," as well as the safety and liability implications of AI, the internet of things (IoT), and robotics. MAIEI provides 15 recommendations in relation to the sections outlined above, including: 1) focus efforts on the research and innovation community, member states, and the private sector; 2) create alignment between trading partners' policies and EU policies; 3) analyze the gaps in the ecosystem between theoretical frameworks and approaches to building trustworthy AI; 4) focus on coordination and policy alignment; 5) focus on mechanisms that promote private and secure sharing of data; 6) create a network of AI research excellence centres to strengthen the research and innovation community; 7) promote knowledge transfer and develop AI expertise through Digital Innovation Hubs; 8) add nuance to the discussion regarding the opacity of AI systems; 9) create a process for individuals to appeal an AI system's decision or output; 10) implement new rules and strengthen existing regulations; 11) ban the use of facial recognition technology; 12) hold all AI systems to similar standards and compulsory requirements; 13) ensure biometric identification systems fulfill the purpose for which they are implemented; 14) implement a voluntary labelling system for systems that are not considered high-risk; 15) appoint individuals to the oversight process who understand AI systems well and are able to communicate potential risks.

preprint2020arXiv

Response by the Montreal AI Ethics Institute to the Santa Clara Principles on Transparency and Accountability in Online Content Moderation

In April 2020, the Electronic Frontier Foundation (EFF) publicly called for comments on expanding and improving the Santa Clara Principles on Transparency and Accountability (SCP), originally published in May 2018. The Montreal AI Ethics Institute (MAIEI) responded to this call by drafting a set of recommendations based on insights and analysis by the MAIEI staff and supplemented by workshop contributions from the AI Ethics community convened during two online public consultation workshops. In its submission, MAIEI provides 12 overarching recommendations for the SCP, these include: 1) ensure there is more diversity in the content moderation process; 2) increase transparency into how platforms guide content-ranking; 3) disclose anonymized data on the training and/or cultural background of the content moderators for a platform; 4) tailor content moderation tools for specific issues; 5) draft specific guidelines for messaging applications with regards to data protection in content moderation; 6) take into account cultural differences relevant to what constitutes acceptable behavior online; 7) ensure platforms are transparent in regards to political advertising; 8) ensure greater transparency into the user-generated flagging/reporting systems deployed by a platform; 9) clarify if user content is flagged or reported through an automated system; 10) provide more data on the types of content removed from platforms; 11) provide clear guidelines on the appeal process, as well as data on prior appeals; 12) create a system for periodically revisiting the SCP so it reflects various technological advancements, modifications in law and policy, as well as changing trends or movements in content moderation.

preprint2020arXiv

Response to Office of the Privacy Commissioner of Canada Consultation Proposals pertaining to amendments to PIPEDA relative to Artificial Intelligence

In February 2020, the Montreal AI Ethics Institute (MAIEI) was invited by the Office of the Privacy Commissioner of Canada (OPCC) to provide for comments both at a closed roundtable and in writing on the OPCC consultation proposal for amendments relative to Artificial Intelligence (AI), to the Canadian privacy legislation, the Personal Information Protection and Electronic Documents Act (PIPEDA). The present document includes MAIEI comments and recommendations in writing. Per MAIEI's mission and mandate to act as a catalyst for public feedback pertaining to AI Ethics and regulatory technology developments, as well as to provide for public competence-building workshops on critical topics in such domains, the reader will also find such public feedback and propositions by Montrealers who participated at MAIEI's workshops, submitted as Schedule 1 to the present report. For each of OPCC 12 proposals, and underlying questions, as described on its website, MAIEI provides a short reply, a summary list of recommendations, as well as comments relevant to the question at hand. We leave you with three general statements to keep in mind while going through the next pages: 1) AI systems should be used to augment human capacity for meaningful and purposeful connections and associations, not as a substitute for trust. 2) Humans have collectively accepted to uphold the rule of law, but for machines, the code is rule. Where socio-technical systems are deployed to make important decisions, profiles or inferences about individuals, we will increasingly have to attempt the difficult exercise of drafting and encoding our law in a manner learnable by machines. 3) Let us work collectively towards a world where Responsible AI becomes the rule, before our socio-technical systems become "too connected to fail".

preprint2020arXiv

Road Grade Estimation Using Crowd-Sourced Smartphone Data

Estimates of road grade/slope can add another dimension of information to existing 2D digital road maps. Integration of road grade information will widen the scope of digital map's applications, which is primarily used for navigation, by enabling driving safety and efficiency applications such as Advanced Driver Assistance Systems (ADAS), eco-driving, etc. The huge scale and dynamic nature of road networks make sensing road grade a challenging task. Traditional methods oftentimes suffer from limited scalability and update frequency, as well as poor sensing accuracy. To overcome these problems, we propose a cost-effective and scalable road grade estimation framework using sensor data from smartphones. Based on our understanding of the error characteristics of smartphone sensors, we intelligently combine data from accelerometer, gyroscope and vehicle speed data from OBD-II/smartphone's GPS to estimate road grade. To improve accuracy and robustness of the system, the estimations of road grade from multiple sources/vehicles are crowd-sourced to compensate for the effects of varying quality of sensor data from different sources. Extensive experimental evaluation on a test route of ~9km demonstrates the superior performance of our proposed method, achieving $5\times$ improvement on road grade estimation accuracy over baselines, with 90\% of errors below 0.3$^\circ$.

preprint2020arXiv

SECure: A Social and Environmental Certificate for AI Systems

In a world increasingly dominated by AI applications, an understudied aspect is the carbon and social footprint of these power-hungry algorithms that require copious computation and a trove of data for training and prediction. While profitable in the short-term, these practices are unsustainable and socially extractive from both a data-use and energy-use perspective. This work proposes an ESG-inspired framework combining socio-technical measures to build eco-socially responsible AI systems. The framework has four pillars: compute-efficient machine learning, federated learning, data sovereignty, and a LEEDesque certificate. Compute-efficient machine learning is the use of compressed network architectures that show marginal decreases in accuracy. Federated learning augments the first pillar's impact through the use of techniques that distribute computational loads across idle capacity on devices. This is paired with the third pillar of data sovereignty to ensure the privacy of user data via techniques like use-based privacy and differential privacy. The final pillar ties all these factors together and certifies products and services in a standardized manner on their environmental and social impacts, allowing consumers to align their purchase with their values.

preprint2020arXiv

Some Limit Properties of Markov Chains Induced by Stochastic Recursive Algorithms

Recursive stochastic algorithms have gained significant attention in the recent past due to data driven applications. Examples include stochastic gradient descent for solving large-scale optimization problems and empirical dynamic programming algorithms for solving Markov decision problems. These recursive stochastic algorithms approximate certain contraction operators and can be viewed within the framework of iterated random operators. Accordingly, we consider iterated random operators over a Polish space that simulate iterated contraction operator over that Polish space. Assume that the iterated random operators are indexed by certain batch sizes such that as batch sizes grow to infinity, each realization of the random operator converges (in some sense) to the contraction operator it is simulating. We show that starting from the same initial condition, the distribution of the random sequence generated by the iterated random operators converges weakly to the trajectory generated by the contraction operator. We further show that under certain conditions, the time average of the random sequence converges to the spatial mean of the invariant distribution. We then apply these results to logistic regression, empirical value iteration, and empirical Q value iteration for finite state finite action MDPs to illustrate the general theory develop here.

preprint2020arXiv

Steady-state fluctuations of a genetic feedback loop with fluctuating rate parameters using the unified colored noise approximation

A common model of stochastic auto-regulatory gene expression describes promoter switching via cooperative protein binding, effective protein production in the active state and dilution of proteins. Here we consider an extension of this model whereby colored noise with a short correlation time is added to the reaction rate parameters -- we show that when the size and timescale of the noise is appropriately chosen it accounts for fast reactions that are not explicitly modelled, e.g., in models with no mRNA description, fluctuations in the protein production rate can account for rapid multiple stages of nuclear mRNA processing which precede translation in eukaryotes. We show how the unified colored noise approximation can be used to derive expressions for the protein number distribution that is in good agreement with stochastic simulations. We find that even when the noise in the rate parameters is small, the protein distributions predicted by our model can be significantly different than models assuming constant reaction rates.

preprint2020arXiv

The Ingredients of Real-World Robotic Reinforcement Learning

The success of reinforcement learning for real world robotics has been, in many cases limited to instrumented laboratory scenarios, often requiring arduous human effort and oversight to enable continuous learning. In this work, we discuss the elements that are needed for a robotic learning system that can continually and autonomously improve with data collected in the real world. We propose a particular instantiation of such a system, using dexterous manipulation as our case study. Subsequently, we investigate a number of challenges that come up when learning without instrumentation. In such settings, learning must be feasible without manually designed resets, using only on-board perception, and without hand-engineered reward functions. We propose simple and scalable solutions to these challenges, and then demonstrate the efficacy of our proposed system on a set of dexterous robotic manipulation tasks, providing an in-depth analysis of the challenges associated with this learning paradigm. We demonstrate that our complete system can learn without any human intervention, acquiring a variety of vision-based skills with a real-world three-fingered hand. Results and videos can be found at https://sites.google.com/view/realworld-rl/

preprint2020arXiv

The Social Contract for AI

Like any technology, AI systems come with inherent risks and potential benefits. It comes with potential disruption of established norms and methods of work, societal impacts and externalities. One may think of the adoption of technology as a form of social contract, which may evolve or fluctuate in time, scale, and impact. It is important to keep in mind that for AI, meeting the expectations of this social contract is critical, because recklessly driving the adoption and implementation of unsafe, irresponsible, or unethical AI systems may trigger serious backlash against industry and academia involved which could take decades to resolve, if not actually seriously harm society. For the purpose of this paper, we consider that a social contract arises when there is sufficient consensus within society to adopt and implement this new technology. As such, to enable a social contract to arise for the adoption and implementation of AI, developing: 1) A socially accepted purpose, through 2) A safe and responsible method, with 3) A socially aware level of risk involved, for 4) A socially beneficial outcome, is key.

preprint2020arXiv

The State of AI Ethics Report (June 2020)

These past few months have been especially challenging, and the deployment of technology in ways hitherto untested at an unrivalled pace has left the internet and technology watchers aghast. Artificial intelligence has become the byword for technological progress and is being used in everything from helping us combat the COVID-19 pandemic to nudging our attention in different directions as we all spend increasingly larger amounts of time online. It has never been more important that we keep a sharp eye out on the development of this field and how it is shaping our society and interactions with each other. With this inaugural edition of the State of AI Ethics we hope to bring forward the most important developments that caught our attention at the Montreal AI Ethics Institute this past quarter. Our goal is to help you navigate this ever-evolving field swiftly and allow you and your organization to make informed decisions. This pulse-check for the state of discourse, research, and development is geared towards researchers and practitioners alike who are making decisions on behalf of their organizations in considering the societal impacts of AI-enabled solutions. We cover a wide set of areas in this report spanning Agency and Responsibility, Security and Risk, Disinformation, Jobs and Labor, the Future of AI Ethics, and more. Our staff has worked tirelessly over the past quarter surfacing signal from the noise so that you are equipped with the right tools and knowledge to confidently tread this complex yet consequential domain.

preprint2020arXiv

Unified Characterization Platform for Emerging NVM Technology: Neural Network Application Benchmarking Using off-the-shelf NVM Chips

In this paper, we present a unified FPGA based electrical test-bench for characterizing different emerging NonVolatile Memory (NVM) chips. In particular, we present detailed electrical characterization and benchmarking of multiple commercially available, off-the-shelf, NVM chips viz.: MRAM, FeRAM, CBRAM, and ReRAM. We investigate important NVM parameters such as: (i) current consumption patterns, (ii) endurance, and (iii) error characterization. The proposed FPGA based testbench is then utilized for a Proof-of-Concept (PoC) Neural Network (NN) image classification application. Four emerging NVM chips are benchmarked against standard SRAM and Flash technology for the AI application as active weight memory during inference mode.

preprint2020arXiv

Unsupervised Meta-Learning for Reinforcement Learning

Meta-learning algorithms use past experience to learn to quickly solve new tasks. In the context of reinforcement learning, meta-learning algorithms acquire reinforcement learning procedures to solve new problems more efficiently by utilizing experience from prior tasks. The performance of meta-learning algorithms depends on the tasks available for meta-training: in the same way that supervised learning generalizes best to test points drawn from the same distribution as the training points, meta-learning methods generalize best to tasks from the same distribution as the meta-training tasks. In effect, meta-reinforcement learning offloads the design burden from algorithm design to task design. If we can automate the process of task design as well, we can devise a meta-learning algorithm that is truly automated. In this work, we take a step in this direction, proposing a family of unsupervised meta-learning algorithms for reinforcement learning. We motivate and describe a general recipe for unsupervised meta-reinforcement learning, and present an instantiation of this approach. Our conceptual and theoretical contributions consist of formulating the unsupervised meta-reinforcement learning problem and describing how task proposals based on mutual information can be used to train optimal meta-learners. Our experimental results indicate that unsupervised meta-reinforcement learning effectively acquires accelerated reinforcement learning procedures without the need for manual task design and these procedures exceed the performance of learning from scratch.

preprint2017arXiv

Addressing Expensive Multi-objective Games with Postponed Preference Articulation via Memetic Co-evolution

This paper presents algorithmic and empirical contributions demonstrating that the convergence characteristics of a co-evolutionary approach to tackle Multi-Objective Games (MOGs) with postponed preference articulation can often be hampered due to the possible emergence of the so-called Red Queen effect. Accordingly, it is hypothesized that the convergence characteristics can be significantly improved through the incorporation of memetics (local solution refinements as a form of lifelong learning), as a promising means of mitigating (or at least suppressing) the Red Queen phenomenon by providing a guiding hand to the purely genetic mechanisms of co-evolution. Our practical motivation is to address MOGs of a time-sensitive nature that are characterized by computationally expensive evaluations, wherein there is a natural need to reduce the total number of true function evaluations consumed in achieving good quality solutions. To this end, we propose novel enhancements to co-evolutionary approaches for tackling MOGs, such that memetic local refinements can be efficiently applied on evolved candidate strategies by searching on computationally cheap surrogate payoff landscapes (that preserve postponed preference conditions). The efficacy of the proposal is demonstrated on a suite of test MOGs that have been designed.