Source author record

Abhishek Gupta

Abhishek Gupta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

74works

31topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care

We reframe clinician overrides of clinical AI recommendations as implicit preference data - the same signal structure exploited by reinforcement learning from human feedback (RLHF), but richer: the annotator is a domain expert, the alternatives carry real consequences, and downstream outcomes are observable. We present a formal framework extending standard preference learning with three contributions: a five-category override taxonomy mapping override types to distinct model update targets; a preference formulation conditioned on patient state s, organizational context c, and clinician capability kappa, where kappa decomposes into execution capability kappa-exec and alignment capability kappa-align; and a dual learning architecture that jointly trains a reward model and a capability model via alternating optimization, preventing a failure mode we term suppression bias-the systematic suppression of correct-but-difficult recommendations when clinician capability falls below the execution threshold. We argue that chronic disease management under outcome-based payment contracts produces override data with uniquely favorable properties-longitudinal density, concentrated decision space, outcome labels, and natural capability variation-and that training environments combining longitudinal outcome measurement with aligned financial incentives are a necessary condition for learning a reward model aligned with patient trajectory rather than with encounter economics. This framework emerged from operational work to improve clinician capability in a live value-based care deployment.

preprint2026arXiv

Learning to Compress Time-to-Control: A Reinforcement Learning Framework for Chronic Disease Management

Reinforcement learning (RL) in healthcare has had mixed results, with reward sparsity, unreliable off-policy evaluation, and deployment-simulation gap as recurring failure modes. We argue that chronic disease management is structurally a more tractable RL setting than the acute-care problems the field has primarily studied, but only if the problem is formalized to exploit chronic care's properties. We propose such a formalization. The agent's objective is to compress time-to-control (TTC) under a tiered reward calibrated to the CMS ACCESS Model. Two quantities from our companion preference-learning paper [Singh et al. 2026] enter as load-bearing structural elements: the execution intensity εbounds action availability under a constrained Markov Decision Process, and the clinician capability κweights offline-data transitions during RL training. Together they couple preference learning and RL into a two-loop architecture. We present simulation results on synthetic state machines for hypertension and type 2 diabetes. Capability-weighted offline RL outperforms uniform-weighted offline RL and the behavior policy by 15 percentage points on T2D TTC; the uniform-weighted formulation (the standard in existing healthcare RL) underperforms even the heterogeneous behavior policy. \Epsilon-aware policies generalize across deployment regimes while ε-naive policies do not.

preprint2026arXiv

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

Generative control policies (GCPs), such as diffusion- and flow-based control policies, have emerged as effective parameterizations for robot learning. This work introduces Off-policy Generative Policy Optimization (OGPO), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward. OGPO achieves state-of-the-art performance on manipulation tasks spanning multi-task settings, high-precision insertion, and dexterous control. To our knowledge, it is also the only method that can fine-tune poorly-initialized behavior cloning policies to near full task-success with no expert data in the online replay buffer, and does so with few task-specific hyperparameter tuning. Through extensive empirical investigations, we demonstrate the OGPO drastically outperforms methods alternatives on policy steering and learning residual corrections, and identify the key mechanisms behind its performance. We further introduce practical stabilizers, including success-buffer regularization, conservative advantages, $χ^2$ regularization, and Q-variance reduction, to mitigate critic over-exploitation across state- and pixel-based settings. Beyond proposing OGPO, we conduct a systematic empirical study of GCP finetuning, identifying the stabilizing mechanisms and failure modes that govern successful off-policy full-policy improvement.

preprint2026arXiv

TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning

Fine-tuning pre-trained robot policies with reinforcement learning (RL) often inherits the bottlenecks introduced by pre-training with behavioral cloning (BC), which produces narrow action distributions that lack the coverage necessary for downstream exploration. We present a unified framework that enables the exploration necessary to enable efficient robot policy finetuning by bridging BC pre-training and RL fine-tuning. Our pre-training method, Context-Smoothed Pre-training (CSP), injects forward-diffusion noise into policy inputs, creating a continuum between precise imitation and broad action coverage. We then fine-tune pre-trained policies via Timestep-Modulated Reinforcement Learning (TMRL), which trains the agent to dynamically adjust this conditioning during fine-tuning by modulating the diffusion timestep, granting explicit control over exploration. Integrating seamlessly with arbitrary policy inputs, e.g., states, 3D point clouds, or image-based VLA policies, we show that TMRL improves RL fine-tuning sample efficiency. Notably, TMRL enables successful real-world fine-tuning on complex manipulation tasks in under one hour. Videos and code available at https://weirdlabuw.github.io/tmrl/.

preprint2025arXiv

PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies

A significant challenge for robot learning research is our ability to accurately measure and compare the performance of robot policies. Benchmarking in robotics is historically challenging due to the stochasticity, reproducibility, and time-consuming nature of real-world rollouts. This challenge is exacerbated for recent generalist policies, which has to be evaluated across a wide variety of scenes and tasks. Evaluation in simulation offers a scalable complement to real world evaluations, but the visual and physical domain gap between existing simulation benchmarks and the real world has made them an unreliable signal for policy improvement. Furthermore, building realistic and diverse simulated environments has traditionally required significant human effort and expertise. To bridge the gap, we introduce Policy Evaluation and Environment Reconstruction in Simulation (PolaRiS), a scalable real-to-sim framework for high-fidelity simulated robot evaluation. PolaRiS utilizes neural reconstruction methods to turn short video scans of real-world scenes into interactive simulation environments. Additionally, we develop a simple simulation data co-training recipe that bridges remaining real-to-sim gaps and enables zero-shot evaluation in unseen simulation environments. Through extensive paired evaluations between simulation and the real world, we demonstrate that PolaRiS evaluations provide a much stronger correlation to real world generalist policy performance than existing simulated benchmarks. Its simplicity also enables rapid creation of diverse simulated environments. As such, this work takes a step towards distributed and democratized evaluation for the next generation of robotic foundation models.

preprint2023arXiv

Multiform Evolution for High-Dimensional Problems with Low Effective Dimensionality

In this paper, we scale evolutionary algorithms to high-dimensional optimization problems that deceptively possess a low effective dimensionality (certain dimensions do not significantly affect the objective function). To this end, an instantiation of the multiform optimization paradigm is presented, where multiple low-dimensional counterparts of a target high-dimensional task are generated via random embeddings. Since the exact relationship between the auxiliary (low-dimensional) tasks and the target is a priori unknown, a multiform evolutionary algorithm is developed for unifying all formulations into a single multi-task setting. The resultant joint optimization enables the target task to efficiently reuse solutions evolved across various low-dimensional searches via cross-form genetic transfers, hence speeding up overall convergence characteristics. To validate the overall efficacy of our proposed algorithmic framework, comprehensive experimental studies are carried out on well-known continuous benchmark functions as well as a set of practical problems in the hyper-parameter tuning of machine learning models and deep learning models in classification tasks and Predator-Prey games, respectively.

preprint2023arXiv

Pendulum Actuated Spherical Robot: Dynamic Modeling & Analysis for Wobble & Precession

A spherical robot has many practical advantages as the entire electronics are protected within a hull and can be carried easily by any Unmanned Aerial Vehicle (UAV). However, its use is limited due to finding mounts for sensors. Pendulum actuated spherical robot provides space for mounting sensors at the yoke. We study the non-linear dynamics of a pendulum-actuated spherical robot to analyze the dynamics of internal assembly (yoke) for mounting sensors. For such robots, we provide a coupled dynamic model that takes care of the relationship between forward and sideways motion. We further demonstrate the effects of wobbling and precession captured by our model when the bot is controlled to execute a turning maneuver while moving with a moderate forward velocity, a practical situation encountered by spherical robots moving in an indoor setting. A simulation setup based on the developed model provides visualization of the spherical robot motion.

preprint2023arXiv

Wobble control of a pendulum actuated spherical robot

Spherical robots can conduct surveillance in hostile, cluttered environments without being damaged, as their protective shell can safely house sensors such as cameras. However, lateral oscillations, also known as wobble, occur when these sphere-shaped robots operate at low speeds, leading to shaky camera feedback. These oscillations in a pendulum-actuated spherical robot are caused by the coupling between the forward and steering motions due to nonholonomic constraints. Designing a controller to limit wobbling in these robots is challenging due to their underactuated nature. We propose a model-based controller to navigate a pendulum-actuated spherical robot using wobble-free turning maneuvers consisting of circular arcs and straight lines. The model is developed using Lagrange-D'Alembert equations and accounts for the coupled forward and steering motions. The model is further analyzed to derive expressions for radius of curvature, precession rate, wobble amplitude, and wobble frequency during circular motions. Finally, we design an input-output feedback linearization-based controller to control the robot's heading direction and wobble. Overall, the proposed controller enables a teleoperator to command a specific forward velocity and pendulum angle as per the desired turning radius while limiting the robot's lateral oscillations to enhance the quality of camera feedback.

preprint2022arXiv

Autonomous Reinforcement Learning: Formalism and Benchmarking

Reinforcement learning (RL) provides a naturalistic framing for learning through trial and error, which is appealing both because of its simplicity and effectiveness and because of its resemblance to how humans and animals acquire skills through experience. However, real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world, whereas common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms, such as robots. In this paper, we aim to address this discrepancy by laying out a framework for Autonomous Reinforcement Learning (ARL): reinforcement learning where the agent not only learns through its own experience, but also contends with lack of human supervision to reset between trials. We introduce a simulated benchmark EARL around this framework, containing a set of diverse and challenging simulated tasks reflective of the hurdles introduced to learning when only a minimal reliance on extrinsic intervention can be assumed. We show that standard approaches to episodic RL and existing approaches struggle as interventions are minimized, underscoring the need for developing new algorithms for reinforcement learning with a greater focus on autonomy.

preprint2022arXiv

Combining Varied Learners for Binary Classification using Stacked Generalization

The Machine Learning has various learning algorithms that are better in some or the other aspect when compared with each other but a common error that all algorithms will suffer from is training data with very high dimensional feature set. This usually ends up algorithms into generalization error that deplete the performance. This can be solved using an Ensemble Learning method known as Stacking commonly termed as Stacked Generalization. In this paper we perform binary classification using Stacked Generalization on high dimensional Polycystic Ovary Syndrome dataset and prove the point that model becomes generalized and metrics improve significantly. The various metrics are given in this paper that also point out a subtle transgression found with Receiver Operating Characteristic Curve that was proved to be incorrect.

preprint2022arXiv

Demonstration-Bootstrapped Autonomous Practicing via Multi-Task Reinforcement Learning

Reinforcement learning systems have the potential to enable continuous improvement in unstructured environments, leveraging data collected autonomously. However, in practice these systems require significant amounts of instrumentation or human intervention to learn in the real world. In this work, we propose a system for reinforcement learning that leverages multi-task reinforcement learning bootstrapped with prior data to enable continuous autonomous practicing, minimizing the number of resets needed while being able to learn temporally extended behaviors. We show how appropriately provided prior data can help bootstrap both low-level multi-task policies and strategies for sequencing these tasks one after another to enable learning with minimal resets. This mechanism enables our robotic system to practice with minimal human intervention at training time while being able to solve long horizon tasks at test time. We show the efficacy of the proposed system on a challenging kitchen manipulation task both in simulation and in the real world, demonstrating the ability to practice autonomously in order to solve temporally extended problems.

preprint2022arXiv

Detection of Tool based Edited Images from Error Level Analysis and Convolutional Neural Network

Image Forgery is a problem of image forensics and its detection can be leveraged using Deep Learning. In this paper we present an approach for identification of authentic and tampered images done using image editing tools with Error Level Analysis and Convolutional Neural Network. The process is performed on CASIA ITDE v2 dataset and trained for 50 and 100 epochs respectively. The respective accuracies of the training and validation sets are represented using graphs.

preprint2022arXiv

Discriminant Analysis in Contrasting Dimensions for Polycystic Ovary Syndrome Prognostication

A lot of prognostication methodologies have been formulated for early detection of Polycystic Ovary Syndrome also known as PCOS using Machine Learning. PCOS is a binary classification problem. Dimensionality Reduction methods impact the performance of Machine Learning to a greater extent and using a Supervised Dimensionality Reduction method can give us a new edge to tackle this problem. In this paper we present Discriminant Analysis in different dimensions with Linear and Quadratic form for binary classification along with metrics. We were able to achieve good accuracy and less variation with Discriminant Analysis as compared to many commonly used classification algorithms with training accuracy reaching 97.37% and testing accuracy of 95.92% using Quadratic Discriminant Analysis. Paper also gives the analysis of data with visualizations for deeper understanding of problem.

preprint2022arXiv

Effects of Parametric and Non-Parametric Methods on High Dimensional Sparse Matrix Representations

The semantics are derived from textual data that provide representations for Machine Learning algorithms. These representations are interpretable form of high dimensional sparse matrix that are given as an input to the machine learning algorithms. Since learning methods are broadly classified as parametric and non-parametric learning methods, in this paper we provide the effects of these type of algorithms on the high dimensional sparse matrix representations. In order to derive the representations from the text data, we have considered TF-IDF representation with valid reason in the paper. We have formed representations of 50, 100, 500, 1000 and 5000 dimensions respectively over which we have performed classification using Linear Discriminant Analysis and Naive Bayes as parametric learning method, Decision Tree and Support Vector Machines as non-parametric learning method. We have later provided the metrics on every single dimension of the representation and effect of every single algorithm detailed in this paper.

preprint2022arXiv

Forged Image Detection using SOTA Image Classification Deep Learning Methods for Image Forensics with Error Level Analysis

The advancement in the area of computer vision has been brought using deep learning mechanisms. Image Forensics is one of the major areas of computer vision application. Forgery of images is sub-category of image forensics and can be detected using Error Level Analysis. Using such images as an input, this can turn out to be a binary classification problem which can be leveraged using variations of convolutional neural networks. In this paper we perform transfer learning with state-of-the-art image classification models over error level analysis induced CASIA ITDE v.2 dataset. The algorithms used are VGG-19, Inception-V3, ResNet-152-V2, XceptionNet and EfficientNet-V2L with their respective methodologies and results.

preprint2022arXiv

HAA4D: Few-Shot Human Atomic Action Recognition via 3D Spatio-Temporal Skeletal Alignment

Human actions involve complex pose variations and their 2D projections can be highly ambiguous. Thus 3D spatio-temporal or 4D (i.e., 3D+T) human skeletons, which are photometric and viewpoint invariant, are an excellent alternative to 2D+T skeletons/pixels to improve action recognition accuracy. This paper proposes a new 4D dataset HAA4D which consists of more than 3,300 RGB videos in 300 human atomic action classes. HAA4D is clean, diverse, class-balanced where each class is viewpoint-balanced with the use of 4D skeletons, in which as few as one 4D skeleton per class is sufficient for training a deep recognition model. Further, the choice of atomic actions makes annotation even easier, because each video clip lasts for only a few seconds. All training and testing 3D skeletons in HAA4D are globally aligned, using a deep alignment model to the same global space, making each skeleton face the negative z-direction. Such alignment makes matching skeletons more stable by reducing intraclass variations and thus with fewer training samples per class needed for action recognition. Given the high diversity and skeletal alignment in HAA4D, we construct the first baseline few-shot 4D human atomic action recognition network without bells and whistles, which produces comparable or higher performance than relevant state-of-the-art techniques relying on embedded space encoding without explicit skeletal alignment, using the same small number of training samples of unseen classes.

preprint2022arXiv

Half a Dozen Real-World Applications of Evolutionary Multitasking, and More

Until recently, the potential to transfer evolved skills across distinct optimization problem instances (or tasks) was seldom explored in evolutionary computation. The concept of evolutionary multitasking (EMT) fills this gap. It unlocks a population's implicit parallelism to jointly solve a set of tasks, hence creating avenues for skills transfer between them. Despite it being early days, the idea of EMT has begun to show promise in a range of real-world applications. In the backdrop of recent advances, the contribution of this paper is twofold. First, a review of several application-oriented explorations of EMT in the literature is presented; the works are assimilated into half a dozen broad categories according to their respective application domains. Each of these six categories elaborates fundamental motivations to multitask, and contains a representative experimental study (referred from the literature). Second, a set of recipes is provided showing how problem formulations of general interest, those that cut across different disciplines, could be transformed in the new light of EMT. Our discussions emphasize the many practical use-cases of EMT, and is intended to spark future research towards crafting novel algorithms for real-world deployment.

preprint2022arXiv

Incentive Design and Profit Sharing in Multi-modal Transportation Network

We consider the situation where multiple transportation service providers cooperate to offer an integrated multi-modal platform to enhance the convenience to the passengers through ease in multi-modal journey planning, payment, and first and last mile connectivity. This market structure allows the multi-modal platform to coordinate profits across modes and also provide incentives to the passengers. Accordingly, in this paper, we use cooperative game theory coupled with the hyperpath-based stochastic user equilibrium framework to study such a market. We assume that the platform sets incentives (price discount or excess charge on passengers) along every edge in the transportation network. We derive the continuity and monotonicity properties of the equilibrium flow with respect to the incentives along every edge. The optimal incentives that maximize the profit of the platform are obtained through a two time-scale stochastic approximation algorithm. We use the asymmetric Nash bargaining solution to design a fair profit sharing scheme among the service providers. We show that the profit for each service provider increases after cooperation on such a platform. Finally, we complement the theoretical results through two numerical simulations.

preprint2022arXiv

Interference Constrained Beam Alignment for Time-Varying Channels via Kernelized Bandits

To fully utilize the abundant spectrum resources in millimeter wave (mmWave), Beam Alignment (BA) is necessary for large antenna arrays to achieve large array gains. In practical dynamic wireless environments, channel modeling is challenging due to time-varying and multipath effects. In this paper, we formulate the beam alignment problem as a non-stationary online learning problem with the objective to maximize the received signal strength under interference constraint. In particular, we employ the non-stationary kernelized bandit to leverage the correlation among beams and model the complex beamforming and multipath channel functions. Furthermore, to mitigate interference to other user equipment, we leverage the primal-dual method to design a constrained UCB-type kernelized bandit algorithm. Our theoretical analysis indicates that the proposed algorithm can adaptively adjust the beam in time-varying environments, such that both the cumulative regret of the received signal and constraint violations have sublinear bounds with respect to time. This result is of independent interest for applications such as adaptive pricing and news ranking. In addition, the algorithm assumes the channel is a black-box function and does not require any prior knowledge for dynamic channel modeling, and thus is applicable in a variety of scenarios. We further show that if the information about the channel variation is known, the algorithm will have better theoretical guarantees and performance. Finally, we conduct simulations to highlight the effectiveness of the proposed algorithm.

preprint2022arXiv

Performance Comparison of Simple Transformer and Res-CNN-BiLSTM for Cyberbullying Classification

The task of text classification using Bidirectional based LSTM architectures is computationally expensive and time consuming to train. For this, transformers were discovered which effectively give good performance as compared to the traditional deep learning architectures. In this paper we present a performance based comparison between simple transformer based network and Res-CNN-BiLSTM based network for cyberbullying text classification problem. The results obtained show that transformer we trained with 0.65 million parameters has significantly being able to beat the performance of Res-CNN-BiLSTM with 48.82 million parameters for faster training speeds and more generalized metrics. The paper also compares the 1-dimensional character level embedding network and 100-dimensional glove embedding network with transformer.

preprint2022arXiv

Prediction of Maneuvering Status for Aerial Vehicles using Supervised Learning Methods

Aerial Vehicles follow a guided approach based on Latitude, Longitude and Altitude. This information can be used for calculating the status of maneuvering for the aerial vehicles along the line of trajectory. This is a binary classification problem and Machine Learning can be leveraged for solving such problem. In this paper we present a methodology for deriving maneuvering status and its prediction using Linear, Distance Metric, Discriminant Analysis and Boosting Ensemble supervised learning methods. We provide various metrics along the line in the results section that give condensed comparison of the appropriate algorithm for prediction of the maneuvering status.

preprint2022arXiv

Requisites on viscoelasticity for exceptional points in passive elastodynamic metamaterials

The recent progress of non-Hermitian physics and the notion of exceptional point (EP) degeneracies in elastodynamics has led to the development of novel metamaterials for the control of elastic wave propagation, hypersensitive sensors, and actuators. The emergence of EPs in a Parity-Time symmetric system relies on judiciously engineered balanced gain and loss mechanisms. Creating gain requires complex circuits and amplification mechanisms, making engineering applications challenging. Here, we report strategies to achieve EPs in passive non-Hermitian elastodynamic systems with differential loss derived from viscoelastic materials. We compare different viscoelastic material models and show that the EP emerges only when the frequency-dependent loss-tangent of the viscoelastic material remains nearly constant in the frequency range of operation. Such type of loss tangent occurs in materials that undergo stress-relaxation over a broad spectrum of relaxation times, for example, materials that follow the Kelvin-Voigt fractional derivative (KVFD) model. Using dynamic mechanical analysis, we show that a few common viscoelastic elastomers such as Polydimethylsiloxane (PDMS) and polyurethane rubber follow the KVFD behavior such that the loss tangent becomes almost constant after a particular frequency. The material models we present and the demonstration of the potential of a widely available material system in creating EPs pave the way for developing non-Hermitian metamaterials with hypersensitivity to perturbations or enhanced emissivity.

preprint2022arXiv

Res-CNN-BiLSTM Network for overcoming Mental Health Disturbances caused due to Cyberbullying through Social Media

Mental Health Disturbance has many reasons and cyberbullying is one of the major causes that does exploitation using social media as an instrument. The cyberbullying is done on the basis of Religion, Ethnicity, Age and Gender which is a sensitive psychological issue. This can be addressed using Natural Language Processing with Deep Learning, since social media is the medium and it generates massive form of data in textual form. Such data can be leveraged to find the semantics and derive what type of cyberbullying is done and who are the people involved for early measures. Since deriving semantics is essential we proposed a Hybrid Deep Learning Model named 1-Dimensional CNN-Bidirectional-LSTMs with Residuals shortly known as Res-CNN-BiLSTM. In this paper we have proposed the architecture and compared its performance with different approaches of Embedding Deep Learning Algorithms.

preprint2022arXiv

Residual-Concatenate Neural Network with Deep Regularization Layers for Binary Classification

Many complex Deep Learning models are used with different variations for various prognostication tasks. The higher learning parameters not necessarily ensure great accuracy. This can be solved by considering changes in very deep models with many regularization based techniques. In this paper we train a deep neural network that uses many regularization layers with residual and concatenation process for best fit with Polycystic Ovary Syndrome Diagnosis prognostication. The network was built with improvements from every step of failure to meet the needs of the data and achieves an accuracy of 99.3% seamlessly.

preprint2022arXiv

Safe Model-based Off-policy Reinforcement Learning for Eco-Driving in Connected and Automated Hybrid Electric Vehicles

Connected and Automated Hybrid Electric Vehicles have the potential to reduce fuel consumption and travel time in real-world driving conditions. The eco-driving problem seeks to design optimal speed and power usage profiles based upon look-ahead information from connectivity and advanced mapping features. Recently, Deep Reinforcement Learning (DRL) has been applied to the eco-driving problem. While the previous studies synthesize simulators and model-free DRL to reduce online computation, this work proposes a Safe Off-policy Model-Based Reinforcement Learning algorithm for the eco-driving problem. The advantages over the existing literature are three-fold. First, the combination of off-policy learning and the use of a physics-based model improves the sample efficiency. Second, the training does not require any extrinsic rewarding mechanism for constraint satisfaction. Third, the feasibility of trajectory is guaranteed by using a safe set approximated by deep generative models. The performance of the proposed method is benchmarked against a baseline controller representing human drivers, a previously designed model-free DRL strategy, and the wait-and-see optimal solution. In simulation, the proposed algorithm leads to a policy with a higher average speed and a better fuel economy compared to the model-free agent. Compared to the baseline controller, the learned strategy reduces the fuel consumption by more than 21\% while keeping the average speed comparable.

preprint2022arXiv

Scalable Transfer Evolutionary Optimization: Coping with Big Task Instances

In today's digital world, we are faced with an explosion of data and models produced and manipulated by numerous large-scale cloud-based applications. Under such settings, existing transfer evolutionary optimization frameworks grapple with simultaneously satisfying two important quality attributes, namely (1) scalability against a growing number of source tasks and (2) online learning agility against sparsity of relevant sources to the target task of interest. Satisfying these attributes shall facilitate practical deployment of transfer optimization to scenarios with big task-instances, while curbing the threat of negative transfer. While applications of existing algorithms are limited to tens of source tasks, in this paper, we take a quantum leap forward in enabling more than two orders of magnitude scale-up in the number of tasks; i.e., we efficiently handle scenarios beyond 1000 source task-instances. We devise a novel transfer evolutionary optimization framework comprising two co-evolving species for joint evolutions in the space of source knowledge and in the search space of solutions to the target problem. In particular, co-evolution enables the learned knowledge to be orchestrated on the fly, expediting convergence in the target optimization task. We have conducted an extensive series of experiments across a set of practically motivated discrete and continuous optimization examples comprising a large number of source task-instances, of which only a small fraction indicate source-target relatedness. The experimental results show that not only does our proposed framework scale efficiently with a growing number of source tasks but is also effective in capturing relevant knowledge against sparsity of related sources, fulfilling the two salient features of scalability and online learning agility.

preprint2022arXiv

State of AI Ethics Report (Volume 6, February 2022)

This report from the Montreal AI Ethics Institute (MAIEI) covers the most salient progress in research and reporting over the second half of 2021 in the field of AI ethics. Particular emphasis is placed on an "Analysis of the AI Ecosystem", "Privacy", "Bias", "Social Media and Problematic Information", "AI Design and Governance", "Laws and Regulations", "Trends", and other areas covered in the "Outside the Boxes" section. The two AI spotlights feature application pieces on "Constructing and Deconstructing Gender with AI-Generated Art" as well as "Will an Artificial Intellichef be Cooking Your Next Meal at a Michelin Star Restaurant?". Given MAIEI's mission to democratize AI, submissions from external collaborators have featured, such as pieces on the "Challenges of AI Development in Vietnam: Funding, Talent and Ethics" and using "Representation and Imagination for Preventing AI Harms". The report is a comprehensive overview of what the key issues in the field of AI ethics were in 2021, what trends are emergent, what gaps exist, and a peek into what to expect from the field of AI ethics in 2022. It is a resource for researchers and practitioners alike in the field to set their research and development agendas to make contributions to the field of AI ethics.

preprint2022arXiv

Succinct Differentiation of Disparate Boosting Ensemble Learning Methods for Prognostication of Polycystic Ovary Syndrome Diagnosis

Prognostication of medical problems using the clinical data by leveraging the Machine Learning techniques with stellar precision is one of the most important real world challenges at the present time. Considering the medical problem of Polycystic Ovary Syndrome also known as PCOS is an emerging problem in women aged from 15 to 49. Diagnosing this disorder by using various Boosting Ensemble Methods is something we have presented in this paper. A detailed and compendious differentiation between Adaptive Boost, Gradient Boosting Machine, XGBoost and CatBoost with their respective performance metrics highlighting the hidden anomalies in the data and its effects on the result is something we have presented in this paper. Metrics like Confusion Matrix, Precision, Recall, F1 Score, FPR, RoC Curve and AUC have been used in this paper.

preprint2022arXiv

Superior mechanical properties by exploiting size-effects and multiscale interactions in hierarchically architected foams

Protective applications in extreme environments demand thermally stable materials with superior modulus, strength, and specific energy absorption (SEA) at lightweight. However, these properties typically have a trade-off. Hierarchically architected materials--such as the architected vertically aligned carbon nanotube (VACNT) foams--offer the potential to overcome these trade-offs to achieve synergistic enhancement in mechanical properties. Here, we adopt a full-factorial design of experiments (DOE) approach to optimize multitier design parameters to achieve synergistic enhancement in SEA, strength, and modulus at lightweight in VACNT foams with mesoscale cylindrical architecture. We exploit the size effects from geometrically-confined synthesis and the highly interactive morphology of CNTs to enable higher-order design parameter interactions that intriguingly break the diameter-to-thickness (D/t)-dependent scaling laws found in common tubular architected materials. We show that exploiting complementary hierarchical mechanisms in architected material design can lead to unprecedented synergistic enhancement of mechanical properties and performance desirable for extreme protective applications.

preprint2022arXiv

The Golden Circle: Creating Socio-technical Alignment in Content Moderation

This paper outlines a conceptual framework titled The Golden Circle that describes the roles of actors at individual, organizational, and societal levels, and their dynamics in the content moderation ecosystem. Centering harm reduction and context moderation, it argues that the ML community must attend to multimodal content moderation solutions, align their work with their organizations' goals and values, and pay attention to the ever changing social contexts in which their sociotechnical systems are embedded. This is done by accounting for the why, how, and what of content moderation from a sociological and technical lens.

preprint2022arXiv

Weighted Gaussian Process Bandits for Non-stationary Environments

In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.

preprint2021arXiv

An Algorithm to Warm Start Perturbed (WASP) Constrained Dynamic Programs

Receding horizon optimal control problems compute the solution at each time step to operate the system on a near-optimal path. However, in many practical cases, the boundary conditions, such as external inputs, constraint equations, or the objective function, vary only marginally from one time step to the next. In this case, recomputing the optimal solution at each time represents a significant burden for real-time applications. This paper proposes a novel algorithm to approximately solve a perturbed constrained dynamic program that significantly improves the computational burden when the objective function and the constraints are perturbed slightly. The method hinges on determining closed-form expressions for first-order perturbations in the optimal strategy and the Lagrange multipliers of the perturbed constrained dynamic programming problem are obtained. This information can be used to initialize any algorithm (such as the method of Lagrange multipliers, or the augmented Lagrangian method) to solve the perturbed dynamic programming problem with minimal computational resources.

preprint2021arXiv

Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence

This paper develops a unified framework, based on iterated random operator theory, to analyze the convergence of constant stepsize recursive stochastic algorithms (RSAs). RSAs use randomization to efficiently compute expectations, and so their iterates form a stochastic process. The key idea of our analysis is to lift the RSA into an appropriate higher-dimensional space and then express it as an equivalent Markov chain. Instead of determining the convergence of this Markov chain (which may not converge under constant stepsize), we study the convergence of the distribution of this Markov chain. To study this, we define a new notion of Wasserstein divergence. We show that if the distribution of the iterates in the Markov chain satisfy a contraction property with respect to the Wasserstein divergence, then the Markov chain admits an invariant distribution. We show that convergence of a large family of constant stepsize RSAs can be understood using this framework, and we provide several detailed examples.

preprint2021arXiv

Coverage Analysis of Broadcast Networks with Users Having Heterogeneous Content/Advertisement Preferences

This work is focused on the system-level performance of a broadcast network. Since all transmitters in a broadcast network transmit the identical signal, received signals from multiple transmitters can be combined to improve system performance. We develop a stochastic geometry based analytical framework to derive the coverage of a typical receiver. We show that there may exist an optimal connectivity radius that maximizes the rate coverage. Our analysis includes the fact that users may have their individual content/advertisement preferences. We assume that there are multiple classes of users with each user class prefers a particular type of content/advertisements and the users will pay the network only when then can see content aligned with their interest. The operator may choose to transmit multiple contents simultaneously to cater more users' interests to increase its revenue. We present revenue models to study the impact of the number of contents on the operator revenue. We consider two scenarios for users' distribution: one where users' interest depends on their geographical location and the one where it doesn't. With the help of numerical results and analysis, we show the impact of various parameters including content granularity, connectivity radius, and rate threshold and present important design insights.

preprint2021arXiv

Making Responsible AI the Norm rather than the Exception

This report prepared by the Montreal AI Ethics Institute provides recommendations in response to the National Security Commission on Artificial Intelligence (NSCAI) Key Considerations for Responsible Development and Fielding of Artificial Intelligence document. The report centres on the idea that Responsible AI should be made the Norm rather than an Exception. It does so by utilizing the guiding principles of: (1) alleviating friction in existing workflows, (2) empowering stakeholders to get buy-in, and (3) conducting an effective translation of abstract standards into actionable engineering practices. After providing some overarching comments on the document from the NSCAI, the report dives into the primary contribution of an actionable framework to help operationalize the ideas presented in the document from the NSCAI. The framework consists of: (1) a learning, knowledge, and information exchange (LKIE), (2) the Three Ways of Responsible AI, (3) an empirically-driven risk-prioritization matrix, and (4) achieving the right level of complexity. All components reinforce each other to move from principles to practice in service of making Responsible AI the norm rather than the exception.

preprint2021arXiv

Rank Reduction in Bimatrix Games

The rank of a bimatrix game is defined as the rank of the sum of the payoff matrices of the two players. The rank of a game is known to impact both the most suitable computation methods for determining a solution and the expressive power of the game. Under certain conditions on the payoff matrices, we devise a method that reduces the rank of the game without changing the equilibrium of the game. We leverage matrix pencil theory and the Wedderburn rank reduction formula to arrive at our results. We also present a constructive proof of the fact that in a generic square game, the rank of the game can be reduced by 1, and in generic rectangular game, the rank of the game can be reduced by 2 under certain assumptions.

Abhishek Gupta

What is connected

Connect this record

See the researcher in context

Building this map preview

74 published item(s)

Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care

Learning to Compress Time-to-Control: A Reinforcement Learning Framework for Chronic Disease Management

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning

PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies

Multiform Evolution for High-Dimensional Problems with Low Effective Dimensionality

Pendulum Actuated Spherical Robot: Dynamic Modeling & Analysis for Wobble & Precession

Wobble control of a pendulum actuated spherical robot

Autonomous Reinforcement Learning: Formalism and Benchmarking

Combining Varied Learners for Binary Classification using Stacked Generalization

Demonstration-Bootstrapped Autonomous Practicing via Multi-Task Reinforcement Learning

Detection of Tool based Edited Images from Error Level Analysis and Convolutional Neural Network

Discriminant Analysis in Contrasting Dimensions for Polycystic Ovary Syndrome Prognostication

Effects of Parametric and Non-Parametric Methods on High Dimensional Sparse Matrix Representations

Forged Image Detection using SOTA Image Classification Deep Learning Methods for Image Forensics with Error Level Analysis

HAA4D: Few-Shot Human Atomic Action Recognition via 3D Spatio-Temporal Skeletal Alignment

Half a Dozen Real-World Applications of Evolutionary Multitasking, and More

Incentive Design and Profit Sharing in Multi-modal Transportation Network

Interference Constrained Beam Alignment for Time-Varying Channels via Kernelized Bandits

Performance Comparison of Simple Transformer and Res-CNN-BiLSTM for Cyberbullying Classification

Prediction of Maneuvering Status for Aerial Vehicles using Supervised Learning Methods

Requisites on viscoelasticity for exceptional points in passive elastodynamic metamaterials

Res-CNN-BiLSTM Network for overcoming Mental Health Disturbances caused due to Cyberbullying through Social Media

Residual-Concatenate Neural Network with Deep Regularization Layers for Binary Classification

Safe Model-based Off-policy Reinforcement Learning for Eco-Driving in Connected and Automated Hybrid Electric Vehicles

Scalable Transfer Evolutionary Optimization: Coping with Big Task Instances

State of AI Ethics Report (Volume 6, February 2022)

Succinct Differentiation of Disparate Boosting Ensemble Learning Methods for Prognostication of Polycystic Ovary Syndrome Diagnosis

Superior mechanical properties by exploiting size-effects and multiscale interactions in hierarchically architected foams

The Golden Circle: Creating Socio-technical Alignment in Content Moderation

Weighted Gaussian Process Bandits for Non-stationary Environments

An Algorithm to Warm Start Perturbed (WASP) Constrained Dynamic Programs

Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence

Coverage Analysis of Broadcast Networks with Users Having Heterogeneous Content/Advertisement Preferences

Making Responsible AI the Norm rather than the Exception

Rank Reduction in Bimatrix Games

BISTRO: Berkeley Integrated System for Transportation Optimization

Comprehensiveness of Archives: A Modern AI-enabled Approach to Build Comprehensive Shared Cultural Heritage

Coverage Improvement of Wireless Sensor Networks via Spatial Profile Information

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Ecological Reinforcement Learning

Fleet Sizing and Charger Allocation in Electric Vehicle Sharing Systems

Green Lighting ML: Confidentiality, Integrity, and Availability of Machine Learning Systems in Deployment

Montreal AI Ethics Institute's (MAIEI) Submission to the World Intellectual Property Organization (WIPO) Conversation on Intellectual Property (IP) and Artificial Intelligence (AI) Second Session

Montreal AI Ethics Institute's Response to Scotland's AI Strategy

Multi-Objective Vehicle Rebalancing for Ridehailing System using a Reinforcement Learning Approach

On the Coverage Performance of Boolean-Poisson Cluster Models for Wireless Sensor Networks

Participatory Design to build better contact- and proximity-tracing apps

Reciprocal Collision Avoidance for General Nonlinear Agents using Reinforcement Learning

Report prepared by the Montreal AI Ethics Institute In Response to Mila's Proposal for a Contact Tracing App

Response by the Montreal AI Ethics Institute to the European Commission's Whitepaper on AI

Response by the Montreal AI Ethics Institute to the Santa Clara Principles on Transparency and Accountability in Online Content Moderation

Response to Office of the Privacy Commissioner of Canada Consultation Proposals pertaining to amendments to PIPEDA relative to Artificial Intelligence

Road Grade Estimation Using Crowd-Sourced Smartphone Data

SECure: A Social and Environmental Certificate for AI Systems

Some Limit Properties of Markov Chains Induced by Stochastic Recursive Algorithms

Steady-state fluctuations of a genetic feedback loop with fluctuating rate parameters using the unified colored noise approximation

The Ingredients of Real-World Robotic Reinforcement Learning

The Social Contract for AI

The State of AI Ethics Report (June 2020)

Unified Characterization Platform for Emerging NVM Technology: Neural Network Application Benchmarking Using off-the-shelf NVM Chips

Unsupervised Meta-Learning for Reinforcement Learning

Addressing Expensive Multi-objective Games with Postponed Preference Articulation via Memetic Co-evolution

Genetic Transfer or Population Diversification? Deciphering the Secret Ingredients of Evolutionary Multitask Optimization

Learning Dexterous Manipulation Policies from Experience and Imitation

Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer

Modeling and Analyzing Millimeter Wave Cellular Systems

Search by Ideal Candidates: Next Generation of Talent Search at LinkedIn

Common Information based Markov Perfect Equilibria for Linear-Gaussian Games with Asymmetric Information

On the Existence of Optimal Policies for a Class of Static and Sequential Dynamic Teams

The Topology of Information on the Space of Probability Measures over Polish Spaces

The Topology of Information on the Space of Probability Measures over Product of Polish Spaces

Nash Equilibria for Stochastic Games with Asymmetric Information-Part 1: Finite Games

Multiple View Reconstruction of Calibrated Images using Singular Value Decomposition