Source author record

Bharathan Balaji

Bharathan Balaji appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Systems and Control Human-Computer Interaction Machine Learning Artificial Intelligence Computer Vision cs.CY Computation and Language Distributed, Parallel, and Cluster Computing eess.SY

Catalog footprint

What is connected

10works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

In LLM Reinforcement Fine-Tuning (RFT), curriculum learning drives both efficiency and performance. Yet, current methods externalize curriculum judgment via handcrafted heuristics or auxiliary models, risking misalignment with the policy's training dynamics. In this paper, we introduce METIS (METacognitive Internalized Self-judgment), a novel framework that internalizes curriculum judgment as a native capability. Leveraging a critical observation that within-prompt reward variance effectively gauges prompt informativeness, METIS predicts this metric based on recent training outcomes as lightweight in-context learning examples. This intrinsic self-judgment then dynamically dictates the training allocation. Moreover, METIS closes the loop between judgment and optimization by jointly optimizing the standard RFT rewards and a self-judgment reward. This allows the policy to learn what to learn next, as a form of metacognition. Across extensive discrete and continuous RFT benchmarks from mathematical reasoning, code generation, to agentic function-calling, METIS consistently delivers superior performance while accelerating convergence by up to 67%. By bypassing handcrafted heuristics and auxiliary models, our work establishes a simple, closed-loop, and highly efficient curriculum internalization paradigm for LLM reinforcement fine-tuning.

preprint2026arXiv

SpiderGen: Towards Procedure Generation For Carbon Life Cycle Assessments with Generative AI

Investigating the effects of climate change and global warming caused by GHG emissions have been a key concern worldwide. These emissions are largely contributed to by the production, use and disposal of consumer products. Thus, it is important to build tools to estimate the environmental impact of consumer goods, an essential part of which is conducting Life Cycle Assessments (LCAs). LCAs specify and account for the appropriate processes involved with the production, use, and disposal of the products. We present SpiderGen, an LLM-based workflow which integrates the taxonomy and methodology of traditional LCA with the reasoning capabilities and world knowledge of LLMs to generate graphical representations of the key procedural information used for LCA, known as Product Category Rules Process Flow Graphs (PCR PFGs). We additionally evaluate the output of SpiderGen by comparing it with 65 real-world LCA documents. We find that SpiderGen provides accurate LCA process information that is either fully correct or has minor errors, achieving an F1-Score of 65% across 10 sample data points, as compared to 53% using a one-shot prompting method. We observe that the remaining errors occur primarily due to differences in detail between LCA documents, as well as differences in the "scope" of which auxiliary processes must also be included. We also demonstrate that SpiderGen performs better than several baselines techniques, such as chain-of-thought prompting and one-shot prompting. Finally, we highlight SpiderGen's potential to reduce the human effort and costs for estimating carbon impact, as it is able to produce LCA process information for less than \$1 USD in under 10 minutes as compared to the status quo LCA, which can cost over \$25000 USD and take up to 21-person days.

preprint2025arXiv

RedunCut: Measurement-Driven Sampling and Accuracy Performance Modeling for Low-Cost Live Video Analytics

Live video analytics (LVA) runs continuously across massive camera fleets, but inference cost with modern vision models remains high. To address this, dynamic model size selection (DMSS) is an attractive approach: it is content-aware but treats models as black boxes, and could potentially reduce cost by up to 10x without model retraining or modification. Without ground truth labels at runtime, we observe that DMSS methods use two stages per segment: (i) sampling a few models to calculate prediction statistics (e.g., confidences), then (ii) selection of the model size from those statistics. Prior systems fail to generalize to diverse workloads, particularly to mobile videos and lower accuracy targets. We identify that the failure modes stem from inefficient sampling whose cost exceeds its benefit, and inaccurate per-segment accuracy prediction. In this work, we present RedunCut, a new DMSS system that addresses both: It uses a measurement-driven planner that estimates the cost-benefit tradeoff of sampling, and a lightweight, data-driven performance model to improve accuracy prediction. Across road-vehicle, drone, and surveillance videos and multiple model families and tasks, RedunCut reduces compute cost by 14-62% at fixed accuracy and remains robust to limited historical data and to drift.

preprint2022arXiv

Context-Aware Streaming Perception in Dynamic Environments

Efficient vision works maximize accuracy under a latency budget. These works evaluate accuracy offline, one image at a time. However, real-time vision applications like autonomous driving operate in streaming settings, where ground truth changes between inference start and finish. This results in a significant accuracy drop. Therefore, a recent work proposed to maximize accuracy in streaming settings on average. In this paper, we propose to maximize streaming accuracy for every environment context. We posit that scenario difficulty influences the initial (offline) accuracy difference, while obstacle displacement in the scene affects the subsequent accuracy degradation. Our method, Octopus, uses these scenario properties to select configurations that maximize streaming accuracy at test time. Our method improves tracking performance (S-MOTA) by 7.4% over the conventional static approach. Further, performance improvement using our method comes in addition to, and not instead of, advances in offline accuracy.

preprint2020arXiv

ACES -- Automatic Configuration of Energy Harvesting Sensors with Reinforcement Learning

Internet of Things forms the backbone of modern building applications. Wireless sensors are being increasingly adopted for their flexibility and reduced cost of deployment. However, most wireless sensors are powered by batteries today and large deployments are inhibited by manual battery replacement. Energy harvesting sensors provide an attractive alternative, but they need to provide adequate quality of service to applications given uncertain energy availability. We propose using reinforcement learning to optimize the operation of energy harvesting sensors to maximize sensing quality with available energy. We present our system ACES that uses reinforcement learning for periodic and event-driven sensing indoors with ambient light energy harvesting. Our custom-built board uses a supercapacitor to store energy temporarily, senses light, motion events and relays them using Bluetooth Low Energy. Using simulations and real deployments, we show that our sensor nodes adapt to their lighting conditions and continuously sends measurements and events across nights and weekends. We use deployment data to continually adapt sensing to changing environmental patterns and transfer learning to reduce the training time in real deployments. In our 60 node deployment lasting two weeks, we observe a dead time of 0.1%. The periodic sensors that measure luminosity have a mean sampling period of 90 seconds and the event sensors that detect motion with PIR captured 86% of the events on average compared to a battery-powered node.

preprint2020arXiv

Quick Question: Interrupting Users for Microtasks with Reinforcement Learning

Human attention is a scarce resource in modern computing. A multitude of microtasks vie for user attention to crowdsource information, perform momentary assessments, personalize services, and execute actions with a single touch. A lot gets done when these tasks take up the invisible free moments of the day. However, an interruption at an inappropriate time degrades productivity and causes annoyance. Prior works have exploited contextual cues and behavioral data to identify interruptibility for microtasks with much success. With Quick Question, we explore use of reinforcement learning (RL) to schedule microtasks while minimizing user annoyance and compare its performance with supervised learning. We model the problem as a Markov decision process and use Advantage Actor Critic algorithm to identify interruptible moments based on context and history of user interactions. In our 5-week, 30-participant study, we compare the proposed RL algorithm against supervised learning methods. While the mean number of responses between both methods is commensurate, RL is more effective at avoiding dismissal of notifications and improves user experience over time.

preprint2016arXiv

Genie: A Longitudinal Study Comparing Physical and Software-augmented Thermostats in Office Buildings

Thermostats are primary interfaces for occupants of office buildings to express their comfort preferences. However, standard thermostats are often ineffective due to inaccessibility, lack of information, or limited responsiveness, leading to occupant discomfort. Software thermostats based on web or smartphone applications provide alternative interfaces to occupants with minimal deployment cost. However, their usage and effectiveness have not been studied extensively in real settings. In this paper we present Genie, a novel software-augmented thermostat that we deployed and studied at our university over a period of 21 months. Our data shows that providing wider thermal control to users does not lead to system abuse and that the effect on energy consumption is minimal while improving comfort and energy awareness. We believe that increased introduction of software thermostats in office buildings will have important effects on comfort and energy consumption and we provide key design recommendations for their implementation and deployment.

preprint2016arXiv

Managing Commercial HVAC Systems: What do Building Operators Really Need?

Buildings form an essential part of modern life; people spend a significant amount of their time in them, and they consume large amounts of energy. A variety of systems provide services such as lighting, air conditioning and security which are managed using Building Management Systems (BMS) by building operators. To better understand the capability of current BMS and characterize common practices of building operators, we investigated their use across five institutions in the US. We interviewed ten operators and discovered that BMS do not address a number of key concerns for the management of buildings. Our analysis is rooted in the everyday work of building operators and highlights a number of design suggestions to help improve the user experience and management of BMS, ultimately leading to improvements in productivity, as well as buildings comfort and energy efficiency.

preprint2016arXiv

Quiver: Using Control Perturbations to Increase the Observability of Sensor Data in Smart Buildings

Modern buildings consist of hundreds of sensors and actuators for monitoring and operation of systems such as HVAC, light and security. To enable portable applications in next generation smart buildings, we need models and standardized ontologies that represent these sensors across diverse types of buildings. Recent research has shown that extracting information such as sensor type with available metadata and timeseries data analysis is difficult due to heterogeneity of systems and lack of support for interoperability. We propose perturbations in the control system as a mechanism to increase the observability of building systems to extract contextual information and develop standardized models. We design Quiver, an experimental framework for actuation of building HVAC system that enables us to perturb the control system safely. Using Quiver, we demonstrate three applications using empirical experiments on a real commercial building: colocation of data points, identification of point type and mapping of dependency between actuators. Our results show that we can colocate data points in HVAC terminal units with 98.4 % accuracy and 63 % coverage. We can identify point types of the terminal units with 85.3 % accuracy. Finally, we map the dependency links between actuators with an accuracy of 73.5 %, with 8.1 % and 18.4 % false positives and false negatives respectively.

preprint2015arXiv

HVACMeter: Apportionment of HVAC Power to Thermal Zones and Air Handler Units

Heating, Ventilation and Air Conditioning (HVAC) systems consume almost half of the total energy use of commercial buildings. To optimize HVAC energy usage, it is important to understand the energy consumption of individual HVAC components at fine granularities. However, buildings typically only have aggregate building level power and thermal meters. We present HVACMeter, a system which leverages existing sensors in commercial HVAC systems to estimate the energy consumed by individual components of the HVAC system, as well by each thermal zone in buildings. HVACMeter can be generalized to any HVAC system as it uses the basic understanding of HVAC operation, heat transfer equations, and historical sensor data to estimate energy. We deploy HVACMeter to three buildings on our campus, to identify the set of sensors that are important for accurately disaggregating energy use at the level of each Air Handler Unit and each thermal zone within these buildings. HVACMeter power estimations have on an average 44.5 % less RMSE than that of mean power estimates. Furthermore, we highlight the usefulness of HVACMeter energy estimation model for a building fault detection application by quantifying the amount of energy that can be saved by fixing particular faults.

Bharathan Balaji

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

SpiderGen: Towards Procedure Generation For Carbon Life Cycle Assessments with Generative AI

RedunCut: Measurement-Driven Sampling and Accuracy Performance Modeling for Low-Cost Live Video Analytics

Context-Aware Streaming Perception in Dynamic Environments

ACES -- Automatic Configuration of Energy Harvesting Sensors with Reinforcement Learning

Quick Question: Interrupting Users for Microtasks with Reinforcement Learning

Genie: A Longitudinal Study Comparing Physical and Software-augmented Thermostats in Office Buildings

Managing Commercial HVAC Systems: What do Building Operators Really Need?

Quiver: Using Control Perturbations to Increase the Observability of Sensor Data in Smart Buildings

HVACMeter: Apportionment of HVAC Power to Thermal Zones and Air Handler Units