Source author record

Arash Tavakoli

Arash Tavakoli appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Human-Computer Interaction Computer Vision Multiagent Systems quant-ph Robotics

Catalog footprint

What is connected

9works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Can Vision-Language Models Understand Construction Workers? An Exploratory Study

As robotics become increasingly integrated into construction workflows, their ability to interpret and respond to human behavior will be essential for enabling safe and effective collaboration. Vision-Language Models (VLMs) have emerged as a promising tool for visual understanding tasks and offer the potential to recognize human behaviors without extensive domain-specific training. This capability makes them particularly appealing in the construction domain, where labeled data is scarce and monitoring worker actions and emotional states is critical for safety and productivity. In this study, we evaluate the performance of three leading VLMs, GPT-4o, Florence 2, and LLaVa-1.5, in detecting construction worker actions and emotions from static site images. Using a curated dataset of 1,000 images annotated across ten action and ten emotion categories, we assess each model's outputs through standardized inference pipelines and multiple evaluation metrics. GPT-4o consistently achieved the highest scores across both tasks, with an average F1-score of 0.756 and accuracy of 0.799 in action recognition, and an F1-score of 0.712 and accuracy of 0.773 in emotion recognition. Florence 2 performed moderately, with F1-scores of 0.497 for action and 0.414 for emotion, while LLaVa-1.5 showed the lowest overall performance, with F1-scores of 0.466 for action and 0.461 for emotion. Confusion matrix analyses revealed that all models struggled to distinguish semantically close categories, such as collaborating in teams versus communicating with supervisors. While the results indicate that general-purpose VLMs can offer a baseline capability for human behavior recognition in construction environments, further improvements, such as domain adaptation, temporal modeling, or multimodal sensing, may be needed for real-world reliability.

preprint2022arXiv

Driver State Modeling through Latent Variable State Space Framework in the Wild

Analyzing the impact of the environment on drivers' stress level and workload is of high importance for designing human-centered driver-vehicle interaction systems and to ultimately help build a safer driving experience. However, driver's state, including stress level and workload, are psychological constructs that cannot be measured on their own and should be estimated through sensor measurements such as psychophysiological measures. We propose using a latent-variable state-space modeling framework for driver state analysis. By using latent-variable state-space models, we model drivers' workload and stress levels as latent variables estimated through multimodal human sensing data, under the perturbations of the environment in a state-space format and in a holistic manner. Through using a case study of multimodal driving data collected from 11 participants, we first estimate the latent stress level and workload of drivers from their heart rate, gaze measures, and intensity of facial action units. We then show that external contextual elements such as the number of vehicles as a proxy for traffic density and secondary task demands may be associated with changes in driver's stress levels and workload. We also show that different drivers may be impacted differently by the aforementioned perturbations. We found out that drivers' latent states at previous timesteps are highly associated with their current states. Additionally, we discuss the utility of state-space models in analyzing the possible lag between the two constructs of stress level and workload, which might be indicative of information transmission between the different parts of the driver's psychophysiology in the wild.

preprint2022arXiv

On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks

Capturing aleatoric uncertainty is a critical part of many machine learning systems. In deep learning, a common approach to this end is to train a neural network to estimate the parameters of a heteroscedastic Gaussian distribution by maximizing the logarithm of the likelihood function under the observed data. In this work, we examine this approach and identify potential hazards associated with the use of log-likelihood in conjunction with gradient-based optimizers. First, we present a synthetic example illustrating how this approach can lead to very poor but stable parameter estimates. Second, we identify the culprit to be the log-likelihood loss, along with certain conditions that exacerbate the issue. Third, we present an alternative formulation, termed $β$-NLL, in which each data point's contribution to the loss is weighted by the $β$-exponentiated variance estimate. We show that using an appropriate $β$ largely mitigates the issue in our illustrative example. Fourth, we evaluate this approach on a range of domains and tasks and show that it achieves considerable improvements and performs more robustly concerning hyperparameters, both in predictive RMSE and log-likelihood criteria.

preprint2022arXiv

Orchestrated Value Mapping for Reinforcement Learning

We present a general convergent class of reinforcement learning algorithms that is founded on two distinct principles: (1) mapping value estimates to a different space using arbitrary functions from a broad class, and (2) linearly decomposing the reward signal into multiple channels. The first principle enables incorporating specific properties into the value estimator that can enhance learning. The second principle, on the other hand, allows for the value function to be represented as a composition of multiple utility functions. This can be leveraged for various purposes, e.g. dealing with highly varying reward scales, incorporating a priori knowledge about the sources of reward, and ensemble learning. Combining the two principles yields a general blueprint for instantiating convergent algorithms by orchestrating diverse mapping functions over multiple reward channels. This blueprint generalizes and subsumes algorithms such as Q-Learning, Log Q-Learning, and Q-Decomposition. In addition, our convergence proof for this general class relaxes certain required assumptions in some of these algorithms. Based on our theory, we discuss several interesting configurations as special cases. Finally, to illustrate the potential of the design space that our theory opens up, we instantiate a particular algorithm and evaluate its performance on the Atari suite.

preprint2022arXiv

The Impact of Surrounding Road Objects and Conditions on Drivers Abrupt Heart Rate Changes

Recent studies have pointed out the importance of mitigating drivers stress and negative emotions. These studies show that certain road objects such as big vehicles might be associated with higher stress levels based on drivers subjective stress measures. Additionally, research shows strong correlations between drivers stress levels and increased heart rate (HR). In this paper, based on a naturalistic multimodal driving dataset, we analyze the visual scenes of driving in the vicinity of abrupt increases in drivers HR for the presence of certain stress-inducing road objects. We show that the probability of the presence of such objects increases when becoming closer to the abrupt increase in drivers HR. Additionally, we show that drivers facial engagement changes significantly in the vicinity of abrupt increases in HR. Our results lay the ground for a human-centered driving experience by detecting and mitigating drivers stress levels in the wild.

preprint2022arXiv

Time Limits in Reinforcement Learning

In reinforcement learning, it is common to let an agent interact for a fixed amount of time with its environment before resetting it and repeating the process in a series of episodes. The task that the agent has to learn can either be to maximize its performance over (i) that fixed period, or (ii) an indefinite period where time limits are only used during training to diversify experience. In this paper, we provide a formal account for how time limits could effectively be handled in each of the two cases and explain why not doing so can cause state aliasing and invalidation of experience replay, leading to suboptimal policies and training instability. In case (i), we argue that the terminations due to time limits are in fact part of the environment, and thus a notion of the remaining time should be included as part of the agent's input to avoid violation of the Markov property. In case (ii), the time limits are not part of the environment and are only used to facilitate learning. We argue that this insight should be incorporated by bootstrapping from the value of the state at the end of each partial episode. For both cases, we illustrate empirically the significance of our considerations in improving the performance and stability of existing reinforcement learning algorithms, showing state-of-the-art results on several control tasks.

preprint2020arXiv

A neural network oracle for quantum nonlocality problems in networks

Characterizing quantum nonlocality in networks is a challenging, but important problem. Using quantum sources one can achieve distributions which are unattainable classically. A key point in investigations is to decide whether an observed probability distribution can be reproduced using only classical resources. This causal inference task is challenging even for simple networks, both analytically and using standard numerical techniques. We propose to use neural networks as numerical tools to overcome these challenges, by learning the classical strategies required to reproduce a distribution. As such, the neural network acts as an oracle, demonstrating that a behavior is classical if it can be learned. We apply our method to several examples in the triangle configuration. After demonstrating that the method is consistent with previously known results, we give solid evidence that the distribution presented in [N. Gisin, Entropy 21(3), 325 (2019)] is indeed nonlocal as conjectured. Finally we examine the genuinely nonlocal distribution presented in [M.-O. Renou et al., PRL 123, 140401 (2019)], and, guided by the findings of the neural network, conjecture nonlocality in a new range of parameters in these distributions. The method allows us to get an estimate on the noise robustness of all examined distributions.

preprint2020arXiv

Exploring Restart Distributions

We consider the generic approach of using an experience memory to help exploration by adapting a restart distribution. That is, given the capacity to reset the state with those corresponding to the agent's past observations, we help exploration by promoting faster state-space coverage via restarting the agent from a more diverse set of initial states, as well as allowing it to restart in states associated with significant past experiences. This approach is compatible with both on-policy and off-policy methods. However, a caveat is that altering the distribution of initial states could change the optimal policies when searching within a restricted class of policies. To reduce this unsought learning bias, we evaluate our approach in deep reinforcement learning which benefits from the high representational capacity of deep neural networks. We instantiate three variants of our approach, each inspired by an idea in the context of experience replay. Using these variants, we show that performance gains can be achieved, especially in hard exploration problems.

preprint2016arXiv

Multiplayer Games for Learning Multirobot Coordination Algorithms

Humans have an impressive ability to solve complex coordination problems in a fully distributed manner. This ability, if learned as a set of distributed multirobot coordination strategies, can enable programming large groups of robots to collaborate towards complex coordination objectives in a way similar to humans. Such strategies would offer robustness, adaptability, fault-tolerance, and, importantly, distributed decision-making. To that end, we have designed a networked gaming platform to investigate human group behavior, specifically in solving complex collaborative coordinated tasks. Through this platform, we are able to limit the communication, sensing, and actuation capabilities provided to the players. With the aim of learning coordination algorithms for robots in mind, we define these capabilities to mimic those of a simple ground robot.

Arash Tavakoli

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Can Vision-Language Models Understand Construction Workers? An Exploratory Study

Driver State Modeling through Latent Variable State Space Framework in the Wild

On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks

Orchestrated Value Mapping for Reinforcement Learning

The Impact of Surrounding Road Objects and Conditions on Drivers Abrupt Heart Rate Changes

Time Limits in Reinforcement Learning

A neural network oracle for quantum nonlocality problems in networks

Exploring Restart Distributions

Multiplayer Games for Learning Multirobot Coordination Algorithms