Researcher profile

Shehroz S. Khan

Shehroz S. Khan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2023arXiv

Bag of States: A Non-sequential Approach to Video-based Engagement Measurement

Automatic measurement of student engagement provides helpful information for instructors to meet learning program objectives and individualize program delivery. Students' behavioral and emotional states need to be analyzed at fine-grained time scales in order to measure their level of engagement. Many existing approaches have developed sequential and spatiotemporal models, such as recurrent neural networks, temporal convolutional networks, and three-dimensional convolutional neural networks, for measuring student engagement from videos. These models are trained to incorporate the order of behavioral and emotional states of students into video analysis and output their level of engagement. In this paper, backed by educational psychology, we question the necessity of modeling the order of behavioral and emotional states of students in measuring their engagement. We develop bag-of-words-based models in which only the occurrence of behavioral and emotional states of students is modeled and analyzed and not the order in which they occur. Behavioral and affective features are extracted from videos and analyzed by the proposed models to determine the level of engagement in an ordinal-output classification setting. Compared to the existing sequential and spatiotemporal approaches for engagement measurement, the proposed non-sequential approach improves the state-of-the-art results. According to experimental results, our method significantly improved engagement level classification accuracy on the IIITB Online SE dataset by 26% compared to sequential models and achieved engagement level classification accuracy as high as 66.58% on the DAiSEE student engagement dataset.

preprint2023arXiv

Inconsistencies in the Definition and Annotation of Student Engagement in Virtual Learning Datasets: A Critical Review

Background: Student engagement (SE) in virtual learning can have a major impact on meeting learning objectives and program dropout risks. Developing Artificial Intelligence (AI) models for automatic SE measurement requires annotated datasets. However, existing SE datasets suffer from inconsistent definitions and annotation protocols mostly unaligned with the definition of SE in educational psychology. This issue could be misleading in developing generalizable AI models and make it hard to compare the performance of these models developed on different datasets. The objective of this critical review was to explore the existing SE datasets and highlight inconsistencies in terms of differing engagement definitions and annotation protocols. Methods: Several academic databases were searched for publications introducing new SE datasets. The datasets containing students' single- or multi-modal data in online or offline computer-based virtual learning sessions were included. The definition and annotation of SE in the existing datasets were analyzed based on our defined seven dimensions of engagement annotation: sources, data modalities, timing, temporal resolution, level of abstraction, combination, and quantification. Results: Thirty SE measurement datasets met the inclusion criteria. The reviewed SE datasets used very diverse and inconsistent definitions and annotation protocols. Unexpectedly, very few of the reviewed datasets used existing psychometrically validated scales in their definition of SE. Discussion: The inconsistent definition and annotation of SE are problematic for research on developing comparable AI models for automatic SE measurement. Some of the existing SE definitions and protocols in settings other than virtual learning that have the potential to be used in virtual learning are introduced.

preprint2023arXiv

Privacy-Protecting Behaviours of Risk Detection in People with Dementia using Videos

People living with dementia often exhibit behavioural and psychological symptoms of dementia that can put their and others' safety at risk. Existing video surveillance systems in long-term care facilities can be used to monitor such behaviours of risk to alert the staff to prevent potential injuries or death in some cases. However, these behaviours of risk events are heterogeneous and infrequent in comparison to normal events. Moreover, analyzing raw videos can also raise privacy concerns. In this paper, we present two novel privacy-protecting video-based anomaly detection approaches to detect behaviours of risks in people with dementia. We either extracted body pose information as skeletons or used semantic segmentation masks to replace multiple humans in the scene with their semantic boundaries. Our work differs from most existing approaches for video anomaly detection that focus on appearance-based features, which can put the privacy of a person at risk and is also susceptible to pixel-based noise, including illumination and viewing direction. We used anonymized videos of normal activities to train customized spatio-temporal convolutional autoencoders and identify behaviours of risk as anomalies. We showed our results on a real-world study conducted in a dementia care unit with patients with dementia, containing approximately 21 hours of normal activities data for training and 9 hours of data containing normal and behaviours of risk events for testing. We compared our approaches with the original RGB videos and obtained a similar area under the receiver operating characteristic curve performance of 0.807 for the skeleton-based approach and 0.823 for the segmentation mask-based approach.

preprint2022arXiv

Multi Visual Modality Fall Detection Dataset

Falls are one of the leading cause of injury-related deaths among the elderly worldwide. Effective detection of falls can reduce the risk of complications and injuries. Fall detection can be performed using wearable devices or ambient sensors; these methods may struggle with user compliance issues or false alarms. Video cameras provide a passive alternative; however, regular RGB cameras are impacted by changing lighting conditions and privacy concerns. From a machine learning perspective, developing an effective fall detection system is challenging because of the rarity and variability of falls. Many existing fall detection datasets lack important real-world considerations, such as varied lighting, continuous activities of daily living (ADLs), and camera placement. The lack of these considerations makes it difficult to develop predictive models that can operate effectively in the real world. To address these limitations, we introduce a novel multi-modality dataset (MUVIM) that contains four visual modalities: infra-red, depth, RGB and thermal cameras. These modalities offer benefits such as obfuscated facial features and improved performance in low-light conditions. We formulated fall detection as an anomaly detection problem, in which a customized spatio-temporal convolutional autoencoder was trained only on ADLs so that a fall would increase the reconstruction error. Our results showed that infra-red cameras provided the highest level of performance (AUC ROC=0.94), followed by thermal (AUC ROC=0.87), depth (AUC ROC=0.86) and RGB (AUC ROC=0.83). This research provides a unique opportunity to analyze the utility of camera modalities in detecting falls in a home setting while balancing performance, passiveness, and privacy.

preprint2022arXiv

Supervised Contrastive Learning for Detecting Anomalous Driving Behaviours from Multimodal Videos

Distracted driving is one of the major reasons for vehicle accidents. Therefore, detecting distracted driving behaviors is of paramount importance to reduce the millions of deaths and injuries occurring worldwide. Distracted or anomalous driving behaviors are deviations from 'normal' driving that need to be identified correctly to alert the driver. However, these driving behaviors do not comprise one specific type of driving style and their distribution can be different during the training and test phases of a classifier. We formulate this problem as a supervised contrastive learning approach to learn a visual representation to detect normal, and seen and unseen anomalous driving behaviors. We made a change to the standard contrastive loss function to adjust the similarity of negative pairs to aid the optimization. Normally, in a (self) supervised contrastive framework, the projection head layers are omitted during the test phase as the encoding layers are considered to contain general visual representative information. However, we assert that for a video-based supervised contrastive learning task, including a projection head can be beneficial. We showed our results on a driver anomaly detection dataset that contains 783 minutes of video recordings of normal and anomalous driving behaviors of 31 drivers from the various top and front cameras (both depth and infrared). Out of 9 video modalities combinations, our proposed contrastive approach improved the ROC AUC on 6 in comparison to the baseline models (from 4.23% to 8.91% for different modalities). We performed statistical tests that showed evidence that our proposed method performs better than the baseline contrastive learning setup. Finally, the results showed that the fusion of depth and infrared modalities from the top and front views achieved the best AUC ROC of 0.9738 and AUC PR of 0.9772.

preprint2020arXiv

DeepFall -- Non-invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders

Human falls rarely occur; however, detecting falls is very important from the health and safety perspective. Due to the rarity of falls, it is difficult to employ supervised classification techniques to detect them. Moreover, in these highly skewed situations it is also difficult to extract domain specific features to identify falls. In this paper, we present a novel framework, \textit{DeepFall}, which formulates the fall detection problem as an anomaly detection problem. The \textit{DeepFall} framework presents the novel use of deep spatio-temporal convolutional autoencoders to learn spatial and temporal features from normal activities using non-invasive sensing modalities. We also present a new anomaly scoring method that combines the reconstruction score of frames across a temporal window to detect unseen falls. We tested the \textit{DeepFall} framework on three publicly available datasets collected through non-invasive sensing modalities, thermal camera and depth cameras and show superior results in comparison to traditional autoencoder methods to identify unseen falls.

preprint2020arXiv

initKmix -- A Novel Initial Partition Generation Algorithm for Clustering Mixed Data using k-means-based Clustering

Mixed datasets consist of both numeric and categorical attributes. Various k-means-based clustering algorithms have been developed for these datasets. Generally, these algorithms use random partition as a starting point, which tends to produce different clustering results for different runs. In this paper, we propose, initKmix, a novel algorithm for finding an initial partition for k-means-based clustering algorithms for mixed datasets. In the initKmix algorithm, a k-means-based clustering algorithm is run many times, and in each run, one of the attributes is used to create initial clusters for that run. The clustering results of various runs are combined to produce the initial partition. This initial partition is then used as a seed to a k-means-based clustering algorithm to cluster mixed data. Experiments with various categorical and mixed datasets showed that initKmix produced accurate and consistent results, and outperformed the random initial partition method and other state-of-the-art initialization methods. Experiments also showed that k-means-based clustering for mixed datasets with initKmix performed similar to or better than many state-of-the-art clustering algorithms for categorical and mixed datasets.

preprint2020arXiv

Motion and Region Aware Adversarial Learning for Fall Detection with Thermal Imaging

Automatic fall detection is a vital technology for ensuring the health and safety of people. Home-based camera systems for fall detection often put people's privacy at risk. Thermal cameras can partially or fully obfuscate facial features, thus preserving the privacy of a person. Another challenge is the less occurrence of falls in comparison to the normal activities of daily living. As fall occurs rarely, it is non-trivial to learn algorithms due to class imbalance. To handle these problems, we formulate fall detection as an anomaly detection within an adversarial framework using thermal imaging. We present a novel adversarial network that comprises of two-channel 3D convolutional autoencoders which reconstructs the thermal data and the optical flow input sequences respectively. We introduce a technique to track the region of interest, a region-based difference constraint, and a joint discriminator to compute the reconstruction error. A larger reconstruction error indicates the occurrence of a fall. The experiments on a publicly available thermal fall dataset show the superior results obtained compared to the standard baseline.

preprint2020arXiv

Spatio-Temporal Adversarial Learning for Detecting Unseen Falls

Fall detection is an important problem from both the health and machine learning perspective. A fall can lead to severe injuries, long term impairments or even death in some cases. In terms of machine learning, it presents a severely class imbalance problem with very few or no training data for falls owing to the fact that falls occur rarely. In this paper, we take an alternate philosophy to detect falls in the absence of their training data, by training the classifier on only the normal activities (that are available in abundance) and identifying a fall as an anomaly. To realize such a classifier, we use an adversarial learning framework, which comprises of a spatio-temporal autoencoder for reconstructing input video frames and a spatio-temporal convolution network to discriminate them against original video frames. 3D convolutions are used to learn spatial and temporal features from the input video frames. The adversarial learning of the spatio-temporal autoencoder will enable reconstructing the normal activities of daily living efficiently; thus, rendering detecting unseen falls plausible within this framework. We tested the performance of the proposed framework on camera sensing modalities that may preserve an individual's privacy (fully or partially), such as thermal and depth camera. Our results on three publicly available datasets show that the proposed spatio-temporal adversarial framework performed better than other baseline frame based (or spatial) adversarial learning methods.