Source author record

Rajib Rana

Rajib Rana appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound Machine Learning cs.CY Human-Computer Interaction Computation and Language Information Theory math.IT Quantitative Methods Computer Vision Networking and Internet Architecture Other Computer Science Artificial Intelligence Neurons and Cognition physics.med-ph Social and Information Networks Systems and Control

Catalog footprint

What is connected

32works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Emotion Intensity and its Control for Emotional Voice Conversion

Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity. In EVC, emotions are usually treated as discrete categories overlooking the fact that speech also conveys emotions with various intensity levels that the listener can perceive. In this paper, we aim to explicitly characterize and control the intensity of emotion. We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding. We further learn the actual emotion encoder from an emotion-labelled database and study the use of relative attributes to represent fine-grained emotion intensity. To ensure emotional intelligibility, we incorporate emotion classification loss and emotion embedding similarity loss into the training of the EVC network. As desired, the proposed network controls the fine-grained emotion intensity in the output speech. Through both objective and subjective evaluations, we validate the effectiveness of the proposed network for emotional expressiveness and emotion intensity control.

preprint2022arXiv

Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition

Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack generalisation across different conditions. A key underlying reason for poor generalisation is the scarcity of emotion datasets, which is a significant roadblock to designing robust machine learning (ML) models. Recent works in SER focus on utilising multitask learning (MTL) methods to improve generalisation by learning shared representations. However, most of these studies propose MTL solutions with the requirement of meta labels for auxiliary tasks, which limits the training of SER systems. This paper proposes an MTL framework (MTL-AUG) that learns generalised representations from augmented data. We utilise augmentation-type classification and unsupervised reconstruction as auxiliary tasks, which allow training SER systems on augmented data without requiring any meta labels for auxiliary tasks. The semi-supervised nature of MTL-AUG allows for the exploitation of the abundant unlabelled data to further boost the performance of SER. We comprehensively evaluate the proposed framework in the following settings: (1) within corpus, (2) cross-corpus and cross-language, (3) noisy speech, (4) and adversarial attacks. Our evaluations using the widely used IEMOCAP, MSP-IMPROV, and EMODB datasets show improved results compared to existing state-of-the-art methods.

preprint2022arXiv

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

Despite the recent advancement in speech emotion recognition (SER) within a single corpus setting, the performance of these SER systems degrades significantly for cross-corpus and cross-language scenarios. The key reason is the lack of generalisation in SER systems towards unseen conditions, which causes them to perform poorly in cross-corpus and cross-language settings. Recent studies focus on utilising adversarial methods to learn domain generalised representation for improving cross-corpus and cross-language SER to address this issue. However, many of these methods only focus on cross-corpus SER without addressing the cross-language SER performance degradation due to a larger domain gap between source and target language data. This contribution proposes an adversarial dual discriminator (ADDi) network that uses the three-players adversarial game to learn generalised representations without requiring any target data labels. We also introduce a self-supervised ADDi (sADDi) network that utilises self-supervised pre-training with unlabelled data. We propose synthetic data generation as a pretext task in sADDi, enabling the network to produce emotionally discriminative and domain invariant representations and providing complementary synthetic data to augment the system. The proposed model is rigorously evaluated using five publicly available datasets in three languages and compared with multiple studies on cross-corpus and cross-language SER. Experimental results demonstrate that the proposed model achieves improved performance compared to the state-of-the-art methods.

preprint2022arXiv

Speech Synthesis with Mixed Emotions

Emotional speech synthesis aims to synthesize human voices with various emotional effects. The current studies are mostly focused on imitating an averaged style belonging to a specific emotion type. In this paper, we seek to generate speech with a mixture of emotions at run-time. We propose a novel formulation that measures the relative difference between the speech samples of different emotions. We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework. During the training, the framework does not only explicitly characterize emotion styles, but also explores the ordinal nature of emotions by quantifying the differences with other emotions. At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector. The objective and subjective evaluations have validated the effectiveness of the proposed framework. To our best knowledge, this research is the first study on modelling, synthesizing, and evaluating mixed emotions in speech.

preprint2021arXiv

A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Reinforcement Learning (RL) is a semi-supervised learning paradigm which an agent learns by interacting with an environment. Deep learning in combination with RL provides an efficient method to learn how to interact with the environment is called Deep Reinforcement Learning (deep RL). Deep RL has gained tremendous success in gaming - such as AlphaGo, but its potential have rarely being explored for challenging tasks like Speech Emotion Recognition (SER). The deep RL being used for SER can potentially improve the performance of an automated call centre agent by dynamically learning emotional-aware response to customer queries. While the policy employed by the RL agent plays a major role in action selection, there is no current RL policy tailored for SER. In addition, extended learning period is a general challenge for deep RL which can impact the speed of learning for SER. Therefore, in this paper, we introduce a novel policy - "Zeta policy" which is tailored for SER and apply Pre-training in deep RL to achieve faster learning rate. Pre-training with cross dataset was also studied to discover the feasibility of pre-training the RL Agent with a similar dataset in a scenario of where no real environmental data is not available. IEMOCAP and SAVEE datasets were used for the evaluation with the problem being to recognize four emotions happy, sad, angry and neutral in the utterances provided. Experimental results show that the proposed "Zeta policy" performs better than existing policies. The results also support that pre-training can reduce the training time upon reducing the warm-up period and is robust to cross-corpus scenario.

preprint2020arXiv

Augmenting Generative Adversarial Networks for Speech Emotion Recognition

Generative adversarial networks (GANs) have shown potential in learning emotional attributes and generating new data samples. However, their performance is usually hindered by the unavailability of larger speech emotion recognition (SER) data. In this work, we propose a framework that utilises the mixup data augmentation scheme to augment the GAN in feature learning and generation. To show the effectiveness of the proposed framework, we present results for SER on (i) synthetic feature vectors, (ii) augmentation of the training data with synthetic features, (iii) encoded features in compressed representation. Our results show that the proposed framework can effectively learn compressed emotional representations as well as it can generate synthetic samples that help improve performance in within-corpus and cross-corpus evaluation.

preprint2020arXiv

Automated Screening for Distress: A Perspective for the Future

Distress is a complex condition which affects a significant percentage of cancer patients and may lead to depression, anxiety, sadness, suicide and other forms of psychological morbidity. Compelling evidence supports screening for distress as a means of facilitating early intervention and subsequent improvements in psychological well-being and overall quality of life. Nevertheless, despite the existence of evidence based and easily administered screening tools, for example, the Distress Thermometer, routine screening for distress is yet to achieve widespread implementation. Efforts are intensifying to utilise innovative, cost effective methods now available through emerging technologies in the informatics and computational arenas.

preprint2020arXiv

Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition

Speech emotion recognition systems (SER) can achieve high accuracy when the training and test data are identically distributed, but this assumption is frequently violated in practice and the performance of SER systems plummet against unforeseen data shifts. The design of robust models for accurate SER is challenging, which limits its use in practical applications. In this paper we propose a deeper neural network architecture wherein we fuse DenseNet, LSTM and Highway Network to learn powerful discriminative features which are robust to noise. We also propose data augmentation with our network architecture to further improve the robustness. We comprehensively evaluate the architecture coupled with data augmentation against (1) noise, (2) adversarial attacks and (3) cross-corpus settings. Our evaluations on the widely used IEMOCAP and MSP-IMPROV datasets show promising results when compared with existing studies and state-of-the-art models.

preprint2020arXiv

Deep Reinforcement Learning with Pre-training for Time-efficient Training of Automatic Speech Recognition

Deep reinforcement learning (deep RL) is a combination of deep learning with reinforcement learning principles to create efficient methods that can learn by interacting with its environment. This has led to breakthroughs in many complex tasks, such as playing the game "Go", that were previously difficult to solve. However, deep RL requires significant training time making it difficult to use in various real-life applications such as Human-Computer Interaction (HCI). In this paper, we study pre-training in deep RL to reduce the training time and improve the performance of Speech Recognition, a popular application of HCI. To evaluate the performance improvement in training we use the publicly available "Speech Command" dataset, which contains utterances of 30 command keywords spoken by 2,618 speakers. Results show that pre-training with deep RL offers faster convergence compared to non-pre-trained RL while achieving improved speech recognition accuracy.

preprint2020arXiv

Direct Modelling of Speech Emotion from Raw Speech

Speech emotion recognition is a challenging task and heavily depends on hand-engineered acoustic features, which are typically crafted to echo human perception of speech signals. However, a filter bank that is designed from perceptual evidence is not always guaranteed to be the best in a statistical modelling framework where the end goal is for example emotion classification. This has fuelled the emerging trend of learning representations from raw speech especially using deep learning neural networks. In particular, a combination of Convolution Neural Networks (CNNs) and Long Short Term Memory (LSTM) have gained great traction for the intrinsic property of LSTM in learning contextual information crucial for emotion recognition; and CNNs been used for its ability to overcome the scalability problem of regular neural networks. In this paper, we show that there are still opportunities to improve the performance of emotion recognition from the raw speech by exploiting the properties of CNN in modelling contextual information. We propose the use of parallel convolutional layers to harness multiple temporal resolutions in the feature extraction block that is jointly trained with the LSTM based classification network for the emotion recognition task. Our results suggest that the proposed model can reach the performance of CNN trained with hand-engineered features from both IEMOCAP and MSP-IMPROV datasets.

preprint2020arXiv

Guided Generative Adversarial Neural Network for Representation Learning and High Fidelity Audio Generation using Fewer Labelled Audio Data

Recent improvements in Generative Adversarial Neural Networks (GANs) have shown their ability to generate higher quality samples as well as to learn good representations for transfer learning. Most of the representation learning methods based on GANs learn representations ignoring their post-use scenario, which can lead to increased generalisation ability. However, the model can become redundant if it is intended for a specific task. For example, assume we have a vast unlabelled audio dataset, and we want to learn a representation from this dataset so that it can be used to improve the emotion recognition performance of a small labelled audio dataset. During the representation learning training, if the model does not know the post emotion recognition task, it can completely ignore emotion-related characteristics in the learnt representation. This is a fundamental challenge for any unsupervised representation learning model. In this paper, we aim to address this challenge by proposing a novel GAN framework: Guided Generative Neural Network (GGAN), which guides a GAN to focus on learning desired representations and generating superior quality samples for audio data leveraging fewer labelled samples. Experimental results show that using a very small amount of labelled data as guidance, a GGAN learns significantly better representations.

preprint2020arXiv

Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition

Inspite the emerging importance of Speech Emotion Recognition (SER), the state-of-the-art accuracy is quite low and needs improvement to make commercial applications of SER viable. A key underlying reason for the low accuracy is the scarcity of emotion datasets, which is a challenge for developing any robust machine learning model in general. In this paper, we propose a solution to this problem: a multi-task learning framework that uses auxiliary tasks for which data is abundantly available. We show that utilisation of this additional data can improve the primary task of SER for which only limited labelled data is available. In particular, we use gender identifications and speaker recognition as auxiliary tasks, which allow the use of very large datasets, e.g., speaker classification datasets. To maximise the benefit of multi-task learning, we further use an adversarial autoencoder (AAE) within our framework, which has a strong capability to learn powerful and discriminative features. Furthermore, the unsupervised AAE in combination with the supervised classification networks enables semi-supervised learning which incorporates a discriminative component in the AAE unsupervised training pipeline. This semi-supervised learning essentially helps to improve generalisation of our framework and thus leads to improvements in SER performance. The proposed model is rigorously evaluated for categorical and dimensional emotion, and cross-corpus scenarios. Experimental results demonstrate that the proposed model achieves state-of-the-art performance on two publicly available datasets.

preprint2020arXiv

Phonocardiographic Sensing using Deep Learning for Abnormal Heartbeat Detection

Cardiac auscultation involves expert interpretation of abnormalities in heart sounds using stethoscope. Deep learning based cardiac auscultation is of significant interest to the healthcare community as it can help reducing the burden of manual auscultation with automated detection of abnormal heartbeats. However, the problem of automatic cardiac auscultation is complicated due to the requirement of reliability and high accuracy, and due to the presence of background noise in the heartbeat sound. In this work, we propose a Recurrent Neural Networks (RNNs) based automated cardiac auscultation solution. Our choice of RNNs is motivated by the great success of deep learning in medical applications and by the observation that RNNs represent the deep learning configuration most suitable for dealing with sequential or temporal data even in the presence of noise. We explore the use of various RNN models, and demonstrate that these models deliver the abnormal heartbeat classification score with significant improvement. Our proposed approach using RNNs can be potentially be used for real-time abnormal heartbeat detection in the Internet of Medical Things for remote monitoring applications.

preprint2020arXiv

Transfer Learning for Improving Speech Emotion Classification Accuracy

The majority of existing speech emotion recognition research focuses on automatic emotion detection using training and testing data from same corpus collected under the same conditions. The performance of such systems has been shown to drop significantly in cross-corpus and cross-language scenarios. To address the problem, this paper exploits a transfer learning technique to improve the performance of speech emotion recognition systems that is novel in cross-language and cross-corpus scenarios. Evaluations on five different corpora in three different languages show that Deep Belief Networks (DBNs) offer better accuracy than previous approaches on cross-corpus emotion recognition, relative to a Sparse Autoencoder and SVM baseline system. Results also suggest that using a large number of languages for training and using a small fraction of the target data in training can significantly boost accuracy compared with baseline also for the corpus with limited training examples.

preprint2020arXiv

Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study

Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective features is crucial. Currently, handcrafted features are mostly used for speech emotion recognition, however, features learned automatically using deep learning have shown strong success in many problems, especially in image processing. In particular, deep generative models such as Variational Autoencoders (VAEs) have gained enormous success for generating features for natural images. Inspired by this, we propose VAEs for deriving the latent representation of speech signals and use this representation to classify emotions. To the best of our knowledge, we are the first to propose VAEs for speech emotion classification. Evaluations on the IEMOCAP dataset demonstrate that features learned by VAEs can produce state-of-the-art results for speech emotion classification.

preprint2016arXiv

Emotion Classification from Noisy Speech - A Deep Learning Approach

This paper investigates the performance of Deep Learning for speech emotion classification when the speech is compounded with noise. It reports on the classification accuracy and concludes with the future directions for achieving greater robustness for emotion recognition from noisy speech.

preprint2016arXiv

Gated Recurrent Unit (GRU) for Emotion Classification from Noisy Speech

Despite the enormous interest in emotion classification from speech, the impact of noise on emotion classification is not well understood. This is important because, due to the tremendous advancement of the smartphone technology, it can be a powerful medium for speech emotion recognition in the outside laboratory natural environment, which is likely to incorporate background noise in the speech. We capitalize on the current breakthrough of Recurrent Neural Network (RNN) and seek to investigate its performance for emotion classification from noisy speech. We particularly focus on the recently proposed Gated Recurrent Unit (GRU), which is yet to be explored for emotion recognition from speech. Experiments conducted with speech compounded with eight different types of noises reveal that GRU incurs an 18.16% smaller run-time while performing quite comparably to the Long Short-Term Memory (LSTM), which is the most popular Recurrent Neural Network proposed to date. This result is promising for any embedded platform in general and will initiate further studies to utilize GRU to its full potential for emotion recognition on smartphones.

preprint2015arXiv

Affect Sensing on Smartphone - Possibilities of Understanding Cognitive Decline in Aging Population

Due to increasing sensing capacity, smartphones offer unprecedented opportunity to monitor human health. Affect sensing is one such essential monitoring that can be achieved on smartphones. Information about affect can be useful for many modern applications. In particular, it can be potentially used for understanding cognitive decline in aging population. In this paper we present an overview of the existing literature that offer affect sensing on smartphone platform. Most importantly, we present the challenges that need to be addressed to make affect sensing on smartphone a reality.

preprint2015arXiv

Evaluating the Performance of BSBL Methodology for EEG Source Localization On a Realistic Head Model

Source localization in EEG represents a high dimensional inverse problem, which is severely ill-posed by nature. Fortunately, sparsity constraints have come into rescue as it helps solving the ill-posed problems when the signal is sparse. When the signal has a structure such as block structure, consideration of block sparsity produces better results. Knowing sparse Bayesian learning is an important member in the family of sparse recovery, and a superior choice when the projection matrix is highly coherent (which is typical the case for EEG), in this work we evaluate the performance of block sparse Bayesian learning (BSBL) method for EEG source localization. It is already accepted by the EEG community that a group of dipoles rather than a single dipole are activated during brain activities; thus, block structure is a reasonable choice for EEG. In this work we use two definitions of blocks: Brodmann areas and automated anatomical labelling (AAL), and analyze the reconstruction performance of BSBL methodology for them. A realistic head model is used for the experiment, which was obtained from segmentation of MRI images. When the number of simultaneously active blocks is 2, the BSBL produces overall localization accuracy of less than 5 mm without the presence of noise. The presence of more than 3 simultaneously active blocks and noise significantly affect the localization performance. Consideration of AAL based blocks results more accurate source localization in comparison to Brodmann area based blocks.

preprint2015arXiv

Gait Velocity Estimation using time interleaved between Consecutive Passive IR Sensor Activations

Gait velocity has been consistently shown to be an important indicator and predictor of health status, especially in older adults. It is often assessed clinically, but the assessments occur infrequently and do not allow optimal detection of key health changes when they occur. In this paper, we show that the time gap between activations of a pair of Passive Infrared (PIR) motion sensors installed in the consecutively visited room pair carry rich latent information about a person's gait velocity. We name this time gap transition time and show that despite a six second refractory period of the PIR sensors, transition time can be used to obtain an accurate representation of gait velocity. Using a Support Vector Regression (SVR) approach to model the relationship between transition time and gait velocity, we show that gait velocity can be estimated with an average error less than 2.5 cm/sec. This is demonstrated with data collected over a 5 year period from 74 older adults monitored in their own homes. This method is simple and cost effective and has advantages over competing approaches such as: obtaining 20 to 100x more gait velocity measurements per day and offering the fusion of location-specific information with time stamped gait estimates. These advantages allow stable estimates of gait parameters (maximum or average speed, variability) at shorter time scales than current approaches. This also provides a pervasive in-home method for context-aware gait velocity sensing that allows for monitoring of gait trajectories in space and time.

preprint2015arXiv

Opportunistic and Context-aware Affect Sensing on Smartphones: The Concept, Challenges and Opportunities

Opportunistic affect sensing offers unprecedented potential for capturing spontaneous affect ubiquitously, obviating biases inherent in the laboratory setting. Facial expression and voice are two major affective displays, however most affect sensing systems on smartphone avoid them due to extensive power requirement. Encouragingly, due to the recent advent of low-power DSP (Digital Signal Processing) co-processor and GPU (Graphics Processing Unit) technology, audio and video sensing are becoming more feasible. To properly evaluate opportunistically captured facial expression and voice, contextual information about the dynamic audio-visual stimuli needs to be inferred. This paper discusses recent advances of affect sensing on the smartphone and identifies the key barriers and potential solutions of implementing opportunistic and context-aware affect sensing on smartphone platforms.

preprint2015arXiv

Sparse Bayesian Learning for EEG Source Localization

Purpose: Localizing the sources of electrical activity from electroencephalographic (EEG) data has gained considerable attention over the last few years. In this paper, we propose an innovative source localization method for EEG, based on Sparse Bayesian Learning (SBL). Methods: To better specify the sparsity profile and to ensure efficient source localization, the proposed approach considers grouping of the electrical current dipoles inside human brain. SBL is used to solve the localization problem in addition with imposed constraint that the electric current dipoles associated with the brain activity are isotropic. Results: Numerical experiments are conducted on a realistic head model that is obtained by segmentation of MRI images of the head and includes four major components, namely the scalp, the skull, the cerebrospinal fluid (CSF) and the brain, with appropriate relative conductivity values. The results demonstrate that the isotropy constraint significantly improves the performance of SBL. In a noiseless environment, the proposed method was 1 found to accurately (with accuracy of >75%) locate up to 6 simultaneously active sources, whereas for SBL without the isotropy constraint, the accuracy of finding just 3 simultaneously active sources was <75%. Conclusions: Compared to the state-of-the-art algorithms, the proposed method is potentially more consistent in specifying the sparsity profile of human brain activity and is able to produce better source localization for EEG.

preprint2015arXiv

wHealth - Transforming Telehealth Services

A worldwide increase in proportions of older people in the population poses the challenge of managing their increasing healthcare needs within limited resources. To achieve this many countries are interested in adopting telehealth technology. Several shortcomings of state-of-the-art telehealth technology constrain widespread adoption of telehealth services. We present an ensemble-sensing framework - wHealth (short form of wireless health) for effective delivery of telehealth services. It extracts personal health information using sensors embedded in everyday devices and allows effective and seamless communication between patients and clinicians. Due to the non-stigmatizing design, ease of maintenance, simplistic interaction and seamless intervention, our wHealth platform has the potential to enable widespread adoption of telehealth services for managing elderly healthcare. We discuss the key barriers and potential solutions to make the wHealth platform a reality.

preprint2014arXiv

Continuous Gait Velocity Estimation using Houseohld Motion Detectors

Gait velocity has been consistently shown to be an important indicator and predictor of health status, especially in older adults. Gait velocity is often assessed clinically, but the assessments occur infrequently and thus do not allow optimal detection of key health changes when they occur. In this paper, we show the time it takes a person to move between rooms in their home denoted 'transition times' can predict gait velocity when estimated from passive infrared motion detectors installed in a patient's own home. Using a support vector regression approach to model the relationship between transition times and gait velocities, we show that velocity can be predicted with an average error less than 2.5 cm/sec. This is demonstrated with data collected over a 5 year period from 74 older adults monitored in their own homes. This method is simple and cost effective, and has advantages over competing approaches such as: obtaining 20 to100x more gait velocity measurements per day, and offering the fusion of location specific information with time stamped gait estimates. These advantages allow stable estimates of gait parameters (maximum or average speed, variability) at shorter time scales than current approaches. This also provides a pervasive in home method for context aware gait velocity sensing that allows for monitoring of gait trajectories in space and time.

preprint2014arXiv

EEG source localization using a sparsity prior based on Brodmann areas

Localizing the sources of electrical activity in the brain from Electroencephalographic (EEG) data is an important tool for non-invasive study of brain dynamics. Generally, the source localization process involves a high-dimensional inverse problem that has an infinite number of solutions and thus requires additional constraints to be considered to have a unique solution. In the context of EEG source localization, we propose a novel approach that is based on dividing the cerebral cortex of the brain into a finite number of Functional Zones which correspond to unitary functional areas in the brain. In this paper we investigate the use of Brodmanns areas as the Functional Zones. This approach allows us to apply a sparsity constraint to find a unique solution for the inverse EEG problem. Compared to previously published algorithms which use different sparsity constraints to solve this problem, the proposed method is potentially more consistent with the known sparsity profile of the human brain activity and thus may be able to ensure better localization. Numerical experiments are conducted on a realistic head model obtained from segmentation of MRI images of the head and includes four major compartments namely scalp, skull, cerebrospinal fluid (CSF) and brain with relative conductivity values. Three different electrode setups are tested in the numerical experiments.

preprint2014arXiv

Ensemble Sensing on Smart Werables for a better Telehealth System

Telehealth offers interesting avenues for improving healthcare access in vulnerable populations through use of electronic devices in the patient's home that monitor and assess for early complications. However, complication of operation and poor reliability hinders the wide acceptability of telehealth services. We propose ensemble sensing on everyday wearable devices, which does not impose the burden of carrying wearable sensors, yet offers a seamless and simple platform to deliver telehealth services.

preprint2014arXiv

Guiding Ebola Patients to Suitable Health Facilities: An SMS-based Approach

We propose to utilize mobile phone technology as a vehicle for people to report their symptoms and to receive immediate feedback about the health services readily available, and for predicting spatial disease outbreak risk. Once symptoms are extracted from the patients text message, they undergo complex classification, pattern matching and prediction to recommend the nearest suitable health service. The added benefit of this approach is that it enables health care facilities to anticipate arrival of new potential Ebola cases.

preprint2014arXiv

Novel Methods for Activity Classification and Occupany Prediction Enabling Fine-grained HVAC Control

Much of the energy consumption in buildings is due to HVAC systems, which has motivated several recent studies on making these systems more energy- efficient. Occupancy and activity are two important aspects, which need to be correctly estimated for optimal HVAC control. However, state-of-the-art methods to estimate occupancy and classify activity require infrastructure and/or wearable sensors which suffers from lower acceptability due to higher cost. Encouragingly, with the advancement of the smartphones, these are becoming more achievable. Most of the existing occupancy estimation tech- niques have the underlying assumption that the phone is always carried by its user. However, phones are often left at desk while attending meeting or other events, which generates estimation error for the existing phone based occupancy algorithms. Similarly, in the recent days the emerging theory of Sparse Random Classifier (SRC) has been applied for activity classification on smartphone, however, there are rooms to improve the on-phone process- ing. We propose a novel sensor fusion method which offers almost 100% accuracy for occupancy estimation. We also propose an activity classifica- tion algorithm, which offers similar accuracy as of the state-of-the-art SRC algorithms while offering 50% reduction in processing.

preprint2014arXiv

Signal Reconstruction from Rechargeable Wireless Sensor Networks using Sparse Random Projections

Due to non-homogeneous spread of sunlight, sensing nodes possess non-uniform energy budget in recharge- able Wireless Sensor Networks (WSNs). An energy-aware workload distribution strategy is therefore nec- essary to achieve good data accuracy subject to energy-neutral operation. Recently proposed signal approx- imation strategies assume uniform sampling and fail to ensure energy neutral operation in rechargeable wireless sensor networks. We propose EAST (Energy Aware Sparse approximation Technique), which ap- proximates a signal, by adapting sensor node sampling workload according to solar energy availability. To the best of our knowledge, we are the first to propose sparse approximation to model energy-aware workload distribution in rechargeable WSNs. Experimental results, using data from an outdoor WSN deployment suggest that EAST significantly improves the approximation accuracy offering approximately 50% higher sensor on-time. EAST requires the approximation error to be known beforehand to determine the number of measure- ments. However, it is not always possible to decide the accuracy a-priori. We improve EAST and propose EAST+, which, given only the energy budget of the nodes, computes the optimal number of measurements subject to the energy neutral operation.

preprint2014arXiv

SimpleTrack:Adaptive Trajectory Compression with Deterministic Projection Matrix for Mobile Sensor Networks

Some mobile sensor network applications require the sensor nodes to transfer their trajectories to a data sink. This paper proposes an adaptive trajectory (lossy) compression algorithm based on compressive sensing. The algorithm has two innovative elements. First, we propose a method to compute a deterministic projection matrix from a learnt dictionary. Second, we propose a method for the mobile nodes to adaptively predict the number of projections needed based on the speed of the mobile nodes. Extensive evaluation of the proposed algorithm using 6 datasets shows that our proposed algorithm can achieve sub-metre accuracy. In addition, our method of computing projection matrices outperforms two existing methods. Finally, comparison of our algorithm against a state-of-the-art trajectory compression algorithm show that our algorithm can reduce the error by 10-60 cm for the same compression ratio.

preprint2013arXiv

A Deterministic Construction of Projection matrix for Adaptive Trajectory Compression

Compressive Sensing, which offers exact reconstruction of sparse signal from a small number of measurements, has tremendous potential for trajectory compression. In order to optimize the compression, trajectory compression algorithms need to adapt compression ratio subject to the compressibility of the trajectory. Intuitively, the trajectory of an object moving in starlight road is more compressible compared to the trajectory of a object moving in winding roads, therefore, higher compression is achievable in the former case compared to the later. We propose an in-situ compression technique underpinning the support vector regression theory, which accurately predicts the compressibility of a trajectory given the mean speed of the object and then apply compressive sensing to adapt the compression to the compressibility of the trajectory. The conventional encoding and decoding process of compressive sensing uses predefined dictionary and measurement (or projection) matrix pairs. However, the selection of an optimal pair is nontrivial and exhaustive, and random selection of a pair does not guarantee the best compression performance. In this paper, we propose a deterministic and data driven construction for the projection matrix which is obtained by applying singular value decomposition to a sparsifying dictionary learned from the dataset. We analyze case studies of pedestrian and animal trajectory datasets including GPS trajectory data from 127 subjects. The experimental results suggest that the proposed adaptive compression algorithm, incorporating the deterministic construction of projection matrix, offers significantly better compression performance compared to the state-of-the-art alternatives.

preprint2013arXiv

Ear-Phone: A Context-Aware Noise Mapping using Smart Phones

A noise map facilitates the monitoring of environmental noise pollution in urban areas. However, state-of-the-art techniques for rendering noise maps in urban areas are expensive and rarely updated, as they rely on population and traffic models rather than on real data. Smart phone based urban sensing can be leveraged to create an open and inexpensive platform for rendering up-to- date noise maps. In this paper, we present the design, implementation and performance evaluation of an end-to-end, context-aware, noise mapping system called Ear-Phone. Ear-Phone investigates the use of different interpolation and regularization methods to address the fundamental problem of recovering the noise map from incomplete and random samples obtained by crowdsourcing data collection. Ear-Phone, implemented on Nokia N95, N97 and HP iPAQ, HTC One mobile devices, also addresses the challenge of collecting accurate noise pollution readings at a mobile device. A major challenge of using smart phones as sensors is that even at the same location, the sensor reading may vary depending on the phone orientation and user context (for example, whether the user is carrying the phone in a bag or holding it in her palm). To address this problem, Ear-Phone leverages context-aware sensing. We develop classifiers to accurately determine the phone sensing context. Upon context discovery, Ear-Phone automatically decides whether to sense or not. Ear-phone also implements in-situ calibration which performs simple calibration that can be carried out without any technical skills whatsoever required on the user's part. Extensive simulations and outdoor experiments demonstrate that Ear-Phone is a feasible platform to assess noise pollution, incurring reasonable system resource consumption at mobile devices and providing high reconstruction accuracy of the noise map.

Rajib Rana

What is connected

Connect this record

See the researcher in context

Building this map preview

32 published item(s)

Emotion Intensity and its Control for Emotional Voice Conversion

Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

Speech Synthesis with Mixed Emotions

A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Augmenting Generative Adversarial Networks for Speech Emotion Recognition

Automated Screening for Distress: A Perspective for the Future

Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition

Deep Reinforcement Learning with Pre-training for Time-efficient Training of Automatic Speech Recognition

Direct Modelling of Speech Emotion from Raw Speech

Guided Generative Adversarial Neural Network for Representation Learning and High Fidelity Audio Generation using Fewer Labelled Audio Data

Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition

Phonocardiographic Sensing using Deep Learning for Abnormal Heartbeat Detection

Transfer Learning for Improving Speech Emotion Classification Accuracy

Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study

Emotion Classification from Noisy Speech - A Deep Learning Approach

Gated Recurrent Unit (GRU) for Emotion Classification from Noisy Speech

Affect Sensing on Smartphone - Possibilities of Understanding Cognitive Decline in Aging Population

Evaluating the Performance of BSBL Methodology for EEG Source Localization On a Realistic Head Model

Gait Velocity Estimation using time interleaved between Consecutive Passive IR Sensor Activations

Opportunistic and Context-aware Affect Sensing on Smartphones: The Concept, Challenges and Opportunities

Sparse Bayesian Learning for EEG Source Localization

wHealth - Transforming Telehealth Services

Continuous Gait Velocity Estimation using Houseohld Motion Detectors

EEG source localization using a sparsity prior based on Brodmann areas

Ensemble Sensing on Smart Werables for a better Telehealth System

Guiding Ebola Patients to Suitable Health Facilities: An SMS-based Approach

Novel Methods for Activity Classification and Occupany Prediction Enabling Fine-grained HVAC Control

Signal Reconstruction from Rechargeable Wireless Sensor Networks using Sparse Random Projections

SimpleTrack:Adaptive Trajectory Compression with Deterministic Projection Matrix for Mobile Sensor Networks

A Deterministic Construction of Projection matrix for Adaptive Trajectory Compression

Ear-Phone: A Context-Aware Noise Mapping using Smart Phones