Source author record

Soumyabrata Dev

Soumyabrata Dev appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.ao-ph Computer Vision eess.SP Machine Learning Artificial Intelligence Cryptography and Security Multimedia Computation and Language eess.AS eess.IV Human-Computer Interaction Information Retrieval Networking and Internet Architecture physics.ins-det Sound

Catalog footprint

What is connected

22works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

A multimodal Transformer for InSAR-based ground deformation forecasting with cross-site generalization across Europe

Near-real-time regional-scale monitoring of ground deformation is increasingly required to support urban planning, critical infrastructure management, and natural hazard mitigation. While Interferometric Synthetic Aperture Radar (InSAR) and continental-scale services such as the European Ground Motion Service (EGMS) provide dense observations of past motion, predicting the next observation remains challenging due to the superposition of long-term trends, seasonal cycles, and occasional abrupt discontinuities (e.g., co-seismic steps), together with strong spatial heterogeneity. In this study we propose a multimodal patch-based Transformer for single-step, fixed-interval next-epoch nowcasting of displacement maps from EGMS time series (resampled to a 64x64 grid over 100 km x 100 km tiles). The model ingests recent displacement snapshots together with (i) static kinematic indicators (mean velocity, acceleration, seasonal amplitude) computed in a leakage-safe manner from the training window only, and (ii) harmonic day-of-year encodings. On the eastern Ireland tile (E32N34), the STGCN is strongest in the displacement-only setting, whereas the multimodal Transformer clearly outperforms CNN-LSTM, CNN-LSTM+Attn, and multimodal STGCN when all models receive the same multimodal inputs, achieving RMSE = 0.90 mm and $R^2$ = 0.97 on the test set with the best threshold accuracies.

preprint2022arXiv

A predictive analytics approach for stroke prediction using machine learning and neural networks

The negative impact of stroke in society has led to concerted efforts to improve the management and diagnosis of stroke. With an increased synergy between technology and medical diagnosis, caregivers create opportunities for better patient management by systematically mining and archiving the patients' medical records. Therefore, it is vital to study the interdependency of these risk factors in patients' health records and understand their relative contribution to stroke prediction. This paper systematically analyzes the various factors in electronic health records for effective stroke prediction. Using various statistical techniques and principal component analysis, we identify the most important factors for stroke prediction. We conclude that age, heart disease, average glucose level, and hypertension are the most important factors for detecting stroke in patients. Furthermore, a perceptron neural network using these four attributes provides the highest accuracy rate and lowest miss rate compared to using all available input features and other benchmarking algorithms. As the dataset is highly imbalanced concerning the occurrence of stroke, we report our results on a balanced dataset created via sub-sampling techniques.

preprint2022arXiv

A semantic web approach to uplift decentralized household energy data

In a decentralized household energy system comprised of various devices such as home appliances, electric vehicles, and solar panels, end-users are able to dig deeper into the system's details and further achieve energy sustainability if they are presented with data on the electric energy consumption and production at the granularity of the device. However, many databases in this field are siloed from other domains, including solely information pertaining to energy. This may result in the loss of information (e.g. weather) on each device's energy use. Meanwhile, a large number of these datasets have been extensively used in computational modeling techniques such as machine learning models. While such computational approaches achieve great accuracy and performance by concentrating only on a local view of datasets, model reliability cannot be guaranteed since such models are very vulnerable to data input fluctuations when information omission is taken into account. This article tackles the data isolation issue in the field of smart energy systems by examining Semantic Web methods on top of a household energy system. We offer an ontology-based approach for managing decentralized data at the device-level resolution in a system. As a consequence, the scope of the data associated with each device may easily be expanded in an interoperable manner throughout the Web, and additional information, such as weather, can be obtained from the Web, provided that the data is organized according to W3C standards.

preprint2022arXiv

Air Quality in the New Delhi Metropolis under COVID-19 Lockdown

Air pollution has been on continuous rise with increase in industrialization in metropolitan cities of the world. Several measures including strict climate laws and reduction in the number of vehicles were implemented by several nations. The COVID-19 pandemic provided a great opportunity to understand the daily human activities effect on air pollution. Majority nations restricted industrial activities and vehicular traffic to a large extent as a measure to restrict COVID-19 spread. In this paper, we analyzed the impact of such COVID19-induced lockdown on the air quality of the city of New Delhi, India. We analyzed the average concentration of common gaseous pollutants viz. sulfur dioxide (SO$_2$), ozone (O$_3$), nitrogen dioxide (NO$_2$), and carbon monoxide (CO). These concentrations were obtained from the tropospheric column of Sentinel-5P (an earth observation satellite of European Space Agency) data. We observed that the city observed a significant drop in the level of atmospheric pollutant's concentration for all the major pollutants as a result of strict lockdown measures. Such findings are also validated with pollutant data obtained from ground-based monitoring stations. We observed that near-surface pollutant concentration dropped significantly by 50% for PM$_{2.5}$, 71.9% for NO$_2$, and 88% for CO, after the lockdown period. Such studies would pave the path for implementing future air pollution control measures by environmentalists.

preprint2022arXiv

An Explore of Virtual Reality for Awareness of the Climate Change Crisis: A Simulation of Sea Level Rise

Virtual Reality (VR) technology has been shown to achieve remarkable results in multiple fields. Due to the nature of the immersive medium of Virtual Reality it logically follows that it can be used as a high-quality educational tool as it offers potentially a higher bandwidth than other mediums such as text, pictures and videos. This short paper illustrates the development of a climate change educational awareness application for virtual reality to simulate virtual scenes of local scenery and sea level rising until 2100 using prediction data. The paper also reports on the current in progress work of porting the system to Augmented Reality (AR) and future work to evaluate the system.

preprint2022arXiv

Analyzing Air Pollutant Concentrations in New Delhi, India

Air pollutants have long been known to cause major health problems across humans and all living organisms. Apart from that, they also play a crucial role in temperature inversion situations in the atmospheric layers thereby seriously impacting the radio communications, increased fog levels and decreased visibility. Appreciating the seriousness of these pollutants, this paper attempts to analyze and create a publicly available and easily accessible dataset of seven different pollutants for New Delhi region in India. This analysis and pre-processing is done to assist the researchers who wish to use the dataset for further studies like pollutant forecasting or correlation analysis, thereby promoting the research in the domain.

preprint2022arXiv

Analyzing the impact of feature selection on the accuracy of heart disease prediction

Heart Disease has become one of the most serious diseases that has a significant impact on human life. It has emerged as one of the leading causes of mortality among the people across the globe during the last decade. In order to prevent patients from further damage, an accurate diagnosis of heart disease on time is an essential factor. Recently we have seen the usage of non-invasive medical procedures, such as artificial intelligence-based techniques in the field of medical. Specially machine learning employs several algorithms and techniques that are widely used and are highly useful in accurately diagnosing the heart disease with less amount of time. However, the prediction of heart disease is not an easy task. The increasing size of medical datasets has made it a complicated task for practitioners to understand the complex feature relations and make disease predictions. Accordingly, the aim of this research is to identify the most important risk-factors from a highly dimensional dataset which helps in the accurate classification of heart disease with less complications. For a broader analysis, we have used two heart disease datasets with various medical features. The classification results of the benchmarked models proved that there is a high impact of relevant features on the classification accuracy. Even with a reduced number of features, the performance of the classification models improved significantly with a reduced training time as compared with models trained on full feature set.

preprint2022arXiv

DMCNet: Diversified Model Combination Network for Understanding Engagement from Video Screengrabs

Engagement is an essential indicator of the Quality-of-Learning Experience (QoLE) and plays a major role in developing intelligent educational interfaces. The number of people learning through Massively Open Online Courses (MOOCs) and other online resources has been increasing rapidly because they provide us with the flexibility to learn from anywhere at any time. This provides a good learning experience for the students. However, such learning interface requires the ability to recognize the level of engagement of the students for a holistic learning experience. This is useful for both students and educators alike. However, understanding engagement is a challenging task, because of its subjectivity and ability to collect data. In this paper, we propose a variety of models that have been trained on an open-source dataset of video screengrabs. Our non-deep learning models are based on the combination of popular algorithms such as Histogram of Oriented Gradient (HOG), Support Vector Machine (SVM), Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF). The deep learning methods include Densely Connected Convolutional Networks (DenseNet-121), Residual Network (ResNet-18) and MobileNetV1. We show the performance of each models using a variety of metrics such as the Gini Index, Adjusted F-Measure (AGF), and Area Under receiver operating characteristic Curve (AUC). We use various dimensionality reduction techniques such as Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) to understand the distribution of data in the feature sub-space. Our work will thereby assist the educators and students in obtaining a fruitful and efficient online learning experience.

preprint2022arXiv

Evaluating the Reliability of Air Temperature from ERA5 Reanalysis Data

The reliability of ERA5 satellite-based air temperature data is under investigation in this paper. To evaluate this, the ERA5 data will be compared with land-based data obtained from weather stations on the Global Historical Climatology Network. Two climate regions are taken into consideration, temperate and tropical. Five years' worth of data is collected and compared through box plots, regression models and statistical metrics. The results show that the satellite temperature performs better in the temperate region than the tropical region. This suggests that the time of year and climate region have an impact on the accuracy of the satellite data as milder temperatures produce better approximations.

preprint2022arXiv

Frequency-centroid features for word recognition of non-native English speakers

The objective of this work is to investigate complementary features which can aid the quintessential Mel frequency cepstral coefficients (MFCCs) in the task of closed, limited set word recognition for non-native English speakers of different mother-tongues. Unlike the MFCCs, which are derived from the spectral energy of the speech signal, the proposed frequency-centroids (FCs) encapsulate the spectral centres of the different bands of the speech spectrum, with the bands defined by the Mel filterbank. These features, in combination with the MFCCs, are observed to provide relative performance improvement in English word recognition, particularly under varied noisy conditions. A two-stage Convolution Neural Network (CNN) is used to model the features of the English words uttered with Arabic, French and Spanish accents.

preprint2022arXiv

LAMSkyCam: A Low-cost and Miniature Ground-based Sky Camera

Ground-based sky imagers (GSIs) are increasingly becoming popular amongst the remote sensing analysts. This is because such imagers offer fantastic alternatives to satellite measurements for the purpose of earth observations. In this paper, we propose an extremely low-cost and miniature ground-based sky camera for atmospheric study. Built using $3$D printed components and off-the-shelf components, our sky camera is lightweight and robust for use in diverse climatic conditions. With a $63^{\circ}$ field of view angle, the camera captures high resolution sky/cloud images for both day and night times at $5$ minute intervals. The camera is designed to be be mounted on a pole-like architecture and with its compact form, it can be installed at any location without requiring any change in the existing infrastructure. For remote areas, the camera has also a local backup facility from which data can be easily accessed manually. We have open-sourced the hardware design of our sky camera, and therefore researchers can easily manufacture and deploy these cameras for their respective use cases.

preprint2022arXiv

On the Relationship Between Ground- and Satellite- Based Global Horizontal Irradiance

Global horizontal irradiance (GHI) plays a significant role in maintaining the earth's ecological balance and generating electricity in photovoltaic systems. While the satellites have more range, they have been shown to over/under-estimate the true values of GHI that are observed at the ground-based stations. Hence, this study aims at analyzing the relationship between these two sources of GHI data in order to better and effectively utilize the reach of satellites for GHI analysis. The paper identifies a near linear relationship between the two and thereby concludes that an approximate mapping from satellite- to ground-based GHI values can be obtained.

preprint2021arXiv

Validating Clustering Frameworks for Electric Load Demand Profiles

Large-scale deployment of smart meters has made it possible to collect sufficient and high-resolution data of residential electric demand profiles. Clustering analysis of these profiles is important to further analyze and comment on electricity consumption patterns. Although many clustering techniques have been proposed in the literature over the years, it is often noticed that different techniques fit best for different datasets. To identify the most suitable technique, standard clustering validity indices are often used. These indices focus primarily on the intrinsic characteristics of the clustering results. Moreover, different indices often give conflicting recommendations which can only be clarified with heuristics about the dataset and/or the expected cluster structures -- information that is rarely available in practical situations. This paper presents a novel scheme to validate and compare the clustering results objectively. Additionally, the proposed scheme considers all the steps prior to the clustering algorithm, including the pre-processing and dimensionality reduction steps, in order to provide recommendations over the complete framework. Accordingly, the proposed strategy is shown to provide better, unbiased, and uniform recommendations as compared to the standard Clustering Validity Indices.

preprint2020arXiv

An Advert Creation System for 3D Product Placements

Over the past decade, the evolution of video-sharing platforms has attracted a significant amount of investments on contextual advertising. The common contextual advertising platforms utilize the information provided by users to integrate 2D visual ads into videos. The existing platforms face many technical challenges such as ad integration with respect to occluding objects and 3D ad placement. This paper presents a Video Advertisement Placement & Integration (Adverts) framework, which is capable of perceiving the 3D geometry of the scene and camera motion to blend 3D virtual objects in videos and create the illusion of reality. The proposed framework contains several modules such as monocular depth estimation, object segmentation, background-foreground separation, alpha matting and camera tracking. Our experiments conducted using Adverts framework indicates the significant potential of this system in contextual ad integration, and pushing the limits of advertising industry using mixed reality technologies.

preprint2020arXiv

DDoSNet: A Deep-Learning Model for Detecting Network Attacks

Software-Defined Networking (SDN) is an emerging paradigm, which evolved in recent years to address the weaknesses in traditional networks. The significant feature of the SDN, which is achieved by disassociating the control plane from the data plane, facilitates network management and allows the network to be efficiently programmable. However, the new architecture can be susceptible to several attacks that lead to resource exhaustion and prevent the SDN controller from supporting legitimate users. One of these attacks, which nowadays is growing significantly, is the Distributed Denial of Service (DDoS) attack. DDoS attack has a high impact on crashing the network resources, making the target servers unable to support the valid users. The current methods deploy Machine Learning (ML) for intrusion detection against DDoS attacks in the SDN network using the standard datasets. However, these methods suffer several drawbacks, and the used datasets do not contain the most recent attack patterns - hence, lacking in attack diversity. In this paper, we propose DDoSNet, an intrusion detection system against DDoS attacks in SDN environments. Our method is based on Deep Learning (DL) technique, combining the Recurrent Neural Network (RNN) with autoencoder. We evaluate our model using the newly released dataset CICDDoS2019, which contains a comprehensive variety of DDoS attacks and addresses the gaps of the existing current datasets. We obtain a significant improvement in attack detection, as compared to other benchmarking methods. Hence, our model provides great confidence in securing these networks.

preprint2020arXiv

Detecting Abnormal Traffic in Large-Scale Networks

With the rapid technological advancements, organizations need to rapidly scale up their information technology (IT) infrastructure viz. hardware, software, and services, at a low cost. However, the dynamic growth in the network services and applications creates security vulnerabilities and new risks that can be exploited by various attacks. For example, User to Root (U2R) and Remote to Local (R2L) attack categories can cause a significant damage and paralyze the entire network system. Such attacks are not easy to detect due to the high degree of similarity to normal traffic. While network anomaly detection systems are being widely used to classify and detect malicious traffic, there are many challenges to discover and identify the minority attacks in imbalanced datasets. In this paper, we provide a detailed and systematic analysis of the existing Machine Learning (ML) approaches that can tackle most of these attacks. Furthermore, we propose a Deep Learning (DL) based framework using Long Short Term Memory (LSTM) autoencoder that can accurately detect malicious traffics in network traffic. We perform our experiments in a publicly available dataset of Intrusion Detection Systems (IDSs). We obtain a significant improvement in attack detection, as compared to other benchmarking methods. Hence, our method provides great confidence in securing these networks from malicious traffic.

preprint2020arXiv

Forecasting Precipitable Water Vapor Using LSTMs

Long-Short-Term-Memory (LSTM) networks have been used extensively for time series forecasting in recent years due to their ability of learning patterns over different periods of time. In this paper, this ability is applied to learning the pattern of Global Positioning System (GPS)-based Precipitable Water Vapor (PWV) measurements over a period of 4 hours. The trained model was evaluated on more than 1500 hours of recorded data. It achieves a root mean square error (RMSE) of 0.098 mm for a forecasting interval of 5 minutes in the future, and outperforms the naive approach for a lead-time of up to 40 minutes.

preprint2019arXiv

A Data-Driven Approach for Accurate Rainfall Prediction

In recent years, there has been growing interest in using Precipitable Water Vapor (PWV) derived from Global Positioning System (GPS) signal delays to predict rainfall. However, the occurrence of rainfall is dependent on a myriad of atmospheric parameters. This paper proposes a systematic approach to analyze various parameters that affect precipitation in the atmosphere. Different ground-based weather features like Temperature, Relative Humidity, Dew Point, Solar Radiation, PWV along with Seasonal and Diurnal variables are identified, and a detailed feature correlation study is presented. While all features play a significant role in rainfall classification, only a few of them, such as PWV, Solar Radiation, Seasonal and Diurnal features, stand out for rainfall prediction. Based on these findings, an optimum set of features are used in a data-driven machine learning algorithm for rainfall prediction. The experimental evaluation using a four-year (2012-2015) database shows a true detection rate of 80.4%, a false alarm rate of 20.3%, and an overall accuracy of 79.6%. Compared to the existing literature, our method significantly reduces the false alarm rates.

preprint2019arXiv

CloudSegNet: A Deep Network for Nychthemeron Cloud Image Segmentation

We analyze clouds in the earth's atmosphere using ground-based sky cameras. An accurate segmentation of clouds in the captured sky/cloud image is difficult, owing to the fuzzy boundaries of clouds. Several techniques have been proposed that use color as the discriminatory feature for cloud detection. In the existing literature, however, analysis of daytime and nighttime images is considered separately, mainly because of differences in image characteristics and applications. In this paper, we propose a light-weight deep-learning architecture called CloudSegNet. It is the first that integrates daytime and nighttime (also known as nychthemeron) image segmentation in a single framework, and achieves state-of-the-art results on public databases.

preprint2016arXiv

Detecting Rainfall Onset Using Sky Images

Ground-based sky cameras (popularly known as Whole Sky Imagers) are increasingly used now-a-days for continuous monitoring of the atmosphere. These imagers have higher temporal and spatial resolutions compared to conventional satellite images. In this paper, we use ground-based sky cameras to detect the onset of rainfall. These images contain additional information about cloud coverage and movement and are therefore useful for accurate rainfall nowcast. We validate our results using rain gauge measurement recordings and achieve an accuracy of 89% for correct detection of rainfall onset.

preprint2016arXiv

Machine Learning Techniques and Applications For Ground-based Image Analysis

Ground-based whole sky cameras have opened up new opportunities for monitoring the earth's atmosphere. These cameras are an important complement to satellite images by providing geoscientists with cheaper, faster, and more localized data. The images captured by whole sky imagers can have high spatial and temporal resolution, which is an important pre-requisite for applications such as solar energy modeling, cloud attenuation analysis, local weather prediction, etc. Extracting valuable information from the huge amount of image data by detecting and analyzing the various entities in these images is challenging. However, powerful machine learning techniques have become available to aid with the image analysis. This article provides a detailed walk-through of recent developments in these techniques and their applications in ground-based imaging. We aim to bridge the gap between computer vision and remote sensing with the help of illustrative examples. We demonstrate the advantages of using machine learning techniques in ground-based image analysis via three primary applications -- segmentation, classification, and denoising.

preprint2016arXiv

Short-term prediction of localized cloud motion using ground-based sky imagers

Fine-scale short-term cloud motion prediction is needed for several applications, including solar energy generation and satellite communications. In tropical regions such as Singapore, clouds are mostly formed by convection; they are very localized, and evolve quickly. We capture hemispherical images of the sky at regular intervals of time using ground-based cameras. They provide a high resolution and localized cloud images. We use two successive frames to compute optical flow and predict the future location of clouds. We achieve good prediction accuracy for a lead time of up to 5 minutes.

Soumyabrata Dev

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

A multimodal Transformer for InSAR-based ground deformation forecasting with cross-site generalization across Europe

A predictive analytics approach for stroke prediction using machine learning and neural networks

A semantic web approach to uplift decentralized household energy data

Air Quality in the New Delhi Metropolis under COVID-19 Lockdown

An Explore of Virtual Reality for Awareness of the Climate Change Crisis: A Simulation of Sea Level Rise

Analyzing Air Pollutant Concentrations in New Delhi, India

Analyzing the impact of feature selection on the accuracy of heart disease prediction

DMCNet: Diversified Model Combination Network for Understanding Engagement from Video Screengrabs

Evaluating the Reliability of Air Temperature from ERA5 Reanalysis Data

Frequency-centroid features for word recognition of non-native English speakers

LAMSkyCam: A Low-cost and Miniature Ground-based Sky Camera

On the Relationship Between Ground- and Satellite- Based Global Horizontal Irradiance

Validating Clustering Frameworks for Electric Load Demand Profiles

An Advert Creation System for 3D Product Placements

DDoSNet: A Deep-Learning Model for Detecting Network Attacks

Detecting Abnormal Traffic in Large-Scale Networks

Forecasting Precipitable Water Vapor Using LSTMs

A Data-Driven Approach for Accurate Rainfall Prediction

CloudSegNet: A Deep Network for Nychthemeron Cloud Image Segmentation

Detecting Rainfall Onset Using Sky Images

Machine Learning Techniques and Applications For Ground-based Image Analysis

Short-term prediction of localized cloud motion using ground-based sky imagers