Source author record

Sohini Roychowdhury

Sohini Roychowdhury appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning eess.IV physics.med-ph Artificial Intelligence Information Retrieval

Catalog footprint

What is connected

13works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

COFFEE: COdesign Framework for Feature Enriched Embeddings in Ads-Ranking Systems

Diverse and enriched data sources are essential for commercial ads-recommendation models to accurately assess user interest both before and after engagement with content. While extended user-engagement histories can improve the prediction of user interests, it is equally important to embed activity sequences from multiple sources to ensure freshness of user and ad-representations, following scaling law principles. In this paper, we present a novel three-dimensional framework for enhancing user-ad representations without increasing model inference or serving complexity. The first dimension examines the impact of incorporating diverse event sources, the second considers the benefits of longer user histories, and the third focuses on enriching data with additional event attributes and multi-modal embeddings. We assess the return on investment (ROI) of our source enrichment framework by comparing organic user engagement sources, such as content viewing, with ad-impression sources. The proposed method can boost the area under curve (AUC) and the slope of scaling curves for ad-impression sources by 1.56 to 2 times compared to organic usage sources even for short online-sequence lengths of 100 to 10K. Additionally, click-through rate (CTR) prediction improves by 0.56% AUC over the baseline production ad-recommendation system when using enriched ad-impression event sources, leading to improved sequence scaling resolutions for longer and offline user-ad representations.

preprint2022arXiv

QU-net++: Image Quality Detection Framework for Segmentation of Medical 3D Image Stacks

Automated segmentation of pathological regions of interest aids medical image diagnostics and follow-up care. However, accurate pathological segmentations require high quality of annotated data that can be both cost and time intensive to generate. In this work, we propose an automated two-step method that detects a minimal image subset required to train segmentation models by evaluating the quality of medical images from 3D image stacks using a U-net++ model. These images that represent a lack of quality training can then be annotated and used to fully train a U-net-based segmentation model. The proposed QU-net++ model detects this lack of quality training based on the disagreement in segmentations produced from the final two output layers. The proposed model isolates around 10% of the slices per 3D image stack and can scale across imaging modalities to segment cysts in OCT images and ground glass opacity (GGO) in lung CT images with Dice scores in the range 0.56-0.72. Thus, the proposed method can be applied for cost effective multi-modal pathology segmentation tasks.

preprint2022arXiv

Semi-supervised and Deep learning Frameworks for Video Classification and Key-frame Identification

Automating video-based data and machine learning pipelines poses several challenges including metadata generation for efficient storage and retrieval and isolation of key-frames for scene understanding tasks. In this work, we present two semi-supervised approaches that automate this process of manual frame sifting in video streams by automatically classifying scenes for content and filtering frames for fine-tuning scene understanding tasks. The first rule-based method starts from a pre-trained object detector and it assigns scene type, uncertainty and lighting categories to each frame based on probability distributions of foreground objects. Next, frames with the highest uncertainty and structural dissimilarity are isolated as key-frames. The second method relies on the simCLR model for frame encoding followed by label-spreading from 20% of frame samples to label the remaining frames for scene and lighting categories. Also, clustering the video frames in the encoded feature space further isolates key-frames at cluster boundaries. The proposed methods achieve 64-93% accuracy for automated scene categorization for outdoor image videos from public domain datasets of JAAD and KITTI. Also, less than 10% of all input frames can be filtered as key-frames that can then be sent for annotation and fine tuning of machine vision algorithms. Thus, the proposed framework can be scaled to additional video data streams for automated training of perception-driven systems with minimal training images.

preprint2021arXiv

OPAM: Online Purchasing-behavior Analysis using Machine learning

Customer purchasing behavior analysis plays a key role in developing insightful communication strategies between online vendors and their customers. To support the recent increase in online shopping trends, in this work, we present a customer purchasing behavior analysis system using supervised, unsupervised and semi-supervised learning methods. The proposed system analyzes session and user-journey level purchasing behaviors to identify customer categories/clusters that can be useful for targeted consumer insights at scale. We observe higher sensitivity to the design of online shopping portals for session-level purchasing prediction with accuracy/recall in range 91-98%/73-99%, respectively. The user-journey level analysis demonstrates five unique user clusters, wherein 'New Shoppers' are most predictable and 'Impulsive Shoppers' are most unique with low viewing and high carting behaviors for purchases. Further, cluster transformation metrics and partial label learning demonstrates the robustness of each user cluster to new/unlabelled events. Thus, customer clusters can aid strategic targeted nudge models.

preprint2020arXiv

Few Shot Learning Framework to Reduce Inter-observer Variability in Medical Images

Most computer aided pathology detection systems rely on large volumes of quality annotated data to aid diagnostics and follow up procedures. However, quality assuring large volumes of annotated medical image data can be subjective and expensive. In this work we present a novel standardization framework that implements three few-shot learning (FSL) models that can be iteratively trained by atmost 5 images per 3D stack to generate multiple regional proposals (RPs) per test image. These FSL models include a novel parallel echo state network (ParESN) framework and an augmented U-net model. Additionally, we propose a novel target label selection algorithm (TLSA) that measures relative agreeability between RPs and the manually annotated target labels to detect the "best" quality annotation per image. Using the FSL models, our system achieves 0.28-0.64 Dice coefficient across vendor image stacks for intra-retinal cyst segmentation. Additionally, the TLSA is capable of automatically classifying high quality target labels from their noisy counterparts for 60-97% of the images while ensuring manual supervision on remaining images. Also, the proposed framework with ParESN model minimizes manual annotation checking to 12-28% of the total number of images. The TLSA metrics further provide confidence scores for the automated annotation quality assurance. Thus, the proposed framework is flexible to extensions for quality image annotation curation of other image stacks as well.

preprint2020arXiv

FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network

Pedestrian intention recognition is very important to develop robust and safe autonomous driving (AD) and advanced driver assistance systems (ADAS) functionalities for urban driving. In this work, we develop an end-to-end pedestrian intention framework that performs well on day- and night- time scenarios. Our framework relies on objection detection bounding boxes combined with skeletal features of human pose. We study early, late, and combined (early and late) fusion mechanisms to exploit the skeletal features and reduce false positives as well to improve the intention prediction performance. The early fusion mechanism results in AP of 0.89 and precision/recall of 0.79/0.89 for pedestrian intention classification. Furthermore, we propose three new metrics to properly evaluate the pedestrian intention systems. Under these new evaluation metrics for the intention prediction, the proposed end-to-end network offers accurate pedestrian intention up to half a second ahead of the actual risky maneuver.

preprint2016arXiv

A generalized flow for multi-class and binary classification tasks: An Azure ML approach

The constant growth in the present day real-world databases pose computational challenges for a single computer. Cloud-based platforms, on the other hand, are capable of handling large volumes of information manipulation tasks, thereby necessitating their use for large real-world data set computations. This work focuses on creating a novel Generalized Flow within the cloud-based computing platform: Microsoft Azure Machine Learning Studio (MAMLS) that accepts multi-class and binary classification data sets alike and processes them to maximize the overall classification accuracy. First, each data set is split into training and testing data sets, respectively. Then, linear and nonlinear classification model parameters are estimated using the training data set. Data dimensionality reduction is then performed to maximize classification accuracy. For multi-class data sets, data centric information is used to further improve overall classification accuracy by reducing the multi-class classification to a series of hierarchical binary classification tasks. Finally, the performance of optimized classification model thus achieved is evaluated and scored on the testing data set. The classification characteristics of the proposed flow are comparatively evaluated on 3 public data sets and a local data set with respect to existing state-of-the-art methods. On the 3 public data sets, the proposed flow achieves 78-97.5% classification accuracy. Also, the local data set, created using the information regarding presence of Diabetic Retinopathy lesions in fundus images, results in 85.3-95.7% average classification accuracy, which is higher than the existing methods. Thus, the proposed generalized flow can be useful for a wide range of application-oriented "big data sets".

preprint2016arXiv

Automated OCT Segmentation for Images with DME

This paper presents a novel automated system that segments six sub-retinal layers from optical coherence tomography (OCT) image stacks of healthy patients and patients with diabetic macular edema (DME). First, each image in the OCT stack is denoised using a Wiener deconvolution algorithm that estimates the additive speckle noise variance using a novel Fourier-domain based structural error. This denoising method enhances the image SNR by an average of 12dB. Next, the denoised images are subjected to an iterative multi-resolution high-pass filtering algorithm that detects seven sub-retinal surfaces in six iterative steps. The thicknesses of each sub-retinal layer for all scans from a particular OCT stack are then compared to the manually marked groundtruth. The proposed system uses adaptive thresholds for denoising and segmenting each image and hence it is robust to disruptions in the retinal micro-structure due to DME. The proposed denoising and segmentation system has an average error of 1.2-5.8 $μm$ and 3.5-26$μm$ for segmenting sub-retinal surfaces in normal and abnormal images with DME, respectively. For estimating the sub-retinal layer thicknesses, the proposed system has an average error of 0.2-2.5 $μm$ and 1.8-18 $μm$ in normal and abnormal images, respectively. Additionally, the average inner sub-retinal layer thickness in abnormal images is estimated as 275$μm (r=0.92)$ with an average error of 9.3 $μm$, while the average thickness of the outer layers in abnormal images is estimated as 57.4$μm (r=0.74)$ with an average error of 3.5 $μm$. The proposed system can be useful for tracking the disease progression for DME over a period of time.

preprint2016arXiv

Automated Selection of Uniform Regions for CT Image Quality Detection

CT images are widely used in pathology detection and follow-up treatment procedures. Accurate identification of pathological features requires diagnostic quality CT images with minimal noise and artifact variation. In this work, a novel Fourier-transform based metric for image quality (IQ) estimation is presented that correlates to additive CT image noise. In the proposed method, two windowed CT image subset regions are analyzed together to identify the extent of variation in the corresponding Fourier-domain spectrum. The two square windows are chosen such that their center pixels coincide and one window is a subset of the other. The Fourier-domain spectral difference between these two sub-sampled windows is then used to isolate spatial regions-of-interest (ROI) with low signal variation (ROI-LV) and high signal variation (ROI-HV), respectively. Finally, the spatial variance ($var$), standard deviation ($std$), coefficient of variance ($cov$) and the fraction of abdominal ROI pixels in ROI-LV ($ν'(q)$), are analyzed with respect to CT image noise. For the phantom CT images, $var$ and $std$ correlate to CT image noise ($|r|>0.76$ ($p\ll0.001$)), though not as well as $ν'(q)$ ($r=0.96$ ($p\ll0.001$)). However, for the combined phantom and patient CT images, $var$ and $std$ do not correlate well with CT image noise ($|r|<0.46$ ($p\ll0.001$)) as compared to $ν'(q)$ ($r=0.95$ ($p\ll0.001$)). Thus, the proposed method and the metric, $ν'(q)$, can be useful to quantitatively estimate CT image noise.

preprint2016arXiv

Blind Analysis of CT Image Noise Using Residual Denoised Images

CT protocol design and quality control would benefit from automated tools to estimate the quality of generated CT images. These tools could be used to identify erroneous CT acquisitions or refine protocols to achieve certain signal to noise characteristics. This paper investigates blind estimation methods to determine global signal strength and noise levels in chest CT images. Methods: We propose novel performance metrics corresponding to the accuracy of noise and signal estimation. We implement and evaluate the noise estimation performance of six spatial- and frequency- based methods, derived from conventional image filtering algorithms. Algorithms were tested on patient data sets from whole-body repeat CT acquisitions performed with a higher and lower dose technique over the same scan region. Results: The proposed performance metrics can evaluate the relative tradeoff of filter parameters and noise estimation performance. The proposed automated methods tend to underestimate CT image noise at low-flux levels. Initial application of methodology suggests that anisotropic diffusion and Wavelet-transform based filters provide optimal estimates of noise. Furthermore, methodology does not provide accurate estimates of absolute noise levels, but can provide estimates of relative change and/or trends in noise levels.

preprint2016arXiv

Classification of Large-Scale Fundus Image Data Sets: A Cloud-Computing Framework

Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.

preprint2015arXiv

A Survey of the Trends in Facial and Expression Recognition Databases and Methods

Automated facial identification and facial expression recognition have been topics of active research over the past few decades. Facial and expression recognition find applications in human-computer interfaces, subject tracking, real-time security surveillance systems and social networking. Several holistic and geometric methods have been developed to identify faces and expressions using public and local facial image databases. In this work we present the evolution in facial image data sets and the methodologies for facial identification and recognition of expressions such as anger, sadness, happiness, disgust, fear and surprise. We observe that most of the earlier methods for facial and expression recognition aimed at improving the recognition rates for facial feature-based methods using static images. However, the recent methodologies have shifted focus towards robust implementation of facial/expression recognition from large image databases that vary with space (gathered from the internet) and time (video recordings). The evolution trends in databases and methodologies for facial and expression recognition can be useful for assessing the next-generation topics that may have applications in security systems or personal identification systems that involve "Quantitative face" assessments.

preprint2015arXiv

Facial Expression Detection using Patch-based Eigen-face Isomap Networks

Automated facial expression detection problem pose two primary challenges that include variations in expression and facial occlusions (glasses, beard, mustache or face covers). In this paper we introduce a novel automated patch creation technique that masks a particular region of interest in the face, followed by Eigen-value decomposition of the patched faces and generation of Isomaps to detect underlying clustering patterns among faces. The proposed masked Eigen-face based Isomap clustering technique achieves 75% sensitivity and 66-73% accuracy in classification of faces with occlusions and smiling faces in around 1 second per image. Also, betweenness centrality, Eigen centrality and maximum information flow can be used as network-based measures to identify the most significant training faces for expression classification tasks. The proposed method can be used in combination with feature-based expression classification methods in large data sets for improving expression classification accuracies.

Sohini Roychowdhury

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

COFFEE: COdesign Framework for Feature Enriched Embeddings in Ads-Ranking Systems

QU-net++: Image Quality Detection Framework for Segmentation of Medical 3D Image Stacks

Semi-supervised and Deep learning Frameworks for Video Classification and Key-frame Identification

OPAM: Online Purchasing-behavior Analysis using Machine learning

Few Shot Learning Framework to Reduce Inter-observer Variability in Medical Images

FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network

A generalized flow for multi-class and binary classification tasks: An Azure ML approach

Automated OCT Segmentation for Images with DME

Automated Selection of Uniform Regions for CT Image Quality Detection

Blind Analysis of CT Image Noise Using Residual Denoised Images

Classification of Large-Scale Fundus Image Data Sets: A Cloud-Computing Framework

A Survey of the Trends in Facial and Expression Recognition Databases and Methods

Facial Expression Detection using Patch-based Eigen-face Isomap Networks