Researcher profile

Soumya Dutta

Soumya Dutta contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2024arXiv

HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition

Emotion recognition in conversations is challenging due to the multi-modal nature of the emotion expression. We propose a hierarchical cross-attention model (HCAM) approach to multi-modal emotion recognition using a combination of recurrent and co-attention neural network models. The input to the model consists of two modalities, i) audio data, processed through a learnable wav2vec approach and, ii) text data represented using a bidirectional encoder representations from transformers (BERT) model. The audio and text representations are processed using a set of bi-directional recurrent neural network layers with self-attention that converts each utterance in a given conversation to a fixed dimensional embedding. In order to incorporate contextual knowledge and the information across the two modalities, the audio and text embeddings are combined using a co-attention layer that attempts to weigh the utterance level embeddings relevant to the task of emotion recognition. The neural network parameters in the audio layers, text layers as well as the multi-modal co-attention layers, are hierarchically trained for the emotion classification task. We perform experiments on three established datasets namely, IEMOCAP, MELD and CMU-MOSI, where we illustrate that the proposed model improves significantly over other benchmarks and helps achieve state-of-art results on all these datasets.

preprint2020arXiv

Geometry-Driven Detection, Tracking and Visual Analysis of Viscous and Gravitational Fingers

Viscous and gravitational flow instabilities cause a displacement front to break up into finger-like fluids. The detection and evolutionary analysis of these fingering instabilities are critical in multiple scientific disciplines such as fluid mechanics and hydrogeology. However, previous detection methods of the viscous and gravitational fingers are based on density thresholding, which provides limited geometric information of the fingers. The geometric structures of fingers and their evolution are important yet little studied in the literature. In this work, we explore the geometric detection and evolution of the fingers in detail to elucidate the dynamics of the instability. We propose a ridge voxel detection method to guide the extraction of finger cores from three-dimensional (3D) scalar fields. After skeletonizing finger cores into skeletons, we design a spanning tree based approach to capture how fingers branch spatially from the finger skeletons. Finally, we devise a novel geometric-glyph augmented tracking graph to study how the fingers and their branches grow, merge, and split over time. Feedback from earth scientists demonstrates the usefulness of our approach to performing spatio-temporal geometric analyses of fingers.

preprint2020arXiv

Pitch-rotational manipulation of single cells and particles using single-beam thermo-optical tweezers

3D pitch rotation of microparticles and cells assumes importance in a wide variety of applications in biology, physics, chemistry and medicine. Applications such as cell imaging and injection benefit from pitch-rotational manipulation. Generation of such motion in single beam optical tweezers has remained elusive due to complicacies of generating high enough ellipticity perpendicular to the direction of propagation. Further, trapping an extended object at two locations can only generate partial pitch motion by moving one of the foci in the axial direction. Here, we use hexagonal-shaped upconverting particles and single cells trapped close to a gold-coated glass cover slip in a sample chamber to generate complete 360 degree and continuous pitch motion even with a single optical tweezers beam. The tweezers beam passing through the gold surface is partially absorbed and generates a hot-spot to produce circulatory convective flows in the vicinity which rotates the objects. The rotation rate can be controlled by the intensity of the laser light and the thickness of the gold layer. Thus such a simple configuration can turn the particle in the pitch sense. The circulatory flows in this technique have a diameter of about 5 $μ$m which is smaller than those reported using acousto-fluidic techniques.