Source author record

Puneet Kumar

Puneet Kumar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cs.CY eess.AS Machine Learning Multimedia Sound

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data

This paper proposes a multimodal emotion recognition system based on hybrid fusion that classifies the emotions depicted by speech utterances and corresponding images into discrete classes. A new interpretability technique has been developed to identify the important speech & image features leading to the prediction of particular emotion classes. The proposed system's architecture has been determined through intensive ablation studies. It fuses the speech & image features and then combines speech, image, and intermediate fusion outputs. The proposed interpretability technique incorporates the divide & conquer approach to compute shapely values denoting each speech & image feature's importance. We have also constructed a large-scale dataset (IIT-R SIER dataset), consisting of speech utterances, corresponding images, and class labels, i.e., 'anger,' 'happy,' 'hate,' and 'sad.' The proposed system has achieved 83.29% accuracy for emotion recognition. The enhanced performance of the proposed system advocates the importance of utilizing complementary information from multiple modalities for emotion recognition.

preprint2022arXiv

Affective Feedback Synthesis Towards Multimodal Text and Image Data

In this paper, we have defined a novel task of affective feedback synthesis that deals with generating feedback for input text & corresponding image in a similar way as humans respond towards the multimodal data. A feedback synthesis system has been proposed and trained using ground-truth human comments along with image-text input. We have also constructed a large-scale dataset consisting of image, text, Twitter user comments, and the number of likes for the comments by crawling the news articles through Twitter feeds. The proposed system extracts textual features using a transformer-based textual encoder while the visual features have been extracted using a Faster region-based convolutional neural networks model. The textual and visual features have been concatenated to construct the multimodal features using which the decoder synthesizes the feedback. We have compared the results of the proposed system with the baseline models using quantitative and qualitative measures. The generated feedbacks have been analyzed using automatic and human evaluation. They have been found to be semantically similar to the ground-truth comments and relevant to the given text-image input.

preprint2020arXiv

Fast Griffin Lim based Waveform Generation Strategy for Text-to-Speech Synthesis

The performance of text-to-speech (TTS) systems heavily depends on spectrogram to waveform generation, also known as the speech reconstruction phase. The time required for the same is known as synthesis delay. In this paper, an approach to reduce speech synthesis delay has been proposed. It aims to enhance the TTS systems for real-time applications such as digital assistants, mobile phones, embedded devices, etc. The proposed approach applies Fast Griffin Lim Algorithm (FGLA) instead Griffin Lim algorithm (GLA) as vocoder in the speech synthesis phase. GLA and FGLA are both iterative, but the convergence rate of FGLA is faster than GLA. The proposed approach is tested on LJSpeech, Blizzard and Tatoeba datasets and the results for FGLA are compared against GLA and neural Generative Adversarial Network (GAN) based vocoder. The performance is evaluated based on synthesis delay and speech quality. A 36.58% reduction in speech synthesis delay has been observed. The quality of the output speech has improved, which is advocated by higher Mean opinion scores (MOS) and faster convergence with FGLA as opposed to GLA.

preprint2014arXiv

E-Governance in India: Definitions, Challenges and Solutions

The Government of India is transcending from traditional modus operandi of governance towards technological involvement in the process of governance. Currently, the Government of India is in the transition phase and seamlessly unleashing the power of ICT in governance. The government is spending an enormous amount of finances in deployment of e-governance, but, are these efforts are going in the appropriate direction and leads towards intended results? What do the people percept from the concept of e-governance? What is the global perspective about perception of e-governance? What are the major challenges confronting the deployment of e-governance? In this attempt the authors have made an attempt to riposte aforesaid issues. Moreover, the authors have also suggested some plausible suggestions which may help in successful and sustainable deployment of e-governance in India.

preprint2014arXiv

ICT in Local Self Governance: A Study of Rural India

The concept of local self-governance is not new as it has its roots in ancient time even before the era of Mauryan emperors. This paper depicts the journey of local self-governance from antediluvian time to 21st century. Further, in the current scenario Information and Communication Technology (ICT) has emerged as a successful tool for dissemination of various e-governance services and in this regard the Government of India has formulated NeGP with adequate service delivery mechanism. With the inculcation of ICT, various applications were designed by central as well as state governments which lead towards strengthening of PRIs for rural reform. This paper also throws some light on necessity of ICT in self-governance along with some case studies.

preprint2013arXiv

A Conceptual E-Governance Framework for Improving Child Immunization Process in India

India is country having high population and great variations in the educational level, economic conditions, population densities, cultures and awareness levels. Due to these variations the immunization process is not so much successful as per expectations of the state and central governments. In some zones the significant amount of vaccines are wasted whereas some are running out of vaccines. One of the reasons for such an imbalance is improper quantity estimation of vaccines in a particular zone. Further a huge amount of liquidity will be wasted in the form of vaccines. If we inculcate ICT (Information and Communication Technology) in the process of immunization then the problem can be rectified to some extent and hence we are proposing a conceptual model using ICT to improve the process of vaccination.

preprint2013arXiv

Discriminative Parameter Estimation for Random Walks Segmentation

The Random Walks (RW) algorithm is one of the most e - cient and easy-to-use probabilistic segmentation methods. By combining contrast terms with prior terms, it provides accurate segmentations of medical images in a fully automated manner. However, one of the main drawbacks of using the RW algorithm is that its parameters have to be hand-tuned. we propose a novel discriminative learning framework that estimates the parameters using a training dataset. The main challenge we face is that the training samples are not fully supervised. Speci cally, they provide a hard segmentation of the images, instead of a proba- bilistic segmentation. We overcome this challenge by treating the opti- mal probabilistic segmentation that is compatible with the given hard segmentation as a latent variable. This allows us to employ the latent support vector machine formulation for parameter estimation. We show that our approach signi cantly outperforms the baseline methods on a challenging dataset consisting of real clinical 3D MRI volumes of skeletal muscles.

preprint2013arXiv

Discriminative Parameter Estimation for Random Walks Segmentation: Technical Report

The Random Walks (RW) algorithm is one of the most e - cient and easy-to-use probabilistic segmentation methods. By combining contrast terms with prior terms, it provides accurate segmentations of medical images in a fully automated manner. However, one of the main drawbacks of using the RW algorithm is that its parameters have to be hand-tuned. we propose a novel discriminative learning framework that estimates the parameters using a training dataset. The main challenge we face is that the training samples are not fully supervised. Speci cally, they provide a hard segmentation of the images, instead of a proba-bilistic segmentation. We overcome this challenge by treating the optimal probabilistic segmentation that is compatible with the given hard segmentation as a latent variable. This allows us to employ the latent support vector machine formulation for parameter estimation. We show that our approach signi cantly outperforms the baseline methods on a challenging dataset consisting of real clinical 3D MRI volumes of skeletal muscles.

preprint2013arXiv

Improved Service Delivery and Cost Effective Framework for e-Governance in India

In current era, the involvement of technologies like virtualization, consolidation and cloud computing, and adoption of free and open source software in designing and deploying e-governance that can reduce the total cost associated with and hence the financial burden abide by the state and central governments. The success of any e-governance project depends upon its utilization by the intended group and so there accessibility needs to be enhanced drastically by reengineered framework. Here, we design an Improved Service Delivery and Cost Effective Framework for e-Governance that will be useful for success of e-governance projects and the delivery mechanism in India by using free and open access software for development and deployment of e-governance applications, virtualization and consolidation techniques for management of e-services and cloud computing for enhancing the accessibility of services.

Puneet Kumar

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data

Affective Feedback Synthesis Towards Multimodal Text and Image Data

Fast Griffin Lim based Waveform Generation Strategy for Text-to-Speech Synthesis

E-Governance in India: Definitions, Challenges and Solutions

ICT in Local Self Governance: A Study of Rural India

A Conceptual E-Governance Framework for Improving Child Immunization Process in India

Discriminative Parameter Estimation for Random Walks Segmentation

Discriminative Parameter Estimation for Random Walks Segmentation: Technical Report

Improved Service Delivery and Cost Effective Framework for e-Governance in India