Source author record

Partha Pratim Das

Partha Pratim Das appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Applications eess.AS physics.med-ph Sound eess.IV

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review

Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment with the vocals makes extracting the melody from the mixture signal much more challenging. Until recently, classical signal processing-based melody extraction methods were quite popular among melody extraction researchers. The ability of the deep learning models to model large-scale data and the ability of the models to learn automatic features by exploiting spatial and temporal dependencies inspired many researchers to adopt deep learning models for melody extraction. In this paper, an attempt has been made to review the up-to-date data-driven deep learning approaches for melody extraction from polyphonic music. The available deep models have been categorized based on the type of neural network used and the output representation they use for predicting melody. Further, the architectures of the 25 melody extraction models are briefly presented. The loss functions used to optimize the model parameters of the melody extraction models are broadly categorized into four categories and briefly describe the loss functions used by various melody extraction models. Also, the various input representations adopted by the melody extraction models and the parameter settings are deeply described. A section describing the explainability of the block-box melody extraction deep neural networks is included. The performance of 25 melody extraction methods is compared. The possible future directions to explore/improve the melody extraction methods are also presented in the paper.

preprint2020arXiv

Beat Detection and Automatic Annotation of the Music of Bharatanatyam Dance using Speech Recognition Techniques

Bharatanatyam, an Indian Classical Dance form, represents the rich cultural heritage of India. Analysis and recognition of such dance forms are critical for the preservation of cultural heritage. Like in most dance forms, a Bharatanatyam dancer performs in synchronization with structured rhythmic music, called Sollukattu, which comprises instrumental beats and vocalized utterances (bols) to create a rhythmic music structure. Computer analysis of Bharatanatyam, therefore, requires a structural analysis of Sollukattus. In this paper, we use speech processing techniques to recognize bols. Exploiting the predefined structures of Sollukattus and the detected bols, we recognize the Sollukattu. We estimate the tempo period by two methods. Finally, we generate a complete annotation of the audio signal by beat marking. For this, we also use the information of beats detected from the onset envelope of a Sollukattu signal. For training and test, we create a data set for Sollukattus and annotate them. We achieve 85% accuracy in bol recognition, 95% in Sollukattu recognition, 96% in tempo period estimation, and over 90% in beat marking. This is the maiden attempt to fully structurally analyze the music of an Indian Classical Dance form and the use of speech processing techniques for beat marking.

preprint2020arXiv

Early Response Assessment in Lung Cancer Patients using Spatio-temporal CBCT Images

We report a model to predict patient's radiological response to curative radiation therapy (RT) for non-small-cell lung cancer (NSCLC). Cone-Beam Computed Tomography images acquired weekly during the six-week course of RT were contoured with the Gross Tumor Volume (GTV) by senior radiation oncologists for 53 patients (7 images per patient). Deformable registration of the images yielded six deformation fields for each pair of consecutive images per patient. Jacobian of a field provides a measure of local expansion/contraction and is used in our model. Delineations were compared post-registration to compute unchanged ($U$), newly grown ($G$), and reduced ($R$) regions within GTV. The mean Jacobian of these regions $μ_U$, $μ_G$ and $μ_R$ are statistically compared and a response assessment model is proposed. A good response is hypothesized if $μ_R < 1.0$, $μ_R < μ_U$, and $μ_G < μ_U$. For early prediction of post-treatment response, first, three weeks' images are used. Our model predicted clinical response with a precision of $74\%$. Using reduction in CT numbers (CTN) and percentage GTV reduction as features in logistic regression, yielded an area-under-curve of 0.65 with p=0.005. Combining logistic regression model with the proposed hypothesis yielded an odds ratio of 20.0 (p=0.0).

preprint2020arXiv

Novel Radiomic Feature for Survival Prediction of Lung Cancer Patients using Low-Dose CBCT Images

Prediction of survivability in a patient for tumor progression is useful to estimate the effectiveness of a treatment protocol. In our work, we present a model to take into account the heterogeneous nature of a tumor to predict survival. The tumor heterogeneity is measured in terms of its mass by combining information regarding the radiodensity obtained in images with the gross tumor volume (GTV). We propose a novel feature called Tumor Mass within a GTV (TMG), that improves the prediction of survivability, compared to existing models which use GTV. Weekly variation in TMG of a patient is computed from the image data and also estimated from a cell survivability model. The parameters obtained from the cell survivability model are indicatives of changes in TMG over the treatment period. We use these parameters along with other patient metadata to perform survival analysis and regression. Cox's Proportional Hazard survival regression was performed using these data. Significant improvement in the average concordance index from 0.47 to 0.64 was observed when TMG is used in the model instead of GTV. The experiments show that there is a difference in the treatment response in responsive and non-responsive patients and that the proposed method can be used to predict patient survivability.

preprint2018arXiv

HSD-CNN: Hierarchically self decomposing CNN architecture using class specific filter sensitivity analysis

Conventional Convolutional neural networks (CNN) are trained on large domain datasets and are hence typically over-represented and inefficient in limited class applications. An efficient way to convert such large many-class pre-trained networks into small few-class networks is through a hierarchical decomposition of its feature maps. To alleviate this issue, we propose an automated framework for such decomposition in Hierarchically Self Decomposing CNN (HSD-CNN), in four steps. HSD-CNN is derived automatically using a class-specific filter sensitivity analysis that quantifies the impact of specific features on a class prediction. The decomposed hierarchical network can be utilized and deployed directly to obtain sub-networks for a subset of classes, and it is shown to perform better without the requirement of retraining these sub-networks. Experimental results show that HSD-CNN generally does not degrade accuracy if the full set of classes are used. Interestingly, when operating on known subsets of classes, HSD-CNN has an improvement in accuracy with a much smaller model size, requiring much fewer operations. HSD-CNN flow is verified on the CIFAR10, CIFAR100 and CALTECH101 data sets. We report accuracies up to $85.6\%$ ( $94.75\%$ ) on scenarios with 13 ( 4 ) classes of CIFAR100, using a pre-trained VGG-16 network on the full data set. In this case, the proposed HSD-CNN requires $3.97 \times$ fewer parameters and has $71.22\%$ savings in operations, in comparison to baseline VGG-16 containing features for all 100 classes.

Partha Pratim Das

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review

Beat Detection and Automatic Annotation of the Music of Bharatanatyam Dance using Speech Recognition Techniques

Early Response Assessment in Lung Cancer Patients using Spatio-temporal CBCT Images

Novel Radiomic Feature for Survival Prediction of Lung Cancer Patients using Low-Dose CBCT Images

HSD-CNN: Hierarchically self decomposing CNN architecture using class specific filter sensitivity analysis