Source author record

Rohit Sinha

Rohit Sinha appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.AS Applications Computer Science and Game Theory Cryptography and Security eess.SP Human-Computer Interaction

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Nash Equilibrium Framework For Training-Free Multimodal Step Verification

Multimodal large language models often generate reasoning chains containing subtle errors that lead to incorrect answers. Current verification approaches have notable limitations. Learned critics need extensive labeled data and show inconsistent performance across different tasks. Meanwhile, existing training-free methods simply average scores from different sources, missing a key insight: when these scores disagree, that disagreement itself carries important information about whether a reasoning step is truly valid or not. We propose a training-free verification approach that treats step-wise verification as a coordination problem among specialized judges. We formalize these judges' interaction as a Nash equilibrium game where agreement signals valid steps while disagreement reveals instability. Our method computes equilibrium scores through a closed-form solution, enabling both disagreement-aware filtering and stability-conscious ranking of reasoning steps. Evaluated across six benchmarks, our approach achieves consistent improvements of 2.4% to 5.2% over baseline models and shows competitive performance against learned critics, demonstrating that cross-modal agreement (not just average confidence) provides robust verification signals without task-specific adaptation.

preprint2026arXiv

Towards Prosodically Informed Mizo TTS without Explicit Tone Markings

This paper reports on the development of a text-to-speech (TTS) system for Mizo, a low-resource, tonal, and Tibeto-Burman language spoken primarily in the Indian state of Mizoram. The TTS was built with only 5.18 hours of data; however, in terms of subjective and objective evaluations, the outputs were considered perceptually acceptable and intelligible. A baseline model using Tacotron2 was built, and then, with the same data, another TTS model was built with VITS. In both subjective and objective evaluations, the VITS model outperformed the Tacotron2 model. In terms of tone synthesis, the VITS model showed significantly lower tone errors than the Tacotron2 model. The paper demonstrates that a non-autoregressive, end-to-end framework can achieve synthesis of acceptable perceptual quality and intelligibility.

preprint2022arXiv

Exploring the Role of Emotion Regulation Difficulties in the Assessment of Mental Disorders

Several studies have been reported in the literature for the automatic detection of mental disorders. It is reported that mental disorders are highly correlated. The exploration of this fact for the automatic detection of mental disorders is yet to explore. Emotion regulation difficulties (ERD) characterize several mental disorders. Motivated by that, we investigated the use of ERD for the detection of two opted mental disorders in this study. For this, we have collected audio-video data of human subjects while conversing with a computer agent based on a specific questionnaire. Subsequently, a subject's responses are collected to obtain the ground truths of the audio-video data of that subject. The results indicate that the ERD can be used as an intermediate representation of audio-video data for detecting mental disorders.

preprint2021arXiv

Enhancing the Intelligibility of Cleft Lip and Palate Speech using Cycle-consistent Adversarial Networks

Cleft lip and palate (CLP) refer to a congenital craniofacial condition that causes various speech-related disorders. As a result of structural and functional deformities, the affected subjects' speech intelligibility is significantly degraded, limiting the accessibility and usability of speech-controlled devices. Towards addressing this problem, it is desirable to improve the CLP speech intelligibility. Moreover, it would be useful during speech therapy. In this study, the cycle-consistent adversarial network (CycleGAN) method is exploited for improving CLP speech intelligibility. The model is trained on native Kannada-speaking childrens' speech data. The effectiveness of the proposed approach is also measured using automatic speech recognition performance. Further, subjective evaluation is performed, and those results also confirm the intelligibility improvement in the enhanced speech over the original.

preprint2020arXiv

Verification of Quantitative Hyperproperties Using Trace Enumeration Relations

Many important cryptographic primitives offer probabilistic guarantees of security that can be specified as quantitative hyperproperties; these are specifications that stipulate the existence of a certain number of traces in the system satisfying certain constraints. Verification of such hyperproperties is extremely challenging because they involve simultaneous reasoning about an unbounded number of different traces. In this paper, we introduce a technique for verification of quantitative hyperproperties based on the notion of trace enumeration relations. These relations allow us to reduce the problem of trace-counting into one of model-counting of formulas in first-order logic. We also introduce a set of inference rules for machine-checked reasoning about the number of satisfying solutions to first-order formulas (aka model counting). Putting these two components together enables semi-automated verification of quantitative hyperproperties on infinite state systems. We use our methodology to prove confidentiality of access patterns in Path ORAMs of unbounded size, soundness of a simple interactive zero-knowledge proof protocol as well as other applications of quantitative hyperproperties studied in past work.

preprint2016arXiv

An Unsupervised Method for Detection and Validation of The Optic Disc and The Fovea

In this work, we have presented a novel method for detection of retinal image features, the optic disc and the fovea, from colour fundus photographs of dilated eyes for Computer-aided Diagnosis(CAD) system. A saliency map based method was used to detect the optic disc followed by an unsupervised probabilistic Latent Semantic Analysis for detection validation. The validation concept is based on distinct vessels structures in the optic disc. By using the clinical information of standard location of the fovea with respect to the optic disc, the macula region is estimated. Accuracy of 100\% detection is achieved for the optic disc and the macula on MESSIDOR and DIARETDB1 and 98.8\% detection accuracy on STARE dataset.

preprint2016arXiv

Electrocardiogram signal denoising using non-local wavelet transform domain filtering

ECG signals are usually corrupted by baseline wander, power-line interference, muscle noise, etc. and numerous methods have been proposed to remove these noises. However, in case of wireless recording of the ECG signal it gets corrupted by the additive white Gaussian noise (AWGN). For the correct diagnosis, removal of AWGN from ECG signals becomes necessary as it affects the all the diagnostic features. The natural signals exhibit correlation among their samples and this property has been exploited in various signal restoration tasks. Motivated by that, in this work we propose a nonlocal wavelet transform domain ECG signal denoising method which exploits the correlations among both local and nonlocal samples of the signal. In the proposed method, the similar blocks of the samples are grouped in a matrix and then denoising is achieved by the shrinkage of its two-dimensional discrete wavelet transform coefficients. The experiments performed on a number of ECG signals show significant quantitative and qualitative improvement in denoising performance over the existing ECG signal denoising methods.

preprint2015arXiv

A Gaussian Scale Space Approach For Exudates Detection, Classification And Severity Prediction

In the context of Computer Aided Diagnosis system for diabetic retinopathy, we present a novel method for detection of exudates and their classification for disease severity prediction. The method is based on Gaussian scale space based interest map and mathematical morphology. It makes use of support vector machine for classification and location information of the optic disc and the macula region for severity prediction. It can efficiently handle luminance variation and it is suitable for varied sized exudates. The method has been probed in publicly available DIARETDB1V2 and e-ophthaEX databases. For exudate detection the proposed method achieved a sensitivity of 96.54% and prediction of 98.35% in DIARETDB1V2 database.

Rohit Sinha

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

A Nash Equilibrium Framework For Training-Free Multimodal Step Verification

Towards Prosodically Informed Mizo TTS without Explicit Tone Markings

Exploring the Role of Emotion Regulation Difficulties in the Assessment of Mental Disorders

Enhancing the Intelligibility of Cleft Lip and Palate Speech using Cycle-consistent Adversarial Networks

Verification of Quantitative Hyperproperties Using Trace Enumeration Relations

An Unsupervised Method for Detection and Validation of The Optic Disc and The Fovea

Electrocardiogram signal denoising using non-local wavelet transform domain filtering

A Gaussian Scale Space Approach For Exudates Detection, Classification And Severity Prediction