Source author record

Rishabh Jain

Rishabh Jain appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Computer Vision hep-ph Sound astro-ph.EP astro-ph.IM cond-mat.mtrl-sci eess.AS gr-qc hep-ex Human-Computer Interaction Machine Learning math.OC Programming Languages Social and Information Networks

Catalog footprint

What is connected

14works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

From Hype to Insight: Rethinking Large Language Model Integration in Visual Speech Recognition

Advances in self-supervised encoders have improved Visual Speech Recognition (VSR). Recent approaches integrating these encoders with LLM decoders improves transcription accuracy; however, it remains unclear whether these gains stem from visual understanding or stronger language modeling. In this work, we systematically evaluate LLM decoders by freezing or selectively updating the visual encoder, scaling decoder size, comparing adaptation strategies and architectures, and varying training data across LRS2, LRS3, and their combination. Evaluation on LRS2, LRS3, and WildVSR shows that scaling and adaptation yield limited improvements, while combining datasets enhances generalization. Semantic analysis reveals that gains arise primarily from lexical rather than semantic processing. Our Llama-2-13B model trained on the combined set achieves 24.7% WER on LRS3 and 47.0% on WildVSR, establishing SOTA among models trained without additional supervision. Our findings indicate LLM decoders refine contextual reasoning rather than visual features, emphasizing the need for stronger visual encoders to drive meaningful progress.

preprint2025arXiv

A Super-Learner with Large Language Models for Medical Emergency Advising

Medical decision-support and advising systems are critical for emergency physicians to quickly and accurately assess patients' conditions and make diagnosis. Artificial Intelligence (AI) has emerged as a transformative force in healthcare in recent years and Large Language Models (LLMs) have been employed in various fields of medical decision-support systems. We studied responses of a group of different LLMs to real cases in emergency medicine. The results of our study on five most renown LLMs showed significant differences in capabilities of Large Language Models for diagnostics acute diseases in medical emergencies with accuracy ranging between 58% and 65%. This accuracy significantly exceeds the reported accuracy of human doctors. We built a super-learner MEDAS (Medical Emergency Diagnostic Advising System) of five major LLMs - Gemini, Llama, Grok, GPT, and Claude). The super-learner produces higher diagnostic accuracy, 70%, even with a quite basic meta-learner. However, at least one of the integrated LLMs in the same super-learner produces 85% correct diagnoses. The super-learner integrates a cluster of LLMs using a meta-learner capable of learning different capabilities of each LLM to leverage diagnostic accuracy of the model by collective capabilities of all LLMs in the cluster. The results of our study showed that aggregated diagnostic accuracy provided by a meta-learning approach exceeds that of any individual LLM, suggesting that the super-learner can take advantage of the combined knowledge of the medical datasets used to train the group of LLMs.

preprint2022arXiv

A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis

Speech synthesis has come a long way as current text-to-speech (TTS) models can now generate natural human-sounding speech. However, most of the TTS research focuses on using adult speech data and there has been very limited work done on child speech synthesis. This study developed and validated a training pipeline for fine-tuning state-of-the-art (SOTA) neural TTS models using child speech datasets. This approach adopts a multi-speaker TTS retuning workflow to provide a transfer-learning pipeline. A publicly available child speech dataset was cleaned to provide a smaller subset of approximately 19 hours, which formed the basis of our fine-tuning experiments. Both subjective and objective evaluations were performed using a pretrained MOSNet for objective evaluation and a novel subjective framework for mean opinion score (MOS) evaluations. Subjective evaluations achieved the MOS of 3.95 for speech intelligibility, 3.89 for voice naturalness, and 3.96 for voice consistency. Objective evaluation using a pretrained MOSNet showed a strong correlation between real and synthetic child voices. Speaker similarity was also verified by calculating the cosine similarity between the embeddings of utterances. An automatic speech recognition (ASR) model is also used to provide a word error rate (WER) comparison between the real and synthetic child voices. The final trained TTS model was able to synthesize child-like speech from reference audio samples as short as 5 seconds.

preprint2022arXiv

Detecting Heavy Higgs Bosons from Natural SUSY at a 100 TeV Hadron Collider

Supersymmetric models with radiatively-driven naturalness (RNS) enjoy low electroweak fine-tuning whilst respecting LHC search limits on gluinos and top squarks and allowing for $m_h\simeq 125$ GeV. While the heavier Higgs bosons $H,\ A$ may have TeV-scale masses, the SUSY conserving $μ$ parameter must lie in the few hundred GeV range. Thus, in natural SUSY models there should occur large heavy Higgs boson branching fractions to electroweakinos, with Higgs boson decays to higgsino plus gaugino dominating when they are kinematically accessible. These SUSY decays can open up new avenues for discovery. We investigate the prospects of discovering heavy neutral Higgs bosons $H$ and $A$ decaying into light plus heavy chargino pairs which can yield a four isolated lepton plus missing transverse energy signature at the LHC and at a future 100 TeV $pp$ collider. We find that discovery of heavy Higgs decay to electroweakinos via its $4\ell$ decay mode is very difficult at HL-LHC. For FCC-hh or SPPC, we study the $H,\ A \to $ SUSY reaction along with dominant physics backgrounds from the Standard Model and devise suitable selection requirements to extract a clean signal for FCC-hh or SPPC with $\sqrt{s}=100$ TeV, assuming an integrated luminosity of 15 $ab^{-1}$. We find that while a conventional cut-and-count analysis yields a signal statistical significance greater than $5σ$ for $m_{A,H}\sim 1.1-1.65$ TeV, a boosted-decision-tree analysis allows for heavy Higgs signal discovery at FCC-hh or SPPC for $m_{A,H}\sim 1-2$ TeV.

preprint2022arXiv

On Hybrid Quantum and Classical Computing Algorithms for Mixed-Integer Programming

Quantum computing is emerging as a new computing resource that could be superior to conventional computing for certain classes of optimization problems. However, in principle, most existing approaches to quantum optimization are intended to solve unconstrained binary programming problems, while mixed-integer linear programming is of most interest in practice. We attempt to bridge the gap between the capability of quantum computing and real-world applications by developing a new approach for mixed-integer programming. The approach applies Benders decomposition to decompose the mixed-integer programming into binary programming and linear programming sub-problems, which are solved by a noisy intermediate-scale quantum processor and conventional processor, respectively. The algorithm is provably able to reach the optimal solution of the original mixed-integer programming problem. The algorithm is tested on a D-Wave 2000Q quantum processing unit and is shown to be effective for small-scaled test cases. We also test the algorithm on a mixed-integer programming inspired by power system applications. Many insights are drawn from the numerical results for both the capabilities and limitations of the proposed algorithm.

preprint2022arXiv

Searching for Charged Higgs Bosons via $e^+ e^- \to H^+ H^- \to c\bar{b} \bar{c}b $ at Linear Colliders

We study a search for the charged Higgs boson via $e^+e^- \to H^+H^- \to c\bar{b}\bar{c}b$ at the 500 GeV ILC. In a general two Higgs doublet model without $Z_2$ symmetry, extra Yukawa couplings $ρ_{tt}$ and $ρ_{tc}$ can drive baryogenesis, but searches at the HL-LHC may still go empty-handed if the couplings are relatively weak. Taking $m_{H^+ } \simeq m_H \simeq m_A \simeq 200$ GeV, with $ρ_{tt}$, $ρ_{tc}\sim 0.1$ and no $h(125)$-$H$ mixing, $H^+ \to c\bar b$ decay is dominant, and the $c\bar{b}\bar{c}b$ final state is likely overwhelmed by QCD background at the LHC. We show that the electroweak production of $H^+ H^-$ at the ILC is discoverable with integrated luminosity of 1 ab$^{-1}$. Furthermore, we show that $m_{H^+}$ can be extracted by requiring the two pairs of $b$ and light jets be roughly equal in mass, without assuming the mass value. Thus, ILC can probe low mass Higgs bosons in multijet final states to complement HL-LHC in the future

preprint2020arXiv

A Search for Technosignatures Around 31 Sun-like Stars with the Green Bank Telescope at 1.15-1.73 GHz

We conducted a search for technosignatures in April of 2018 and 2019 with the L-band receiver (1.15-1.73 GHz) of the 100 m diameter Green Bank Telescope. These observations focused on regions surrounding 31 Sun-like stars near the plane of the Galaxy. We present the results of our search for narrowband signals in this data set as well as improvements to our data processing pipeline. Specifically, we applied an improved candidate signal detection procedure that relies on the topographic prominence of the signal power, which nearly doubles the signal detection count of some previously analyzed data sets. We also improved the direction-of-origin filters that remove most radio frequency interference (RFI) to ensure that they uniquely link signals observed in separate scans. We performed a preliminary signal injection and recovery analysis to test the performance of our pipeline. We found that our pipeline recovers 93% of the injected signals over the usable frequency range of the receiver and 98% if we exclude regions with dense RFI. In this analysis, 99.73% of the recovered signals were correctly classified as technosignature candidates. Our improved data processing pipeline classified over 99.84% of the ~26 million signals detected in our data as RFI. Of the remaining candidates, 4539 were detected outside of known RFI frequency regions. The remaining candidates were visually inspected and verified to be of anthropogenic nature. Our search compares favorably to other recent searches in terms of end-to-end sensitivity, frequency drift rate coverage, and signal detection count per unit bandwidth per unit integration time.

preprint2020arXiv

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people? Such agents could leverage a larger variety of existing data to generalize to new tasks, minimizing expensive data collection and annotation. In this work, we study a setting we call "Dialog without Dialog", which requires agents to develop visually grounded dialog models that can adapt to new tasks without language level supervision. By factorizing intention and language, our model minimizes linguistic drift after fine-tuning for new tasks. We present qualitative results, automated metrics, and human studies that all show our model can adapt to new tasks and maintain language quality. Baselines either fail to perform well at new tasks or experience language drift, becoming unintelligible to humans. Code has been made available at https://github.com/mcogswell/dialog_without_dialog

preprint2019arXiv

nocaps: novel object captioning at scale

Image captioning models have achieved impressive results on datasets containing limited visual concepts and large amounts of paired image-caption training data. However, if these models are to ever function in the wild, a much larger variety of visual concepts must be learned, ideally from less supervision. To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task. Dubbed 'nocaps', for novel object captioning at scale, our benchmark consists of 166,100 human-generated captions describing 15,100 images from the OpenImages validation and test sets. The associated training data consists of COCO image-caption pairs, plus OpenImages image-level labels and object bounding boxes. Since OpenImages contains many more classes than COCO, nearly 400 object classes seen in test images have no or very few associated training captions (hence, nocaps). We extend existing novel object captioning models to establish strong baselines for this benchmark and provide analysis to guide future work on this task.

preprint2016arXiv

The Jeans theorem and the "Tolman-Oppenheimer-Volkov equation" in an exact wave solution of R^2 gravity

Corda, Mosquera Cuesta and Lorduy Gomez have shown that spherically symmetric stationary states can be used as a model for galaxies in the framework of the linearized R^2 gravity. Those states could represent a partial solution to the Dark Matter Problem. Here we discuss an improvement of this work. In fact, as the star density is a functional of the invariants of the associated Vlasov equation, we show that any of these invariants is in its turn a functional of the local energy and the angular momentum. As a consequence, the star density depends only on these two integrals of the Vlasov system. This result is known as the "Jeans theorem". In addition, we find an analogous of the historical Tolman- Oppenheimer-Volkov equation for the system considered in this paper. For the sake of completeness, in the final Section of the paper we consider two additional models which argue that Dark Matter could not be an essential element.

preprint2015arXiv

Effect of shear strain on band structure and electronic properties of phosphorene

We present an ab-initio investigation of effects of shear strain on band structure and electronic properties of 2D phosphorene. We carried out DFT calculations to determine the shear stress as a function of shear strain and found the monolayer phosphorene has ultimate strength at shear strain 30% and 35% in armchair and zigzag directions, respectively, and it was also found that the monolayer extends in z direction on applying shear strain in both directions. Additionally, we derived band structures of phosphorene along both directions under shear strain and have shown that band gap in phosphorene decreases along both directions and that phosphorene shows a semi-metal nature on applying shear strain of magnitude 30% in both directions. The electrical conductivity of phosphorene was estimated by effective mass along zigzag and armchair directions and it is shown that the electrical conductivity is far higher along armchair direction, and that with increasing shear strain conductivity increases along armchair, up to ultimate strength, and zigzag directions.

preprint2014arXiv

Clear, Concise and Effective UI: Opinion and Suggestions

The most important aspect of any Software is the operability for the intended audience. This factor of operability is encompassed in the user interface, which serves as the only window to the features of the system. It is thus essential that the User Interface provided is robust, concise and lucid. Presently there are no properly defined rules or guidelines for user interface design enabling a perfect design, since such a system cannot be perceived. This article aims at providing suggestions in the design of the User Interface, which would make it easier for the user to navigate through the system features and also the developers to guide the users towards better utilization of the features.

preprint2014arXiv

Mining and Analyzing Twitter trends: Frequency based ranking of descriptive Tweets

One of the major sources of trending news, events and opinion in the current age is micro blogging. Twitter, being one of them, is extensively used to mine data about public responses and event updates. This paper intends to propose methods to filter tweets to obtain the most accurately descriptive tweets, which communicates the content of the trend. It also potentially ranks the tweets according to relevance. The principle behind the ranking mechanism would be the assumed tendencies in the natural language used by the users. The mapping frequencies of occurrence of words and related hash tags is used to create a weighted score for each tweet in the sample space obtained from twitter on a particular trend.

preprint2014arXiv

Optimizing the For loop: Comparison of For loop and micro For loop

Looping is one of the fundamental logical instructions used for repeating a block of code. It is used in programs across all programming languages. Traditionally, in languages like C, the for loop is used extensively for repeated execution of a block of code, due to its ease for use and simplified representation. This paper proposes a new way of representing the for loop to improve its runtime efficiency and compares the experimental statistics with the traditional for loop representation. It is found that for small number of iterations, the difference in computational time may not be considerable. But given any large number of iterations, the difference is noticeable.

Rishabh Jain

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

From Hype to Insight: Rethinking Large Language Model Integration in Visual Speech Recognition

A Super-Learner with Large Language Models for Medical Emergency Advising

A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis

Detecting Heavy Higgs Bosons from Natural SUSY at a 100 TeV Hadron Collider

On Hybrid Quantum and Classical Computing Algorithms for Mixed-Integer Programming

Searching for Charged Higgs Bosons via $e^+ e^- \to H^+ H^- \to c\bar{b} \bar{c}b $ at Linear Colliders

A Search for Technosignatures Around 31 Sun-like Stars with the Green Bank Telescope at 1.15-1.73 GHz

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

nocaps: novel object captioning at scale

The Jeans theorem and the "Tolman-Oppenheimer-Volkov equation" in an exact wave solution of R^2 gravity

Effect of shear strain on band structure and electronic properties of phosphorene

Clear, Concise and Effective UI: Opinion and Suggestions

Mining and Analyzing Twitter trends: Frequency based ranking of descriptive Tweets

Optimizing the For loop: Comparison of For loop and micro For loop