Source author record

Jana Hutter

Jana Hutter appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision physics.med-ph Computation and Language eess.IV Machine Learning Sound

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

Real-time magnetic resonance imaging (rtMRI) of speech production enables non-invasive visualization of dynamic vocal-tract motion and is valuable for speech science and clinical assessment. However, rtMRI is fundamentally constrained by trade-offs among spatial resolution, temporal resolution, and acquisition speed, often leading to undersampled k-space measurements and degraded reconstructions. We propose SIREM, a speech-informed MRI reconstruction framework that uses synchronized speech as a cross-modal prior. The central idea is that vocal-tract configurations during speech are correlated with the produced acoustics, making part of the image content predictable from audio. SIREM models each frame as a fusion of an audio-driven component and an MRI-driven component through a spatial weighting map. The audio branch predicts articulator-related structure from speech, while the MRI branch reconstructs complementary content from measured k-space data. We further introduce a learnable soft weighting profile over spiral arms, enabling a differentiable study of how k-space arm usage interacts with speech-informed fusion. This yields a unified multimodal formulation that combines audio-driven prediction, MRI reconstruction, and sampling adaptation. We evaluate SIREM on the USC speech rtMRI benchmark against standard baselines, including gridding, wavelet-based compressed sensing, and total variation. SIREM introduces a speech-informed reconstruction paradigm that operates in a substantially higher-throughput regime than iterative methods while preserving anatomically plausible vocal-tract structure. These results establish an initial benchmark for multimodal speech-informed rtMRI reconstruction and highlight the potential of synchronized speech as an auxiliary prior for fast reconstruction. The source code is available at https://github.com/mdhasanai/SIREM

preprint2026arXiv

Speech-Guided Multimodal Learning for Vocal Tract Segmentation in Real-Time MRI

Segmenting vocal tract articulators in real-time MRI (rtMRI) is a challenging dynamic image segmentation problem characterized by low contrast, rapid motion, and limited spatial resolution. However, while rtMRI acquisitions may provide synchronized acoustic signals, existing methods discard this information, and the few multimodal approaches that incorporate audio cannot be deployed when audio is unavailable. We propose a three-stage framework that leverages acoustic and phonological supervision during training while requiring only the rtMRI image at inference: phonological representations are converted into spatial bounding-box priors for articulator localization, visual and acoustic encoders are aligned via dual-level cross-modal contrastive pretraining, and the learned representations are fused through a cross-attention decoder, effectively transferring multimodal knowledge into a single-modality inference pipeline. Evaluated on 75-Speaker~Annot-16 and USC-TIMIT datasets, our method outperforms existing unimodal and multimodal methods, demonstrating that multimodal supervision provides transferable benefits for precise and clinically deployable vocal tract segmentation.

preprint2020arXiv

Diffusion tensor driven image registration: a deep learning approach

Tracking microsctructural changes in the developing brain relies on accurate inter-subject image registration. However, most methods rely on either structural or diffusion data to learn the spatial correspondences between two or more images, without taking into account the complementary information provided by using both. Here we propose a deep learning registration framework which combines the structural information provided by T2-weighted (T2w) images with the rich microstructural information offered by diffusion tensor imaging (DTI) scans. We perform a leave-one-out cross-validation study where we compare the performance of our multi-modality registration model with a baseline model trained on structural data only, in terms of Dice scores and differences in fractional anisotropy (FA) maps. Our results show that in terms of average Dice scores our model performs better in subcortical regions when compared to using structural data only. Moreover, average sum-of-squared differences between warped and fixed FA maps show that our proposed model performs better at aligning the diffusion data.

preprint2020arXiv

Scattered slice SHARD reconstruction for motion correction in multi-shell diffusion MRI

Diffusion MRI offers a unique probe into neural microstructure and connectivity in the developing brain. However, analysis of neonatal brain imaging data is complicated by inevitable subject motion, leading to a series of scattered slices that need to be aligned within and across diffusion-weighted contrasts. Here, we develop a reconstruction method for scattered slice multi-shell high angular resolution diffusion imaging (HARDI) data, jointly estimating an uncorrupted data representation and motion parameters at the slice or multiband excitation level. The reconstruction relies on data-driven representation of multi-shell HARDI data using a bespoke spherical harmonics and radial decomposition (SHARD), which avoids imposing model assumptions, thus facilitating to compare various microstructure imaging methods in the reconstructed output. Furthermore, the proposed framework integrates slice-level outlier rejection, distortion correction, and slice profile correction. We evaluate the method in the neonatal cohort of the developing Human Connectome Project (650 scans). Validation experiments demonstrate accurate slice-level motion correction across the age range and across the range of motion in the population. Results in the neonatal data show successful reconstruction even in severely motion-corrupted subjects. In addition, we illustrate how local tissue modelling can extract advanced microstructure features such as orientation distribution functions from the motion-corrected reconstructions.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint