Source author record

Yiming Xiao

Yiming Xiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.AS Computation and Language eess.IV physics.med-ph Sound

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DisastQA: A Comprehensive Benchmark for Evaluating Question Answering in Disaster Management

Accurate question answering (QA) in disaster management requires reasoning over uncertain and conflicting information, a setting poorly captured by existing benchmarks built on clean evidence. We introduce DisastQA, a large-scale benchmark of 3,000 rigorously verified questions (2,000 multiple-choice and 1,000 open-ended) spanning eight disaster types. The benchmark is constructed via a human-LLM collaboration pipeline with stratified sampling to ensure balanced coverage. Models are evaluated under varying evidence conditions, from closed-book to noisy evidence integration, enabling separation of internal knowledge from reasoning under imperfect information. For open-ended QA, we propose a human-verified keypoint-based evaluation protocol emphasizing factual completeness over verbosity. Experiments with 20 models reveal substantial divergences from general-purpose leaderboards such as MMLU-Pro. While recent open-weight models approach proprietary systems in clean settings, performance degrades sharply under realistic noise, exposing critical reliability gaps for disaster response. All code, data, and evaluation resources are available at https://github.com/TamuChen18/DisastQA_open.

preprint2026arXiv

WD-FQDet: Multispectral Detection Transformer via Wavelet Decomposition and Frequency-aware Query Learning

Infrared-visible object detection improves detection performance by combining complementary features from multispectral images. Existing backbone-specific and backbone-shared approaches still suffer from the problems of severe bias of modality-shared features and the insufficiency of modality-specific features. To address these issues, we propose a novel detection framework WD-FQDet that explicitly decouples modality-shared and modality-specific information from infrared and visible modalities in the new view of low- and high-frequency domains, allowing fusion strategies tailored to their frequency characteristics. Specifically, a low-frequency homogeneity alignment module is proposed to align modality-shared features across modalities via a cross-modal attention mechanism, and a high-frequency specificity retention module is proposed to preserve modality-specific features through the multi-scale gradient consistency loss. To reinforce the feature representation in the frequency domain, we propose a hybrid feature enhancement module that incorporates spatial cues. Furthermore, considering that the contributions of homogeneous and modality-specific features to object detection vary across scenarios, we propose a frequency-aware query selection module to dynamically regulate their contributions. Experimental results on the FLIR, LLVIP, and M3FD datasets demonstrate that WD-FQDet achieves state-of-the-art performance across multiple evaluation metrics.

preprint2022arXiv

ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during online meetings, speech quality can be significantly affected by background noise, reverberation, packet loss, network jitter, etc. Because of its nature, speech quality is traditionally assessed in subjective tests in laboratories and lately also in crowdsourcing following the international standards from ITU-T Rec. P.800 series. However, those approaches are costly and cannot be applied to customer data. Therefore, an effective objective assessment approach is needed to evaluate or monitor the speech quality of the ongoing conversation. The ConferencingSpeech 2022 challenge targets the non-intrusive deep neural network models for the speech quality assessment task. We open-sourced a training corpus with more than 86K speech clips in different languages, with a wide range of synthesized and live degradations and their corresponding subjective quality scores through crowdsourcing. 18 teams submitted their models for evaluation in this challenge. The blind test sets included about 4300 clips from wide ranges of degradations. This paper describes the challenge, the datasets, and the evaluation methods and reports the final results.

preprint2022arXiv

DiffeoRaptor: Diffeomorphic Inter-modal Image Registration using RaPTOR

Purpose: Diffeomorphic image registration is essential in many medical imaging applications. Several registration algorithms of such type have been proposed, but primarily for intra-contrast alignment. Currently, efficient inter-modal/contrast diffeomorphic registration, which is vital in numerous applications, remains a challenging task. Methods: We proposed a novel inter-modal/contrast registration algorithm that leverages Robust PaTch-based cOrrelation Ratio (RaPTOR) metric to allow inter-modal/contrast image alignment and bandlimited geodesic shooting demonstrated in Fourier Approximated Lie Algebras (FLASH) algorithm for fast diffeomorphic registration. Results: The proposed algorithm, named DiffeoRaptor, was validated with three public databases for the tasks of brain and abdominal image registration while comparing the results against three state-of-the-art techniques, including FLASH, NiftyReg, and Symmetric image normalization (SyN). Conclusions: Our results demonstrated that DiffeoRaptor offered comparable or better registration performance in terms of registration accuracy. Moreover, DiffeoRaptor produces smoother deformations than SyN in inter-modal and contrast registration. The code for DiffeoRaptor is publicly available at https://github.com/nimamasoumi/DiffeoRaptor.

preprint2022arXiv

RESECT-SEG: Open access annotations of intra-operative brain tumor ultrasound images

Purpose: Registration and segmentation of magnetic resonance (MR) and ultrasound (US) images play an essential role in surgical planning and resection of brain tumors. However, validating these techniques is challenging due to the scarcity of publicly accessible sources with high-quality ground truth information. To this end, we propose a unique annotation dataset of tumor tissues and resection cavities from the previously published RESECT dataset (Xiao et al. 2017) to encourage a more rigorous assessments of image processing techniques. Acquisition and validation methods: The RESECT database consists of MR and intraoperative US (iUS) images of 23 patients who underwent resection surgeries. The proposed dataset contains tumor tissues and resection cavity annotations of the iUS images. The quality of annotations were validated by two highly experienced neurosurgeons through several assessment criteria. Data format and availability: Annotations of tumor tissues and resection cavities are provided in 3D NIFTI formats. Both sets of annotations are accessible online in the \url{https://osf.io/6y4db}. Discussion and potential applications: The proposed database includes tumor tissue and resection cavity annotations from real-world clinical ultrasound brain images to evaluate segmentation and registration methods. These labels could also be used to train deep learning approaches. Eventually, this dataset should further improve the quality of image guidance in neurosurgery.

preprint2020arXiv

Do Public Datasets Assure Unbiased Comparisons for Registration Evaluation?

With the increasing availability of new image registration approaches, an unbiased evaluation is becoming more needed so that clinicians can choose the most suitable approaches for their applications. Current evaluations typically use landmarks in manually annotated datasets. As a result, the quality of annotations is crucial for unbiased comparisons. Even though most data providers claim to have quality control over their datasets, an objective third-party screening can be reassuring for intended users. In this study, we use the variogram to screen the manually annotated landmarks in two datasets used to benchmark registration in image-guided neurosurgeries. The variogram provides an intuitive 2D representation of the spatial characteristics of annotated landmarks. Using variograms, we identified potentially problematic cases and had them examined by experienced radiologists. We found that (1) a small number of annotations may have fiducial localization errors; (2) the landmark distribution for some cases is not ideal to offer fair comparisons. If unresolved, both findings could incur bias in registration evaluation.

preprint2020arXiv

Improved Source Counting and Separation for Monaural Mixture

Single-channel speech separation in time domain and frequency domain has been widely studied for voice-driven applications over the past few years. Most of previous works assume known number of speakers in advance, however, which is not easily accessible through monaural mixture in practice. In this paper, we propose a novel model of single-channel multi-speaker separation by jointly learning the time-frequency feature and the unknown number of speakers. Specifically, our model integrates the time-domain convolution encoded feature map and the frequency-domain spectrogram by attention mechanism, and the integrated features are projected into high-dimensional embedding vectors which are then clustered with deep attractor network to modify the encoded feature. Meanwhile, the number of speakers is counted by computing the Gerschgorin disks of the embedding vectors which are orthogonal for different speakers. Finally, the modified encoded feature is inverted to the sound waveform using a linear decoder. Experimental evaluation on the GRID dataset shows that the proposed method with a single model can accurately estimate the number of speakers with 96.7 % probability of success, while achieving the state-of-the-art separation results on multi-speaker mixtures in terms of scale-invariant signal-to-noise ratio improvement (SI-SNRi) and signal-to-distortion ratio improvement (SDRi).

Yiming Xiao

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

DisastQA: A Comprehensive Benchmark for Evaluating Question Answering in Disaster Management

WD-FQDet: Multispectral Detection Transformer via Wavelet Decomposition and Frequency-aware Query Learning

ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

DiffeoRaptor: Diffeomorphic Inter-modal Image Registration using RaPTOR

RESECT-SEG: Open access annotations of intra-operative brain tumor ultrasound images

Do Public Datasets Assure Unbiased Comparisons for Registration Evaluation?

Improved Source Counting and Separation for Monaural Mixture