Source author record

Xi Fang

Xi Fang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence eess.IV Networking and Internet Architecture physics.med-ph Robotics

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-specific characteristics. Here we introduce SpecVQA, a professional scientific-image benchmark for evaluating multimodal models on scientific spectral understanding, covering 7 representative spectrum types with expert-annotated question-answer pairs. The aim comprises two aspects: spectra scientific QA evaluation and corresponding underlying task evaluation. SpecVQA contains 620 figures and 3100 QA pairs curated from peer-reviewed literature, targeting both direct information extraction and domain-specific reasoning. To effectively reduce token length while preserving essential curve characteristics, we propose a spectral data sampling and interpolation reconstruction approach. Ablation studies further confirm that the approach achieves substantial performance improvements on the proposed benchmark. We test the capability of prominent MLLMs in scientific spectral understanding on our benchmark and present a leaderboard. This work represents an essential step toward enhancing spectral understanding in multimodal large models and suggests promising directions for extending visual-language models to broader scientific research and data analysis.

preprint2020arXiv

Integrative Analysis for COVID-19 Patient Outcome Prediction

While image analysis of chest computed tomography (CT) for COVID-19 diagnosis has been intensively studied, little work has been performed for image-based patient outcome prediction. Management of high-risk patients with early intervention is a key to lower the fatality rate of COVID-19 pneumonia, as a majority of patients recover naturally. Therefore, an accurate prediction of disease progression with baseline imaging at the time of the initial presentation can help in patient management. In lieu of only size and volume information of pulmonary abnormalities and features through deep learning based image segmentation, here we combine radiomics of lung opacities and non-imaging features from demographic data, vital signs, and laboratory findings to predict need for intensive care unit (ICU) admission. To our knowledge, this is the first study that uses holistic information of a patient including both imaging and non-imaging data for outcome prediction. The proposed methods were thoroughly evaluated on datasets separately collected from three hospitals, one in the United States, one in Iran, and another in Italy, with a total 295 patients with reverse transcription polymerase chain reaction (RT-PCR) assay positive COVID-19 pneumonia. Our experimental results demonstrate that adding non-imaging features can significantly improve the performance of prediction to achieve AUC up to 0.884 and sensitivity as high as 96.1%, which can be valuable to provide clinical decision support in managing COVID-19 patients. Our methods may also be applied to other lung diseases including but not limited to community acquired pneumonia. The source code of our work is available at https://github.com/DIAL-RPI/COVID19-ICUPrediction.

preprint2020arXiv

Multi-organ Segmentation over Partially Labeled Datasets with Multi-scale Feature Abstraction

Shortage of fully annotated datasets has been a limiting factor in developing deep learning based image segmentation algorithms and the problem becomes more pronounced in multi-organ segmentation. In this paper, we propose a unified training strategy that enables a novel multi-scale deep neural network to be trained on multiple partially labeled datasets for multi-organ segmentation. In addition, a new network architecture for multi-scale feature abstraction is proposed to integrate pyramid input and feature analysis into a U-shape pyramid structure. To bridge the semantic gap caused by directly merging features from different scales, an equal convolutional depth mechanism is introduced. Furthermore, we employ a deep supervision mechanism to refine the outputs in different scales. To fully leverage the segmentation features from all the scales, we design an adaptive weighting layer to fuse the outputs in an automatic fashion. All these mechanisms together are integrated into a Pyramid Input Pyramid Output Feature Abstraction Network (PIPO-FAN). Our proposed method was evaluated on four publicly available datasets, including BTCV, LiTS, KiTS and Spleen, where very promising performance has been achieved. The source code of this work is publicly shared at https://github.com/DIAL-RPI/PIPO-FAN for others to easily reproduce the work and build their own models with the introduced mechanisms.

preprint2020arXiv

Towards Real-Time Advancement of Underwater Visual Quality with GAN

Low visual quality has prevented underwater robotic vision from a wide range of applications. Although several algorithms have been developed, real-time and adaptive methods are deficient for real-world tasks. In this paper, we address this difficulty based on generative adversarial networks (GAN), and propose a GAN-based restoration scheme (GAN-RS). In particular, we develop a multi-branch discriminator including an adversarial branch and a critic branch for the purpose of simultaneously preserving image content and removing underwater noise. In addition to adversarial learning, a novel dark channel prior loss also promotes the generator to produce realistic vision. More specifically, an underwater index is investigated to describe underwater properties, and a loss function based on the underwater index is designed to train the critic branch for underwater noise suppression. Through extensive comparisons on visual quality and feature restoration, we confirm the superiority of the proposed approach. Consequently, the GAN-RS can adaptively improve underwater visual quality in real time and induce an overall superior restoration performance. Finally, a real-world experiment is conducted on the seabed for grasping marine products, and the results are quite promising. The source code is publicly available at https://github.com/SeanChenxy/GAN_RS.

preprint2019arXiv

A Method of Rapid Quantification of Patient-Specific Organ Dose for CT Using Coupled Deep-Learning based Multi-Organ Segmentation and GPU-accelerated Monte Carlo Dose Computing

Purpose: This paper describes a new method to apply deep-learning algorithms for automatic segmentation of radiosensitive organs from 3D tomographic CT images before computing organ doses using a GPU-based Monte Carlo code. Methods: A deep convolutional neural network (CNN) for organ segmentation is trained to automatically delineate radiosensitive organs from CT. With a GPU-based Monte Carlo dose engine (ARCHER) to derive CT dose of a phantom made from a subject's CT scan, we are then able to compute the patient-specific CT dose for each of the segmented organs. The developed tool is validated by using Relative Dose Error (RDE) against the organ doses calculated by ARCHER with manual segmentation performed by radiologists. The dose computation results are also compared against organ doses from population-average phantoms to demonstrate the improvement achieved by using the developed tool. In this study, two datasets were used: The Lung CT Segmentation Challenge 2017 (LCTSC) dataset, which contains 60 thoracic CT scan patients each with 5 segmented organs, and the Pancreas-CT (PCT) dataset, which contains 43 abdominal CT scan patients each with 8 segmented organs. Five-fold cross-validation of the new method is performed on both datasets. Results: Comparing with the traditional organ dose evaluation method that based on population-average phantom, our proposed method achieved the smaller RDE range on all organs with -4.3%~1.5% vs -31.5%~33.9% (lung), -7.0%~2.3% vs -15.2%~125.1% (heart), -18.8%~40.2% vs -10.3%~124.1% (esophagus) in the LCTSC dataset and -5.6%~1.6% vs -20.3%~57.4% (spleen), -4.5%~4.6% vs -19.5%~61.0% (pancreas), -2.3%~4.4% vs -37.8%~75.8% (left kidney), -14.9%~5.4% vs -39.9% ~14.6% (gall bladder), -0.9%~1.6% vs -30.1%~72.5% (liver), and -23.0%~11.1% vs -52.5%~-1.3% (stomach) in the PCT dataset.

preprint2011arXiv

Wireless Communications and Networking Technologies for Smart Grid: Paradigms and Challenges

Smart grid, regarded as the next generation power grid, uses two-way flows of electricity and information to create a widely distributed automated energy delivery network. In this work we present our vision on smart grid from the perspective of wireless communications and networking technologies. We present wireless communication and networking paradigms for four typical scenarios in the future smart grid and also point out the research challenges of the wireless communication and networking technologies used in smart grid

Xi Fang

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

Integrative Analysis for COVID-19 Patient Outcome Prediction

Multi-organ Segmentation over Partially Labeled Datasets with Multi-scale Feature Abstraction

Towards Real-Time Advancement of Underwater Visual Quality with GAN

A Method of Rapid Quantification of Patient-Specific Organ Dose for CT Using Coupled Deep-Learning based Multi-Organ Segmentation and GPU-accelerated Monte Carlo Dose Computing

Wireless Communications and Networking Technologies for Smart Grid: Paradigms and Challenges