Source author record

Fenglin Liu

Fenglin Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Computer Vision physics.med-ph Biological Physics Machine Learning eess.IV eess.AS physics.optics Sound

Catalog footprint

What is connected

17works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

BioMedArena: An Open-source Toolkit for Building and Evaluating Biomedical Deep Research Agents

Building a deep research agent today is an exercise in glue code: the same backbone evaluated on the same benchmark can report different accuracies in different papers because harness and tool registry all differ, and integrating a new foundation model into a comparable evaluation surface costs weeks of model-specific engineering. We call this the per-paper engineering tax and release BioMedArena, an open-source toolkit that not only alleviates it but also provides an arena for fair comparison of different foundation models when evaluating them as deep-research agents. BioMedArena decouples six layers of biomedical agent evaluation -- benchmark loading, tool exposure, tool selection, execution mode, context management, and scoring -- and exposes 147 biomedical benchmarks and 75 biomedical tools across 9 functional families. Adding a new model, benchmark, or tool reduces to registering a few-line provider adapter. We further provide 6 agent harnesses with 6 context-management strategies, which provide 12 backbones with competitive research capabilities and significantly improved performance, achieving state-of-the-art (SOTA) results on 8 representative biomedical benchmarks, with an average lift of +15.03 percentage points over prior SOTA. The toolkit, configurations, and per-task traces are available at https://github.com/AI-in-Health/BioMedArena

preprint2026arXiv

Biosignal Fingerprinting: A Cross-Modal PPG-ECG Foundation Model

Cardiovascular disease remains the leading cause of global mortality, yet scalable cardiac monitoring is hindered by the gap between diagnostic-rich ECG and ubiquitous wearable PPG. Bridging this gap requires representations that are compact, transferable across modalities and devices, and deployable without task-specific retraining. Here we introduce biosignal fingerprints: compact latent representations of cardiovascular state derived from a cross-modal foundation model, the Multi-modal Masked Autoencoder (M2AE), trained on over 3.4 million paired ECG and PPG signals. M2AE integrates modality-specific encoders with a shared bottleneck and dual decoders, jointly optimized using reconstruction and cross-modal contrastive objectives, yielding generalizable fingerprints that retain intra- and inter-modality features. Like a biometric fingerprint, these representations uniquely encode an individual's cardiovascular state in a modality-agnostic, privacy-preserving form reusable across clinical tasks without exposing raw waveform data or requiring model retraining. Across 7 downstream tasks, spanning cross-modal reconstruction, cardiovascular disease classification, hypertension detection, mortality prediction, and demographic inference, biosignal fingerprints achieve competitive or superior performance compared to leading domain-specialist foundation models in frozen settings, including an AUROC of 0.974 for five-class CVD classification and 0.877 for hypertension detection, with a maximum improvement of 27.7% in AUROC across 5 classification tasks. Critically, strong performance is maintained with only a single modality, enabling deployment in resource-constrained, single-sensor environments typical of real-world wearable monitoring, with direct implications for continuous cardiovascular monitoring across clinical and consumer health settings.

preprint2022arXiv

A physical perspective to understand myelin. I. Peters quadrant mystery

In the development of oligodendrocytes in the central nervous systems, the inner and outer tongue of the myelin sheath tend to be located within the same quadrant, which was named as Peters quadrant mystery. In this study, we conduct in silico investigations to explore the possible mechanisms underlying the Peters quadrant mystery. A biophysically detailed model of oligodendrocytes was used to simulate the effect of the actional potential-induced electric field across the myelin sheath. Our simulation suggests that the paranodal channel connecting the inner and outer tongue forms a low impedance route, inducing two high-current zones at the area around the inner and outer tongue. When the inner tongue and outer tongue are located within the same quadrant, the interaction of these two high-current-zones will induce a maximum amplitude and a polarity reverse of the voltage upon the inner tongue, resulting in the same quadrant phenomenon. This model indicates that the growth of myelin follows a simple principle: an external negative or positive E-field can promote or inhibit the growth of the inner tongue, respectively.

preprint2022arXiv

A physical perspective to understand myelin. II. The physical origin of myelin development

The physical principle of myelin development is obtained from our previous study by explaining Peter's quadrant mystery: an external applied negative and positive E-field can promote and inhibit the growth of the inner tongue of the myelin sheath, respectively. In this study, this principle is considered as a fundamental hypothesis, named Hypothesis-E, to explain more phenomena about myelin development systematically. Specifically, the g-ratio and the fate of the Schwann cell's differentiation are explained in terms of E-field. Moreover, an experiment is proposed to validate this theory.

preprint2022arXiv

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

Recently, medical report generation, which aims to automatically generate a long and coherent descriptive paragraph of a given medical image, has received growing research interests. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias: the normal visual regions dominate the dataset over the abnormal visual regions, and 2) the very long sequence. To alleviate above two problems, we propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules: 1) AHA module first predicts the disease tags from the input image and then learns the multi-grained visual features by hierarchically aligning the visual regions and disease tags. The acquired disease-grounded visual features can better represent the abnormal regions of the input image, which could alleviate data bias problem; 2) MGT module effectively uses the multi-grained features and Transformer framework to generate the long medical report. The experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets. Moreover, the human evaluation conducted by professional radiologists further proves the effectiveness of our approach.

preprint2022arXiv

End-to-end Spoken Conversational Question Answering: Task, Dataset and Model

In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts. However, the most natural way that human seek or test their knowledge is via human conversations. Therefore, we propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows given the speech documents. In this task, our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering. To this end, instead of directly adopting automatically generated speech transcripts with highly noisy data, we propose a novel unified data distillation approach, DDNet, which effectively ingests cross-modal information to achieve fine-grained representations of the speech and language modalities. Moreover, we propose a simple and novel mechanism, termed Dual Attention, by encouraging better alignments between audio and text to ease the process of knowledge transfer. To evaluate the capacity of SCQA systems in a dialogue-style interaction, we assemble a Spoken Conversational Question Answering (Spoken-CoQA) dataset with more than 40k question-answer pairs from 4k conversations. The performance of the existing state-of-the-art methods significantly degrade on our dataset, hence demonstrating the necessity of cross-modal information integration. Our experimental results demonstrate that our proposed method achieves superior performance in spoken conversational question answering tasks.

preprint2022arXiv

Graph-in-Graph Network for Automatic Gene Ontology Description Generation

Gene Ontology (GO) is the primary gene function knowledge base that enables computational tasks in biomedicine. The basic element of GO is a term, which includes a set of genes with the same function. Existing research efforts of GO mainly focus on predicting gene term associations. Other tasks, such as generating descriptions of new terms, are rarely pursued. In this paper, we propose a novel task: GO term description generation. This task aims to automatically generate a sentence that describes the function of a GO term belonging to one of the three categories, i.e., molecular function, biological process, and cellular component. To address this task, we propose a Graph-in-Graph network that can efficiently leverage the structural information of GO. The proposed network introduces a two-layer graph: the first layer is a graph of GO terms where each node is also a graph (gene graph). Such a Graph-in-Graph network can derive the biological functions of GO terms and generate proper descriptions. To validate the effectiveness of the proposed network, we build three large-scale benchmark datasets. By incorporating the proposed Graph-in-Graph network, the performances of seven different sequence-to-sequence models can be substantially boosted across all evaluation metrics, with up to 34.7%, 14.5%, and 39.1% relative improvements in BLEU, ROUGE-L, and METEOR, respectively.

preprint2022arXiv

O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Video captioning combines video understanding and language generation. Different from image captioning that describes a static image with details of almost every object, video captioning usually considers a sequence of frames and biases towards focused objects, e.g., the objects that stay in focus regardless of the changing background. Therefore, detecting and properly accommodating focused objects is critical in video captioning. To enforce the description of focused objects and achieve controllable video captioning, we propose an Object-Oriented Non-Autoregressive approach (O2NA), which performs caption generation in three steps: 1) identify the focused objects and predict their locations in the target caption; 2) generate the related attribute words and relation words of these focused objects to form a draft caption; and 3) combine video information to refine the draft caption to a fluent final caption. Since the focused objects are generated and located ahead of other words, it is difficult to apply the word-by-word autoregressive generation process; instead, we adopt a non-autoregressive approach. The experiments on two benchmark datasets, i.e., MSR-VTT and MSVD, demonstrate the effectiveness of O2NA, which achieves results competitive with the state-of-the-arts but with both higher diversity and higher inference speed.

preprint2022arXiv

Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding

In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and substantially improve the performance of sequence-to-sequence learning with deep representations on five diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on ten benchmark datasets, including a low-resource machine translation dataset and two low-resource medical report generation datasets.

preprint2021arXiv

Adaptive Bi-directional Attention: Exploring Multi-Granularity Representations for Machine Reading Comprehension

Recently, the attention-enhanced multi-layer encoder, such as Transformer, has been extensively studied in Machine Reading Comprehension (MRC). To predict the answer, it is common practice to employ a predictor to draw information only from the final encoder layer which generates the \textit{coarse-grained} representations of the source sequences, i.e., passage and question. Previous studies have shown that the representation of source sequence becomes more \textit{coarse-grained} from \textit{fine-grained} as the encoding layer increases. It is generally believed that with the growing number of layers in deep neural networks, the encoding process will gather relevant information for each location increasingly, resulting in more \textit{coarse-grained} representations, which adds the likelihood of similarity to other locations (referring to homogeneity). Such a phenomenon will mislead the model to make wrong judgments so as to degrade the performance. To this end, we propose a novel approach called Adaptive Bidirectional Attention, which adaptively exploits the source representations of different levels to the predictor. Experimental results on the benchmark dataset, SQuAD 2.0 demonstrate the effectiveness of our approach, and the results are better than the previous state-of-the-art model by 2.5$\%$ EM and 2.3$\%$ F1 scores.

preprint2020arXiv

Exploring and Distilling Cross-Modal Information for Image Captioning

Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding. In this work, we argue that such understanding requires visual attention to correlated image regions and semantic attention to coherent attributes of interest. Based on the Transformer, to perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that explores and distills the source information in vision and language. It globally provides the aspect vector, a spatial and relational representation of images based on caption contexts, through the extraction of salient region groupings and attribute collocations, and locally extracts the fine-grained regions and attributes in reference to the aspect vector for word selection. Our Transformer-based model achieves a CIDEr score of 129.3 in offline COCO evaluation on the COCO testing set with remarkable efficiency in terms of accuracy, speed, and parameter budget.

preprint2019arXiv

Block Matching Frame based Material Reconstruction for Spectral CT

Spectral computed tomography (CT) has a great potential in material identification and decomposition. To achieve high-quality material composition images and further suppress the x-ray beam hardening artifacts, we first propose a one-step material reconstruction model based on Taylor first-order expansion. Then, we develop a basic material reconstruction method named material simultaneous algebraic reconstruction technique (MSART). Considering the local similarity of each material image, we incorporate a powerful block matching frame (BMF) into the material reconstruction (MR) model and generate a BMF based MR (BMFMR) method. Because the BMFMR model contains the L0-norm problem, we adopt a split-Bregman method for optimization. The numerical simulation and physical phantom experiment results validate the correctness of the material reconstruction algorithms and demonstrate that the BMF regularization outperforms the total variation and no-local mean regularizations.

preprint2019arXiv

DLIMD: Dictionary Learning based Image-domain Material Decomposition for spectral CT

The potential huge advantage of spectral computed tomography (CT) is its capability to provide accuracy material identification and quantitative tissue information. This can benefit clinical applications, such as brain angiography, early tumor recognition, etc. To achieve more accurate material components with higher material image quality, we develop a dictionary learning based image-domain material decomposition (DLIMD) for spectral CT in this paper. First, we reconstruct spectral CT image from projections and calculate material coefficients matrix by selecting uniform regions of basis materials from image reconstruction results. Second, we employ the direct inversion (DI) method to obtain initial material decomposition results, and a set of image patches are extracted from the mode-1 unfolding of normalized material image tensor to train a united dictionary by the K-SVD technique. Third, the trained dictionary is employed to explore the similarities from decomposed material images by constructing the DLIMD model. Fourth, more constraints (i.e., volume conservation and the bounds of each pixel within material maps) are further integrated into the model to improve the accuracy of material decomposition. Finally, both physical phantom and preclinical experiments are employed to evaluate the performance of the proposed DLIMD in material decomposition accuracy, material image edge preservation and feature recovery.

preprint2016arXiv

BPF-type Region-of-interest Reconstruction for Parallel Translational Computed Tomography

Recently, an ultra-low-cost linear scan based tomography architecture was proposed by our team. Similar to linear tomosynthesis, the source and detector are translated in opposite directions and the data acquisition system targets on a region-of-interest (ROI) to acquire data for image reconstruction. This kind of tomography architecture was named parallel translational computed tomography (PTCT). In our previous studies, filtered backprojection (FBP)-type algorithms were developed to reconstruct images from PTCT. However, the reconstructed ROI images from truncated projections have severe truncation artifacts. In this paper, we propose two backprojection filtering (BPF)-type algorithms named MP-BPF and MZ-BPF to reconstruct ROI images from truncated PTCT data. A weight function is constructed to deal with data redundancy for multi-linear translations modes. Extensive numerical simulations are performed to evaluate the proposed MP-BPF and MZ-BPF algorithms for PTCT in fan-beam geometry. Qualitative and quantitative results demonstrate that the proposed BPF-type algorithms can not only accurately reconstruct ROI images from truncated projections but also provide high-quality images for the entire image support in some circumstances.

preprint2013arXiv

Dynamic Bowtie for Fan-beam CT

A bowtie is a filter used to shape an x-ray beam and equalize its flux reaching different detector channels. For development of spectral CT with energy-discriminative photon-counting (EDPC) detectors, here we propose and evaluate a dynamic bowtie for performance optimization based on a patient model or a scout scan. Our dynamic bowtie modifies an x-ray beam intensity profile by mechanical rotation and adaptive adjustment of the x-ray source flux. First, a mathematical model for dynamic bowtie filtering is established for an elliptical section in fan-beam geometry, and the contour of the optimal bowtie is derived. Then, numerical simulation is performed to compare the performance of the dynamic bowtie in the cases of an ideal phantom and a realistic cross-section relative to the counterparts without any bowtie and with a fixed bowtie respectively. Our dynamic bowtie can equalize the expected numbers of photons in the case of an ideal phantom. In practical cases, our dynamic bowtie can effectively reduce the dynamic range of detected signals inside the field of view. Although our design is optimized for an elliptical phantom, the resultant dynamic bowtie can be applied to a real fan-beam scan if the underlying cross-section can be approximated as an ellipse. Furthermore, our design methodology can be applied to specify an optimized dynamic bowtie for any cross-section of a patient, preferably using rapid prototyping technology. This fan-beam dynamic bowtie work could be extended to the cone-beam geometry in a follow-up study.

preprint2013arXiv

Micro-modulated luminescence tomography

Imaging depth of optical microscopy has been fundamentally limited to millimeter or sub-millimeter due to light scattering. X-ray microscopy can resolve spatial details of few microns deeply inside a sample but the contrast resolution is still inadequate to depict heterogeneous features at cellular or sub-cellular levels. To enhance and enrich biological contrast at large imaging depth, various nanoparticles are introduced and become essential to basic research and molecular medicine. Nanoparticles can be functionalized as imaging probes, similar to fluorescent and bioluminescent proteins. LiGa5O8:Cr3+ nanoparticles were recently synthesized to facilitate luminescence energy storage with x-ray pre-excitation and the subsequently stimulated luminescence emission by visible/near-infrared (NIR) light. In this paper, we suggest a micro-modulated luminescence tomography (MLT) approach to quantify a nanophosphor distribution in a thick biological sample with high resolution. Our numerical simulation studies demonstrate the feasibility of the proposed approach.

preprint2013arXiv

Top-level Design and Pilot Analysis of Low-end CT Scanners Based on Linear Scanning for Developing Countries

Purpose: The goal is to develop a new architecture for computed tomography (CT) which is at an ultra-low-dose for developing countries, especially in rural areas. Methods: The proposed scheme is inspired by the recently developed compressive sensing and interior tomography techniques, where the data acquisition system targets a region of interest (ROI) to acquire limited and truncated data. The source and detector are translated in opposite directions for either ROI reconstruction with one or more localized linear scans or global reconstruction by combining multiple ROI reconstructions. In other words, the popular slip ring is replaced by a translation based setup, and the instrumentation cost is reduced by a relaxation of the imaging speed requirement. Results: The various translational scanning modes are theoretically analyzed, and the scanning parameters are optimized. The numerical simulation results from different numbers of linear scans confirm the feasibility of the proposed scheme, and suggest two preferred low-end systems for horizontal and vertical patient positions respectively. Conclusion: Ultra-low-cost x-ray CT is feasible with our proposed combination of linear scanning, compressive sensing, and interior tomography. The proposed architecture can be tailored into permanent, movable, or reconfigurable systems as desirable. Advanced image registration and spectral imaging features can be included as well.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computation and Language Artificial Intelligence Computer Vision physics.med-ph Biological Physics Machine Learning eess.IV eess.AS physics.optics Sound

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.09579:author:3:fenglin-liu

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.06177:author:9:fenglin-liu

Imported May 20, 2026Synced May 20, 2026

5 works

Xian Wu

Researcher

Xian Wu contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Chenyu You

Researcher

Chenyu You contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Hengyong Yu

Researcher

Hengyong Yu contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Shen Ge

Researcher

Shen Ge contributes to research discovery and scholarly infrastructure.

Open to collaborate

Fenglin Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

BioMedArena: An Open-source Toolkit for Building and Evaluating Biomedical Deep Research Agents

Biosignal Fingerprinting: A Cross-Modal PPG-ECG Foundation Model

A physical perspective to understand myelin. I. Peters quadrant mystery

A physical perspective to understand myelin. II. The physical origin of myelin development

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

End-to-end Spoken Conversational Question Answering: Task, Dataset and Model

Graph-in-Graph Network for Automatic Gene Ontology Description Generation

O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding

Adaptive Bi-directional Attention: Exploring Multi-Granularity Representations for Machine Reading Comprehension

Exploring and Distilling Cross-Modal Information for Image Captioning

Block Matching Frame based Material Reconstruction for Spectral CT

DLIMD: Dictionary Learning based Image-domain Material Decomposition for spectral CT

BPF-type Region-of-interest Reconstruction for Parallel Translational Computed Tomography

Dynamic Bowtie for Fan-beam CT

Micro-modulated luminescence tomography

Top-level Design and Pilot Analysis of Low-end CT Scanners Based on Linear Scanning for Developing Countries