Researcher profile

Quan Liu

Quan Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2026arXiv

On the Intrinsic Limits of Transformer Image Embeddings in Non-Solvable Spatial Reasoning

Vision Transformers (ViTs) excel in semantic recognition but exhibit systematic failures in spatial reasoning tasks such as mental rotation. While often attributed to data scale, we propose that this limitation arises from the intrinsic circuit complexity of the architecture. We formalize spatial understanding as learning a Group Homomorphism: mapping image sequences to a latent space that preserves the algebraic structure of the underlying transformation group. We demonstrate that for non-solvable groups (e.g., the 3D rotation group $\mathrm{SO}(3)$), maintaining such a structure-preserving embedding is computationally lower-bounded by the Word Problem, which is $\mathsf{NC^1}$-complete. In contrast, we prove that constant-depth ViTs with polynomial precision are strictly bounded by $\mathsf{TC^0}$. Under the conjecture $\mathsf{TC^0} \subsetneq \mathsf{NC^1}$, we establish a complexity boundary: constant-depth ViTs fundamentally lack the logical depth to efficiently capture non-solvable spatial structures. We validate this complexity gap via latent-space probing, demonstrating that ViT representations suffer a structural collapse on non-solvable tasks as compositional depth increases.

preprint2026arXiv

Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning

Reinforcement Learning with Verifiable Rewards (RLVR) elicits long chain-of-thought reasoning in large language models (LLMs), but outcome-based rewards lead to coarse-grained advantage estimation. While existing approaches improve RLVR via token-level entropy or sequence-level length control, they lack a semantically grounded, step-level measure of reasoning progress. As a result, LLMs fail to distinguish necessary deduction from redundant verification: they may continue checking after reaching a correct solution and, in extreme cases, overturn a correct trajectory into an incorrect final answer. To remedy the lack of process supervision, we introduce a training-free probing mechanism that extracts intermediate confidence and correctness and combines them into a Step Potential signal that explicitly estimates the reasoning state at each step. Building on this signal, we propose Step Potential Advantage Estimation (SPAE), a fine-grained credit assignment method that amplifies potential gains, penalizes potential drops, and applies penalty after potential saturates to encourage timely termination. Experiments across multiple benchmarks show SPAE consistently improves accuracy while substantially reducing response length, outperforming strong RL baselines and recent efficient reasoning and token-level advantage estimation methods. The code is available at https://github.com/cii030/SPAE-RL.

preprint2023arXiv

Omni-Seg: A Scale-aware Dynamic Network for Renal Pathological Image Segmentation

Comprehensive semantic segmentation on renal pathological images is challenging due to the heterogeneous scales of the objects. For example, on a whole slide image (WSI), the cross-sectional areas of glomeruli can be 64 times larger than that of the peritubular capillaries, making it impractical to segment both objects on the same patch, at the same scale. To handle this scaling issue, prior studies have typically trained multiple segmentation networks in order to match the optimal pixel resolution of heterogeneous tissue types. This multi-network solution is resource-intensive and fails to model the spatial relationship between tissue types. In this paper, we propose the Omni-Seg+ network, a scale-aware dynamic neural network that achieves multi-object (six tissue types) and multi-scale (5X to 40X scale) pathological image segmentation via a single neural network. The contribution of this paper is three-fold: (1) a novel scale-aware controller is proposed to generalize the dynamic neural network from single-scale to multi-scale; (2) semi-supervised consistency regularization of pseudo-labels is introduced to model the inter-scale correlation of unannotated tissue types into a single end-to-end learning paradigm; and (3) superior scale-aware generalization is evidenced by directly applying a model trained on human kidney images to mouse kidney images, without retraining. By learning from ~150,000 human pathological image patches from six tissue types at three different resolutions, our approach achieved superior segmentation performance according to human visual assessment and evaluation of image-omics (i.e., spatial transcriptomics). The official implementation is available at https://github.com/ddrrnn123/Omni-Seg.

preprint2022arXiv

CaCL: Class-aware Codebook Learning for Weakly Supervised Segmentation on Diffuse Image Patterns

Weakly supervised learning has been rapidly advanced in biomedical image analysis to achieve pixel-wise labels (segmentation) from image-wise annotations (classification), as biomedical images naturally contain image-wise labels in many scenarios. The current weakly supervised learning algorithms from the computer vision community are largely designed for focal objects (e.g., dogs and cats). However, such algorithms are not optimized for diffuse patterns in biomedical imaging (e.g., stains and fluorescence in microscopy imaging). In this paper, we propose a novel class-aware codebook learning (CaCL) algorithm to perform weakly supervised learning for diffuse image patterns. Specifically, the CaCL algorithm is deployed to segment protein expressed brush border regions from histological images of human duodenum. Our contribution is three-fold: (1) we approach the weakly supervised segmentation from a novel codebook learning perspective; (2) the CaCL algorithm segments diffuse image patterns rather than focal objects; and (3) the proposed algorithm is implemented in a multi-task framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) via joint image reconstruction, classification, feature embedding, and segmentation. The experimental results show that our method achieved superior performance compared with baseline weakly supervised algorithms. The code is available at https://github.com/ddrrnn123/CaCL.

preprint2022arXiv

Compound Figure Separation of Biomedical Images: Mining Large Datasets for Self-supervised Learning

With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as digital books, publications, and search engines, provide a new resource for obtaining large-scale images. However, published images in healthcare (e.g., radiology and pathology) consist of a considerable amount of compound figures with subplots. In order to extract and separate compound figures into usable individual images for downstream learning, we propose a simple compound figure separation (SimCFS) framework without using the traditionally required detection bounding box annotations, with a new loss function and a hard case simulation. Our technical contribution is four-fold: (1) we introduce a simulation-based training framework that minimizes the need for resource extensive bounding box annotations; (2) we propose a new side loss that is optimized for compound figure separation; (3) we propose an intra-class image augmentation method to simulate hard cases; and (4) to the best of our knowledge, this is the first study that evaluates the efficacy of leveraging self-supervised learning with compound image separation. From the results, the proposed SimCFS achieved state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The pretrained self-supervised learning model using large-scale mined figures improved the accuracy of downstream image classification tasks with a contrastive learning algorithm. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation.

preprint2022arXiv

Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using Multilingual BERT

Multilingual BERT (mBERT), a language model pre-trained on large multilingual corpora, has impressive zero-shot cross-lingual transfer capabilities and performs surprisingly well on zero-shot POS tagging and Named Entity Recognition (NER), as well as on cross-lingual model transfer. At present, the mainstream methods to solve the cross-lingual downstream tasks are always using the last transformer layer's output of mBERT as the representation of linguistic information. In this work, we explore the complementary property of lower layers to the last transformer layer of mBERT. A feature aggregation module based on an attention mechanism is proposed to fuse the information contained in different layers of mBERT. The experiments are conducted on four zero-shot cross-lingual transfer datasets, and the proposed method obtains performance improvements on key multilingual benchmark tasks XNLI (+1.5 %), PAWS-X (+2.4 %), NER (+1.2 F1), and POS (+1.5 F1). Through the analysis of the experimental results, we prove that the layers before the last layer of mBERT can provide extra useful information for cross-lingual downstream tasks and explore the interpretability of mBERT empirically.

preprint2022arXiv

Learning Disentangled Representations for Counterfactual Regression via Mutual Information Minimization

Learning individual-level treatment effect is a fundamental problem in causal inference and has received increasing attention in many areas, especially in the user growth area which concerns many internet companies. Recently, disentangled representation learning methods that decompose covariates into three latent factors, including instrumental, confounding and adjustment factors, have witnessed great success in treatment effect estimation. However, it remains an open problem how to learn the underlying disentangled factors precisely. Specifically, previous methods fail to obtain independent disentangled factors, which is a necessary condition for identifying treatment effect. In this paper, we propose Disentangled Representations for Counterfactual Regression via Mutual Information Minimization (MIM-DRCFR), which uses a multi-task learning framework to share information when learning the latent factors and incorporates MI minimization learning criteria to ensure the independence of these factors. Extensive experiments including public benchmarks and real-world industrial user growth datasets demonstrate that our method performs much better than state-of-the-art methods.

preprint2022arXiv

Leaving No One Behind: A Multi-Scenario Multi-Task Meta Learning Approach for Advertiser Modeling

Advertisers play an essential role in many e-commerce platforms like Taobao and Amazon. Fulfilling their marketing needs and supporting their business growth is critical to the long-term prosperity of platform economies. However, compared with extensive studies on user modeling such as click-through rate predictions, much less attention has been drawn to advertisers, especially in terms of understanding their diverse demands and performance. Different from user modeling, advertiser modeling generally involves many kinds of tasks (e.g. predictions of advertisers' expenditure, active-rate, or total impressions of promoted products). In addition, major e-commerce platforms often provide multiple marketing scenarios (e.g. Sponsored Search, Display Ads, Live Streaming Ads) while advertisers' behavior tend to be dispersed among many of them. This raises the necessity of multi-task and multi-scenario consideration in comprehensive advertiser modeling, which faces the following challenges: First, one model per scenario or per task simply doesn't scale; Second, it is particularly hard to model new or minor scenarios with limited data samples; Third, inter-scenario correlations are complicated, and may vary given different tasks. To tackle these challenges, we propose a multi-scenario multi-task meta learning approach (M2M) which simultaneously predicts multiple tasks in multiple advertising scenarios.

preprint2022arXiv

Meta-optic Accelerators for Object Classifiers

Rapid advances in deep learning have led to paradigm shifts in a number of fields, from medical image analysis to autonomous systems. These advances, however, have resulted in digital neural networks with large computational requirements, resulting in high energy consumption and limitations in real-time decision making when computation resources are limited. Here, we demonstrate a meta-optic based neural network accelerator that can off-load computationally expensive convolution operations into high-speed and low-power optics. In this architecture, metasurfaces enable both spatial multiplexing and additional information channels, such as polarization, in object classification. End-to-end design is used to co-optimize the optical and digital systems resulting in a robust classifier that achieves 95% accurate classification of handwriting digits and 94% accuracy in classifying both the digit and its polarization state. This approach could enable compact, high-speed, and low-power image and information processing systems for a wide range of applications in machine-vision and artificial intelligence.

preprint2022arXiv

Multi-Level Contrastive Learning for Cross-Lingual Alignment

Cross-language pre-trained models such as multilingual BERT (mBERT) have achieved significant performance in various cross-lingual downstream NLP tasks. This paper proposes a multi-level contrastive learning (ML-CTL) framework to further improve the cross-lingual ability of pre-trained models. The proposed method uses translated parallel data to encourage the model to generate similar semantic embeddings for different languages. However, unlike the sentence-level alignment used in most previous studies, in this paper, we explicitly integrate the word-level information of each pair of parallel sentences into contrastive learning. Moreover, cross-zero noise contrastive estimation (CZ-NCE) loss is proposed to alleviate the impact of the floating-point error in the training process with a small batch size. The proposed method significantly improves the cross-lingual transfer ability of our basic model (mBERT) and outperforms on multiple zero-shot cross-lingual downstream tasks compared to the same-size models in the Xtreme benchmark.

preprint2022arXiv

Omni-Seg: A Single Dynamic Network for Multi-label Renal Pathology Image Segmentation using Partially Labeled Data

Computer-assisted quantitative analysis on Giga-pixel pathology images has provided a new avenue in histology examination. The innovations have been largely focused on cancer pathology (i.e., tumor segmentation and characterization). In non-cancer pathology, the learning algorithms can be asked to examine more comprehensive tissue types simultaneously, as a multi-label setting. The prior arts typically needed to train multiple segmentation networks in order to match the domain-specific knowledge for heterogeneous tissue types (e.g., glomerular tuft, glomerular unit, proximal tubular, distal tubular, peritubular capillaries, and arteries). In this paper, we propose a dynamic single segmentation network (Omni-Seg) that learns to segment multiple tissue types using partially labeled images (i.e., only one tissue type is labeled for each training image) for renal pathology. By learning from ~150,000 patch-wise pathological images from six tissue types, the proposed Omni-Seg network achieved superior segmentation accuracy and less resource consumption when compared to the previous the multiple-network and multi-head design. In the testing stage, the proposed method obtains "completely labeled" tissue segmentation results using only "partially labeled" training images. The source code is available at https://github.com/ddrrnn123/Omni-Seg

preprint2022arXiv

Survival Prediction of Brain Cancer with Incomplete Radiology, Pathology, Genomics, and Demographic Data

Integrating cross-department multi-modal data (e.g., radiological, pathological, genomic, and clinical data) is ubiquitous in brain cancer diagnosis and survival prediction. To date, such an integration is typically conducted by human physicians (and panels of experts), which can be subjective and semi-quantitative. Recent advances in multi-modal deep learning, however, have opened a door to leverage such a process to a more objective and quantitative manner. Unfortunately, the prior arts of using four modalities on brain cancer survival prediction are limited by a "complete modalities" setting (i.e., with all modalities available). Thus, there are still open questions on how to effectively predict brain cancer survival from the incomplete radiological, pathological, genomic, and demographic data (e.g., one or more modalities might not be collected for a patient). For instance, should we use both complete and incomplete data, and more importantly, how to use those data? To answer the preceding questions, we generalize the multi-modal learning on cross-department multi-modal data to a missing data setting. Our contribution is three-fold: 1) We introduce optimal multi-modal learning with missing data (MMD) pipeline with optimized hardware consumption and computational efficiency; 2) We extend multi-modal learning on radiological, pathological, genomic, and demographic data into missing data scenarios; 3) a large-scale public dataset (with 962 patients) is collected to systematically evaluate glioma tumor survival prediction using four modalities. The proposed method improved the C-index of survival prediction from 0.7624 to 0.8053.

preprint2020arXiv

Dually Interactive Matching Network for Personalized Response Selection in Retrieval-Based Chatbots

This paper proposes a dually interactive matching network (DIM) for presenting the personalities of dialogue agents in retrieval-based chatbots. This model develops from the interactive matching network (IMN) which models the matching degree between a context composed of multiple utterances and a response candidate. Compared with previous persona fusion approaches which enhance the representation of a context by calculating its similarity with a given persona, the DIM model adopts a dual matching architecture, which performs interactive matching between responses and contexts and between responses and personas respectively for ranking response candidates. Experimental results on PERSONA-CHAT dataset show that the DIM model outperforms its baseline model, i.e., IMN with persona fusion, by a margin of 14.5% and outperforms the current state-of-the-art model by a margin of 27.7% in terms of top-1 accuracy hits@1.

preprint2020arXiv

Filtering before Iteratively Referring for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots

The challenges of building knowledge-grounded retrieval-based chatbots lie in how to ground a conversation on its background knowledge and how to match response candidates with both context and knowledge simultaneously. This paper proposes a method named Filtering before Iteratively REferring (FIRE) for this task. In this method, a context filter and a knowledge filter are first built, which derive knowledge-aware context representations and context-aware knowledge representations respectively by global and bidirectional attention. Besides, the entries irrelevant to the conversation are discarded by the knowledge filter. After that, iteratively referring is performed between context and response representations as well as between knowledge and response representations, in order to collect deep matching features for scoring response candidates. Experimental results show that FIRE outperforms previous methods by margins larger than 2.8% and 4.1% on the PERSONA-CHAT dataset with original and revised personas respectively, and margins larger than 3.1% on the CMU_DoG dataset in terms of top-1 accuracy. We also show that FIRE is more interpretable by visualizing the knowledge grounding process.

preprint2020arXiv

Fine-Tuning BERT for Schema-Guided Zero-Shot Dialogue State Tracking

We present our work on Track 4 in the Dialogue System Technology Challenges 8 (DSTC8). The DSTC8-Track 4 aims to perform dialogue state tracking (DST) under the zero-shot settings, in which the model needs to generalize on unseen service APIs given a schema definition of these target APIs. Serving as the core for many virtual assistants such as Siri, Alexa, and Google Assistant, the DST keeps track of the user's goal and what happened in the dialogue history, mainly including intent prediction, slot filling, and user state tracking, which tests models' ability of natural language understanding. Recently, the pretrained language models have achieved state-of-the-art results and shown impressive generalization ability on various NLP tasks, which provide a promising way to perform zero-shot learning for language understanding. Based on this, we propose a schema-guided paradigm for zero-shot dialogue state tracking (SGP-DST) by fine-tuning BERT, one of the most popular pretrained language models. The SGP-DST system contains four modules for intent prediction, slot prediction, slot transfer prediction, and user state summarizing respectively. According to the official evaluation results, our SGP-DST (team12) ranked 3rd on the joint goal accuracy (primary evaluation metric for ranking submissions) and 1st on the requsted slots F1 among 25 participant teams.

preprint2020arXiv

Periodic Fluorescence Variations of CdSe Quantum Dots Coupled to Aryleneethynylenes with Aggregation Induced Emission

CdSe nanocrystals and aggregates of an aryleneethynylene derivative are assembled into a hybrid thin film with dual fluorescence from both fluorophores. Under continuous excitation, the nanocrystals and the molecules exhibit anti-correlated fluorescence intensity variations, which become periodic at low temperature. We attribute this to a structure-dependent aggregation induced emission of the aryleneethynylene derivative, which impacts the rate of excitation energy transfer between the molecules and nanocrystals. Energy transfer also affects the electric transport properties of the hybrid material under optical excitation. This work highlights that combining semiconductor nanocrystals with molecular aggregates, which exhibit aggregation induced emission, can result in unprecedented emerging optical properties.

preprint2020arXiv

Pre-Trained and Attention-Based Neural Networks for Building Noetic Task-Oriented Dialogue Systems

The NOESIS II challenge, as the Track 2 of the 8th Dialogue System Technology Challenges (DSTC 8), is the extension of DSTC 7. This track incorporates new elements that are vital for the creation of a deployed task-oriented dialogue system. This paper describes our systems that are evaluated on all subtasks under this challenge. We study the problem of employing pre-trained attention-based network for multi-turn dialogue systems. Meanwhile, several adaptation methods are proposed to adapt the pre-trained language models for multi-turn dialogue systems, in order to keep the intrinsic property of dialogue systems. In the released evaluation results of Track 2 of DSTC 8, our proposed models ranked fourth in subtask 1, third in subtask 2, and first in subtask 3 and subtask 4 respectively.

preprint2020arXiv

Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots

In this paper, we study the problem of employing pre-trained language models for multi-turn response selection in retrieval-based chatbots. A new model, named Speaker-Aware BERT (SA-BERT), is proposed in order to make the model aware of the speaker change information, which is an important and intrinsic property of multi-turn dialogues. Furthermore, a speaker-aware disentanglement strategy is proposed to tackle the entangled dialogues. This strategy selects a small number of most important utterances as the filtered context according to the speakers' information in them. Finally, domain adaptation is performed to incorporate the in-domain knowledge into pre-trained language models. Experiments on five public datasets show that our proposed model outperforms the present models on all metrics by large margins and achieves new state-of-the-art performances for multi-turn response selection.

preprint2019arXiv

Hypericin: Single molecule spectroscopy of an active natural ingredient

Hypericin can be found in nature in Hypericum perforatum (St. John's Wort) and has become subject of intense biochemical research. Studies report of antidepressive, antineoplastic, antitumor and antiviral activity of hypericin. Among the variety of potential applications hypericin can be used as photosensitizer in photodynamic therapy (PDT), where it is brought into cancer cells and produces singlet oxygen upon irradiation with a suitable light source. Therefore, the photophysical properties of hypericin are crucial for a successful application in a medical treatment. Here, we present the first single molecule optical spectroscopy study of hypericin. Its photostability is large enough to obtain single molecule fluorescence, surface enhanced Raman spectra (SERS), fluorescence lifetime, antibunching and blinking dynamics. Embedding hypericin in a PVA matrix changes the blinking dynamics, reduces the fluorescence lifetime and increases the photostability. Single molecule SERS spectra show both the neutral and deprotonated form of hypericin and exhibit sudden spectral changes, which can be associated with a reorientation of the single molecule with respect to the surface.