Source author record

Qing Liu

Qing Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language astro-ph.CO astro-ph.GA math.AG Methodology

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy

Document classification forms the backbone of modern enterprise content management, yet existing benchmarks remain trapped in oversimplified paradigms -- single domain settings with flat label structures -- that bear little resemblance to the hierarchical, multi-modal, and cross-domain nature of real-world business documents. This gap not only misrepresents practical complexity but also stifles progress toward industrially viable document intelligence. To bridge this gap, we construct the first Multi-level, Multi-domain, Multi-modal document classification Benchmark (MMM-Bench). MMM-Bench includes (1) a deeply hierarchical taxonomy spanning five levels that capture the authentic organizational logic of business documentation; and (2) 5,990 real-world multi-modal documents meticulously curated from 12 commercial domains in Alibaba. Each document is manually annotated with a complete hierarchical path by domain experts. We establish comprehensive baselines on MMM-Bench, which consists of open-weight models and API-based models. Through systematic experiments, we identify four fundamental challenges within MMM-Bench and propose corresponding insights. To provide a solid foundation for advancing research in multi-level, multi-domain document classification, we release all of the data and the evaluation toolkit of MMM-Bench at https://github.com/MMMDC-Bench/MMMDC-Bench.

preprint2026arXiv

On the cohomological representations of finite automorphism groups of singular curves and compact complex spaces

Let G be a finite group acting tamely on a proper reduced curve C over an algebraically closed field. We study the G-module structure on the cohomology groups of a G-equivariant locally free sheaf F on C, and give formulas of Chevalley--Weil type, with values in the Grothendieck ring R_k(G)_Q of finitely generated G-modules. We also give a similar formula for the singular cohomology of compact complex spaces. The focus is on the case where C is nodal. Using the Chevalley--Weil formula, we compute the G-invariant part of the global sections of the pluricanonical bundle ω_C^{\otimes m}. In turn, we use the formula for m=2 to compute the equivariant deformation space of a stable G-curve C. We also obtain numerical criteria for the presence of any given irreducible representation in space of the global sections of ω_C\otimes F, where F is an ample locally free G-sheaf on C. Some new phenomena, pathological compared to the smooth curve case, are discussed.

preprint2026arXiv

PrivGemo: Privacy-Preserving Dual-Tower Graph Retrieval for Empowering LLM Reasoning with Memory Augmentation

Knowledge graphs (KGs) provide structured evidence that can ground large language model (LLM) reasoning for knowledge-intensive question answering. However, many practical KGs are private, and sending retrieved triples or exploration traces to closed-source LLM APIs introduces leakage risk. Existing privacy treatments focus on masking entity names, but they still face four limitations: structural leakage under semantic masking, uncontrollable remote interaction, fragile multi-hop and multi-entity reasoning, and limited experience reuse for stability and efficiency. To address these issues, we propose PrivGemo, a privacy-preserving retrieval-augmented framework for KG-grounded reasoning with memory-guided exposure control. PrivGemo uses a dual-tower design to keep raw KG knowledge local while enabling remote reasoning over an anonymized view that goes beyond name masking to limit both semantic and structural exposure. PrivGemo supports multi-hop, multi-entity reasoning by retrieving anonymized long-hop paths that connect all topic entities, while keeping grounding and verification on the local KG. A hierarchical controller and a privacy-aware experience memory further reduce unnecessary exploration and remote interactions. Comprehensive experiments on six benchmarks show that PrivGemo achieves overall state-of-the-art results, outperforming the strongest baseline by up to 17.1%. Furthermore, PrivGemo enables smaller models (e.g., Qwen3-4B) to achieve reasoning performance comparable to that of GPT-4-Turbo.

preprint2026arXiv

Unsupervised dense random survival forests identify interpretable patient profiles with heterogeneous treatment benefit

Precision oncology aims to prescribe the optimal cancer treatment to the right patients, maximizing therapeutic benefits. However, identifying patient subgroups that may benefit more from experimental cancer treatments based on randomized clinical trials presents a significant analytical challenge. To address this, we introduce a novel unsupervised machine learning approach based on very dense random survival forests (up to 100,000 trees), equipped with a new splitting rule that explicitly targets treatment-effect heterogeneity. This method is robust, interpretable, and effectively identifies responsive subgroups. Extensive simulations confirm its ability to detect heterogeneous patient responses and distinguish between datasets with and without heterogeneity, while maintaining a stringent Type I error rate of 1%. We further validate its performance using Phase III randomized clinical trial datasets, demonstrating significant patient heterogeneity in treatment response based on baseline characteristics.

preprint2025arXiv

Quantitative Morphology of Galactic Cirrus in Deep Optical Imaging

Imaging of optical Galactic cirrus, the spatially resolved form of diffuse Galactic light, provides important insights into the properties of the diffuse interstellar medium (ISM) in the Milky Way. While previous investigations have focused mainly on the intensity characteristics of optical cirrus, their morphological properties remain largely unexplored. In this study, we employ several complementary statistical approaches -- local intensity statistics, angular power spectrum / $Δ$-variance analysis, and wavelet scattering transform analysis -- to characterize the morphology of cirrus in deep optical imaging data. We place our investigation of optical cirrus into a multi-wavelength context by comparing the morphology of cirrus seen with the Dragonfly Telephoto Array to that seen with space-based facilities working at longer wavelengths (Herschel 250 $μm$, WISE 12 $μm$, and Planck radiance), as well as with structures seen in the DHIGLS HI column density map. Our statistical methods quantify the similarities and the differences of cirrus morphology in all these datasets. The morphology of cirrus at visible wavelengths resembles that of far-infrared cirrus more closely than that of mid-infrared cirrus; on small scales, anisotropies in the cosmic infrared background and systematics may lead to differences. Across all dust tracers, cirrus morphology can be well described by a power spectrum with a common power-law index $γ\sim-2.9$. We demonstrate quantitatively that optical cirrus exhibits filamentary, coherent structures across a broad range of angular scales. Our results offer promising avenues for linking the analysis of coherent structures in optical cirrus to the underlying physical processes in the ISM that shape them. Furthermore, we demonstrate that these morphological signatures can be leveraged to distinguish and disentangle cirrus from extragalactic light.

Qing Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy

On the cohomological representations of finite automorphism groups of singular curves and compact complex spaces

PrivGemo: Privacy-Preserving Dual-Tower Graph Retrieval for Empowering LLM Reasoning with Memory Augmentation

Unsupervised dense random survival forests identify interpretable patient profiles with heterogeneous treatment benefit

Quantitative Morphology of Galactic Cirrus in Deep Optical Imaging