Source author record

Xiaolong Liu

Xiaolong Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.mtrl-sci Computer Vision cond-mat.mes-hall Artificial Intelligence eess.IV hep-th Machine Learning math-ph math.AG math.MP physics.app-ph physics.ins-det

Catalog footprint

What is connected

16works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Donaldson-Thomas invariants of $[\mathbb C^4/\mathbb Z_r]$

We compute the zero-dimensional Donaldson-Thomas invariants of the quotient stack $[\mathbb{C}^4/\mathbb{Z}_r]$, confirming a conjecture of Cao-Kool-Monavari. Our main theorem is established through an orbifold analogue of Cao-Zhao-Zhou's degeneration formula combined with the zero-dimensional Donaldson-Thomas invariants for $\mathcal{A}_{r-1}\times\mathbb{C}^2$ and an explicit determination of orientations of Hilbert schemes of points on $[\mathbb{C}^4/\mathbb{Z}_r]$.

preprint2026arXiv

The Perceptual Bandwidth Bottleneck in Vision-Language Models: Active Visual Reasoning via Sequential Experimental Design

Visual perception in modern Vision-Language Models (VLMs) is constrained by a perceptual bandwidth bottleneck: a broad field of view preserves global context but sacrifices the fine-grained details required for complex reasoning. We argue that high-resolution visual reasoning is therefore not only semantic reasoning but also task-relevant evidence acquisition under limited perceptual bandwidth. Inspired by active vision and information foraging, we formalise this process as sequential Bayesian optimal experimental design (S-BOED), where an agent decides which visual evidence to acquire before answering. Since exact Bayesian inference is intractable in continuous gigapixel spaces, we derive a tractable coverage--resolution objective as a proxy for task-relevant information gain. We instantiate this framework with FOVEA, a training-free procedure that refines VLM crop proposals through evidence-oriented probing. Experiments on high-resolution benchmarks show consistent gains over direct and ReAct-style baselines, with particularly strong improvements in search-dominated remote-sensing settings.

preprint2022arXiv

An Empirical Study of End-to-End Temporal Action Detection

Temporal action detection (TAD) is an important yet challenging task in video understanding. It aims to simultaneously predict the semantic label and the temporal interval of every action instance in an untrimmed video. Rather than end-to-end learning, most existing methods adopt a head-only learning paradigm, where the video encoder is pre-trained for action classification, and only the detection head upon the encoder is optimized for TAD. The effect of end-to-end learning is not systematically evaluated. Besides, there lacks an in-depth study on the efficiency-accuracy trade-off in end-to-end TAD. In this paper, we present an empirical study of end-to-end temporal action detection. We validate the advantage of end-to-end learning over head-only learning and observe up to 11\% performance improvement. Besides, we study the effects of multiple design choices that affect the TAD performance and speed, including detection head, video encoder, and resolution of input videos. Based on the findings, we build a mid-resolution baseline detector, which achieves the state-of-the-art performance of end-to-end methods while running more than 4$\times$ faster. We hope that this paper can serve as a guide for end-to-end learning and inspire future research in this field. Code and models are available at \url{https://github.com/xlliu7/E2E-TAD}.

preprint2022arXiv

End-to-end Temporal Action Detection with Transformer

Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video. It is a fundamental and challenging task in video understanding. Previous methods tackle this task with complicated pipelines. They often need to train multiple networks and involve hand-designed operations, such as non-maximal suppression and anchor generation, which limit the flexibility and prevent end-to-end learning. In this paper, we propose an end-to-end Transformer-based method for TAD, termed TadTR. Given a small set of learnable embeddings called action queries, TadTR adaptively extracts temporal context information from the video for each query and directly predicts action instances with the context. To adapt Transformer to TAD, we propose three improvements to enhance its locality awareness. The core is a temporal deformable attention module that selectively attends to a sparse set of key snippets in a video. A segment refinement mechanism and an actionness regression head are designed to refine the boundaries and confidence of the predicted instances, respectively. With such a simple pipeline, TadTR requires lower computation cost than previous detectors, while preserving remarkable performance. As a self-contained detector, it achieves state-of-the-art performance on THUMOS14 (56.7% mAP) and HACS Segments (32.09% mAP). Combined with an extra action classifier, it obtains 36.75% mAP on ActivityNet-1.3. Code is available at https://github.com/xlliu7/TadTR.

preprint2022arXiv

Multi-modal Emotion Estimation for in-the-wild Videos

In this paper, we briefly introduce our submission to the Valence-Arousal Estimation Challenge of the 3rd Affective Behavior Analysis in-the-wild (ABAW) competition. Our method utilizes the multi-modal information, i.e., the visual and audio information, and employs a temporal encoder to model the temporal context in the videos. Besides, a smooth processor is applied to get more reasonable predictions, and a model ensemble strategy is used to improve the performance of our proposed method. The experiment results show that our method achieves 65.55% ccc for valence and 70.88% ccc for arousal on the validation set of the Aff-Wild2 dataset, which prove the effectiveness of our proposed method.

preprint2022arXiv

Multi-Task Learning Framework for Emotion Recognition in-the-wild

This paper presents our system for the Multi-Task Learning (MTL) Challenge in the 4th Affective Behavior Analysis in-the-wild (ABAW) competition. We explore the research problems of this challenge from three aspects: 1) For obtaining efficient and robust visual feature representations, we propose MAE-based unsupervised representation learning and IResNet/DenseNet-based supervised representation learning methods; 2) Considering the importance of temporal information in videos, we explore three types of sequential encoders to capture the temporal information, including the encoder based on transformer, the encoder based on LSTM, and the encoder based on GRU; 3) For modeling the correlation between these different tasks (i.e., valence, arousal, expression, and AU) for multi-task affective analysis, we first explore the dependency between these different tasks and propose three multi-task learning frameworks to model the correlations effectively. Our system achieves the performance of $1.7607$ on the validation dataset and $1.4361$ on the test dataset, ranking first in the MTL Challenge. The code is available at https://github.com/AIM3-RUC/ABAW4.

preprint2020arXiv

CASNet: Common Attribute Support Network for image instance and panoptic segmentation

Instance segmentation and panoptic segmentation is being paid more and more attention in recent years. In comparison with bounding box based object detection and semantic segmentation, instance segmentation can provide more analytical results at pixel level. Given the insight that pixels belonging to one instance have one or more common attributes of current instance, we bring up an one-stage instance segmentation network named Common Attribute Support Network (CASNet), which realizes instance segmentation by predicting and clustering common attributes. CASNet is designed in the manner of fully convolutional and can implement training and inference from end to end. And CASNet manages predicting the instance without overlaps and holes, which problem exists in most of current instance segmentation algorithms. Furthermore, it can be easily extended to panoptic segmentation through minor modifications with little computation overhead. CASNet builds a bridge between semantic and instance segmentation from finding pixel class ID to obtaining class and instance ID by operations on common attribute. Through experiment for instance and panoptic segmentation, CASNet gets mAP 32.8% and PQ 59.0% on Cityscapes validation dataset by joint training, and mAP 36.3% and PQ 66.1% by separated training mode. For panoptic segmentation, CASNet gets state-of-the-art performance on the Cityscapes validation dataset.

preprint2020arXiv

Nanoscale probing of image-potential states and electron transfer doping in borophene polymorphs

Using field-emission resonance spectroscopy with an ultrahigh vacuum scanning tunneling microscope, we reveal Stark-shifted image-potential states of the v_1/6 and v_1/5 borophene polymorphs on Ag(111) with long lifetimes, suggesting high borophene lattice and interface quality. These image-potential states allow the local work function and interfacial charge transfer of borophene to be probed at the nanoscale and test the widely employed self-doping model of borophene. Supported by apparent barrier height measurements and density functional theory calculations, electron transfer doping occurs for both borophene phases from the Ag(111) substrate. In contradiction with the self-doping model, a higher electron transfer doping level occurs for denser v_1/6 borophene compared to v_1/5 borophene, thus revealing the importance of substrate effects on borophene electron transfer.

preprint2016arXiv

Point Defects and Grain Boundaries in Rotationally Commensurate MoS2 on Epitaxial Graphene

With reduced degrees of freedom, structural defects are expected to play a greater role in two-dimensional materials in comparison to their bulk counterparts. In particular, mechanical strength, electronic properties, and chemical reactivity are strongly affected by crystal imperfections in the atomically thin limit. Here, ultra-high vacuum (UHV) scanning tunneling microscopy (STM) and spectroscopy (STS) are employed to interrogate point and line defects in monolayer MoS2 grown on epitaxial graphene (EG) at the atomic scale. Five types of point defects are observed with the majority species showing apparent structures that are consistent with vacancy and interstitial models. The total defect density is observed to be lower than MoS2 grown on other substrates, and is likely attributed to the van der Waals epitaxy of MoS2 on EG. Grain boundaries (GBs) with 30° and 60° tilt angles resulting from the rotational commensurability of MoS2 on EG are more easily resolved by STM than atomic force microscopy at similar scales due to the enhanced contrast from their distinct electronic states. For example, band gap reduction to ~0.8 eV and ~0.5 eV is observed with STS for 30° and 60° GBs, respectively. In addition, atomic resolution STM images of these GBs are found to agree well with proposed structure models. This work offers quantitative insight into the structure and properties of common defects in MoS2, and suggests pathways for tailoring the performance of MoS2/graphene heterostructures via defect engineering.

preprint2016arXiv

Rotationally Commensurate Growth of MoS2 on Epitaxial Graphene

Atomically thin MoS2/graphene heterostructures are promising candidates for nanoelectronic and optoelectronic technologies. Among different graphene substrates, epitaxial graphene (EG) on SiC provides several potential advantages for such heterostructures including high electronic quality, tunable substrate coupling, wafer-scale processability, and crystalline ordering that can template commensurate growth. Exploiting these attributes, we demonstrate here the thickness-controlled van der Waals epitaxial growth of MoS2 on EG via chemical vapor deposition, giving rise to transfer-free synthesis of a two-dimensional heterostructure with registry between its constituent materials. The rotational commensurability observed between the MoS2 and EG is driven by the energetically favorable alignment of their respective lattices and results in nearly strain-free MoS2, as evidenced by synchrotron X-ray scattering and atomic-resolution scanning tunneling microscopy (STM). The electronic nature of the MoS2/EG heterostructure is elucidated with STM and scanning tunneling spectroscopy, which reveals bias-dependent apparent thickness, band bending, and a reduced bandgap of ~0.4 eV at the monolayer MoS2 edges.

preprint2016arXiv

Stable Aqueous Dispersions of Optically and Electronically Active Phosphorene

Understanding and exploiting the remarkable optical and electronic properties of phosphorene require mass production methods that avoid chemical degradation. While solution-based strategies have been developed for scalable exfoliation of black phosphorus, these techniques have thus far employed anhydrous organic solvents in an effort to minimize exposure to known oxidants, but at the cost of limited exfoliation yield and flake size distribution. Here, we present an alternative phosphorene production method based on surfactant-assisted exfoliation and post-processing of black phosphorus in deoxygenated water. From comprehensive microscopic and spectroscopic analysis, this approach is shown to yield phosphorene dispersions that are stable, highly concentrated, and comparable to micromechanically exfoliated phosphorene in structure and chemistry. Due to the high exfoliation efficiency of this process, the resulting phosphorene flakes are thinner than anhydrous organic solvent dispersions, thus allowing the observation of layer-dependent photoluminescence down to the monolayer limit. Furthermore, to demonstrate preservation of electronic properties following solution processing, the aqueous-exfoliated phosphorene flakes are employed in field-effect transistors with high drive currents and current modulation ratios. Overall, this method enables the isolation and mass production of few-layer phosphorene, which will accelerate ongoing efforts to realize a diverse range of phosphorene-based applications.

preprint2015arXiv

In Situ Thermal Decomposition of Exfoliated Two-Dimensional Black Phosphorus

With a semiconducting band gap and high charge carrier mobility, two-dimensional (2D) black phosphorus (BP), often referred to as phosphorene, holds significant promise for next generation electronics and optoelectronics. However, as a 2D material, it possesses a higher surface area to volume ratio than bulk BP, suggesting that its chemical and thermal stability will be modified. Herein, an atomic-scale microscopic and spectroscopic study is performed to characterize the thermal degradation of mechanically exfoliated 2D BP. From in situ scanning/transmission electron microscopy, decomposition of 2D BP is observed to occur at ~400 °C in vacuum, in contrast to the 550 °C bulk BP sublimation temperature. This decomposition initiates via eye-shaped cracks along the [001] direction and then continues until only a thin, amorphous red phosphorous like skeleton remains. In situ electron energy loss spectroscopy, energy-dispersive X-ray spectroscopy, and energy-loss near-edge structure changes provide quantitative insight into this chemical transformation process.

preprint2015arXiv

Planar carbon nanotube-graphene hybrid films for high-performance broadband photodetectors

Graphene has emerged as a promising material for photonic applications fuelled by its superior electronic and optical properties. However, the photoresponsivity is limited by the low absorption cross section and ultrafast recombination rates of photoexcited carriers. Here we demonstrate a photoconductive gain of $\sim$ 10$^5$ electrons per photon in a carbon nanotube-graphene one dimensional-two dimensional hybrid due to efficient photocarriers generation and transport within the nanostructure. A broadband photodetector (covering 400 nm to 1550 nm) based on such hybrid films is fabricated with a high photoresponsivity of more than 100 AW$^{-1}$ and a fast response time of approximately 100 μs. The combination of ultra-broad bandwidth, high responsivities and fast operating speeds affords new opportunities for facile and scalable fabrication of all-carbon optoelectronic devices.

preprint2015arXiv

Solvent Exfoliation of Electronic-Grade, Two-Dimensional Black Phosphorus

Solution dispersions of two-dimensional (2D) black phosphorus (BP), often referred to as phosphorene, are achieved by solvent exfoliation. These pristine, electronic-grade BP dispersions are produced with anhydrous, organic solvents in a sealed tip ultrasonication system, which circumvents BP degradation that would otherwise occur via solvated oxygen or water. Among conventional solvents, n-methyl-pyrrolidone (NMP) is found to provide stable, highly concentrated (~0.4 mg/mL) BP dispersions. Atomic force microscopy, scanning electron microscopy, transmission electron microscopy, Raman spectroscopy, and X-ray photoelectron spectroscopy show that the structure and chemistry of solvent-exfoliated BP nanosheets are comparable to mechanically exfoliated BP flakes. Additionally, residual NMP from the liquid-phase processing suppresses the rate of BP oxidation in ambient conditions. Solvent-exfoliated BP nanosheet field-effect transistors (FETs) exhibit ambipolar behavior with current on/off ratios and mobilities up to ~10000 and ~50 cm^2/(V*s), respectively. Overall, this study shows that stable, highly concentrated, electronic-grade 2D BP dispersions can be realized by scalable solvent exfoliation, thereby presenting opportunities for large-area, high-performance BP device applications.

preprint2015arXiv

The progress of neutron texture diffractometer at China Advanced Research Reactor

The first neutron texture diffractometer in China has been built at China Advanced Research Reactor due to the strong demands of texture measurement with neutrons from domestic user community. This neutron texture diffractometer has high neutron intensity, moderate resolution and is mainly applied to study the texture in the commonly used industrial materials and engineering components. In this paper, the design and characteristics of this instrument are described. The results for calibration with neutrons and quantitative texture analysis of Zr alloy plate are presented. The comparison of texture measurement among different neutron texture diffractometer of HIPPO at LANSCE, Kowari at ANSTO and neutron texture diffractometer at CARR illustrates the reliable performance of this texture diffractometer.

preprint2014arXiv

Effective Passivation of Exfoliated Black Phosphorus Transistors against Ambient Degradation

Unencapsulated, exfoliated black phosphorus (BP) flakes are found to chemically degrade upon exposure to ambient conditions. Atomic force microscopy, electrostatic force microscopy, transmission electron microscopy, X-ray photoelectron spectroscopy, and Fourier transform infrared spectroscopy are employed to characterize the structure and chemistry of the degradation process, suggesting that O2 saturated H2O irreversibly reacts with BP to form oxidized phosphorus species. This interpretation is further supported by the observation that BP degradation occurs more rapidly on hydrophobic octadecyltrichlorosilane self-assembled monolayers and on H-Si(111), versus hydrophilic SiO2. For unencapsulated BP field-effect transistors, the ambient degradation causes large increases in threshold voltage after 6 hours in ambient, followed by a ~10^3 decrease in FET current on/off ratio and mobility after 48 hours. Atomic layer deposited AlOx overlayers effectively suppress ambient degradation, allowing encapsulated BP FETs to maintain high on/off ratios of ~10^3 and mobilities of ~100 cm2/(V*s) for over two weeks in ambient. This work shows that the ambient degradation of BP can be managed effectively when the flakes are sufficiently passivated. In turn, our strategy for enhancing BP environmental stability will accelerate efforts to implement BP in electronic and optoelectronic applications.

Xiaolong Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Donaldson-Thomas invariants of $[\mathbb C^4/\mathbb Z_r]$

The Perceptual Bandwidth Bottleneck in Vision-Language Models: Active Visual Reasoning via Sequential Experimental Design

An Empirical Study of End-to-End Temporal Action Detection

End-to-end Temporal Action Detection with Transformer

Multi-modal Emotion Estimation for in-the-wild Videos

Multi-Task Learning Framework for Emotion Recognition in-the-wild

CASNet: Common Attribute Support Network for image instance and panoptic segmentation

Nanoscale probing of image-potential states and electron transfer doping in borophene polymorphs

Point Defects and Grain Boundaries in Rotationally Commensurate MoS2 on Epitaxial Graphene

Rotationally Commensurate Growth of MoS2 on Epitaxial Graphene

Stable Aqueous Dispersions of Optically and Electronically Active Phosphorene

In Situ Thermal Decomposition of Exfoliated Two-Dimensional Black Phosphorus

Planar carbon nanotube-graphene hybrid films for high-performance broadband photodetectors

Solvent Exfoliation of Electronic-Grade, Two-Dimensional Black Phosphorus

The progress of neutron texture diffractometer at China Advanced Research Reactor

Effective Passivation of Exfoliated Black Phosphorus Transistors against Ambient Degradation