Source author record

Liang Xiao

Liang Xiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NT Computer Vision math.AG eess.IV eess.SP Machine Learning Robotics Cryptography and Security Information Theory math.IT

Catalog footprint

What is connected

22works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Vision-Language-Action Model with Visual Prompt for OFF-Road Autonomous Driving

Efficient trajectory planning in off-road terrains presents a formidable challenge for autonomous vehicles, often necessitating complex multi-step pipelines. However, traditional approaches exhibit limited adaptability in dynamic environments. To address these limitations, this paper proposes OFF-EMMA, a novel end-to-end multimodal framework designed to overcome the deficiencies of insufficient spatial perception and unstable reasoning in visual-language-action (VLA) models for off-road autonomous driving scenarios. The framework explicitly annotates input images through the design of a visual prompt block and introduces a chain-of-thought with self-consistency (COT-SC) reasoning strategy to enhance the accuracy and robustness of trajectory planning. The visual prompt block utilizes semantic segmentation masks as visual prompts, enhancing the spatial understanding ability of pre-trained visual-language models for complex terrains. The COT- SC strategy effectively mitigates the error impact of outliers on planning performance through a multi-path reasoning mechanism. Experimental results on the RELLIS-3D off-road dataset demonstrate that OFF-EMMA significantly outperforms existing methods, reducing the average L2 error of the Qwen backbone model by 13.3% and decreasing the failure rate from 16.52% to 6.56%.

preprint2026arXiv

Autonomous Driving in Unstructured Environments: How Far Have We Come?

Research on autonomous driving in unstructured outdoor environments is less advanced than in structured urban settings due to challenges like environmental diversities and scene complexity. These environments-such as rural areas and rugged terrains-pose unique obstacles that are not common in structured urban areas. Despite these difficulties, autonomous driving in unstructured outdoor environments is crucial for applications in agriculture, mining, and military operations. Our survey reviews over 250 papers for autonomous driving in unstructured outdoor environments, covering offline mapping, pose estimation, environmental perception, path planning, end-to-end autonomous driving, datasets, and relevant challenges. We also discuss emerging trends and future research directions. This review aims to consolidate knowledge and encourage further research for autonomous driving in unstructured environments. To support ongoing work, we maintain an active repository with up-to-date literature and open-source projects at: https://github.com/chaytonmin/Survey-Autonomous-Driving-in-Unstructured-Environments.

preprint2026arXiv

GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection

This paper presents GenDet, a novel framework that redefines object detection as an image generation task. In contrast to traditional approaches, GenDet adopts a pioneering approach by leveraging generative modeling: it conditions on the input image and directly generates bounding boxes with semantic annotations in the original image space. GenDet establishes a conditional generation architecture built upon the large-scale pre-trained Stable Diffusion model, formulating the detection task as semantic constraints within the latent space. It enables precise control over bounding box positions and category attributes, while preserving the flexibility of the generative model. This novel methodology effectively bridges the gap between generative models and discriminative tasks, providing a fresh perspective for constructing unified visual understanding systems. Systematic experiments demonstrate that GenDet achieves competitive accuracy compared to discriminative detectors, while retaining the flexibility characteristic of generative methods.

preprint2025arXiv

Balanced Hierarchical Contrastive Learning with Decoupled Queries for Fine-grained Object Detection in Remote Sensing Images

Fine-grained remote sensing datasets often use hierarchical label structures to differentiate objects in a coarse-to-fine manner, with each object annotated across multiple levels. However, embedding this semantic hierarchy into the representation learning space to improve fine-grained detection performance remains challenging. Previous studies have applied supervised contrastive learning at different hierarchical levels to group objects under the same parent class while distinguishing sibling subcategories. Nevertheless, they overlook two critical issues: (1) imbalanced data distribution across the label hierarchy causes high-frequency classes to dominate the learning process, and (2) learning semantic relationships among categories interferes with class-agnostic localization. To address these issues, we propose a balanced hierarchical contrastive loss combined with a decoupled learning strategy within the detection transformer (DETR) framework. The proposed loss introduces learnable class prototypes and equilibrates gradients contributed by different classes at each hierarchical level, ensuring that each hierarchical class contributes equally to the loss computation in every mini-batch. The decoupled strategy separates DETR's object queries into classification and localization sets, enabling task-specific feature extraction and optimization. Experiments on three fine-grained datasets with hierarchical annotations demonstrate that our method outperforms state-of-the-art approaches.

preprint2022arXiv

ORFD: A Dataset and Benchmark for Off-Road Freespace Detection

Freespace detection is an essential component of autonomous driving technology and plays an important role in trajectory planning. In the last decade, deep learning-based free space detection methods have been proved feasible. However, these efforts were focused on urban road environments and few deep learning-based methods were specifically designed for off-road free space detection due to the lack of off-road benchmarks. In this paper, we present the ORFD dataset, which, to our knowledge, is the first off-road free space detection dataset. The dataset was collected in different scenes (woodland, farmland, grassland, and countryside), different weather conditions (sunny, rainy, foggy, and snowy), and different light conditions (bright light, daylight, twilight, darkness), which totally contains 12,198 LiDAR point cloud and RGB image pairs with the traversable area, non-traversable area and unreachable area annotated in detail. We propose a novel network named OFF-Net, which unifies Transformer architecture to aggregate local and global information, to meet the requirement of large receptive fields for free space detection tasks. We also propose the cross-attention to dynamically fuse LiDAR and RGB image information for accurate off-road free space detection. Dataset and code are publicly available athttps://github.com/chaytonmin/OFF-Net.

preprint2021arXiv

BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images

Fast and effective responses are required when a natural disaster (e.g., earthquake, hurricane, etc.) strikes. Building damage assessment from satellite imagery is critical before relief effort is deployed. With a pair of pre- and post-disaster satellite images, building damage assessment aims at predicting the extent of damage to buildings. With the powerful ability of feature representation, deep neural networks have been successfully applied to building damage assessment. Most existing works simply concatenate pre- and post-disaster images as input of a deep neural network without considering their correlations. In this paper, we propose a novel two-stage convolutional neural network for Building Damage Assessment, called BDANet. In the first stage, a U-Net is used to extract the locations of buildings. Then the network weights from the first stage are shared in the second stage for building damage assessment. In the second stage, a two-branch multi-scale U-Net is employed as backbone, where pre- and post-disaster images are fed into the network separately. A cross-directional attention module is proposed to explore the correlations between pre- and post-disaster images. Moreover, CutMix data augmentation is exploited to tackle the challenge of difficult classes. The proposed method achieves state-of-the-art performance on a large-scale dataset -- xBD. The code is available at https://github.com/ShaneShen/BDANet-Building-Damage-Assessment.

preprint2021arXiv

Model Inspired Autoencoder for Unsupervised Hyperspectral Image Super-Resolution

This paper focuses on hyperspectral image (HSI) super-resolution that aims to fuse a low-spatial-resolution HSI and a high-spatial-resolution multispectral image to form a high-spatial-resolution HSI (HR-HSI). Existing deep learning-based approaches are mostly supervised that rely on a large number of labeled training samples, which is unrealistic. The commonly used model-based approaches are unsupervised and flexible but rely on hand-craft priors. Inspired by the specific properties of model, we make the first attempt to design a model inspired deep network for HSI super-resolution in an unsupervised manner. This approach consists of an implicit autoencoder network built on the target HR-HSI that treats each pixel as an individual sample. The nonnegative matrix factorization (NMF) of the target HR-HSI is integrated into the autoencoder network, where the two NMF parts, spectral and spatial matrices, are treated as decoder parameters and hidden outputs respectively. In the encoding stage, we present a pixel-wise fusion model to estimate hidden outputs directly, and then reformulate and unfold the model's algorithm to form the encoder network. With the specific architecture, the proposed network is similar to a manifold prior-based model, and can be trained patch by patch rather than the entire image. Moreover, we propose an additional unsupervised network to estimate the point spread function and spectral response function. Experimental results conducted on both synthetic and real datasets demonstrate the effectiveness of the proposed approach.

preprint2020arXiv

Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds

3D moving object detection is one of the most critical tasks in dynamic scene analysis. In this paper, we propose a novel Drosophila-inspired 3D moving object detection method using Lidar sensors. According to the theory of elementary motion detector, we have developed a motion detector based on the shallow visual neural pathway of Drosophila. This detector is sensitive to the movement of objects and can well suppress background noise. Designing neural circuits with different connection modes, the approach searches for motion areas in a coarse-to-fine fashion and extracts point clouds of each motion area to form moving object proposals. An improved 3D object detection network is then used to estimate the point clouds of each proposal and efficiently generates the 3D bounding boxes and the object categories. We evaluate the proposed approach on the widely-used KITTI benchmark, and state-of-the-art performance was obtained by using the proposed approach on the task of motion detection.

preprint2020arXiv

Eliminating NB-IoT Interference to LTE System: a Sparse Machine Learning Based Approach

Narrowband internet-of-things (NB-IoT) is a competitive 5G technology for massive machine-type communication scenarios, but meanwhile introduces narrowband interference (NBI) to existing broadband transmission such as the long term evolution (LTE) systems in enhanced mobile broadband (eMBB) scenarios. In order to facilitate the harmonic and fair coexistence in wireless heterogeneous networks, it is important to eliminate NB-IoT interference to LTE systems. In this paper, a novel sparse machine learning based framework and a sparse combinatorial optimization problem is formulated for accurate NBI recovery, which can be efficiently solved using the proposed iterative sparse learning algorithm called sparse cross-entropy minimization (SCEM). To further improve the recovery accuracy and convergence rate, regularization is introduced to the loss function in the enhanced algorithm called regularized SCEM. Moreover, exploiting the spatial correlation of NBI, the framework is extended to multiple-input multiple-output systems. Simulation results demonstrate that the proposed methods are effective in eliminating NB-IoT interference to LTE systems, and significantly outperform the state-of-the-art methods.

preprint2020arXiv

Fast Reinforcement Learning for Anti-jamming Communications

This letter presents a fast reinforcement learning algorithm for anti-jamming communications which chooses previous action with probability $τ$ and applies $ε$-greedy with probability $(1-τ)$. A dynamic threshold based on the average value of previous several actions is designed and probability $τ$ is formulated as a Gaussian-like function to guide the wireless devices. As a concrete example, the proposed algorithm is implemented in a wireless communication system against multiple jammers. Experimental results demonstrate that the proposed algorithm exceeds Q-learing, deep Q-networks (DQN), double DQN (DDQN), and prioritized experience reply based DDQN (PDDQN), in terms of signal-to-interference-plus-noise ratio and convergence rate.

preprint2020arXiv

Sparse Coding Driven Deep Decision Tree Ensembles for Nuclear Segmentation in Digital Pathology Images

In this paper, we propose an easily trained yet powerful representation learning approach with performance highly competitive to deep neural networks in a digital pathology image segmentation task. The method, called sparse coding driven deep decision tree ensembles that we abbreviate as ScD2TE, provides a new perspective on representation learning. We explore the possibility of stacking several layers based on non-differentiable pairwise modules and generate a densely concatenated architecture holding the characteristics of feature map reuse and end-to-end dense learning. Under this architecture, fast convolutional sparse coding is used to extract multi-level features from the output of each layer. In this way, rich image appearance models together with more contextual information are integrated by learning a series of decision tree ensembles. The appearance and the high-level context features of all the previous layers are seamlessly combined by concatenating them to feed-forward as input, which in turn makes the outputs of subsequent layers more accurate and the whole model efficient to train. Compared with deep neural networks, our proposed ScD2TE does not require back-propagation computation and depends on less hyper-parameters. ScD2TE is able to achieve a fast end-to-end pixel-wise training in a layer-wise manner. We demonstrated the superiority of our segmentation technique by evaluating it on the multi-disease state and multi-organ dataset where consistently higher performances were obtained for comparison against several state-of-the-art deep learning methods such as convolutional neural networks (CNN), fully convolutional networks (FCN), etc.

preprint2016arXiv

Newton slopes for Artin-Schreier-Witt towers

We fix a monic polynomial $f(x) \in \mathbb F_q[x]$ over a finite field and consider the Artin-Schreier-Witt tower defined by $f(x)$; this is a tower of curves $\cdots \to C_m \to C_{m-1} \to \cdots \to C_0 =\mathbb A^1$, with total Galois group $\mathbb Z_p$. We study the Newton slopes of zeta functions of this tower of curves. This reduces to the study of the Newton slopes of L-functions associated to characters of the Galois group of this tower. We prove that, when the conductor of the character is large enough, the Newton slopes of the L-function form arithmetic progressions which are independent of the conductor of the character. As a corollary, we obtain a result on the behavior of the slopes of the eigencurve associated to the Artin-Schreier-Witt tower, analogous to the result of Buzzard and Kilford.

preprint2016arXiv

Partial Hasse invariants on splitting models of Hilbert modular varieties

Let $F$ be a totally real field of degree $g$, and let $p$ be a prime number. We construct $g$ partial Hasse invariants on the characteristic $p$ fiber of the Pappas-Rapoport splitting model of the Hilbert modular variety for $F$ with level prime to $p$, extending the usual partial Hasse invariants defined over the Rapoport locus. In particular, when $p$ ramifies in $F$, we solve the problem of lack of partial Hasse invariants. Using the stratification induced by these generalized partial Hasse invariants on the splitting model, we prove in complete generality the existence of Galois pseudo-representations attached to Hecke eigenclasses of paritious weight occurring in the coherent cohomology of Hilbert modular varieties $\mathrm{mod}$ $p^m$, extending a previous result of M. Emerton and the authors which required $p$ to be unramified in $F$.

preprint2016arXiv

Slopes of eigencurves over boundary disks

Let $p$ be a prime number. We study the slopes of $U_p$-eigenvalues on the subspace of modular forms that can be transferred to a definite quaternion algebra. We give a sharp lower bound of the corresponding Newton polygon. The computation happens over a definite quaternion algebra by Jacquet-Langlands correspondence; it generalizes a prior work of Daniel Jacobs who treated the case of $p=3$ with a particular level. In case when the modular forms have a finite character of conductor highly divisible by $p$, we improve the lower bound to show that the slopes of $U_p$-eigenvalues grow roughly like arithmetic progressions as the weight $k$ increases. This is the first very positive evidence for Buzzard-Kilford's conjecture on the behavior of the eigencurve near the boundary of the weight space, that is proved for arbitrary $p$ and general level. We give the exact formula of a fraction of the slope sequence.

preprint2015arXiv

On the parity conjecture in finite-slope families

We generalize to the finite-slope setting several techniques due to Nekovar concerning the parity conjecture for self-dual motives. In particular we show that, for a $p$-adic analytic family, with irreducible base, of symplectic self-dual global Galois representations whose $(φ,Γ)$-modules at places lying over $p$ satisfy a Panchishkin condition, the validity of the parity conjecture is constant among all specializations that are pure. As an application, we extend some other results of Nekovar for Hilbert modular forms from the ordinary case to the finite-slope case.

preprint2014arXiv

Cleanliness and log-characteristic cycles for vector bundles with flat connections

Let $X$ be a proper smooth algebraic variety over a field $k$ of characteristic zero and let $D$ be a divisor with simple normal crossings. Let $M$ be a vector bundle over $X-D$ equipped with a flat connection with possible irregular singularities along $D$. We define a cleanliness condition which roughly says that the singularities of the connection are controlled by the singularities at the generic points of $D$. When this condition is satisfied, we compute explicitly the associated log-characteristic cycle, and relate it to the so-called refined irregularities. As a corollary of a log-variant of Kashiwara-Dubson formula, we obtain the Euler characteristic of the de Rham cohomology of the vector bundle, under a mild technical hypothesis on $M$.

preprint2014arXiv

Galois representations and torsion in the coherent cohomology of Hilbert modular varieties

Let $F$ be a totally real number field and let $p$ be a prime unramified in $F$. We prove the existence of Galois pseudo-representations attached to mod $p^m$ Hecke eigenclasses of paritious weight occurring in the coherent cohomology of Hilbert modular varieties for $F$ of level prime to $p$.

preprint2014arXiv

Gauss-Manin connections for p-adic families of nearly overconvergent modular forms

We interpolate the Gauss-Manin connection in p-adic families of nearly overconvergent modular forms. This gives a family of Maass-Shimura type differential operators from the space of nearly overconvergent modular forms of type r to the space of nearly overconvergent modular forms of type r + 1 with p-adic weight shifted by 2. Our construction is purely geometric, using Andreatta-Iovita-Stevens and Pilloni's geometric construction of eigencurves, and should thus generalize to higher rank groups.

preprint2013arXiv

Cohomology of arithmetic families of (phi,Gamma)-modules

We prove the finiteness and compatibility with base change of the (phi,Gamma)-cohomology and the Iwasawa cohomology of arithmetic families of (phi,Gamma)-modules. Using this finiteness theorem, we show that a family of Galois representations that is densely pointwise refined in the sense of Mazur is actually trianguline as a family over a large subspace. In the case of the Coleman-Mazur eigencurve, we determine the behavior at all points.

preprint2011arXiv

On refined ramification filtrations in the equal characteristic case

Let k be a complete discrete valuation field of equal characteristic p>0. Using the tools of p-adic differential modules, we define refined Artin and Swan conductors for a representation of the absolute Galois group $G_k$ with finite local monodromy; this leads to a description of the subquotients of the ramification filtration on $G_k$. We prove that our definition of the refined Swan conductors coincide with that is given by Saito, which uses etale cohomology. We also study its relation with the toroidal variation of the Swan conductors.

preprint2010arXiv

On ramification filtrations and $p$-adic differential modules, I: equal characteristic case

Let $k$ be a complete discretely valued field of equal characteristic $p > 0$ with possibly imperfect residue field and let $G_k$ be its Galois group. We prove that the conductors computed by the arithmetic ramification filtrations on $G_k$ coincide with the differential Artin conductors and Swan conductors of Galois representations of $G_k$. As a consequence, we give a Hasse-Arf theorem for arithmetic ramification filtrations in this case. As applications, we obtain a Hasse-Arf theorem for finite flat group schemes; we also give a comparison theorem between the differential Artin conductors and Borger's conductors.

preprint2009arXiv

Using the Physical Layer for Wireless Authentication in Time-Variant Channels

The wireless medium contains domain-specific information that can be used to complement and enhance traditional security mechanisms. In this paper we propose ways to exploit the spatial variability of the radio channel response in a rich scattering environment, as is typical of indoor environments. Specifically, we describe a physical-layer authentication algorithm that utilizes channel probing and hypothesis testing to determine whether current and prior communication attempts are made by the same transmit terminal. In this way, legitimate users can be reliably authenticated and false users can be reliably detected. We analyze the ability of a receiver to discriminate between transmitters (users) according to their channel frequency responses. This work is based on a generalized channel response with both spatial and temporal variability, and considers correlations among the time, frequency and spatial domains. Simulation results, using the ray-tracing tool WiSE to generate the time-averaged response, verify the efficacy of the approach under realistic channel conditions, as well as its capability to work under unknown channel variations.

Liang Xiao

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

A Vision-Language-Action Model with Visual Prompt for OFF-Road Autonomous Driving

Autonomous Driving in Unstructured Environments: How Far Have We Come?

GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection

Balanced Hierarchical Contrastive Learning with Decoupled Queries for Fine-grained Object Detection in Remote Sensing Images

ORFD: A Dataset and Benchmark for Off-Road Freespace Detection

BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images

Model Inspired Autoencoder for Unsupervised Hyperspectral Image Super-Resolution

Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds

Eliminating NB-IoT Interference to LTE System: a Sparse Machine Learning Based Approach

Fast Reinforcement Learning for Anti-jamming Communications

Sparse Coding Driven Deep Decision Tree Ensembles for Nuclear Segmentation in Digital Pathology Images

Newton slopes for Artin-Schreier-Witt towers

Partial Hasse invariants on splitting models of Hilbert modular varieties

Slopes of eigencurves over boundary disks

On the parity conjecture in finite-slope families

Cleanliness and log-characteristic cycles for vector bundles with flat connections

Galois representations and torsion in the coherent cohomology of Hilbert modular varieties

Gauss-Manin connections for p-adic families of nearly overconvergent modular forms

Cohomology of arithmetic families of (phi,Gamma)-modules

On refined ramification filtrations in the equal characteristic case

On ramification filtrations and $p$-adic differential modules, I: equal characteristic case

Using the Physical Layer for Wireless Authentication in Time-Variant Channels