Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

Value-Decomposed Reinforcement Learning Framework for Taxiway Routing with Hierarchical Conflict-Aware Observations

Taxiway routing and on-surface conflict avoidance are coupled safety-critical decision problems in airport surface operations. Existing planning and optimization methods are often limited by online computational cost, while reinforcement learning methods may struggle to represent downstream traffic conflicts and balance multiple objectives. This paper presents Conflict-aware Taxiway Routing (CaTR), a reinforcement learning framework for real-time multi-aircraft taxiway routing. CaTR constructs a grid-based airport surface environment with action masking, introduces a hierarchical foresight traffic representation to encode current and downstream conflict-related traffic conditions, and adopts a value-decomposed reinforcement learning strategy to prioritize sparse but safety-critical objectives. Experiments are conducted on a realistic environment based on Changsha Huanghua International Airport under multiple traffic density levels. Results show that CaTR achieves better safety--efficiency trade-offs than representative planning, optimization, and reinforcement learning baselines while maintaining practical runtime.

preprint2024arXiv

Equivariant Morse theory for Lie algebra actions on Riemannian foliations

Consider the transverse isometric action of a finite dimensional Lie algebra g on a Riemannian foliation. This paper studies the equivariant Morse-Bott theory on the leaf space of the Riemannian foliations in this setting. Among other things, we establish a foliated version of the Morse-Bott lemma for a g-invariant basic Morse-Bott function, and a foliated version of the usual handle presentation theorem. In the non-equivariant case, we apply these results to present a new proof of the Morse inequalities on Riemannian foliations. In the equivariant case, we apply these results to study Hamiltonian action of an abelian Lie algebra on a presymplectic manifold whose underlying foliation is also Riemannian, and extend the Kirwan surjectivity and injectivity theorem in equivariant symplectic geometry to this situation. Among other things, this implies the Kirwan surjectivity and injectivity hold for Hamiltonian torus actions on symplectic orbifolds.

preprint2022arXiv

A Comparative Study of Speaker Role Identification in Air Traffic Communication Using Deep Learning Approaches

Automatic spoken instruction understanding (SIU) of the controller-pilot conversations in the air traffic control (ATC) requires not only recognizing the words and semantics of the speech but also determining the role of the speaker. However, few of the published works on the automatic understanding systems in air traffic communication focus on speaker role identification (SRI). In this paper, we formulate the SRI task of controller-pilot communication as a binary classification problem. Furthermore, the text-based, speech-based, and speech and text based multi-modal methods are proposed to achieve a comprehensive comparison of the SRI task. To ablate the impacts of the comparative approaches, various advanced neural network architectures are applied to optimize the implementation of text-based and speech-based methods. Most importantly, a multi-modal speaker role identification network (MMSRINet) is designed to achieve the SRI task by considering both the speech and textual modality features. To aggregate modality features, the modal fusion module is proposed to fuse and squeeze acoustic and textual representations by modal attention mechanism and self-attention pooling layer, respectively. Finally, the comparative approaches are validated on the ATCSpeech corpus collected from a real-world ATC environment. The experimental results demonstrate that all the comparative approaches are worked for the SRI task, and the proposed MMSRINet shows the competitive performance and robustness than the other methods on both seen and unseen data, achieving 98.56%, and 98.08% accuracy, respectively.

preprint2022arXiv

Accurate Scoliosis Vertebral Landmark Localization on X-ray Images via Shape-constrained Multi-stage Cascaded CNNs

Vertebral landmark localization is a crucial step for variant spine-related clinical applications, which requires detecting the corner points of 17 vertebrae. However, the neighbor landmarks often disturb each other for the homogeneous appearance of vertebrae, which makes vertebral landmark localization extremely difficult. In this paper, we propose multi-stage cascaded convolutional neural networks (CNNs) to split the single task into two sequential steps, i.e., center point localization to roughly locate 17 center points of vertebrae, and corner point localization to find 4 corner points for each vertebra without distracted by others. Landmarks in each step are located gradually from a set of initialized points by regressing offsets via cascaded CNNs. Principal Component Analysis (PCA) is employed to preserve a shape constraint in offset regression to resist the mutual attraction of vertebrae. We evaluate our method on the AASCE dataset that consists of 609 tight spinal anterior-posterior X-ray images and each image contains 17 vertebrae composed of the thoracic and lumbar spine for spinal shape characterization. Experimental results demonstrate our superior performance of vertebral landmark localization over other state-of-the-arts with the relative error decreasing from 3.2e-3 to 7.2e-4.

preprint2022arXiv

Basic Kirwan injectivity and its applications

Consider the Hamiltonian action of a torus on a transversely symplectic foliation that is also Riemannian. When the transverse hard Lefschetz property is satisfied, we establish a foliated version of the Kirwan injectivity theorem, and use it to study Hamiltonian torus actions on transversely Kähler foliations. Among other things, we prove a foliated analogue of the Carrell--Liberman theorem. As an application, this confirms a conjecture raised by Battaglia--Zaffran on the basic Hodge numbers of symplectic toric quasifolds. Our methods also allow us to present a symplectic approach to the calculation of the Betti numbers of symplectic toric quasifolds as diffeological spaces.

preprint2022arXiv

Bubble Planner: Planning High-speed Smooth Quadrotor Trajectories using Receding Corridors

Quadrotors are agile platforms. With human experts, they can perform extremely high-speed flights in cluttered environments. However, fully autonomous flight at high speed remains a significant challenge. In this work, we propose a motion planning algorithm based on the corridor-constrained minimum control effort trajectory optimization (MINCO) framework. Specifically, we use a series of overlapping spheres to represent the free space of the environment and propose two novel designs that enable the algorithm to plan high-speed quadrotor trajectories in real-time. One is a sampling-based corridor generation method that generates spheres with large overlapped areas (hence overall corridor size) between two neighboring spheres. The second is a Receding Horizon Corridors (RHC) strategy, where part of the previously generated corridor is reused in each replan. Together, these two designs enlarge the corridor spaces in accordance with the quadrotor's current state and hence allow the quadrotor to maneuver at high speeds. We benchmark our algorithm against other state-of-the-art planning methods to show its superiority in simulation. Comprehensive ablation studies are also conducted to show the necessity of the two designs. The proposed method is finally evaluated on an autonomous LiDAR-navigated quadrotor UAV in woods environments, achieving flight speeds over 13.7 m/s without any prior map of the environment or external localization facility.

preprint2022arXiv

Dynamic Free-Space Roadmap for Safe Quadrotor Motion Planning

Free-space-oriented roadmaps typically generate a series of convex geometric primitives, which constitute the safe region for motion planning. However, a static environment is assumed for this kind of roadmap. This assumption makes it unable to deal with dynamic obstacles and limits its applications. In this paper, we present a dynamic free-space roadmap, which provides feasible spaces and a navigation graph for safe quadrotor motion planning. Our roadmap is constructed by continuously seeding and extracting free regions in the environment. In order to adapt our map to environments with dynamic obstacles, we incrementally decompose the polyhedra intersecting with obstacles into obstacle-free regions, while the graph is also updated by our well-designed mechanism. Extensive simulations and real-world experiments demonstrate that our method is practically applicable and efficient.

preprint2022arXiv

Exciton-Driven Renormalization of Quasiparticle Band Structure in Monolayer MoS2

Optical excitation serves as a powerful approach to control the electronic structure of layered Van der Waals materials via many-body screening effects, induced by photoexcited free carriers, or via light-driven coherence, such as optical Stark and Bloch-Siegert effects. Although theoretical work has also pointed to an exotic mechanism of renormalizing band structure via excitonic correlations in bound electron-hole pairs (excitons), experimental observation of such exciton-driven band renormalization and the full extent of their implications is still lacking, largely due to the limitations of optical probes and the impact of screening effects. Here, by using extreme-ultraviolet time-resolved angle-resolved photoemission spectroscopy together with excitonic many-body theoretical calculations, we directly unmask the band renormalization effects driven by excitonic correlations in a monolayer semiconductor. We revealed a surprising bandgap opening, increased by 40 meV, and a simultaneous enhancement of band effective mass. Our findings unmask the novel exciton-driven mechanism towards the band engineering in photoexcited semiconducting materials, opening a new playground to manipulate the transient energy states in layered quantum materials via optical controls of excitonic many-body correlations.

preprint2022arXiv

Improving the estimation of directional area scattering factor (DASF) from canopy reflectance: theoretical basis and validation

Directional area scattering factor (DASF) is a critical canopy structural parameter for vegetation monitoring. It provides an efficient tool for decoupling of canopy structure and leaf optics from canopy reflectance. Current standard approach to estimate DASF from canopy bidirectional reflectance factor (BRF) is based on the assumption that in the weakly absorbing 710 to 790 nm spectral interval, leaf scattering does not change much with the concentration of dry matter and thus its variation can be neglected. This results in biased estimates of DASF and consequently leads to uncertainty in DASF-related applications. This study proposes a new approach to account for variations in concentrations of this biochemical constituent, which additionally uses the canopy BRF at 2260 nm. In silico analysis of the proposed approach suggests significant increase in accuracy over the standard technique by a relative root mean square error (rRMSE) of 49% and 34% for one- and three dimensional scenes, respectively. When compared with indoor multi-angular hyperspectral measurements reported in literature, the mean absolute error has reduced by 68% for needle leaf and 20% for broadleaf canopies. Thus, the proposed DASF estimation approach outperforms the current one and can be used more reliably in DASF-related applications, such as vegetation monitoring of functional traits, dynamics, and radiation budget.

preprint2022arXiv

InsMix: Towards Realistic Generative Data Augmentation for Nuclei Instance Segmentation

Nuclei Segmentation from histology images is a fundamental task in digital pathology analysis. However, deep-learning-based nuclei segmentation methods often suffer from limited annotations. This paper proposes a realistic data augmentation method for nuclei segmentation, named InsMix, that follows a Copy-Paste-Smooth principle and performs morphology-constrained generative instance augmentation. Specifically, we propose morphology constraints that enable the augmented images to acquire luxuriant information about nuclei while maintaining their morphology characteristics (e.g., geometry and location). To fully exploit the pixel redundancy of the background and improve the model's robustness, we further propose a background perturbation method, which randomly shuffles the background patches without disordering the original nuclei distribution. To achieve contextual consistency between original and template instances, a smooth-GAN is designed with a foreground similarity encoder (FSE) and a triplet loss. We validated the proposed method on two datasets, i.e., Kumar and CPS datasets. Experimental results demonstrate the effectiveness of each component and the superior performance achieved by our method to the state-of-the-art methods.

preprint2022arXiv

Mapping the dispersion of the occupied and unoccupied band structure in photoexcited 1T-TiSe$_2$

Charge density waves (CDW) are states of broken symmetry with a periodic modulation of charge and lattice typically leading to the opening of a gap in the band structure. In the model CDW system 1T-TiSe$_2$ such a gap opens up between its Se$_{4p}$ valence and Ti$_{3d}$ conduction band, accompanied by a change of dispersion. These changes are crucial in understanding the CDW phase, as they provide a measure of the Se$_{4p}$-Ti$_{3d}$ hybridization strength and characteristic mechanistic features. Using time- and angle-resolved photoelectron spectroscopy (trARPES), the unoccupied band structure is populated with near-infrared (NIR) pump pulses which allows to to directly visualize the parabolically-shaped Ti$_{3d}$ conduction band. Furthermore, we observe a transient change of effective mass in the Se$_{4p}$ valence band following photoexcitation. This occurs alongside an overall reduction due to weakening of the CDW phase and is accompanied by an oscillating component with the frequency of the characteristic A$_{1g}$ phonon. These observations, enabled by trAPRES, highlight the importance of the lattice contributions in establishing the CDW order in 1T-TiSe$_2$.

preprint2022arXiv

Revealing the order parameter dynamics of 1T-TiSe$_2$ following optical excitation

The formation of a charge density wave state is characterized by an order parameter. The way it is established provides unique information on both the role that correlation plays in driving the charge density wave formation and the mechanism behind its formation. Here we use time and angle resolved photoelectron spectroscopy to optically perturb the charge-density phase in 1T-TiSe$_2$ and follow the recovery of its order parameter as a function of energy, momentum and excitation density. Our results reveal that two distinct orders contribute to the gap formation, a CDW order and pseudogap-like order, manifested by an overall robustness to optical excitation. A detailed analysis of the magnitude of the the gap as a function of excitation density and delay time reveals the excitonic long-range nature of the CDW gap and the short-range Jahn-Teller character of the pseudogap order. In contrast to the gap, the intensity of the folded Se$_{4p}$* band can only give access to the excitonic order. These results provide new information into the the long standing debate on the origin of the gap in TiSe$_2$ and place it in the same context of other quantum materials where a pseudogap phase appears to be a precursor of long-range order.

preprint2022arXiv

Seg4Reg+: Consistency Learning between Spine Segmentation and Cobb Angle Regression

Automated methods for Cobb angle estimation are of high demand for scoliosis assessment. Existing methods typically calculate the Cobb angle from landmark estimation, or simply combine the low-level task (e.g., landmark detection and spine segmentation) with the Cobb angle regression task, without fully exploring the benefits from each other. In this study, we propose a novel multi-task framework, named Seg4Reg+, which jointly optimizes the segmentation and regression networks. We thoroughly investigate both local and global consistency and knowledge transfer between each other. Specifically, we propose an attention regularization module leveraging class activation maps (CAMs) from image-segmentation pairs to discover additional supervision in the regression network, and the CAMs can serve as a region-of-interest enhancement gate to facilitate the segmentation task in turn. Meanwhile, we design a novel triangle consistency learning to train the two networks jointly for global optimization. The evaluations performed on the public AASCE Challenge dataset demonstrate the effectiveness of each module and superior performance of our model to the state-of-the-art methods.

preprint2022arXiv

Spatiotemporal Propagation Learning for Network-Wide Flight Delay Prediction

Demystifying the delay propagation mechanisms among multiple airports is fundamental to precise and interpretable delay prediction, which is crucial during decision-making for all aviation industry stakeholders. The principal challenge lies in effectively leveraging the spatiotemporal dependencies and exogenous factors related to the delay propagation. However, previous works only consider limited spatiotemporal patterns with few factors. To promote more comprehensive propagation modeling for delay prediction, we propose SpatioTemporal Propagation Network (STPN), a space-time separable graph convolutional network, which is novel in spatiotemporal dependency capturing. From the aspect of spatial relation modeling, we propose a multi-graph convolution model considering both geographic proximity and airline schedule. From the aspect of temporal dependency capturing, we propose a multi-head self-attentional mechanism that can be learned end-to-end and explicitly reason multiple kinds of temporal dependency of delay time series. We show that the joint spatial and temporal learning models yield a sum of the Kronecker product, which factors the spatiotemporal dependence into the sum of several spatial and temporal adjacency matrices. By this means, STPN allows cross-talk of spatial and temporal factors for modeling delay propagation. Furthermore, a squeeze and excitation module is added to each layer of STPN to boost meaningful spatiotemporal features. To this end, we apply STPN to multi-step ahead arrival and departure delay prediction in large-scale airport networks. To validate the effectiveness of our model, we experiment with two real-world delay datasets, including U.S and China flight delays; and we show that STPN outperforms state-of-the-art methods. In addition, counterfactuals produced by STPN show that it learns explainable delay propagation patterns.

preprint2021arXiv

ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

In this paper, a multilingual end-to-end framework, called as ATCSpeechNet, is proposed to tackle the issue of translating communication speech into human-readable text in air traffic control (ATC) systems. In the proposed framework, we focus on integrating the multilingual automatic speech recognition (ASR) into one model, in which an end-to-end paradigm is developed to convert speech waveform into text directly, without any feature engineering or lexicon. In order to make up for the deficiency of the handcrafted feature engineering caused by ATC challenges, a speech representation learning (SRL) network is proposed to capture robust and discriminative speech representations from the raw wave. The self-supervised training strategy is adopted to optimize the SRL network from unlabeled data, and further to predict the speech features, i.e., wave-to-feature. An end-to-end architecture is improved to complete the ASR task, in which a grapheme-based modeling unit is applied to address the multilingual ASR issue. Facing the problem of small transcribed samples in the ATC domain, an unsupervised approach with mask prediction is applied to pre-train the backbone network of the ASR model on unlabeled data by a feature-to-feature process. Finally, by integrating the SRL with ASR, an end-to-end multilingual ASR framework is formulated in a supervised manner, which is able to translate the raw wave into text in one model, i.e., wave-to-text. Experimental results on the ATCSpeech corpus demonstrate that the proposed approach achieves a high performance with a very small labeled corpus and less resource consumption, only 4.20% label error rate on the 58-hour transcribed corpus. Compared to the baseline model, the proposed approach obtains over 100% relative performance improvement which can be further enhanced with the increasing of the size of the transcribed samples.

preprint2021arXiv

Improving speech recognition models with small samples for air traffic control systems

In the domain of air traffic control (ATC) systems, efforts to train a practical automatic speech recognition (ASR) model always faces the problem of small training samples since the collection and annotation of speech samples are expert- and domain-dependent task. In this work, a novel training approach based on pretraining and transfer learning is proposed to address this issue, and an improved end-to-end deep learning model is developed to address the specific challenges of ASR in the ATC domain. An unsupervised pretraining strategy is first proposed to learn speech representations from unlabeled samples for a certain dataset. Specifically, a masking strategy is applied to improve the diversity of the sample without losing their general patterns. Subsequently, transfer learning is applied to fine-tune a pretrained or other optimized baseline models to finally achieves the supervised ASR task. By virtue of the common terminology used in the ATC domain, the transfer learning task can be regarded as a sub-domain adaption task, in which the transferred model is optimized using a joint corpus consisting of baseline samples and new transcribed samples from the target dataset. This joint corpus construction strategy enriches the size and diversity of the training samples, which is important for addressing the issue of the small transcribed corpus. In addition, speed perturbation is applied to augment the new transcribed samples to further improve the quality of the speech corpus. Three real ATC datasets are used to validate the proposed ASR model and training strategies. The experimental results demonstrate that the ASR performance is significantly improved on all three datasets, with an absolute character error rate only one-third of that achieved through the supervised training. The applicability of the proposed strategies to other ASR approaches is also validated.

preprint2020arXiv

Harmonic and rational harmonic driving of microresonator soliton frequency combs

With demonstrated applications ranging from metrology to telecommunications, soliton microresonator frequency combs have emerged over the past decade as a remarkable new technology. However, standard implementations only allow for the generation of combs whose repetition rate is tied close to the fundamental resonator free-spectral range (FSR), offering little or no dynamic control over the comb line spacing. Here we propose and experimentally demonstrate harmonic and rational harmonic driving as novel techniques that allow for the robust generation of soliton frequency combs with discretely adjustable frequency spacing. By driving an integrated Kerr microresonator with a periodic train of picosecond pulses whose repetition rate is set close to an integer harmonic of the 3.23 GHz cavity FSR, we deterministically generate soliton frequency combs with frequency spacings discretely adjustable between 3.23 GHz and 19.38 GHz. More remarkably, we also demonstrate that driving the resonator at rational fractions of the FSR allows for the generation of combs whose frequency spacing corresponds to an integer harmonic of the pump repetition rate. By measuring the combs' radio-frequency spectrum, we confirm operation in the low-noise soliton regime with no supermode noise. The novel techniques demonstrated in our work provide new degrees of freedom for the design of synchronously pumped soliton frequency combs.

preprint2020arXiv

Hybrid Attention for Automatic Segmentation of Whole Fetal Head in Prenatal Ultrasound Volumes

Background and Objective: Biometric measurements of fetal head are important indicators for maternal and fetal health monitoring during pregnancy. 3D ultrasound (US) has unique advantages over 2D scan in covering the whole fetal head and may promote the diagnoses. However, automatically segmenting the whole fetal head in US volumes still pends as an emerging and unsolved problem. The challenges that automated solutions need to tackle include the poor image quality, boundary ambiguity, long-span occlusion, and the appearance variability across different fetal poses and gestational ages. In this paper, we propose the first fully-automated solution to segment the whole fetal head in US volumes. Methods: The segmentation task is firstly formulated as an end-to-end volumetric mapping under an encoder-decoder deep architecture. We then combine the segmentor with a proposed hybrid attention scheme (HAS) to select discriminative features and suppress the non-informative volumetric features in a composite and hierarchical way. With little computation overhead, HAS proves to be effective in addressing boundary ambiguity and deficiency. To enhance the spatial consistency in segmentation, we further organize multiple segmentors in a cascaded fashion to refine the results by revisiting context in the prediction of predecessors. Results: Validated on a large dataset collected from 100 healthy volunteers, our method presents superior segmentation performance (DSC (Dice Similarity Coefficient), 96.05%), remarkable agreements with experts. With another 156 volumes collected from 52 volunteers, we ahieve high reproducibilities (mean standard deviation 11.524 mL) against scan variations. Conclusion: This is the first investigation about whole fetal head segmentation in 3D US. Our method is promising to be a feasible solution in assisting the volumetric US-based prenatal studies.

preprint2019arXiv

ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment

Automatic Speech Recognition (ASR) is greatly developed in recent years, which expedites many applications on other fields. For the ASR research, speech corpus is always an essential foundation, especially for the vertical industry, such as Air Traffic Control (ATC). There are some speech corpora for common applications, public or paid. However, for the ATC, it is difficult to collect raw speeches from real systems due to safety issues. More importantly, for a supervised learning task like ASR, annotating the transcription is a more laborious work, which hugely restricts the prospect of ASR application. In this paper, a multilingual speech corpus (ATCSpeech) from real ATC systems, including accented Mandarin Chinese and English, is built and released to encourage the non-commercial ASR research in ATC domain. The corpus is detailly introduced from the perspective of data amount, speaker gender and role, speech quality and other attributions. In addition, the performance of our baseline ASR models is also reported. A community edition for our speech database can be applied and used under a special contrast. To our best knowledge, this is the first work that aims at building a real and multilingual ASR corpus for the air traffic related research.