Researcher profile

Leilei Sun

Leilei Sun contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization

Geo-localization aims to infer the geographic origin of a given signal. In computer vision, geo-localization has served as a demanding benchmark for compositional reasoning and is relevant to public safety. In contrast, progress on audio geo-localization has been constrained by the lack of high-quality audio-location pairs. To address this gap, we introduce AGL1K, the first audio geo-localization benchmark for audio language models (ALMs), spanning 72 countries and territories. To extract reliably localizable samples from a crowd-sourced platform, we propose the Audio Localizability metric that quantifies the informativeness of each recording, yielding 1,444 curated audio clips. Evaluations on 16 ALMs show that ALMs have emerged with audio geo-localization capability. We find that closed-source models substantially outperform open-source models, and that linguistic clues often dominate as a scaffold for prediction. We further analyze ALMs' reasoning traces, regional bias, error causes, and the interpretability of the localizability metric. Overall, AGL1K establishes a benchmark for audio geo-localization and may advance ALMs with better geospatial reasoning capability.

preprint2022arXiv

Continuous-Time and Multi-Level Graph Representation Learning for Origin-Destination Demand Prediction

Traffic demand forecasting by deep neural networks has attracted widespread interest in both academia and industry society. Among them, the pairwise Origin-Destination (OD) demand prediction is a valuable but challenging problem due to several factors: (i) the large number of possible OD pairs, (ii) implicitness of spatial dependence, and (iii) complexity of traffic states. To address the above issues, this paper proposes a Continuous-time and Multi-level dynamic graph representation learning method for Origin-Destination demand prediction (CMOD). Firstly, a continuous-time dynamic graph representation learning framework is constructed, which maintains a dynamic state vector for each traffic node (metro stations or taxi zones). The state vectors keep historical transaction information and are continuously updated according to the most recently happened transactions. Secondly, a multi-level structure learning module is proposed to model the spatial dependency of station-level nodes. It can not only exploit relations between nodes adaptively from data, but also share messages and representations via cluster-level and area-level virtual nodes. Lastly, a cross-level fusion module is designed to integrate multi-level memories and generate comprehensive node representations for the final prediction. Extensive experiments are conducted on two real-world datasets from Beijing Subway and New York Taxi, and the results demonstrate the superiority of our model against the state-of-the-art approaches.

preprint2022arXiv

Dynamic Graph Learning Based on Hierarchical Memory for Origin-Destination Demand Prediction

Recent years have witnessed a rapid growth of applying deep spatiotemporal methods in traffic forecasting. However, the prediction of origin-destination (OD) demands is still a challenging problem since the number of OD pairs is usually quadratic to the number of stations. In this case, most of the existing spatiotemporal methods fail to handle spatial relations on such a large scale. To address this problem, this paper provides a dynamic graph representation learning framework for OD demands prediction. In particular, a hierarchical memory updater is first proposed to maintain a time-aware representation for each node, and the representations are updated according to the most recently observed OD trips in continuous-time and multiple discrete-time ways. Second, a spatiotemporal propagation mechanism is provided to aggregate representations of neighbor nodes along a random spatiotemporal route which treats origin and destination as two different semantic entities. Last, an objective function is designed to derive the future OD demands according to the most recent node representations, and also to tackle the data sparsity problem in OD prediction. Extensive experiments have been conducted on two real-world datasets, and the experimental results demonstrate the superiority of the proposed method. The code and data are available at https://github.com/Rising0321/HMOD.

preprint2022arXiv

Heterogeneous Graph Representation Learning with Relation Awareness

Representation learning on heterogeneous graphs aims to obtain meaningful node representations to facilitate various downstream tasks, such as node classification and link prediction. Existing heterogeneous graph learning methods are primarily developed by following the propagation mechanism of node representations. There are few efforts on studying the role of relations for improving the learning of more fine-grained node representations. Indeed, it is important to collaboratively learn the semantic representations of relations and discern node representations with respect to different relation types. To this end, in this paper, we propose a novel Relation-aware Heterogeneous Graph Neural Network, namely R-HGNN, to learn node representations on heterogeneous graphs at a fine-grained level by considering relation-aware characteristics. Specifically, a dedicated graph convolution component is first designed to learn unique node representations from each relation-specific graph separately. Then, a cross-relation message passing module is developed to improve the interactions of node representations across different relations. Also, the relation representations are learned in a layer-wise manner to capture relation semantics, which are used to guide the node representation learning process. Moreover, a semantic fusing module is presented to aggregate relation-aware node representations into a compact representation with the learned relation representations. Finally, we conduct extensive experiments on a variety of graph learning tasks, and experimental results demonstrate that our approach consistently outperforms existing methods among all the tasks.

preprint2022arXiv

Learning the Evolutionary and Multi-scale Graph Structure for Multivariate Time Series Forecasting

Recent studies have shown great promise in applying graph neural networks for multivariate time series forecasting, where the interactions of time series are described as a graph structure and the variables are represented as the graph nodes. Along this line, existing methods usually assume that the graph structure (or the adjacency matrix), which determines the aggregation manner of graph neural network, is fixed either by definition or self-learning. However, the interactions of variables can be dynamic and evolutionary in real-world scenarios. Furthermore, the interactions of time series are quite different if they are observed at different time scales. To equip the graph neural network with a flexible and practical graph structure, in this paper, we investigate how to model the evolutionary and multi-scale interactions of time series. In particular, we first provide a hierarchical graph structure cooperated with the dilated convolution to capture the scale-specific correlations among time series. Then, a series of adjacency matrices are constructed under a recurrent manner to represent the evolving correlations at each layer. Moreover, a unified neural network is provided to integrate the components above to get the final prediction. In this way, we can capture the pair-wise correlations and temporal dependency simultaneously. Finally, experiments on both single-step and multi-step forecasting tasks demonstrate the superiority of our method over the state-of-the-art approaches.

preprint2022arXiv

WASP-35 and HAT-P-30/WASP-51: re-analysis using TESS and ground-based transit photometry

High-precision transit observations provide excellent opportunities for characterizing the physical properties of exoplanetary systems. These physical properties supply many pieces of information for unvealing the internal structure, external atmosphere, and dynamical history of the planets. We present revised properties of transiting systems WASP-35 and HAT-P-30/WASP-51 through analyzing newly available TESS photometry and ground-based observations obtained at 1m telescope of Yunnan Observatories as well as from the literature. The improved system parameters are consistent with the previous results. Furthermore, we find that HAT-P-30b/WASP-51b's transits show significant timing variation which cannot be explained by decaying orbit due to tidal dissipation and the Rømer effect, while both apsidal precession and an additional perturbing body could reproduce this signal through our comprehensive dynamical simulations. Because both of them are valuable targets which are suitable for transmission spectroscopy, we make some predictions for atmospheric properties of WASP-35b and HAT-P-30b/WASP-51b based on newly derived system parameters.

preprint2021arXiv

A highly mutually-inclined, compact warm-Jupiter system KOI-984 ?

The discovery of a population of close-orbiting giant planets ($\le$ 1 au) has raised a number of questions about their origins and dynamical histories. These issues have still not yet been fully resolved, despite over 20 years of exoplanet detections and a large number of discovered exoplanets. In particular, it is unclear whether warm Jupiters (WJs) form in situ, or whether they migrate from further outside and are even currently migrating to form hot Jupiters (HJs). Here, we report the possible discovery and characterization of the planets in a highly mutually-inclined ($I_{\rm mut}\simeq 45^\circ$), compact two-planet system (KOI-984), in which the newly discovered warm Jupiter KOI-984$c$ is on a 21.5-day, moderately eccentric ($e\simeq 0.4$) orbit, in addition to a previously known 4.3-day planet candidate KOI-984$b$. Meanwhile, the orbital configuration of a moderately inclined ($I_{\rm mut}\simeq 15^\circ$), low-mass ($m_{c}\simeq 24 M_{\oplus}$;$P_b\simeq 8.6$ days) perturbing planet near 1:2 mean motion resonace with KOI-984$b$ could also well reproduce observed transit timing variations and transit duration variations of KOI-984$b$. Such an eccentric WJ with a close-in sibling would pose a challenge to the proposed formation and migration mechanisms of WJs, if the first scenario is supported with more evidences in near future; this system with several other well-measured inclined WJ systems (e.g., Kepler-419 and Kepler-108) may provide additional clues for the origin and dynamical histories of WJs.

preprint2021arXiv

LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Extreme Multi-label text Classification (XMC) is a task of finding the most relevant labels from a large label set. Nowadays deep learning-based methods have shown significant success in XMC. However, the existing methods (e.g., AttentionXML and X-Transformer etc) still suffer from 1) combining several models to train and predict for one dataset, and 2) sampling negative labels statically during the process of training label ranking model, which reduces both the efficiency and accuracy of the model. To address the above problems, we proposed LightXML, which adopts end-to-end training and dynamic negative labels sampling. In LightXML, we use generative cooperative networks to recall and rank labels, in which label recalling part generates negative and positive labels, and label ranking part distinguishes positive labels from these labels. Through these networks, negative labels are sampled dynamically during label ranking part training by feeding with the same text representation. Extensive experiments show that LightXML outperforms state-of-the-art methods in five extreme multi-label datasets with much smaller model size and lower computational complexity. In particular, on the Amazon dataset with 670K labels, LightXML can reduce the model size up to 72% compared to AttentionXML.

preprint2020arXiv

Defending Water Treatment Networks: Exploiting Spatio-temporal Effects for Cyber Attack Detection

While Water Treatment Networks (WTNs) are critical infrastructures for local communities and public health, WTNs are vulnerable to cyber attacks. Effective detection of attacks can defend WTNs against discharging contaminated water, denying access, destroying equipment, and causing public fear. While there are extensive studies in WTNs attack detection, they only exploit the data characteristics partially to detect cyber attacks. After preliminary exploring the sensing data of WTNs, we find that integrating spatio-temporal knowledge, representation learning, and detection algorithms can improve attack detection accuracy. To this end, we propose a structured anomaly detection framework to defend WTNs by modeling the spatio-temporal characteristics of cyber attacks in WTNs. In particular, we propose a spatio-temporal representation framework specially tailored to cyber attacks after separating the sensing data of WTNs into a sequence of time segments. This framework has two key components. The first component is a temporal embedding module to preserve temporal patterns within a time segment by projecting the time segment of a sensor into a temporal embedding vector. We then construct Spatio-Temporal Graphs (STGs), where a node is a sensor and an attribute is the temporal embedding vector of the sensor, to describe the state of the WTNs. The second component is a spatial embedding module, which learns the final fused embedding of the WTNs from STGs. In addition, we devise an improved one class-SVM model that utilizes a new designed pairwise kernel to detect cyber attacks. The devised pairwise kernel augments the distance between normal and attack patterns in the fused embedding space. Finally, we conducted extensive experimental evaluations with real-world data to demonstrate the effectiveness of our framework.

preprint2020arXiv

Hybrid Micro/Macro Level Convolution for Heterogeneous Graph Learning

Heterogeneous graphs are pervasive in practical scenarios, where each graph consists of multiple types of nodes and edges. Representation learning on heterogeneous graphs aims to obtain low-dimensional node representations that could preserve both node attributes and relation information. However, most of the existing graph convolution approaches were designed for homogeneous graphs, and therefore cannot handle heterogeneous graphs. Some recent methods designed for heterogeneous graphs are also faced with several issues, including the insufficient utilization of heterogeneous properties, structural information loss, and lack of interpretability. In this paper, we propose HGConv, a novel Heterogeneous Graph Convolution approach, to learn comprehensive node representations on heterogeneous graphs with a hybrid micro/macro level convolutional operation. Different from existing methods, HGConv could perform convolutions on the intrinsic structure of heterogeneous graphs directly at both micro and macro levels: A micro-level convolution to learn the importance of nodes within the same relation, and a macro-level convolution to distinguish the subtle difference across different relations. The hybrid strategy enables HGConv to fully leverage heterogeneous information with proper interpretability. Moreover, a weighted residual connection is designed to aggregate both inherent attributes and neighbor information of the focal node adaptively. Extensive experiments on various tasks demonstrate not only the superiority of HGConv over existing methods, but also the intuitive interpretability of our approach for graph analysis.

preprint2020arXiv

Predicting Temporal Sets with Deep Neural Networks

Given a sequence of sets, where each set contains an arbitrary number of elements, the problem of temporal sets prediction aims to predict the elements in the subsequent set. In practice, temporal sets prediction is much more complex than predictive modelling of temporal events and time series, and is still an open problem. Many possible existing methods, if adapted for the problem of temporal sets prediction, usually follow a two-step strategy by first projecting temporal sets into latent representations and then learning a predictive model with the latent representations. The two-step approach often leads to information loss and unsatisfactory prediction performance. In this paper, we propose an integrated solution based on the deep neural networks for temporal sets prediction. A unique perspective of our approach is to learn element relationship by constructing set-level co-occurrence graph and then perform graph convolutions on the dynamic relationship graphs. Moreover, we design an attention-based module to adaptively learn the temporal dependency of elements and sets. Finally, we provide a gated updating mechanism to find the hidden shared patterns in different sequences and fuse both static and dynamic information to improve the prediction performance. Experiments on real-world data sets demonstrate that our approach can achieve competitive performances even with a portion of the training data and can outperform existing methods with a significant margin.