Researcher profile

Xiaohong Li

Xiaohong Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

Inertia and spectral symmetry of eccentricity matrices of some clique trees

The eccentricity matrix $\mathcal E(G)$ of a connected graph $G$ is obtained from the distance matrix of $G$ by leaving unchanged the largest nonzero entries in each row and each column, and replacing the remaining ones with zeros. In this paper, we consider the set $\mathcal C \mathcal T$ of clique trees whose blocks have at most two cut-vertices \textcolor{blue}{of the clique tree}. After proving the irreducibility of the eccentricity matrix of a clique tree in $\mathcal C \mathcal T$ and finding its inertia indices, we show that every graph in $\mathcal C \mathcal T$ with more than $4$ vertices and odd diameter has two positive and two negative $\mathcal E$-eigenvalues. Positive $\mathcal E$-eigenvalues and negative $\mathcal E$-eigenvalues turn out to be equal in number even for graphs in $\mathcal C \mathcal T$ with even diameter; that shared cardinality also counts the \textcolor{blue}{`diametrally distinguished'} vertices. Finally, we prove that the spectrum of the eccentricity matrix of a clique tree $G$ in $\mathcal C \mathcal T$ is symmetric with respect to the origin if and only if $G$ has an odd diameter and exactly two adjacent central vertices.

preprint2022arXiv

Short-term passenger flow prediction for multi-traffic modes: A Transformer and residual network based multi-task learning method

With the prevailing of mobility as a service (MaaS), it becomes increasingly important to manage multi-traffic modes simultaneously and cooperatively. As an important component of MaaS, short-term passenger flow prediction for multi-traffic modes has thus been brought into focus. It is a challenging problem because the spatiotemporal features of multi-traffic modes are critically complex. Moreover, the passenger flows of multi-traffic modes differentiate and fluctuate significantly. To solve these problems, this paper proposes a multitask learning-based model, called Res-Transformer, for short-term inflow prediction of multi-traffic modes (subway, taxi, and bus). Each traffic mode is treated as a single task in the model. The Res-Transformer consists of two parts: (1) several modified Transformer layers comprising the conv-Transformer layer and the multi-head attention mechanism, which helps to extract the spatial and temporal features of multi-traffic modes, (2) the structure of residual network is utilized to obtain the correlations of different traffic modes and prevent gradient vanishing, gradient explosion, and overfitting. The Res-Transformer model is evaluated on two large-scale real-world datasets from Beijing, China. One is the region of a traffic hub and the other is the region of a residential area. Experiments are conducted to compare the performance of the proposed model with several baseline models. Results prove the effectiveness and robustness of the proposed method. This paper can give critical insights into the short-term inflow prediction for multi-traffic modes.

preprint2022arXiv

Towards Understanding the Faults of JavaScript-Based Deep Learning Systems

Quality assurance is of great importance for deep learning (DL) systems, especially when they are applied in safety-critical applications. While quality issues of native DL applications have been extensively analyzed, the issues of JavaScript-based DL applications have never been systematically studied. Compared with native DL applications, JavaScript-based DL applications can run on major browsers, making the platform- and device-independent. Specifically, the quality of JavaScript-based DL applications depends on the 3 parts: the application, the third-party DL library used and the underlying DL framework (e.g., TensorFlow.js), called JavaScript-based DL system. In this paper, we conduct the first empirical study on the quality issues of JavaScript-based DL systems. Specifically, we collect and analyze 700 real-world faults from relevant GitHub repositories, including the official TensorFlow.js repository, 13 third-party DL libraries, and 58 JavaScript-based DL applications. To better understand the characteristics of these faults, we manually analyze and construct taxonomies for the fault symptoms, root causes, and fix patterns, respectively. Moreover, we also study the fault distributions of symptoms and root causes, in terms of the different stages of the development lifecycle, the 3-level architecture in the DL system, and the 4 major components of TensorFlow.js framework. Based on the results, we suggest actionable implications and research avenues that can potentially facilitate the development, testing, and debugging of JavaScript-based DL systems.

preprint2021arXiv

Coronal rain in randomly heated arcades

Adopting the MPI-AMRVAC code, we present a 2.5-dimensional magnetohydrodynamic (MHD) simulation, which includes thermal conduction and radiative cooling, to investigate the formation and evolution of the coronal rain phenomenon. We perform the simulation in initially linear force-free magnetic fields which host chromospheric, transition region, and coronal plasma, with turbulent heating localized on their footpoints. Due to thermal instability, condensations start to occur at the loop top, and rebound shocks are generated by the siphon inflows. Condensations fragment into smaller blobs moving downwards and as they hit the lower atmosphere, concurrent upflows are triggered. Larger clumps show us clear "coronal rain showers" as dark structures in synthetic EUV hot channels and bright blobs with cool cores in the 304 Å channel, well resembling real observations. Following coronal rain dynamics for more than 10 hours, we carry out a statistical study of all coronal rain blobs to quantify their widths, lengths, areas, velocity distributions, and other properties. The coronal rain shows us continuous heating-condensation cycles, as well as cycles in EUV emissions. Compared to previous studies adopting steady heating, the rain happens faster and in more erratic cycles. Although most blobs are falling downward, upward-moving blobs exist at basically every moment. We also track the movement of individual blobs to study their dynamics and the forces driving their movements. The blobs have a prominence-corona transition-region-like structure surrounding them, and their movements are dominated by the pressure evolution in the very dynamic loop system.

preprint2021arXiv

Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization

ExploitDB is one of the important public websites, which contributes a large number of vulnerabilities to official CVE database. Over 60\% of these vulnerabilities have high- or critical-security risks. Unfortunately, over 73\% of exploits appear publicly earlier than the corresponding CVEs, and about 40\% of exploits do not even have CVEs. To assist in documenting CVEs for the ExploitDB posts, we propose an open information method to extract 9 key vulnerability aspects (vulnerable product/version/component, vulnerability type, vendor, attacker type, root cause, attack vector and impact) from the verbose and noisy ExploitDB posts. The extracted aspects from an ExploitDB post are then composed into a CVE description according to the suggested CVE description templates, which is must-provided information for requesting new CVEs. Through the evaluation on 13,017 manually labeled sentences and the statistically sampling of 3,456 extracted aspects, we confirm the high accuracy of our extraction method. Compared with 27,230 reference CVE descriptions. Our composed CVE descriptions achieve high ROUGH-L (0.38), a longest common subsequence based metric for evaluating text summarization methods.

preprint2021arXiv

Stereo Correspondence and Reconstruction of Endoscopic Data Challenge

The stereo correspondence and reconstruction of endoscopic data sub-challenge was organized during the Endovis challenge at MICCAI 2019 in Shenzhen, China. The task was to perform dense depth estimation using 7 training datasets and 2 test sets of structured light data captured using porcine cadavers. These were provided by a team at Intuitive Surgical. 10 teams participated in the challenge day. This paper contains 3 additional methods which were submitted after the challenge finished as well as a supplemental section from these teams on issues they found with the dataset.

preprint2020arXiv

A3Ident: A Two-phased Approach to Identify the Leading Authors of Android Apps

Authorship identification is the process of identifying and classifying authors through given codes. Authorship identification can be used in a wide range of software domains, e.g., code authorship disputes, plagiarism detection, exposure of attackers' identity. Besides the inherent challenges from legacy software development, framework programming and crowdsourcing mode in Android raise the difficulties of authorship identification significantly. More specifically, widespread third party libraries and inherited components (e.g., classes, methods, and variables) dilute the primary code within the entire Android app and blur the boundaries of code written by different authors. However, prior research has not well addressed these challenges. To this end, we design a two-phased approach to attribute the primary code of an Android app to the specific developer. In the first phase, we put forward three types of strategies to identify the relationships between Java packages in an app, which consist of context, semantic and structural relationships. A package aggregation algorithm is developed to cluster all packages that are of high probability written by the same authors. In the second phase, we develop three types of features to capture authors' coding habits and code stylometry. Based on that, we generate fingerprints for an author from its developed Android apps and employ several machine learning algorithms for authorship classification. We evaluate our approach in three datasets that contain 15,666 apps from 257 distinct developers and achieve a 92.5% accuracy rate on average. Additionally, we test it on 2,900 obfuscated apps and our approach can classify apps with an accuracy rate of 80.4%.

preprint2020arXiv

Multi-graph convolutional network for short-term passenger flow forecasting in urban rail transit

Short-term passenger flow forecasting is a crucial task for urban rail transit operations. Emerging deep-learning technologies have become effective methods used to overcome this problem. In this study, the authors propose a deep-learning architecture called Conv-GCN that combines a graph convolutional network (GCN) and a three-dimensional (3D) convolutional neural network (3D CNN). First, they introduce a multi-graph GCN to deal with three inflow and outflow patterns (recent, daily, and weekly) separately. Multi-graph GCN networks can capture spatiotemporal correlations and topological information within the entire network. A 3D CNN is then applied to deeply integrate the inflow and outflow information. High-level spatiotemporal features between different inflow and outflow patterns and between stations that are nearby and far away can be extracted by 3D CNN. Finally, a fully connected layer is used to output results. The Conv-GCN model is evaluated on smart card data of the Beijing subway under the time interval of 10, 15, and 30 min. Results show that this model yields the best performance compared with seven other models. In terms of the root-mean-square errors, the performances under three time intervals have been improved by 9.402, 7.756, and 9.256%, respectively. This study can provide critical insights for subway operators to optimise urban rail transit operations.

preprint2020arXiv

Predicting Missing Information of Key Aspects in Vulnerability Reports

Software vulnerabilities have been continually disclosed and documented. An important practice in documenting vulnerabilities is to describe the key vulnerability aspects, such as vulnerability type, root cause, affected product, impact, attacker type and attack vector, for the effective search and management of fast-growing vulnerabilities. We investigate 120,103 vulnerability reports in the Common Vulnerabilities and Exposures (CVE) over the past 20 years. We find that 56%, 85%, 38% and 28% of CVEs miss vulnerability type, root causes, attack vector and attacker type respectively. To help to complete the missing information of these vulnerability aspects, we propose a neural-network based approach for predicting the missing information of a key aspect of a vulnerability based on the known aspects of the vulnerability. We explore the design space of the neural network models and empirically identify the most effective model design. Using a large-scale vulnerability datas\-et from CVE, we show that we can effectively train a neural-network based classifier with less than 20% of historical CVEs. Our model achieves the prediction accuracy 94%, 79%, 89%and 70% for vulnerability type, root cause, attacker type and attack vector, respectively. Our ablation study reveals the prominent correlations among vulnerability aspects and further confirms the practicality of our approach.