Source author record

Xiaohong Li

Xiaohong Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering astro-ph.SR Machine Learning Applications Computer Vision Cryptography and Security Information Retrieval math.CO physics.soc-ph

Catalog footprint

What is connected

11works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Inertia and spectral symmetry of eccentricity matrices of some clique trees

The eccentricity matrix $\mathcal E(G)$ of a connected graph $G$ is obtained from the distance matrix of $G$ by leaving unchanged the largest nonzero entries in each row and each column, and replacing the remaining ones with zeros. In this paper, we consider the set $\mathcal C \mathcal T$ of clique trees whose blocks have at most two cut-vertices \textcolor{blue}{of the clique tree}. After proving the irreducibility of the eccentricity matrix of a clique tree in $\mathcal C \mathcal T$ and finding its inertia indices, we show that every graph in $\mathcal C \mathcal T$ with more than $4$ vertices and odd diameter has two positive and two negative $\mathcal E$-eigenvalues. Positive $\mathcal E$-eigenvalues and negative $\mathcal E$-eigenvalues turn out to be equal in number even for graphs in $\mathcal C \mathcal T$ with even diameter; that shared cardinality also counts the \textcolor{blue}{`diametrally distinguished'} vertices. Finally, we prove that the spectrum of the eccentricity matrix of a clique tree $G$ in $\mathcal C \mathcal T$ is symmetric with respect to the origin if and only if $G$ has an odd diameter and exactly two adjacent central vertices.

preprint2022arXiv

Short-term passenger flow prediction for multi-traffic modes: A Transformer and residual network based multi-task learning method

With the prevailing of mobility as a service (MaaS), it becomes increasingly important to manage multi-traffic modes simultaneously and cooperatively. As an important component of MaaS, short-term passenger flow prediction for multi-traffic modes has thus been brought into focus. It is a challenging problem because the spatiotemporal features of multi-traffic modes are critically complex. Moreover, the passenger flows of multi-traffic modes differentiate and fluctuate significantly. To solve these problems, this paper proposes a multitask learning-based model, called Res-Transformer, for short-term inflow prediction of multi-traffic modes (subway, taxi, and bus). Each traffic mode is treated as a single task in the model. The Res-Transformer consists of two parts: (1) several modified Transformer layers comprising the conv-Transformer layer and the multi-head attention mechanism, which helps to extract the spatial and temporal features of multi-traffic modes, (2) the structure of residual network is utilized to obtain the correlations of different traffic modes and prevent gradient vanishing, gradient explosion, and overfitting. The Res-Transformer model is evaluated on two large-scale real-world datasets from Beijing, China. One is the region of a traffic hub and the other is the region of a residential area. Experiments are conducted to compare the performance of the proposed model with several baseline models. Results prove the effectiveness and robustness of the proposed method. This paper can give critical insights into the short-term inflow prediction for multi-traffic modes.

preprint2022arXiv

Towards Understanding the Faults of JavaScript-Based Deep Learning Systems

Quality assurance is of great importance for deep learning (DL) systems, especially when they are applied in safety-critical applications. While quality issues of native DL applications have been extensively analyzed, the issues of JavaScript-based DL applications have never been systematically studied. Compared with native DL applications, JavaScript-based DL applications can run on major browsers, making the platform- and device-independent. Specifically, the quality of JavaScript-based DL applications depends on the 3 parts: the application, the third-party DL library used and the underlying DL framework (e.g., TensorFlow.js), called JavaScript-based DL system. In this paper, we conduct the first empirical study on the quality issues of JavaScript-based DL systems. Specifically, we collect and analyze 700 real-world faults from relevant GitHub repositories, including the official TensorFlow.js repository, 13 third-party DL libraries, and 58 JavaScript-based DL applications. To better understand the characteristics of these faults, we manually analyze and construct taxonomies for the fault symptoms, root causes, and fix patterns, respectively. Moreover, we also study the fault distributions of symptoms and root causes, in terms of the different stages of the development lifecycle, the 3-level architecture in the DL system, and the 4 major components of TensorFlow.js framework. Based on the results, we suggest actionable implications and research avenues that can potentially facilitate the development, testing, and debugging of JavaScript-based DL systems.

preprint2021arXiv

Coronal rain in randomly heated arcades

Adopting the MPI-AMRVAC code, we present a 2.5-dimensional magnetohydrodynamic (MHD) simulation, which includes thermal conduction and radiative cooling, to investigate the formation and evolution of the coronal rain phenomenon. We perform the simulation in initially linear force-free magnetic fields which host chromospheric, transition region, and coronal plasma, with turbulent heating localized on their footpoints. Due to thermal instability, condensations start to occur at the loop top, and rebound shocks are generated by the siphon inflows. Condensations fragment into smaller blobs moving downwards and as they hit the lower atmosphere, concurrent upflows are triggered. Larger clumps show us clear "coronal rain showers" as dark structures in synthetic EUV hot channels and bright blobs with cool cores in the 304 Å channel, well resembling real observations. Following coronal rain dynamics for more than 10 hours, we carry out a statistical study of all coronal rain blobs to quantify their widths, lengths, areas, velocity distributions, and other properties. The coronal rain shows us continuous heating-condensation cycles, as well as cycles in EUV emissions. Compared to previous studies adopting steady heating, the rain happens faster and in more erratic cycles. Although most blobs are falling downward, upward-moving blobs exist at basically every moment. We also track the movement of individual blobs to study their dynamics and the forces driving their movements. The blobs have a prominence-corona transition-region-like structure surrounding them, and their movements are dominated by the pressure evolution in the very dynamic loop system.

preprint2021arXiv

Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization

ExploitDB is one of the important public websites, which contributes a large number of vulnerabilities to official CVE database. Over 60\% of these vulnerabilities have high- or critical-security risks. Unfortunately, over 73\% of exploits appear publicly earlier than the corresponding CVEs, and about 40\% of exploits do not even have CVEs. To assist in documenting CVEs for the ExploitDB posts, we propose an open information method to extract 9 key vulnerability aspects (vulnerable product/version/component, vulnerability type, vendor, attacker type, root cause, attack vector and impact) from the verbose and noisy ExploitDB posts. The extracted aspects from an ExploitDB post are then composed into a CVE description according to the suggested CVE description templates, which is must-provided information for requesting new CVEs. Through the evaluation on 13,017 manually labeled sentences and the statistically sampling of 3,456 extracted aspects, we confirm the high accuracy of our extraction method. Compared with 27,230 reference CVE descriptions. Our composed CVE descriptions achieve high ROUGH-L (0.38), a longest common subsequence based metric for evaluating text summarization methods.

preprint2021arXiv

Stereo Correspondence and Reconstruction of Endoscopic Data Challenge

The stereo correspondence and reconstruction of endoscopic data sub-challenge was organized during the Endovis challenge at MICCAI 2019 in Shenzhen, China. The task was to perform dense depth estimation using 7 training datasets and 2 test sets of structured light data captured using porcine cadavers. These were provided by a team at Intuitive Surgical. 10 teams participated in the challenge day. This paper contains 3 additional methods which were submitted after the challenge finished as well as a supplemental section from these teams on issues they found with the dataset.

preprint2020arXiv

A3Ident: A Two-phased Approach to Identify the Leading Authors of Android Apps

Authorship identification is the process of identifying and classifying authors through given codes. Authorship identification can be used in a wide range of software domains, e.g., code authorship disputes, plagiarism detection, exposure of attackers' identity. Besides the inherent challenges from legacy software development, framework programming and crowdsourcing mode in Android raise the difficulties of authorship identification significantly. More specifically, widespread third party libraries and inherited components (e.g., classes, methods, and variables) dilute the primary code within the entire Android app and blur the boundaries of code written by different authors. However, prior research has not well addressed these challenges. To this end, we design a two-phased approach to attribute the primary code of an Android app to the specific developer. In the first phase, we put forward three types of strategies to identify the relationships between Java packages in an app, which consist of context, semantic and structural relationships. A package aggregation algorithm is developed to cluster all packages that are of high probability written by the same authors. In the second phase, we develop three types of features to capture authors' coding habits and code stylometry. Based on that, we generate fingerprints for an author from its developed Android apps and employ several machine learning algorithms for authorship classification. We evaluate our approach in three datasets that contain 15,666 apps from 257 distinct developers and achieve a 92.5% accuracy rate on average. Additionally, we test it on 2,900 obfuscated apps and our approach can classify apps with an accuracy rate of 80.4%.

preprint2020arXiv

Multi-graph convolutional network for short-term passenger flow forecasting in urban rail transit

Short-term passenger flow forecasting is a crucial task for urban rail transit operations. Emerging deep-learning technologies have become effective methods used to overcome this problem. In this study, the authors propose a deep-learning architecture called Conv-GCN that combines a graph convolutional network (GCN) and a three-dimensional (3D) convolutional neural network (3D CNN). First, they introduce a multi-graph GCN to deal with three inflow and outflow patterns (recent, daily, and weekly) separately. Multi-graph GCN networks can capture spatiotemporal correlations and topological information within the entire network. A 3D CNN is then applied to deeply integrate the inflow and outflow information. High-level spatiotemporal features between different inflow and outflow patterns and between stations that are nearby and far away can be extracted by 3D CNN. Finally, a fully connected layer is used to output results. The Conv-GCN model is evaluated on smart card data of the Beijing subway under the time interval of 10, 15, and 30 min. Results show that this model yields the best performance compared with seven other models. In terms of the root-mean-square errors, the performances under three time intervals have been improved by 9.402, 7.756, and 9.256%, respectively. This study can provide critical insights for subway operators to optimise urban rail transit operations.

preprint2020arXiv

Predicting Missing Information of Key Aspects in Vulnerability Reports

Software vulnerabilities have been continually disclosed and documented. An important practice in documenting vulnerabilities is to describe the key vulnerability aspects, such as vulnerability type, root cause, affected product, impact, attacker type and attack vector, for the effective search and management of fast-growing vulnerabilities. We investigate 120,103 vulnerability reports in the Common Vulnerabilities and Exposures (CVE) over the past 20 years. We find that 56%, 85%, 38% and 28% of CVEs miss vulnerability type, root causes, attack vector and attacker type respectively. To help to complete the missing information of these vulnerability aspects, we propose a neural-network based approach for predicting the missing information of a key aspect of a vulnerability based on the known aspects of the vulnerability. We explore the design space of the neural network models and empirically identify the most effective model design. Using a large-scale vulnerability datas\-et from CVE, we show that we can effectively train a neural-network based classifier with less than 20% of historical CVEs. Our model achieves the prediction accuracy 94%, 79%, 89%and 70% for vulnerability type, root cause, attacker type and attack vector, respectively. Our ablation study reveals the prominent correlations among vulnerability aspects and further confirms the practicality of our approach.

preprint2015arXiv

Trigger of a blowout jet in a solar coronal mass ejection associated with a flare

Using the multi-wavelength images and the photospheric magnetograms from the \emph{Solar Dynamics Observatory}, we study the flare which was associated by the only one coronal mass ejection (CME) in active region (AR) 12192. The eruption of a filament caused a blowout jet, and then an M4.0 class flare occurred. This flare was located at the edge of AR instead of in the core region. The flare was close to the apparently "open" fields, appearing as extreme-ultraviolet structures that fan out rapidly. Due to the interaction between flare materials and "open" fields, the flare became an eruptive flare, leading to the CME. Then at the same site of the first eruption, another small filament erupted. With the high spatial and temporal resolution H$α$ data from the New Vacuum Solar Telescope at the \emph{Fuxian Solar Observatory}, we investigate the interaction between the second filament and the nearby "open" lines. The filament reconnected with the "open" lines, forming a new system. To our knowledge, the detailed process of this kind of interaction is reported for the first time. Then the new system rotated due to the untwisting motion of the filament, implying that the twist was transferred from the closed filament system to the "open" system. In addition, the twist seemed to propagate from the lower atmosphere to the upper layers, and was eventually spread by the CME to the interplanetary space.

preprint2010arXiv

Prediction-based classification for longitudinal biomarkers

Assessment of circulating CD4 count change over time in HIV-infected subjects on antiretroviral therapy (ART) is a central component of disease monitoring. The increasing number of HIV-infected subjects starting therapy and the limited capacity to support CD4 count testing within resource-limited settings have fueled interest in identifying correlates of CD4 count change such as total lymphocyte count, among others. The application of modeling techniques will be essential to this endeavor due to the typically nonlinear CD4 trajectory over time and the multiple input variables necessary for capturing CD4 variability. We propose a prediction-based classification approach that involves first stage modeling and subsequent classification based on clinically meaningful thresholds. This approach draws on existing analytical methods described in the receiver operating characteristic curve literature while presenting an extension for handling a continuous outcome. Application of this method to an independent test sample results in greater than 98% positive predictive value for CD4 count change. The prediction algorithm is derived based on a cohort of $n=270$ HIV-1 infected individuals from the Royal Free Hospital, London who were followed for up to three years from initiation of ART. A test sample comprised of $n=72$ individuals from Philadelphia and followed for a similar length of time is used for validation. Results suggest that this approach may be a useful tool for prioritizing limited laboratory resources for CD4 testing after subjects start antiretroviral therapy.

Xiaohong Li

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Inertia and spectral symmetry of eccentricity matrices of some clique trees

Short-term passenger flow prediction for multi-traffic modes: A Transformer and residual network based multi-task learning method

Towards Understanding the Faults of JavaScript-Based Deep Learning Systems

Coronal rain in randomly heated arcades

Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization

Stereo Correspondence and Reconstruction of Endoscopic Data Challenge

A3Ident: A Two-phased Approach to Identify the Leading Authors of Android Apps

Multi-graph convolutional network for short-term passenger flow forecasting in urban rail transit

Predicting Missing Information of Key Aspects in Vulnerability Reports

Trigger of a blowout jet in a solar coronal mass ejection associated with a flare

Prediction-based classification for longitudinal biomarkers