Researcher profile

Ming Xu

Ming Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2024arXiv

Advanced Unstructured Data Processing for ESG Reports: A Methodology for Structured Transformation and Enhanced Analysis

In the evolving field of corporate sustainability, analyzing unstructured Environmental, Social, and Governance (ESG) reports is a complex challenge due to their varied formats and intricate content. This study introduces an innovative methodology utilizing the "Unstructured Core Library", specifically tailored to address these challenges by transforming ESG reports into structured, analyzable formats. Our approach significantly advances the existing research by offering high-precision text cleaning, adept identification and extraction of text from images, and standardization of tables within these reports. Emphasizing its capability to handle diverse data types, including text, images, and tables, the method adeptly manages the nuances of differing page layouts and report styles across industries. This research marks a substantial contribution to the fields of industrial ecology and corporate sustainability assessment, paving the way for the application of advanced NLP technologies and large language models in the analysis of corporate governance and sustainability. Our code is available at https://github.com/linancn/TianGong-AI-Unstructure.git.

preprint2024arXiv

Global Feature Pyramid Network

The visual feature pyramid has proven its effectiveness and efficiency in target detection tasks. Yet, current methodologies tend to overly emphasize inter-layer feature interaction, neglecting the crucial aspect of intra-layer feature adjustment. Experience underscores the significant advantages of intra-layer feature interaction in enhancing target detection tasks. While some approaches endeavor to learn condensed intra-layer feature representations using attention mechanisms or visual transformers, they overlook the incorporation of global information interaction. This oversight results in increased false detections and missed targets.To address this critical issue, this paper introduces the Global Feature Pyramid Network (GFPNet), an augmented version of PAFPN that integrates global information for enhanced target detection. Specifically, we leverage a lightweight MLP to capture global feature information, utilize the VNC encoder to process these features, and employ a parallel learnable mechanism to extract intra-layer features from the input image. Building on this foundation, we retain the PAFPN method to facilitate inter-layer feature interaction, extracting rich feature details across various levels.Compared to conventional feature pyramids, GFPN not only effectively focuses on inter-layer feature information but also captures global feature details, fostering intra-layer feature interaction and generating a more comprehensive and impactful feature representation. GFPN consistently demonstrates performance improvements over object detection baselines.

preprint2023arXiv

A clean-label graph backdoor attack method in node classification task

Backdoor attacks in the traditional graph neural networks (GNNs) field are easily detectable due to the dilemma of confusing labels. To explore the backdoor vulnerability of GNNs and create a more stealthy backdoor attack method, a clean-label graph backdoor attack method(CGBA) in the node classification task is proposed in this paper. Differently from existing backdoor attack methods, CGBA requires neither modification of node labels nor graph structure. Specifically, to solve the problem of inconsistency between the contents and labels of the samples, CGBA selects poisoning samples in a specific target class and uses the label of sample as the target label (i.e., clean-label) after injecting triggers into the target samples. To guarantee the similarity of neighboring nodes, the raw features of the nodes are elaborately picked as triggers to further improve the concealment of the triggers. Extensive experiments results show the effectiveness of our method. When the poisoning rate is 0.04, CGBA can achieve an average attack success rate of 87.8%, 98.9%, 89.1%, and 98.5%, respectively.

preprint2023arXiv

Two Wrongs Don't Make a Right: Combating Confirmation Bias in Learning with Label Noise

Noisy labels damage the performance of deep networks. For robust learning, a prominent two-stage pipeline alternates between eliminating possible incorrect labels and semi-supervised training. However, discarding part of noisy labels could result in a loss of information, especially when the corruption has a dependency on data, e.g., class-dependent or instance-dependent. Moreover, from the training dynamics of a representative two-stage method DivideMix, we identify the domination of confirmation bias: pseudo-labels fail to correct a considerable amount of noisy labels, and consequently, the errors accumulate. To sufficiently exploit information from noisy labels and mitigate wrong corrections, we propose Robust Label Refurbishment (Robust LR) a new hybrid method that integrates pseudo-labeling and confidence estimation techniques to refurbish noisy labels. We show that our method successfully alleviates the damage of both label noise and confirmation bias. As a result, it achieves state-of-the-art performance across datasets and noise types, namely CIFAR under different levels of synthetic noise and Mini-WebVision and ANIMAL-10N with real-world noise.

preprint2022arXiv

3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization

Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.

preprint2022arXiv

Improving Road Segmentation in Challenging Domains Using Similar Place Priors

Road segmentation in challenging domains, such as night, snow or rain, is a difficult task. Most current approaches boost performance using fine-tuning, domain adaptation, style transfer, or by referencing previously acquired imagery. These approaches share one or more of three significant limitations: a reliance on large amounts of annotated training data that can be costly to obtain, both anticipation of and training data from the type of environmental conditions expected at inference time, and/or imagery captured from a previous visit to the location. In this research, we remove these restrictions by improving road segmentation based on similar places. We use Visual Place Recognition (VPR) to find similar but geographically distinct places, and fuse segmentations for query images and these similar place priors using a Bayesian approach and novel segmentation quality metric. Ablation studies show the need to re-evaluate notions of VPR utility for this task. We demonstrate the system achieving state-of-the-art road segmentation performance across multiple challenging condition scenarios including night time and snow, without requiring any prior training or previous access to the same geographical locations. Furthermore, we show that this method is network agnostic, improves multiple baseline techniques and is competitive against methods specialised for road prediction.

preprint2022arXiv

Improving Worst Case Visual Localization Coverage via Place-specific Sub-selection in Multi-camera Systems

6-DoF visual localization systems utilize principled approaches rooted in 3D geometry to perform accurate camera pose estimation of images to a map. Current techniques use hierarchical pipelines and learned 2D feature extractors to improve scalability and increase performance. However, despite gains in typical recall@0.25m type metrics, these systems still have limited utility for real-world applications like autonomous vehicles because of their `worst' areas of performance - the locations where they provide insufficient recall at a certain required error tolerance. Here we investigate the utility of using `place specific configurations', where a map is segmented into a number of places, each with its own configuration for modulating the pose estimation step, in this case selecting a camera within a multi-camera system. On the Ford AV benchmark dataset, we demonstrate substantially improved worst-case localization performance compared to using off-the-shelf pipelines - minimizing the percentage of the dataset which has low recall at a certain error tolerance, as well as improved overall localization performance. Our proposed approach is particularly applicable to the crowdsharing model of autonomous vehicle deployment, where a fleet of AVs are regularly traversing a known route.

preprint2022arXiv

Mixed-UNet: Refined Class Activation Mapping for Weakly-Supervised Semantic Segmentation with Multi-scale Inference

Deep learning techniques have shown great potential in medical image processing, particularly through accurate and reliable image segmentation on magnetic resonance imaging (MRI) scans or computed tomography (CT) scans, which allow the localization and diagnosis of lesions. However, training these segmentation models requires a large number of manually annotated pixel-level labels, which are time-consuming and labor-intensive, in contrast to image-level labels that are easier to obtain. It is imperative to resolve this problem through weakly-supervised semantic segmentation models using image-level labels as supervision since it can significantly reduce human annotation efforts. Most of the advanced solutions exploit class activation mapping (CAM). However, the original CAMs rarely capture the precise boundaries of lesions. In this study, we propose the strategy of multi-scale inference to refine CAMs by reducing the detail loss in single-scale reasoning. For segmentation, we develop a novel model named Mixed-UNet, which has two parallel branches in the decoding phase. The results can be obtained after fusing the extracted features from two branches. We evaluate the designed Mixed-UNet against several prevalent deep learning-based segmentation approaches on our dataset collected from the local hospital and public datasets. The validation results demonstrate that our model surpasses available methods under the same supervision level in the segmentation of various lesions from brain imaging.

preprint2022arXiv

Naturally reductive $(α_1, α_2)$ metrics

Let $F$ be a homogeneous $(α_1,α_2)$ metric on the reductive homogeneous manifold $G/H$. Firstly, we characterize the natural reductiveness of $F$ as a local $f$-product between naturally reductive Riemannian metrics. Secondly, we prove the equivalence among several properties of $F$ for its mean Berwald curvature and S-curvature. Finally, we find an explicit flag curvature formula when $F$ is naturally reductive.

preprint2022arXiv

Neuro-Symbolic Learning: Principles and Applications in Ophthalmology

Neural networks have been rapidly expanding in recent years, with novel strategies and applications. However, challenges such as interpretability, explainability, robustness, safety, trust, and sensibility remain unsolved in neural network technologies, despite the fact that they will unavoidably be addressed for critical applications. Attempts have been made to overcome the challenges in neural network computing by representing and embedding domain knowledge in terms of symbolic representations. Thus, the neuro-symbolic learning (NeSyL) notion emerged, which incorporates aspects of symbolic representation and bringing common sense into neural networks (NeSyL). In domains where interpretability, reasoning, and explainability are crucial, such as video and image captioning, question-answering and reasoning, health informatics, and genomics, NeSyL has shown promising outcomes. This review presents a comprehensive survey on the state-of-the-art NeSyL approaches, their principles, advances in machine and deep learning algorithms, applications such as opthalmology, and most importantly, future perspectives of this emerging field.

preprint2022arXiv

On the girth cycles of the bipartite graph $D(k,q)$

For integer $k\geq2$ and prime power $q$, the algebraic bipartite graph $D(k,q)$ proposed by Lazebnik and Ustimenko (1995) is meaningful not only in extremal graph theory but also in coding theory and cryptography. This graph is $q$-regular, edge-transitive and of girth at least $k+4$. For its exact girth $g=g(D(k,q))$, Füredi et al. (1995) conjectured $g=k+5$ for odd $k$ and $q\geq4$. This conjecture was shown to be valid in 2016 when $(k+5)/2$ is the product of an arbitrary factor of $q-1$ and an arbitrary power of the characteristic of $\mathbb{F}_q$. In this paper, we determine all the girth cycles of $D(k,q)$ for $3\leq k\leq 5$, $q>3$, and those for $3\leq k\leq8$, $q=3$.

preprint2022arXiv

Randers and $(α,β)$ equigeodesics for some compact homogeneous manifolds

A smooth curve on $G/H$ is called a Riemannian equigeodesic if it is a homogeneous geodesic for all $G$-invariant Riemannian metrics on $G/H$. With the $G$-invariant Riemannian metric replaced by other classes of $G$-invariant metrics, we can similarly define Finsler equigeodesic, Randers equigeodesic, $(α,β)$ equigeodesic, etc. In this paper, we study Randers and $(α,β)$ equigeodesics. For a compact homogeneous manifold, we prove Randers and $(α,β)$ equigeodesics are equivalent, and find a criterion for them. Using this criterion we can classify the equigeodesics on many compact homogeneous manifolds which permit non-Riemannian homogeneous Randers metrics, including four classes of homogeneous spheres.

preprint2021arXiv

Parallel translations for a left invariant spray

In this paper, we study the left invariant spray geometry on a connected Lie group. Using the technique of invariant frames, we find the ordinary differential equations on the Lie algebra describing for a left invariant spray structure the linearly parallel translations along a geodesic and the nonlinearly parallel translations along a smooth curve. In these equations, the connection operator plays an important role. Using linearly parallel translations, we provide alternative interpretations or proofs for some homogeneous curvature formulae. Concerning the nonlinearly ones, we propose two questions in left invariant spray geometry. One question generalizes Landsberg Problem in Finsler geometry, and the other concerns the restricted holonomy group.

preprint2021arXiv

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition

Visual Place Recognition is a challenging task for robotics and autonomous systems, which must deal with the twin problems of appearance and viewpoint change in an always changing world. This paper introduces Patch-NetVLAD, which provides a novel formulation for combining the advantages of both local and global descriptor methods by deriving patch-level features from NetVLAD residuals. Unlike the fixed spatial neighborhood regime of existing local keypoint features, our method enables aggregation and matching of deep-learned local features defined over the feature-space grid. We further introduce a multi-scale fusion of patch features that have complementary scales (i.e. patch sizes) via an integral feature space and show that the fused features are highly invariant to both condition (season, structure, and illumination) and viewpoint (translation and rotation) changes. Patch-NetVLAD outperforms both global and local feature descriptor-based methods with comparable compute, achieving state-of-the-art visual place recognition results on a range of challenging real-world datasets, including winning the Facebook Mapillary Visual Place Recognition Challenge at ECCV2020. It is also adaptable to user requirements, with a speed-optimised version operating over an order of magnitude faster than the state-of-the-art. By combining superior performance with improved computational efficiency in a configurable framework, Patch-NetVLAD is well suited to enhance both stand-alone place recognition capabilities and the overall performance of SLAM systems.

preprint2020arXiv

EPINE: Enhanced Proximity Information Network Embedding

Unsupervised homogeneous network embedding (NE) represents every vertex of networks into a low-dimensional vector and meanwhile preserves the network information. Adjacency matrices retain most of the network information, and directly charactrize the first-order proximity. In this work, we devote to mining valuable information in adjacency matrices at a deeper level. Under the same objective, many NE methods calculate high-order proximity by the powers of adjacency matrices, which is not accurate and well-designed enough. Instead, we propose to redefine high-order proximity in a more intuitive manner. Besides, we design a novel algorithm for calculation, which alleviates the scalability problem in the field of accurate calculation for high-order proximity. Comprehensive experiments on real-world network datasets demonstrate the effectiveness of our method in downstream machine learning tasks such as network reconstruction, link prediction and node classification.

preprint2020arXiv

On the characterization of some algebraically defined bipartite graphs of girth eight

For any field $\mathbb{F}$ and polynomials $f_{2},f_{3}\in\mathbb{F}[x,y]$, let $Γ_{\mathbb{F}}(f_{2},f_{3})$ denote the bipartite graph with vertex partition $P\cup L$, where $P$ and $L$ are two copies of $\mathbb{F}^{3}$, and $(p_{1},p_{2},p_{3})\in P$ is adjacent to $[l_{1},l_{2},l_{3}]\in L$ if and only if $p_{2}+l_{2}=f_{2}(p_{1},l_{1})$ and $p_{3}+l_{3}=f_{3}(p_{1},l_{1})$. The graph $Γ_{3}(\mathbb{F})=Γ_{\mathbb{F}}(xy,xy^{2})$ is known to be of girth eight. When $\mathbb{F}=\mathbb{F}_q$ is a finite field of odd size $q$ or $\mathbb{F}=\mathbb{F}_{\infty}$ is an algebraically closed field of characteristic zero, the graph $Γ_{3}(\mathbb{F})$ is conjectured to be the unique one with girth at least eight among those $Γ_{\mathbb{F}}(f_{2},f_{3})$ up to isomorphism. This conjecture has been confirmed for the case that both $f_{2},f_{3}$ are monomials over $\mathbb{F}_q$, and for the case that at least one of $f_{2},f_{3}$ is a monomial over $\mathbb{F}_{\infty}$. If one of $f_{2},f_{3}\in\mathbb{F}_q[x,y]$ is a monomial, it has also been proved the existence of a positive integer $M$ such that $G=Γ_{\mathbb{F}_{q^{M}}}(f_2,f_3)$ is isomorphic to $Γ_{3}(\mathbb{F}_{q^{M}})$ provided $G$ has girth at least eight. In this paper, these results are shown to be valid when the restriction on the polynomials $f_2,f_3$ is relaxed further to that one of them is the product of two univariate polynomials. Furthermore, all of such polynomials $f_2,f_3$ are characterized completely.

preprint2020arXiv

Side-On transition radiation detector: a detector prototype for TeV energy scale calibration of calorimeters in space

Transition Radiation (TR) plays an important role in particle identification in high-energy physics and its characteristics provide a feasible method of energy calibration in the energy range up to 10 TeV, which is of interest for dark matter searches in cosmic rays. In a Transition Radiation Detector (TRD), the TR signal is superimposed onto the ionization energy loss signal induced by incident charged particles. In order to make the TR signal stand out from the background of ionization energy loss in a significant way, we optimized both the radiators and the detector. We have designed a new prototype of regular radiator optimized for a maximal TR photon yield, combined with the Side-On TRD which is supposed to improve the detection efficiency of TR. We started a test beam experiment with the Side-On TRD at Conseil Européen pour la Recherche Nucléaire (CERN), and found that the experimental data is consistent with the simulation results.

preprint2020arXiv

Triaging moderate COVID-19 and other viral pneumonias from routine blood tests

The COVID-19 is sweeping the world with deadly consequences. Its contagious nature and clinical similarity to other pneumonias make separating subjects contracted with COVID-19 and non-COVID-19 viral pneumonia a priority and a challenge. However, COVID-19 testing has been greatly limited by the availability and cost of existing methods, even in developed countries like the US. Intrigued by the wide availability of routine blood tests, we propose to leverage them for COVID-19 testing using the power of machine learning. Two proven-robust machine learning model families, random forests (RFs) and support vector machines (SVMs), are employed to tackle the challenge. Trained on blood data from 208 moderate COVID-19 subjects and 86 subjects with non-COVID-19 moderate viral pneumonia, the best result is obtained in an SVM-based classifier with an accuracy of 84%, a sensitivity of 88%, a specificity of 80%, and a precision of 92%. The results are found explainable from both machine learning and medical perspectives. A privacy-protected web portal is set up to help medical personnel in their practice and the trained models are released for developers to further build other applications. We hope our results can help the world fight this pandemic and welcome clinical verification of our approach on larger populations.