Researcher profile

Shengxuan Luo

Shengxuan Luo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

An Accurate Unsupervised Method for Joint Entity Alignment and Dangling Entity Detection

Knowledge graph integration typically suffers from the widely existing dangling entities that cannot find alignment cross knowledge graphs (KGs). The dangling entity set is unavailable in most real-world scenarios, and manually mining the entity pairs that consist of entities with the same meaning is labor-consuming. In this paper, we propose a novel accurate Unsupervised method for joint Entity alignment (EA) and Dangling entity detection (DED), called UED. The UED mines the literal semantic information to generate pseudo entity pairs and globally guided alignment information for EA and then utilizes the EA results to assist the DED. We construct a medical cross-lingual knowledge graph dataset, MedED, providing data for both the EA and DED tasks. Extensive experiments demonstrate that in the EA task, UED achieves EA results comparable to those of state-of-the-art supervised EA baselines and outperforms the current state-of-the-art EA methods by combining supervised EA data. For the DED task, UED obtains high-quality results without supervision.

preprint2022arXiv

BIOS: An Algorithmically Generated Biomedical Knowledge Graph

Biomedical knowledge graphs (BioMedKGs) are essential infrastructures for biomedical and healthcare big data and artificial intelligence (AI), facilitating natural language processing, model development, and data exchange. For decades, these knowledge graphs have been developed via expert curation; however, this method can no longer keep up with today's AI development, and a transition to algorithmically generated BioMedKGs is necessary. In this work, we introduce the Biomedical Informatics Ontology System (BIOS), the first large-scale publicly available BioMedKG generated completely by machine learning algorithms. BIOS currently contains 4.1 million concepts, 7.4 million terms in two languages, and 7.3 million relation triplets. We present the methodology for developing BIOS, including the curation of raw biomedical terms, computational identification of synonymous terms and aggregation of these terms to create concept nodes, semantic type classification of the concepts, relation identification, and biomedical machine translation. We provide statistics on the current BIOS content and perform preliminary assessments of term quality, synonym grouping, and relation extraction. The results suggest that machine learning-based BioMedKG development is a viable alternative to traditional expert curation.

preprint2022arXiv

Semi-constraint Optimal Transport for Entity Alignment with Dangling Cases

Entity alignment (EA) merges knowledge graphs (KGs) by identifying the equivalent entities in different graphs, which can effectively enrich knowledge representations of KGs. However, in practice, different KGs often include dangling entities whose counterparts cannot be found in the other graph, which limits the performance of EA methods. To improve EA with dangling entities, we propose an unsupervised method called Semi-constraint Optimal Transport for Entity Alignment in Dangling cases (SoTead). Our main idea is to model the entity alignment between two KGs as an optimal transport problem from one KG's entities to the others. First, we set pseudo entity pairs between KGs based on pretrained word embeddings. Then, we conduct contrastive metric learning to obtain the transport cost between each entity pair. Finally, we introduce a virtual entity for each KG to "align" the dangling entities from the other KGs, which relaxes the optimization constraints and leads to a semi-constraint optimal transport. In the experimental part, we first show the superiority of SoTead on a commonly-used entity alignment dataset. Besides, to analyze the ability for dangling entity detection with other baselines, we construct a medical cross-lingual knowledge graph dataset, MedED, where our SoTead also reaches state-of-the-art performance.

preprint2022arXiv

Sentence Alignment with Parallel Documents Facilitates Biomedical Machine Translation

Objective: Today's neural machine translation (NMT) can achieve near human-level translation quality and greatly facilitates international communications, but the lack of parallel corpora poses a key problem to the development of translation systems for highly specialized domains, such as biomedicine. This work presents an unsupervised algorithm for deriving parallel corpora from document-level translations by using sentence alignment and explores how training materials affect the performance of biomedical NMT systems. Materials and Methods: Document-level translations are mixed to train bilingual word embeddings (BWEs) for the evaluation of cross-lingual word similarity, and sentence distance is defined by combining semantic and positional similarities of the sentences. The alignment of sentences is formulated as an extended earth mover's distance problem. A Chinese-English biomedical parallel corpus is derived with the proposed algorithm using bilingual articles from UpToDate and translations of PubMed abstracts, which is then used for the training and evaluation of NMT. Results: On two manually aligned translation datasets, the proposed algorithm achieved accurate sentence alignment in the 1-to-1 cases and outperformed competing algorithms in the many-to-many cases. The NMT model fine-tuned on biomedical data significantly improved the in-domain translation quality (zh-en: +17.72 BLEU; en-zh: +17.02 BLEU). Both the size of the training data and the combination of different corpora can significantly affect the model's performance. Conclusion: The proposed algorithm relaxes the assumption for sentence alignment and effectively generates accurate translation pairs that facilitate training high quality biomedical NMT models.