Researcher profile

Yibo Sun

Yibo Sun contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

ERNIE-GeoL: A Geography-and-Language Pre-trained Model and its Applications in Baidu Maps

Pre-trained models (PTMs) have become a fundamental backbone for downstream tasks in natural language processing and computer vision. Despite initial gains that were obtained by applying generic PTMs to geo-related tasks at Baidu Maps, a clear performance plateau over time was observed. One of the main reasons for this plateau is the lack of readily available geographic knowledge in generic PTMs. To address this problem, in this paper, we present ERNIE-GeoL, which is a geography-and-language pre-trained model designed and developed for improving the geo-related tasks at Baidu Maps. ERNIE-GeoL is elaborately designed to learn a universal representation of geography-language by pre-training on large-scale data generated from a heterogeneous graph that contains abundant geographic knowledge. Extensive quantitative and qualitative experiments conducted on large-scale real-world datasets demonstrate the superiority and effectiveness of ERNIE-GeoL. ERNIE-GeoL has already been deployed in production at Baidu Maps since April 2021, which significantly benefits the performance of various downstream tasks. This demonstrates that ERNIE-GeoL can serve as a fundamental backbone for a wide range of geo-related tasks.

preprint2022arXiv

Understanding the Impact of the COVID-19 Pandemic on Transportation-related Behaviors with Human Mobility Data

The constrained outbreak of COVID-19 in Mainland China has recently been regarded as a successful example of fighting this highly contagious virus. Both the short period (in about three months) of transmission and the sub-exponential increase of confirmed cases in Mainland China have proved that the Chinese authorities took effective epidemic prevention measures, such as case isolation, travel restrictions, closing recreational venues, and banning public gatherings. These measures can, of course, effectively control the spread of the COVID-19 pandemic. Meanwhile, they may dramatically change the human mobility patterns, such as the daily transportation-related behaviors of the public. To better understand the impact of COVID-19 on transportation-related behaviors and to provide more targeted anti-epidemic measures, we use the huge amount of human mobility data collected from Baidu Maps, a widely-used Web mapping service in China, to look into the detail reaction of the people there during the pandemic. To be specific, we conduct data-driven analysis on transportation-related behaviors during the pandemic from the perspectives of 1) means of transportation, 2) type of visited venues, 3) check-in time of venues, 4) preference on "origin-destination" distance, and 5) "origin-transportation-destination" patterns. For each topic, we also give our specific insights and policy-making suggestions. Given that the COVID-19 pandemic is still spreading in more than 200 countries and territories worldwide, infecting millions of people, the insights and suggestions provided here may help fight COVID-19.

preprint2020arXiv

Keyphrase Extraction with Span-based Feature Representations

Keyphrases are capable of providing semantic metadata characterizing documents and producing an overview of the content of a document. Since keyphrase extraction is able to facilitate the management, categorization, and retrieval of information, it has received much attention in recent years. There are three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks. Two-step ranking approach is based on feature engineering, which is labor intensive and domain dependent. Sequence labeling is not able to tackle overlapping phrases. Generation methods (i.e., Sequence-to-sequence neural network models) overcome those shortcomings, so they have been widely studied and gain state-of-the-art performance. However, generation methods can not utilize context information effectively. In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens. In this way, our model obtains representation for each keyphrase and further learns to capture the interaction between keyphrases in one document to get better ranking results. In addition, with the help of tokens, our model is able to extract overlapped keyphrases. Experimental results on the benchmark datasets show that our proposed model outperforms the existing methods by a large margin.

preprint2020arXiv

Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation

In this paper, we focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer and aims to preserve text styles while altering the content. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference. The task is unsupervised due to lack of parallel data, and is challenging to select suitable records and style words from bi-aspect inputs respectively and generate a high-fidelity long document. To tackle those problems, we first build a dataset based on a basketball game report corpus as our testbed, and present an unsupervised neural model with interactive attention mechanism, which is used for learning the semantic relationship between records and reference texts to achieve better content transfer and better style preservation. In addition, we also explore the effectiveness of the back-translation in our task for constructing some pseudo-training pairs. Empirical results show superiority of our approaches over competitive methods, and the models also yield a new state-of-the-art result on a sentence-level dataset.