Source author record

Baichuan Zhang

Baichuan Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

4works
5topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2016arXiv

Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams

The name entity disambiguation task aims to partition the records of multiple real-life persons so that each partition contains records pertaining to a unique person. Most of the existing solutions for this task operate in a batch mode, where all records to be disambiguated are initially available to the algorithm. However, more realistic settings require that the name disambiguation task be performed in an online fashion, in addition to, being able to identify records of new ambiguous entities having no preexisting records. In this work, we propose a Bayesian non-exhaustive classification framework for solving online name disambiguation task. Our proposed method uses a Dirichlet process prior with a Normal * Normal * Inverse Wishart data model which enables identification of new ambiguous entities who have no records in the training data. For online classification, we use one sweep Gibbs sampler which is very efficient and effective. As a case study we consider bibliographic data in a temporal stream format and disambiguate authors by partitioning their papers into homogeneous groups. Our experimental results demonstrate that the proposed method is better than existing methods for performing online name disambiguation task.

preprint2016arXiv

Name Disambiguation from link data in a collaboration graph using temporal and topological features

In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the time-stamped graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.

preprint2016arXiv

NOUS: Construction and Querying of Dynamic Knowledge Graphs

The ability to construct domain specific knowledge graphs (KG) and perform question-answering or hypothesis generation is a transformative capability. Despite their value, automated construction of knowledge graphs remains an expensive technical challenge that is beyond the reach for most enterprises and academic institutions. We propose an end-to-end framework for developing custom knowledge graph driven analytics for arbitrary application domains. The uniqueness of our system lies A) in its combination of curated KGs along with knowledge extracted from unstructured text, B) support for advanced trending and explanatory questions on a dynamic KG, and C) the ability to answer queries where the answer is embedded across multiple data sources.

preprint2016arXiv

Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge Graphs

Link prediction, or predicting the likelihood of a link in a knowledge graph based on its existing state is a key research task. It differs from a traditional link prediction task in that the links in a knowledge graph are categorized into different predicates and the link prediction performance of different predicates in a knowledge graph generally varies widely. In this work, we propose a latent feature embedding based link prediction model which considers the prediction task for each predicate disjointly. To learn the model parameters it utilizes a Bayesian personalized ranking based optimization technique. Experimental results on large-scale knowledge bases such as YAGO2 show that our link prediction approach achieves substantially higher performance than several state-of-art approaches. We also show that for a given predicate the topological properties of the knowledge graph induced by the given predicate edges are key indicators of the link prediction performance of that predicate in the knowledge graph.