Source author record

Hung Nghiep Tran

Hung Nghiep Tran appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Digital Libraries Computation and Language Distributed, Parallel, and Cluster Computing Information Retrieval Machine Learning

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2017arXiv

Author Name Disambiguation by Using Deep Neural Network

Author name ambiguity decreases the quality and reliability of information retrieved from digital libraries. Existing methods have tried to solve this problem by predefining a feature set based on expert's knowledge for a specific dataset. In this paper, we propose a new approach which uses deep neural network to learn features automatically from data. Additionally, we propose the general system architecture for author name disambiguation on any dataset. In this research, we evaluate the proposed method on a dataset containing Vietnamese author names. The results show that this method significantly outperforms other methods that use predefined feature set. The proposed method achieves 99.31% in terms of accuracy. Prediction error rate decreases from 1.83% to 0.69%, i.e., it decreases by 1.14%, or 62.3% relatively compared with other methods that use predefined feature set (Table 3).

preprint2015arXiv

A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Data are essential for the experiments of relevant scientific publication recommendation methods but it is difficult to build ground truth data. A naturally promising solution is using publications that are referenced by researchers to build their ground truth data. Unfortunately, this approach has not been explored in the literature, so its applicability is still a gap in our knowledge. In this research, we systematically study this approach by theoretical and empirical analyses. In general, the results show that this approach is reasonable and has many advantages. However, the empirical analysis shows both positive and negative results. We conclude that, in some situations, this is a useful alternative approach toward overcoming data limitation. Based on this approach, we build and publish a dataset in computer science domain to help advancing other researches.

preprint2015arXiv

Partitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization

Topic modeling is a very powerful technique in data analysis and data mining but it is generally slow. Many parallelization approaches have been proposed to speed up the learning process. However, they are usually not very efficient because of the many kinds of overhead, especially the load-balancing problem. We address this problem by proposing three partitioning algorithms, which either run more quickly or achieve better load balance than current partitioning algorithms. These algorithms can easily be extended to improve parallelization efficiency on other topic models similar to LDA, e.g., Bag of Timestamps, which is an extension of LDA with time information. We evaluate these algorithms on two popular datasets, NIPS and NYTimes. We also build a dataset containing over 1,000,000 scientific publications in the computer science domain from 1951 to 2010 to experiment with Bag of Timestamps parallelization, which we design to demonstrate the proposed algorithms' extensibility. The results strongly confirm the advantages of these algorithms.

preprint2015arXiv

SciRecSys: A Recommendation System for Scientific Publication by Discovering Keyword Relationships

In this work, we propose a new approach for discovering various relationships among keywords over the scientific publications based on a Markov Chain model. It is an important problem since keywords are the basic elements for representing abstract objects such as documents, user profiles, topics and many things else. Our model is very effective since it combines four important factors in scientific publications: content, publicity, impact and randomness. Particularly, a recommendation system (called SciRecSys) has been presented to support users to efficiently find out relevant articles.

Hung Nghiep Tran

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Author Name Disambiguation by Using Deep Neural Network

A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Partitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization

SciRecSys: A Recommendation System for Scientific Publication by Discovering Keyword Relationships