Source author record

Hyuk-Yoon Kwon

Hyuk-Yoon Kwon appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Information Retrieval

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Historical Credibility for Movie Reviews and Its Application to Weakly Supervised Classification

In this study, we deal with the problem of judging the credibility of movie reviews. The problem is challenging because even experts cannot clearly and efficiently judge the credibility of a movie review and the number of movie reviews is very large. To tackle this problem, we propose historical credibility that judges the credibility of reviews based on the historical ratings and textual reviews written by each reviewer. For this, we present three kinds of criteria that can clearly classify the reviews into trusted or distrusted ones. We validate the effectiveness of the proposed historical credibility through extensive analysis. Specifically, we show that characteristics between the trusted or distrusted reviews are quite distinguishable in terms of three viewpoints: 1) distribution, 2) statistics, and 3) correlation. Then, we apply historical credibility to a weakly supervised model to classify a given review as a trusted or distrusted one. First, we show that it is significantly efficient because the entire data set is annotated according to the predefined criteria. Indeed, it can annotate 6,400 movie reviews only in 0.093 seconds, which occupy only 0.55%~1.88% of the total learning time when we use LSTM and SVM as the learning model. Second, we show that the historical credibility-based classification model clearly outperforms the textual review-based classification model. Specifically, the classification accuracy of the former outperforms that of the latter by up to 11.7%~13.4%. In addition, we clearly confirm that our classification model shows higher accuracy as the data size increases.

preprint2014arXiv

Odysseus/DFS: Integration of DBMS and Distributed File System for Transaction Processing of Big Data

The relational DBMS (RDBMS) has been widely used since it supports various high-level functionalities such as SQL, schemas, indexes, and transactions that do not exist in the O/S file system. But, a recent advent of big data technology facilitates development of new systems that sacrifice the DBMS functionality in order to efficiently manage large-scale data. Those so-called NoSQL systems use a distributed file system, which support scalability and reliability. They support scalability of the system by storing data into a large number of low-cost commodity hardware and support reliability by storing the data in replica. However, they have a drawback that they do not adequately support high-level DBMS functionality. In this paper, we propose an architecture of a DBMS that uses the DFS as storage. With this novel architecture, the DBMS is capable of supporting scalability and reliability of the DFS as well as high-level functionality of DBMS. Thus, a DBMS can utilize a virtually unlimited storage space provided by the DFS, rendering it to be suitable for big data analytics. As part of the architecture of the DBMS, we propose the notion of the meta DFS file, which allows the DBMS to use the DFS as the storage, and an efficient transaction management method including recovery and concurrency control. We implement this architecture in Odysseus/DFS, an integration of the Odysseus relational DBMS, that has been being developed at KAIST for over 24 years, with the DFS. Our experiments on transaction processing show that, due to the high-level functionality of Odysseus/DFS, it outperforms Hbase, which is a representative open-source NoSQL system. We also show that, compared with an RDBMS with local storage, the performance of Odysseus/DFS is comparable or marginally degraded, showing that the overhead of Odysseus/DFS for supporting scalability by using the DFS as the storage is not significant.

preprint2012arXiv

ODYS: A Massively-Parallel Search Engine Using a DB-IR Tightly-Integrated Parallel DBMS

Recently, parallel search engines have been implemented based on scalable distributed file systems such as Google File System. However, we claim that building a massively-parallel search engine using a parallel DBMS can be an attractive alternative since it supports a higher-level (i.e., SQL-level) interface than that of a distributed file system for easy and less error-prone application development while providing scalability. In this paper, we propose a new approach of building a massively-parallel search engine using a DB-IR tightly-integrated parallel DBMS and demonstrate its commercial-level scalability and performance. In addition, we present a hybrid (i.e., analytic and experimental) performance model for the parallel search engine. We have built a five-node parallel search engine according to the proposed architecture using a DB-IR tightly-integrated DBMS. Through extensive experiments, we show the correctness of the model by comparing the projected output with the experimental results of the five-node engine. Our model demonstrates that ODYS is capable of handling 1 billion queries per day (81 queries/sec) for 30 billion web pages by using only 43,472 nodes with an average query response time of 211 ms, which is equivalent to or better than those of commercial search engines. We also show that, by using twice as many (86,944) nodes, ODYS can provide an average query response time of 162 ms, which is significantly lower than those of commercial search engines.