Source author record

Raymond Chi-Wing Wong

Raymond Chi-Wing Wong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Artificial Intelligence Computation and Language Networking and Internet Architecture Social and Information Networks

Catalog footprint

What is connected

8works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Approximating Regret Minimizing Sets: A Happiness Perspective

A Regret Minimizing Set (RMS) is a useful concept in which a smaller subset of a database is selected while mostly preserving the best scores along every possible utility function. In this paper, we study the $k$-Regret Minimizing Sets ($k$-RMS) and Average Regret Minimizing Sets (ARMS) problems. $k$-RMS selects $r$ records from a database such that the maximum regret ratio between the $k$-th best score in the database and the best score in the selected records for any possible utility function is minimized. Meanwhile, ARMS minimizes the average of this ratio within a distribution of utility functions. Particularly, we study approximation algorithms for $k$-RMS and ARMS from the perspective of approximating the happiness ratio, which is equivalent to one minus the regret ratio. In this paper, we show that the problem of approximating the happiness of a $k$-RMS within any finite factor is NP-Hard when the dimensionality of the database is unconstrained and extend the result to an inapproximability proof for the regret. We then provide approximation algorithms for approximating the happiness of ARMS with better approximation ratios and time complexities than known algorithms for approximating the regret. We further provide dataset reduction schemes which can be used to reduce the runtime of existing heuristic based algorithms, as well as to derive polynomial-time approximation schemes for $k$-RMS when dimensionality is fixed. Finally, we provide experimental validation.

preprint2022arXiv

Fake News Detection with Heterogeneous Transformer

The dissemination of fake news on social networks has drawn public need for effective and efficient fake news detection methods. Generally, fake news on social networks is multi-modal and has various connections with other entities such as users and posts. The heterogeneity in both news content and the relationship with other entities in social networks brings challenges to designing a model that comprehensively captures the local multi-modal semantics of entities in social networks and the global structural representation of the propagation patterns, so as to classify fake news effectively and accurately. In this paper, we propose a novel Transformer-based model: HetTransformer to solve the fake news detection problem on social networks, which utilises the encoder-decoder structure of Transformer to capture the structural information of news propagation patterns. We first capture the local heterogeneous semantics of news, post, and user entities in social networks. Then, we apply Transformer to capture the global structural representation of the propagation patterns in social networks for fake news detection. Experiments on three real-world datasets demonstrate that our model is able to outperform the state-of-the-art baselines in fake news detection.

preprint2022arXiv

Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question

Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most easiest and efficient way for human-computer interaction. This paper works towards designing more effective speech-based interfaces to query the structured data in relational databases. We first identify a new task named Speech-to-SQL, which aims to understand the information conveyed by human speech and directly translate it into structured query language (SQL) statements. A naive solution to this problem can work in a cascaded manner, that is, an automatic speech recognition (ASR) component followed by a text-to-SQL component. However, it requires a high-quality ASR system and also suffers from the error compounding problem between the two components, resulting in limited performance. To handle these challenges, we further propose a novel end-to-end neural architecture named SpeechSQLNet to directly translate human speech into SQL queries without an external ASR step. SpeechSQLNet has the advantage of making full use of the rich linguistic information presented in speech. To the best of our knowledge, this is the first attempt to directly synthesize SQL based on arbitrary natural language questions, rather than a natural language-based version of SQL or its variants with a limited SQL grammar. To validate the effectiveness of the proposed problem and model, we further construct a dataset named SpeechQL, by piggybacking the widely-used text-to-SQL datasets. Extensive experimental evaluations on this dataset show that SpeechSQLNet can directly synthesize high-quality SQL queries from human speech, outperforming various competitive counterparts as well as the cascaded methods in terms of exact match accuracies.

preprint2020arXiv

Bridging the Gap Between Theory and Practice on Insertion-Intensive Database

With the prevalence of online platforms, today, data is being generated and accessed by users at a very high rate. Besides, applications such as stock trading or high frequency trading require guaranteed low delays for performing an operation on a database. It is consequential to design databases that guarantee data insertion and query at a consistently high rate without introducing any long delay during insertion. In this paper, we propose Nested B-trees (NB-trees), an index that can achieve a consistently high insertion rate on large volumes of data, while providing asymptotically optimal query performance that is very efficient in practice. Nested B-trees support insertions at rates higher than LSM-trees, the state-of-the-art index for insertion-intensive workloads, while avoiding their long insertion delays and improving on their query performance. They approach the query performance of B-trees when complemented with Bloom filters. In our experiments, NB-trees had worst-case delays up to 1000 smaller than LevelDB, RocksDB and bLSM, commonly used LSM-tree data-stores, could perform queries more than 4 times faster than LevelDB and 1.5 times faster than bLSM and RocksDB, while also outperforming them in terms of average insertion rate.

preprint2014arXiv

Hop Doubling Label Indexing for Point-to-Point Distance Querying on Scale-Free Networks

We study the problem of point-to-point distance querying for massive scale-free graphs, which is important for numerous applications. Given a directed or undirected graph, we propose to build an index for answering such queries based on a hop-doubling labeling technique. We derive bounds on the index size, the computation costs and I/O costs based on the properties of unweighted scale-free graphs. We show that our method is much more efficient compared to the state-of-the-art technique, in terms of both querying time and indexing time. Our empirical study shows that our method can handle graphs that are orders of magnitude larger than existing methods.

preprint2014arXiv

Multi-channel Wireless Networks with Infrastructure Support: Capacity and Delay

In this paper, we propose a novel multi-channel network with infrastructure support, called an \textit{MC-IS} network, which has not been studied in the literature. To the best of our knowledge, we are the first to study such an \textit{MC-IS} network. Our \textit{MC-IS} network is equipped with a number of infrastructure nodes which can communicate with common nodes using a number of channels where a communication between a common node and an infrastructure node is called an infrastructure communication and a communication between two common nodes is called an ad-hoc communication. Our proposed \textit{MC-IS} network has a number of advantages over three existing conventional networks, namely a single-channel wireless ad hoc network (called an \textit{SC-AH} network), a multi-channel wireless ad hoc network (called an \textit{MC-AH} network) and a single-channel network with infrastructure support (called an \textit{SC-IS} network). In particular, the \textit{network capacity} of our proposed \textit{MC-IS} network is $\sqrt{n \log n}$ times higher than that of an \textit{SC-AH} network and an \textit{MC-AH} network and the same as that of an \textit{SC-IS} network, where $n$ is the number of nodes in the network. The \textit{average delay} of our \textit{MC-IS} network is $\sqrt{\log n/n}$ times lower than that of an \textit{SC-AH} network and an \textit{MC-AH} network, and $\min(C_I,m)$ times lower than the average delay of an \textit{SC-IS} network, where $C_I$ and $m$ denote the number of channels dedicated for infrastructure communications and the number of interfaces mounted at each infrastructure node, respectively.

preprint2012arXiv

IS-LABEL: an Independent-Set based Labeling Scheme for Point-to-Point Distance Querying on Large Graphs

We study the problem of computing shortest path or distance between two query vertices in a graph, which has numerous important applications. Quite a number of indexes have been proposed to answer such distance queries. However, all of these indexes can only process graphs of size barely up to 1 million vertices, which is rather small in view of many of the fast-growing real-world graphs today such as social networks and Web graphs. We propose an efficient index, which is a novel labeling scheme based on the independent set of a graph. We show that our method can handle graphs of size three orders of magnitude larger than those existing indexes.

preprint2012arXiv

Small Count Privacy and Large Count Utility in Data Publishing

While the introduction of differential privacy has been a major breakthrough in the study of privacy preserving data publication, some recent work has pointed out a number of cases where it is not possible to limit inference about individuals. The dilemma that is intrinsic in the problem is the simultaneous requirement of data utility in the published data. Differential privacy does not aim to protect information about an individual that can be uncovered even without the participation of the individual. However, this lack of coverage may violate the principle of individual privacy. Here we propose a solution by providing protection to sensitive information, by which we refer to the answers for aggregate queries with small counts. Previous works based on $\ell$-diversity can be seen as providing a special form of this kind of protection. Our method is developed with another goal which is to provide differential privacy guarantee, and for that we introduce a more refined form of differential privacy to deal with certain practical issues. Our empirical studies show that our method can preserve better utilities than a number of state-of-the-art methods although these methods do not provide the protections that we provide.

Raymond Chi-Wing Wong

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Approximating Regret Minimizing Sets: A Happiness Perspective

Fake News Detection with Heterogeneous Transformer

Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question

Bridging the Gap Between Theory and Practice on Insertion-Intensive Database

Hop Doubling Label Indexing for Point-to-Point Distance Querying on Scale-Free Networks

Multi-channel Wireless Networks with Infrastructure Support: Capacity and Delay

IS-LABEL: an Independent-Set based Labeling Scheme for Point-to-Point Distance Querying on Large Graphs

Small Count Privacy and Large Count Utility in Data Publishing