Researcher profile

Yeow Meng Chee

Yeow Meng Chee contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2023arXiv

Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation

Recent neural methods for vehicle routing problems always train and test the deep models on the same instance distribution (i.e., uniform). To tackle the consequent cross-distribution generalization concerns, we bring the knowledge distillation to this field and propose an Adaptive Multi-Distribution Knowledge Distillation (AMDKD) scheme for learning more generalizable deep models. Particularly, our AMDKD leverages various knowledge from multiple teachers trained on exemplar distributions to yield a light-weight yet generalist student model. Meanwhile, we equip AMDKD with an adaptive strategy that allows the student to concentrate on difficult distributions, so as to absorb hard-to-master knowledge more effectively. Extensive experimental results show that, compared with the baseline neural methods, our AMDKD is able to achieve competitive results on both unseen in-distribution and out-of-distribution instances, which are either randomly synthesized or adopted from benchmark datasets (i.e., TSPLIB and CVRPLIB). Notably, our AMDKD is generic, and consumes less computational resources for inference.

preprint2022arXiv

Cost-Effective Algorithms for Average-Case Interactive Graph Search

Interactive graph search (IGS) uses human intelligence to locate the target node in hierarchy, which can be applied for image classification, product categorization and searching a database. Specifically, IGS aims to categorize an object from a given category hierarchy via several rounds of interactive queries. In each round of query, the search algorithm picks a category and receives a boolean answer on whether the object is under the chosen category. The main efficiency goal asks for the minimum number of queries to identify the correct hierarchical category for the object. In this paper, we study the average-case interactive graph search (AIGS) problem that aims to minimize the expected number of queries when the objects follow a probability distribution. We propose a greedy search policy that splits the candidate categories as evenly as possible with respect to the probability weights, which offers an approximation guarantee of $O(\log n)$ for AIGS given the category hierarchy is a directed acyclic graph (DAG), where $n$ is the total number of categories. Meanwhile, if the input hierarchy is a tree, we show that a constant approximation factor of $(1+\sqrt{5})/2$ can be achieved. Furthermore, we present efficient implementations of the greedy policy, namely GreedyTree and GreedyDAG, that can quickly categorize the object in practice. Extensive experiments in real-world scenarios are carried out to demonstrate the superiority of our proposed methods.

preprint2022arXiv

Neural Network Decoders for Permutation Codes Correcting Different Errors

Permutation codes were extensively studied in order to correct different types of errors for the applications on power line communication and rank modulation for flash memory. In this paper, we introduce the neural network decoders for permutation codes to correct these errors with one-shot decoding, which treat the decoding as $n$ classification tasks for non-binary symbols for a code of length $n$. These are actually the first general decoders introduced to deal with any error type for these two applications. The performance of the decoders is evaluated by simulations with different error models.

preprint2022arXiv

Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives

Numerous advancements in deep learning can be attributed to the access to large-scale and well-annotated datasets. However, such a dataset is prohibitively expensive in 3D computer vision due to the substantial collection cost. To alleviate this issue, we propose a cost-effective method for automatically generating a large amount of 3D objects with annotations. In particular, we synthesize objects simply by assembling multiple random primitives. These objects are thus auto-annotated with part labels originating from primitives. This allows us to perform multi-task learning by combining the supervised segmentation with unsupervised reconstruction. Considering the large overhead of learning on the generated dataset, we further propose a dataset distillation strategy to remove redundant samples regarding a target dataset. We conduct extensive experiments for the downstream tasks of 3D object classification. The results indicate that our dataset, together with multi-task pretraining on its annotations, achieves the best performance compared to other commonly used datasets. Further study suggests that our strategy can improve the model performance by pretraining and fine-tuning scheme, especially for the dataset with a small scale. In addition, pretraining with the proposed dataset distillation method can save 86\% of the pretraining time with negligible performance degradation. We expect that our attempt provides a new data-centric perspective for training 3D deep models.

preprint2022arXiv

Serverless Data Science -- Are We There Yet? A Case Study of Model Serving

Machine learning (ML) is an important part of modern data science applications. Data scientists today have to manage the end-to-end ML life cycle that includes both model training and model serving, the latter of which is essential, as it makes their works available to end-users. Systems of model serving require high performance, low cost, and ease of management. Cloud providers are already offering model serving choices, including managed services and self-rented servers. Recently, serverless computing, whose advantages include high elasticity and a fine-grained cost model, brings another option for model serving. Our goal in this paper is to examine the viability of serverless as a mainstream model serving platform. To this end, we first conduct a comprehensive evaluation of the performance and cost of serverless against other model serving systems on Amazon Web Service and Google Cloud Platform. We find that serverless outperforms many cloud-based alternatives. Further, there are settings under which it even achieves better performance than GPU-based systems. Next, we present the design space of serverless model serving, which comprises multiple dimensions, including cloud platforms, serving runtimes, and other function-specific parameters. For each dimension, we analyze the impact of different choices and provide suggestions for data scientists to better utilize serverless model serving. Finally, we discuss challenges and opportunities in building a more practical serverless model serving system.

preprint2022arXiv

Two dimensional RC/Subarray Constrained Codes: Bounded Weight and Almost Balanced Weight

In this work, we study two types of constraints on two-dimensional binary arrays. In particular, given $p,ε>0$, we study (i) The $p$-bounded constraint: a binary vector of size $m$ is said to be $p$-bounded if its weight is at most $pm$, and (ii) The $ε$-balanced constraint: a binary vector of size $m$ is said to be $ε$-balanced if its weight is within $[(0.5-ε)*m,(0.5+ε)*m]$. Such constraints are crucial in several data storage systems, those regard the information data as two-dimensional (2D) instead of one-dimensional (1D), such as the crossbar resistive memory arrays and the holographic data storage. In this work, efficient encoding/decoding algorithms are presented for binary arrays so that the weight constraint (either $p$-bounded constraint or $ε$-balanced constraint) is enforced over every row and every column, regarded as 2D row-column (RC) constrained codes; or over every subarray, regarded as 2D subarray constrained codes. While low-complexity designs have been proposed in the literature, mostly focusing on 2D RC constrained codes where $p = 1/2$ and $ε= 0$, this work provides efficient coding methods that work for both 2D RC constrained codes and 2D subarray constrained codes, and more importantly, the methods are applicable for arbitrary values of $p$ and $ε$. Furthermore, for certain values of $p$ and $ε$, we show that, for sufficiently large array size, there exists linear-time encoding/decoding algorithm that incurs at most one redundant bit.

preprint2020arXiv

Constrained de Bruijn Codes: Properties, Enumeration, Constructions, and Applications

The de Bruijn graph, its sequences, and their various generalizations, have found many applications in information theory, including many new ones in the last decade. In this paper, motivated by a coding problem for emerging memory technologies, a set of sequences which generalize sequences in the de Bruijn graph are defined. These sequences can be also defined and viewed as constrained sequences. Hence, they will be called constrained de Bruijn sequences and a set of such sequences will be called a constrained de Bruijn code. Several properties and alternative definitions for such codes are examined and they are analyzed as generalized sequences in the de Bruijn graph (and its generalization) and as constrained sequences. Various enumeration techniques are used to compute the total number of sequences for any given set of parameters. A construction method of such codes from the theory of shift-register sequences is proposed. Finally, we show how these constrained de Bruijn sequences and codes can be applied in constructions of codes for correcting synchronization errors in the $\ell$-symbol read channel and in the racetrack memory channel. For this purpose, these codes are superior in their size on previously known codes.

preprint2020arXiv

Explicit Baranyai Partitions for Quadruples, Part I: Quadrupling Constructions

It is well known that, whenever $k$ divides $n$, the complete $k$-uniform hypergraph on $n$ vertices can be partitioned into disjoint perfect matchings. Equivalently, the set of $k$-subsets of an $n$-set can be partitioned into parallel classes so that each parallel class is a partition of the $n$-set. This result is known as Baranyai's theorem, which guarantees the existence of \emph{Baranyai partitions}. Unfortunately, the proof of Baranyai's theorem uses network flow arguments, making this result non-explicit. In particular, there is no known method to produce Baranyai partitions in time and space that scale linearly with the number of hyperedges in the hypergraph. It is desirable for certain applications to have an explicit construction that generates Baranyai partitions in linear time. Such an efficient construction is known for $k=2$ and $k=3$. In this paper, we present an explicit recursive quadrupling construction for $k=4$ and $n=4t$, where $t \equiv 0,3,4,6,8,9 ~(\text{mod}~12)$. In a follow-up paper (Part II), the other values of~$t$, namely $t \equiv 1,2,5,7,10,11 ~(\text{mod}~12)$, will be considered.

preprint2019arXiv

Robust Positioning Patterns with Low Redundancy

A robust positioning pattern is a large array that allows a mobile device to locate its position by reading a possibly corrupted small window around it. In this paper, we provide constructions of binary positioning patterns, equipped with efficient locating algorithms, that are robust to a constant number of errors and have redundancy within a constant factor of optimality. Furthermore, we modify our constructions to correct rank errors and obtain binary positioning patterns robust to any errors of rank less than a constant number. Additionally, we construct $q$-ary robust positioning sequences robust to a large number of errors, some of which have length attaining the upper bound. Our construction of binary positioning sequences that are robust to a constant number of errors has the least known redundancy amongst those explicit constructions with efficient locating algorithms. On the other hand, for binary robust positioning arrays, our construction is the first explicit construction whose redundancy is within a constant factor of optimality. The locating algorithms accompanying both constructions run in time cubic in sequence length or array dimension.

preprint2010arXiv

Linear Size Optimal q-ary Constant-Weight Codes and Constant-Composition Codes

An optimal constant-composition or constant-weight code of weight $w$ has linear size if and only if its distance $d$ is at least $2w-1$. When $d\geq 2w$, the determination of the exact size of such a constant-composition or constant-weight code is trivial, but the case of $d=2w-1$ has been solved previously only for binary and ternary constant-composition and constant-weight codes, and for some sporadic instances. This paper provides a construction for quasicyclic optimal constant-composition and constant-weight codes of weight $w$ and distance $2w-1$ based on a new generalization of difference triangle sets. As a result, the sizes of optimal constant-composition codes and optimal constant-weight codes of weight $w$ and distance $2w-1$ are determined for all such codes of sufficiently large lengths. This solves an open problem of Etzion. The sizes of optimal constant-composition codes of weight $w$ and distance $2w-1$ are also determined for all $w\leq 6$, except in two cases.

preprint2010arXiv

Query-Efficient Locally Decodable Codes of Subexponential Length

We develop the algebraic theory behind the constructions of Yekhanin (2008) and Efremenko (2009), in an attempt to understand the ``algebraic niceness&#39;&#39; phenomenon in $\mathbb{Z}_m$. We show that every integer $m = pq = 2^t -1$, where $p$, $q$ and $t$ are prime, possesses the same good algebraic property as $m=511$ that allows savings in query complexity. We identify 50 numbers of this form by computer search, which together with 511, are then applied to gain improvements on query complexity via Itoh and Suzuki&#39;s composition method. More precisely, we construct a $3^{\lceil r/2\rceil}$-query LDC for every positive integer $r<104$ and a $\left\lfloor (3/4)^{51}\cdot 2^{r}\right\rfloor$-query LDC for every integer $r\geq 104$, both of length $N_{r}$, improving the $2^r$ queries used by Efremenko (2009) and $3\cdot 2^{r-2}$ queries used by Itoh and Suzuki (2010). We also obtain new efficient private information retrieval (PIR) schemes from the new query-efficient LDCs.

preprint2010arXiv

Spectrum of Sizes for Perfect Deletion-Correcting Codes

One peculiarity with deletion-correcting codes is that perfect $t$-deletion-correcting codes of the same length over the same alphabet can have different numbers of codewords, because the balls of radius $t$ with respect to the Levenshte\uın distance may be of different sizes. There is interest, therefore, in determining all possible sizes of a perfect $t$-deletion-correcting code, given the length $n$ and the alphabet size~$q$. In this paper, we determine completely the spectrum of possible sizes for perfect $q$-ary 1-deletion-correcting codes of length three for all $q$, and perfect $q$-ary 2-deletion-correcting codes of length four for almost all $q$, leaving only a small finite number of cases in doubt.

preprint2010arXiv

Universal Cycles for Minimum Coverings of Pairs by Triples, with Application to 2-Radius Sequences

A new ordering, extending the notion of universal cycles of Chung {\em et al.} (1992), is proposed for the blocks of $k$-uniform set systems. Existence of minimum coverings of pairs by triples that possess such an ordering is established for all orders. Application to the construction of short 2-radius sequences is given, with some new 2-radius sequences found through computer search.