Source author record

Andrew Lim

Andrew Lim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Data Structures and Algorithms Applications Computer Vision Information Theory math.IT Databases Discrete Mathematics Distributed, Parallel, and Cluster Computing eess.IV Information Retrieval math.OC Social and Information Networks

Catalog footprint

What is connected

16works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Deep Reinforcement Learning for Solving the Heterogeneous Capacitated Vehicle Routing Problem

Existing deep reinforcement learning (DRL) based methods for solving the capacitated vehicle routing problem (CVRP) intrinsically cope with homogeneous vehicle fleet, in which the fleet is assumed as repetitions of a single vehicle. Hence, their key to construct a solution solely lies in the selection of the next node (customer) to visit excluding the selection of vehicle. However, vehicles in real-world scenarios are likely to be heterogeneous with different characteristics that affect their capacity (or travel speed), rendering existing DRL methods less effective. In this paper, we tackle heterogeneous CVRP (HCVRP), where vehicles are mainly characterized by different capacities. We consider both min-max and min-sum objectives for HCVRP, which aim to minimize the longest or total travel time of the vehicle(s) in the fleet. To solve those problems, we propose a DRL method based on the attention mechanism with a vehicle selection decoder accounting for the heterogeneous fleet constraint and a node selection decoder accounting for the route construction, which learns to construct a solution by automatically selecting both a vehicle and a node for this vehicle at each step. Experimental results based on randomly generated instances show that, with desirable generalization to various problem sizes, our method outperforms the state-of-the-art DRL method and most of the conventional heuristics, and also delivers competitive performance against the state-of-the-art heuristic method, i.e., SISR. Additionally, the results of extended experiments demonstrate that our method is also able to solve CVRPLib instances with satisfactory performance.

preprint2022arXiv

Proteus: A Self-Designing Range Filter

We introduce Proteus, a novel self-designing approximate range filter, which configures itself based on sampled data in order to optimize its false positive rate (FPR) for a given space requirement. Proteus unifies the probabilistic and deterministic design spaces of state-of-the-art range filters to achieve robust performance across a larger variety of use cases. At the core of Proteus lies our Contextual Prefix FPR (CPFPR) model - a formal framework for the FPR of prefix-based filters across their design spaces. We empirically demonstrate the accuracy of our model and Proteus' ability to optimize over both synthetic workloads and real-world datasets. We further evaluate Proteus in RocksDB and show that it is able to improve end-to-end performance by as much as 5.3x over more brittle state-of-the-art methods such as SuRF and Rosetta. Our experiments also indicate that the cost of modeling is not significant compared to the end-to-end performance gains and that Proteus is robust to workload shifts.

preprint2021arXiv

Revisiting Modified Greedy Algorithm for Monotone Submodular Maximization with a Knapsack Constraint

Monotone submodular maximization with a knapsack constraint is NP-hard. Various approximation algorithms have been devised to address this optimization problem. In this paper, we revisit the widely known modified greedy algorithm. First, we show that this algorithm can achieve an approximation factor of $0.405$, which significantly improves the known factors of $0.357$ given by Wolsey and $(1-1/\mathrm{e})/2\approx 0.316$ given by Khuller et al. More importantly, our analysis closes a gap in Khuller et al.'s proof for the extensively mentioned approximation factor of $(1-1/\sqrt{\mathrm{e}})\approx 0.393$ in the literature to clarify a long-standing misconception on this issue. Second, we enhance the modified greedy algorithm to derive a data-dependent upper bound on the optimum. We empirically demonstrate the tightness of our upper bound with a real-world application. The bound enables us to obtain a data-dependent ratio typically much higher than $0.405$ between the solution value of the modified greedy algorithm and the optimum. It can also be used to significantly improve the efficiency of algorithms such as branch and bound.

preprint2021arXiv

Why Are the ARIMA and SARIMA not Sufficient

The autoregressive moving average (ARMA) model takes the significant position in time series analysis for a wide-sense stationary time series. The difference operator and seasonal difference operator, which are bases of ARIMA and SARIMA (Seasonal ARIMA), respectively, were introduced to remove the trend and seasonal component so that the original non-stationary time series could be transformed into a wide-sense stationary one, which could then be handled by Box-Jenkins methodology. However, such difference operators are more practical experiences than exact theories by now. In this paper, we investigate the power of the (resp. seasonal) difference operator from the perspective of spectral analysis, linear system theory and digital filtering, and point out the characteristics and limitations of (resp. seasonal) difference operator. Besides, the general method that transforms a non-stationary (the non-stationarity in the mean sense) stochastic process to be wide-sense stationary will be presented.

preprint2020arXiv

An Exponential Factorization Machine with Percentage Error Minimization to Retail Sales Forecasting

This paper proposes a new approach to sales forecasting for new products with long lead time but short product life cycle. These SKUs are usually sold for one season only, without any replenishments. An exponential factorization machine (EFM) sales forecast model is developed to solve this problem which not only considers SKU attributes, but also pairwise interactions. The EFM model is significantly different from the original Factorization Machines (FM) from two-fold: (1) the attribute-level formulation for explanatory variables and (2) exponential formulation for the positive response variable. The attribute-level formation excludes infeasible intra-attribute interactions and results in more efficient feature engineering comparing with the conventional one-hot encoding, while the exponential formulation is demonstrated more effective than the log-transformation for the positive but not skewed distributed responses. In order to estimate the parameters, percentage error squares (PES) and error squares (ES) are minimized by a proposed adaptive batch gradient descent method over the training set. Real-world data provided by a footwear retailer in Singapore is used for testing the proposed approach. The forecasting performance in terms of both mean absolute percentage error (MAPE) and mean absolute error (MAE) compares favourably with not only off-the-shelf models but also results reported by extant sales and demand forecasting studies. The effectiveness of the proposed approach is also demonstrated by two external public datasets. Moreover, we prove the theoretical relationships between PES and ES minimization, and present an important property of the PES minimization for regression models; that it trains models to underestimate data. This property fits the situation of sales forecasting where unit-holding cost is much greater than the unit-shortage cost.

preprint2020arXiv

Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene

Learning on 3D scene-based point cloud has received extensive attention as its promising application in many fields, and well-annotated and multisource datasets can catalyze the development of those data-driven approaches. To facilitate the research of this area, we present a richly-annotated 3D point cloud dataset for multiple outdoor scene understanding tasks and also an effective learning framework for its hierarchical segmentation task. The dataset was generated via the photogrammetric processing on unmanned aerial vehicle (UAV) images of the National University of Singapore (NUS) campus, and has been point-wisely annotated with both hierarchical and instance-based labels. Based on it, we formulate a hierarchical learning problem for 3D point cloud segmentation and propose a measurement evaluating consistency across various hierarchies. To solve this problem, a two-stage method including multi-task (MT) learning and hierarchical ensemble (HE) with consistency consideration is proposed. Experimental results demonstrate the superiority of the proposed method and potential advantages of our hierarchical annotations. In addition, we benchmark results of semantic and instance segmentation, which is accessible online at https://3d.dataset.site with the dataset and all source codes.

preprint2020arXiv

Directed Graph Convolutional Network

Graph Convolutional Networks (GCNs) have been widely used due to their outstanding performance in processing graph-structured data. However, the undirected graphs limit their application scope. In this paper, we extend spectral-based graph convolution to directed graphs by using first- and second-order proximity, which can not only retain the connection properties of the directed graph, but also expand the receptive field of the convolution operation. A new GCN model, called DGCN, is then designed to learn representations on the directed graph, leveraging both the first- and second-order proximity information. We empirically show the fact that GCNs working only with DGCNs can encode more useful information from graph and help achieve better performance when generalized to other models. Moreover, extensive experiments on citation networks and co-purchase datasets demonstrate the superiority of our model against the state-of-the-art methods.

preprint2020arXiv

Efficient Approximation Algorithms for Adaptive Influence Maximization

Given a social network $G$ and an integer $k$, the influence maximization (IM) problem asks for a seed set $S$ of $k$ nodes from $G$ to maximize the expected number of nodes influenced via a propagation model. The majority of the existing algorithms for the IM problem are developed only under the non-adaptive setting, i.e., where all $k$ seed nodes are selected in one batch without observing how they influence other users in real world. In this paper, we study the adaptive IM problem where the $k$ seed nodes are selected in batches of equal size $b$, such that the $i$-th batch is identified after the actual influence results of the former $i-1$ batches are observed. In this paper, we propose the first practical algorithm for the adaptive IM problem that could provide the worst-case approximation guarantee of $1-\mathrm{e}^{ρ_b(\varepsilon-1)}$, where $ρ_b=1-(1-1/b)^b$ and $\varepsilon \in (0, 1)$ is a user-specified parameter. In particular, we propose a general framework AdaptGreedy that could be instantiated by any existing non-adaptive IM algorithms with expected approximation guarantee. Our approach is based on a novel randomized policy that is applicable to the general adaptive stochastic maximization problem, which may be of independent interest. In addition, we propose a novel non-adaptive IM algorithm called EPIC which not only provides strong expected approximation guarantee, but also presents superior performance compared with the existing IM algorithms. Meanwhile, we clarify some existing misunderstandings in recent work and shed light on further study of the adaptive IM problem. We conduct experiments on real social networks to evaluate our proposed algorithms comprehensively, and the experimental results strongly corroborate the superiorities and effectiveness of our approach.

preprint2020arXiv

Learning Improvement Heuristics for Solving Routing Problems

Recent studies in using deep learning to solve routing problems focus on construction heuristics, the solutions of which are still far from optimality. Improvement heuristics have great potential to narrow this gap by iteratively refining a solution. However, classic improvement heuristics are all guided by hand-crafted rules which may limit their performance. In this paper, we propose a deep reinforcement learning framework to learn the improvement heuristics for routing problems. We design a self-attention based deep architecture as the policy network to guide the selection of next solution. We apply our method to two important routing problems, i.e. travelling salesman problem (TSP) and capacitated vehicle routing problem (CVRP). Experiments show that our method outperforms state-of-the-art deep learning based approaches. The learned policies are more effective than the traditional hand-crafted ones, and can be further enhanced by simple diversifying strategies. Moreover, the policies generalize well to different problem sizes, initial solutions and even real-world dataset.

preprint2020arXiv

On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial Attacks

While deep learning in 3D domain has achieved revolutionary performance in many tasks, the robustness of these models has not been sufficiently studied or explored. Regarding the 3D adversarial samples, most existing works focus on manipulation of local points, which may fail to invoke the global geometry properties, like robustness under linear projection that preserves the Euclidean distance, i.e., isometry. In this work, we show that existing state-of-the-art deep 3D models are extremely vulnerable to isometry transformations. Armed with the Thompson Sampling, we develop a black-box attack with success rate over 95% on ModelNet40 data set. Incorporating with the Restricted Isometry Property, we propose a novel framework of white-box attack on top of spectral norm based perturbation. In contrast to previous works, our adversarial samples are experimentally shown to be strongly transferable. Evaluated on a sequence of prevailing 3D models, our white-box attack achieves success rates from 98.88% to 100%. It maintains a successful attack rate over 95% even within an imperceptible rotation range $[\pm 2.81^{\circ}]$.

preprint2019arXiv

Deep Pattern of Time Series and Its Applications in Estimation, Forecasting, Fault Diagnosis and Target Tracking

The information contained in a time series is more than what the values themselves are. In this paper, the Time-variant Local Autocorrelated Polynomial model with Kalman filter is proposed to model the underlying dynamics of a time series (or signal) and mine the deep pattern of it, except estimating the instantaneous mean function (also known as trend function), including: (1) identifying and predicting the peak and valley values of a time series; (2) reporting and forecasting the current changing pattern (increasing or decreasing pattern of the trend, and how fast it changes). We will show that it is this deep pattern that allows us to make higher-accuracy estimation and forecasting for a time series, to easily detect the anomalies (faults) of a sensor, and to track a highly-maneuvering target.

preprint2016arXiv

Distributed Graphical Simulation in the Cloud

Graphical simulations are a cornerstone of modern media and films. But existing software packages are designed to run on HPC nodes, and perform poorly in the computing cloud. These simulations have complex data access patterns over complex data structures, and mutate data arbitrarily, and so are a poor fit for existing cloud computing systems. We describe a software architecture for running graphical simulations in the cloud that decouples control logic, computations and data exchanges. This allows a central controller to balance load by redistributing computations, and recover from failures. Evaluations show that the architecture can run existing, state-of-the-art simulations in the presence of stragglers and failures, thereby enabling this large class of applications to use the computing cloud for the first time.

preprint2016arXiv

Learning Robust Features using Deep Learning for Automatic Seizure Detection

We present and evaluate the capacity of a deep neural network to learn robust features from EEG to automatically detect seizures. This is a challenging problem because seizure manifestations on EEG are extremely variable both inter- and intra-patient. By simultaneously capturing spectral, temporal and spatial information our recurrent convolutional neural network learns a general spatially invariant representation of a seizure. The proposed approach exceeds significantly previous results obtained on cross-patient classifiers both in terms of sensitivity and false positive rate. Furthermore, our model proves to be robust to missing channel and variable electrode montage.

preprint2014arXiv

A Tabu Search Algorithm for the Multi-period Inspector Scheduling Problem

This paper introduces a multi-period inspector scheduling problem (MPISP), which is a new variant of the multi-trip vehicle routing problem with time windows (VRPTW). In the MPISP, each inspector is scheduled to perform a route in a given multi-period planning horizon. At the end of each period, each inspector is not required to return to the depot but has to stay at one of the vertices for recuperation. If the remaining time of the current period is insufficient for an inspector to travel from his/her current vertex $A$ to a certain vertex B, he/she can choose either waiting at vertex A until the start of the next period or traveling to a vertex C that is closer to vertex B. Therefore, the shortest transit time between any vertex pair is affected by the length of the period and the departure time. We first describe an approach of computing the shortest transit time between any pair of vertices with an arbitrary departure time. To solve the MPISP, we then propose several local search operators adapted from classical operators for the VRPTW and integrate them into a tabu search framework. In addition, we present a constrained knapsack model that is able to produce an upper bound for the problem. Finally, we evaluate the effectiveness of our algorithm with extensive experiments based on a set of test instances. Our computational results indicate that our approach generates high-quality solutions.

preprint2014arXiv

An Enhanced Branch-and-bound Algorithm for the Talent Scheduling Problem

The talent scheduling problem is a simplified version of the real-world film shooting problem, which aims to determine a shooting sequence so as to minimize the total cost of the actors involved. In this article, we first formulate the problem as an integer linear programming model. Next, we devise a branch-and-bound algorithm to solve the problem. The branch-and-bound algorithm is enhanced by several accelerating techniques, including preprocessing, dominance rules and caching search states. Extensive experiments over two sets of benchmark instances suggest that our algorithm is superior to the current best exact algorithm. Finally, the impacts of different parameter settings are disclosed by some additional experiments.

preprint2014arXiv

Branch-and-price-and-cut for the Split-collection Vehicle Routing Problem with Time Windows and Linear Weight-related Cost

This paper addresses a new vehicle routing problem that simultaneously involves time windows, split collection and linear weight-related cost, which is a generalization of the split delivery vehicle routing problem with time windows (SDVRPTW). This problem consists of determining least-cost vehicle routes to serve a set of customers while respecting the restrictions of vehicle capacity and time windows. The travel cost per unit distance is a linear function of the vehicle weight and the customer demand can be fulfilled by multiple vehicles. To solve this problem, we propose a exact branch-and-price-and-cut algorithm, where the pricing subproblem is a resource-constrained elementary least-cost path problem. We first prove that at least an optimal solution to the pricing subproblem is associated with an extreme collection pattern, and then design a tailored and novel label-setting algorithm to solve it. Computational results show that our proposed algorithm can handle both the SDVRPTW and our problem effectively.

Andrew Lim

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Deep Reinforcement Learning for Solving the Heterogeneous Capacitated Vehicle Routing Problem

Proteus: A Self-Designing Range Filter

Revisiting Modified Greedy Algorithm for Monotone Submodular Maximization with a Knapsack Constraint

Why Are the ARIMA and SARIMA not Sufficient

An Exponential Factorization Machine with Percentage Error Minimization to Retail Sales Forecasting

Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene

Directed Graph Convolutional Network

Efficient Approximation Algorithms for Adaptive Influence Maximization

Learning Improvement Heuristics for Solving Routing Problems

On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial Attacks

Deep Pattern of Time Series and Its Applications in Estimation, Forecasting, Fault Diagnosis and Target Tracking

Distributed Graphical Simulation in the Cloud

Learning Robust Features using Deep Learning for Automatic Seizure Detection

A Tabu Search Algorithm for the Multi-period Inspector Scheduling Problem

An Enhanced Branch-and-bound Algorithm for the Talent Scheduling Problem

Branch-and-price-and-cut for the Split-collection Vehicle Routing Problem with Time Windows and Linear Weight-related Cost