Researcher profile

Jian Tan

Jian Tan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

CobBO: Coordinate Backoff Bayesian Optimization with Two-Stage Kernels

Bayesian optimization is a popular method for optimizing expensive black-box functions. Yet it oftentimes struggles in high dimensions where the computation could be prohibitively heavy. To alleviate this problem, we introduce Coordinate backoff Bayesian Optimization (CobBO) with two-stage kernels. During each round, the first stage uses a simple coarse kernel that sacrifices the approximation accuracy for computational efficiency. It captures the global landscape by purposely smoothing away local fluctuations. Then, in the second stage of the same round, past observed points in the full space are projected to the selected subspace to form virtual points. These virtual points, along with the means and variances of their unknown function values estimated using the simple kernel of the first stage, are fitted to a more sophisticated kernel model in the second stage. Within the selected low dimensional subspace, the computational cost of conducting Bayesian optimization therein becomes affordable. To further enhance the performance, a sequence of consecutive observations in the same subspace are collected, which can effectively refine the approximation of the function. This refinement lasts until a stopping rule is met determining when to back off from a certain subspace and switch to another. This decoupling significantly reduces the computational burden in high dimensions, which fully leverages the observations in the whole space rather than only relying on observations in each coordinate subspace. Extensive evaluations show that CobBO finds solutions comparable to or better than other state-of-the-art methods for dimensions ranging from tens to hundreds, while reducing both the trial complexity and computational costs.

preprint2022arXiv

Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation

Recently, using automatic configuration tuning to improve the performance of modern database management systems (DBMSs) has attracted increasing interest from the database community. This is embodied with a number of systems featuring advanced tuning capabilities being developed. However, it remains a challenge to select the best solution for database configuration tuning, considering the large body of algorithm choices. In addition, beyond the applications on database systems, we could find more potential algorithms designed for configuration tuning. To this end, this paper provides a comprehensive evaluation of configuration tuning techniques from a broader perspective, hoping to better benefit the database community. In particular, we summarize three key modules of database configuration tuning systems and conduct extensive ablation studies using various challenging cases. Our evaluation demonstrates that the hyper-parameter optimization algorithms can be borrowed to further enhance the database configuration tuning. Moreover, we identify the best algorithm choices for different modules. Beyond the comprehensive evaluations, we offer an efficient and unified database configuration tuning benchmark via surrogates that reduces the evaluation cost to a minimum, allowing for extensive runs and analysis of new techniques.

preprint2022arXiv

LPC-AD: Fast and Accurate Multivariate Time Series Anomaly Detection via Latent Predictive Coding

This paper proposes LPC-AD, a fast and accurate multivariate time series (MTS) anomaly detection method. LPC-AD is motivated by the ever-increasing needs for fast and accurate MTS anomaly detection methods to support fast troubleshooting in cloud computing, micro-service systems, etc. LPC-AD is fast in the sense that its reduces the training time by as high as 38.2% compared to the state-of-the-art (SOTA) deep learning methods that focus on training speed. LPC-AD is accurate in the sense that it improves the detection accuracy by as high as 18.9% compared to SOTA sophisticated deep learning methods that focus on enhancing detection accuracy. Methodologically, LPC-AD contributes a generic architecture LPC-Reconstruct for one to attain different trade-offs between training speed and detection accuracy. More specifically, LPC-Reconstruct is built on ideas from autoencoder for reducing redundancy in time series, latent predictive coding for capturing temporal dependence in MTS, and randomized perturbation for avoiding overfitting of anomalous dependence in the training data. We present simple instantiations of LPC-Reconstruct to attain fast training speed, where we propose a simple randomized perturbation method. The superior performance of LPC-AD over SOTA methods is validated by extensive experiments on four large real-world datasets. Experiment results also show the necessity and benefit of each component of the LPC-Reconstruct architecture and that LPC-AD is robust to hyper parameters.

preprint2022arXiv

Towards Dynamic and Safe Configuration Tuning for Cloud Databases

Configuration knobs of database systems are essential to achieve high throughput and low latency. Recently, automatic tuning systems using machine learning methods (ML) have shown to find better configurations compared to experienced database administrators (DBAs). However, there are still gaps to apply the existing systems in production environments, especially in the cloud. First, they conduct tuning for a given workload within a limited time window and ignore the dynamicity of workloads and data. Second, they rely on a copied instance and do not consider the availability of the database when sampling configurations, making the tuning expensive, delayed, and unsafe. To fill these gaps, we propose OnlineTune, which tunes the online databases safely in changing cloud environments. To accommodate the dynamicity, OnlineTune embeds the environmental factors as context feature and adopts contextual Bayesian Optimization with context space partition to optimize the database adaptively and scalably. To pursue safety during tuning, we leverage the black-box and the white-box knowledge to evaluate the safety of configurations and propose a safe exploration strategy via subspace adaptation.%, greatly decreasing the risks of applying bad configurations. We conduct evaluations on dynamic workloads from benchmarks and real-world workloads. Compared with the state-of-the-art methods, OnlineTune achieves 14.4%~165.3% improvement on cumulative performance while reducing 91.0%~99.5% unsafe configuration recommendations.

preprint2020arXiv

Learning Latent Features with Pairwise Penalties in Low-Rank Matrix Completion

Low-rank matrix completion has achieved great success in many real-world data applications. A matrix factorization model that learns latent features is usually employed and, to improve prediction performance, the similarities between latent variables can be exploited by pairwise learning using the graph regularized matrix factorization (GRMF) method. However, existing GRMF approaches often use the squared loss to measure the pairwise differences, which may be overly influenced by dissimilar pairs and lead to inferior prediction. To fully empower pairwise learning for matrix completion, we propose a general optimization framework that allows a rich class of (non-)convex pairwise penalty functions. A new and efficient algorithm is developed to solve the proposed optimization problem, with a theoretical convergence guarantee under mild assumptions. In an important situation where the latent variables form a small number of subgroups, its statistical guarantee is also fully considered. In particular, we theoretically characterize the performance of the complexity-regularized maximum likelihood estimator, as a special case of our framework, which is shown to have smaller errors when compared to the standard matrix completion framework without pairwise penalties. We conduct extensive experiments on both synthetic and real datasets to demonstrate the superior performance of this general framework.