Source author record

Jian Tan

Jian Tan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Databases Information Theory math.IT Biomolecules Cell Behavior math.OC Performance

Catalog footprint

What is connected

8works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

CobBO: Coordinate Backoff Bayesian Optimization with Two-Stage Kernels

Bayesian optimization is a popular method for optimizing expensive black-box functions. Yet it oftentimes struggles in high dimensions where the computation could be prohibitively heavy. To alleviate this problem, we introduce Coordinate backoff Bayesian Optimization (CobBO) with two-stage kernels. During each round, the first stage uses a simple coarse kernel that sacrifices the approximation accuracy for computational efficiency. It captures the global landscape by purposely smoothing away local fluctuations. Then, in the second stage of the same round, past observed points in the full space are projected to the selected subspace to form virtual points. These virtual points, along with the means and variances of their unknown function values estimated using the simple kernel of the first stage, are fitted to a more sophisticated kernel model in the second stage. Within the selected low dimensional subspace, the computational cost of conducting Bayesian optimization therein becomes affordable. To further enhance the performance, a sequence of consecutive observations in the same subspace are collected, which can effectively refine the approximation of the function. This refinement lasts until a stopping rule is met determining when to back off from a certain subspace and switch to another. This decoupling significantly reduces the computational burden in high dimensions, which fully leverages the observations in the whole space rather than only relying on observations in each coordinate subspace. Extensive evaluations show that CobBO finds solutions comparable to or better than other state-of-the-art methods for dimensions ranging from tens to hundreds, while reducing both the trial complexity and computational costs.

preprint2022arXiv

Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation

Recently, using automatic configuration tuning to improve the performance of modern database management systems (DBMSs) has attracted increasing interest from the database community. This is embodied with a number of systems featuring advanced tuning capabilities being developed. However, it remains a challenge to select the best solution for database configuration tuning, considering the large body of algorithm choices. In addition, beyond the applications on database systems, we could find more potential algorithms designed for configuration tuning. To this end, this paper provides a comprehensive evaluation of configuration tuning techniques from a broader perspective, hoping to better benefit the database community. In particular, we summarize three key modules of database configuration tuning systems and conduct extensive ablation studies using various challenging cases. Our evaluation demonstrates that the hyper-parameter optimization algorithms can be borrowed to further enhance the database configuration tuning. Moreover, we identify the best algorithm choices for different modules. Beyond the comprehensive evaluations, we offer an efficient and unified database configuration tuning benchmark via surrogates that reduces the evaluation cost to a minimum, allowing for extensive runs and analysis of new techniques.

preprint2022arXiv

LPC-AD: Fast and Accurate Multivariate Time Series Anomaly Detection via Latent Predictive Coding

This paper proposes LPC-AD, a fast and accurate multivariate time series (MTS) anomaly detection method. LPC-AD is motivated by the ever-increasing needs for fast and accurate MTS anomaly detection methods to support fast troubleshooting in cloud computing, micro-service systems, etc. LPC-AD is fast in the sense that its reduces the training time by as high as 38.2% compared to the state-of-the-art (SOTA) deep learning methods that focus on training speed. LPC-AD is accurate in the sense that it improves the detection accuracy by as high as 18.9% compared to SOTA sophisticated deep learning methods that focus on enhancing detection accuracy. Methodologically, LPC-AD contributes a generic architecture LPC-Reconstruct for one to attain different trade-offs between training speed and detection accuracy. More specifically, LPC-Reconstruct is built on ideas from autoencoder for reducing redundancy in time series, latent predictive coding for capturing temporal dependence in MTS, and randomized perturbation for avoiding overfitting of anomalous dependence in the training data. We present simple instantiations of LPC-Reconstruct to attain fast training speed, where we propose a simple randomized perturbation method. The superior performance of LPC-AD over SOTA methods is validated by extensive experiments on four large real-world datasets. Experiment results also show the necessity and benefit of each component of the LPC-Reconstruct architecture and that LPC-AD is robust to hyper parameters.

preprint2022arXiv

Towards Dynamic and Safe Configuration Tuning for Cloud Databases

Configuration knobs of database systems are essential to achieve high throughput and low latency. Recently, automatic tuning systems using machine learning methods (ML) have shown to find better configurations compared to experienced database administrators (DBAs). However, there are still gaps to apply the existing systems in production environments, especially in the cloud. First, they conduct tuning for a given workload within a limited time window and ignore the dynamicity of workloads and data. Second, they rely on a copied instance and do not consider the availability of the database when sampling configurations, making the tuning expensive, delayed, and unsafe. To fill these gaps, we propose OnlineTune, which tunes the online databases safely in changing cloud environments. To accommodate the dynamicity, OnlineTune embeds the environmental factors as context feature and adopts contextual Bayesian Optimization with context space partition to optimize the database adaptively and scalably. To pursue safety during tuning, we leverage the black-box and the white-box knowledge to evaluate the safety of configurations and propose a safe exploration strategy via subspace adaptation.%, greatly decreasing the risks of applying bad configurations. We conduct evaluations on dynamic workloads from benchmarks and real-world workloads. Compared with the state-of-the-art methods, OnlineTune achieves 14.4%~165.3% improvement on cumulative performance while reducing 91.0%~99.5% unsafe configuration recommendations.

preprint2020arXiv

Learning Latent Features with Pairwise Penalties in Low-Rank Matrix Completion

Low-rank matrix completion has achieved great success in many real-world data applications. A matrix factorization model that learns latent features is usually employed and, to improve prediction performance, the similarities between latent variables can be exploited by pairwise learning using the graph regularized matrix factorization (GRMF) method. However, existing GRMF approaches often use the squared loss to measure the pairwise differences, which may be overly influenced by dissimilar pairs and lead to inferior prediction. To fully empower pairwise learning for matrix completion, we propose a general optimization framework that allows a rich class of (non-)convex pairwise penalty functions. A new and efficient algorithm is developed to solve the proposed optimization problem, with a theoretical convergence guarantee under mild assumptions. In an important situation where the latent variables form a small number of subgroups, its statistical guarantee is also fully considered. In particular, we theoretically characterize the performance of the complexity-regularized maximum likelihood estimator, as a special case of our framework, which is shown to have smaller errors when compared to the standard matrix completion framework without pairwise penalties. We conduct extensive experiments on both synthetic and real datasets to demonstrate the superior performance of this general framework.

preprint2015arXiv

Yeast caspase 1 suppresses the burst of reactive oxygen species and maintains mitochondrial stability in Saccharomyces cerevisiae

Caspases are a family of cysteine proteases that play essential roles during apoptosis, and we presume some of them may also protect the cell from oxidative stress. We found that the absence of yeast caspase 1(Yca1)in Saccharomyces cerevisiae leads to a more intense burst of mitochondrial reactive oxygen species (ROS) In addition, compared to wild type yeast cells, the ability of yca1 mutant cells to maintain mitochondrial activity is significantly reduced after either oxidative stress treatment or aging. During mitochondrial ROS burst, deletion of the yca1 gene delayed structural damage of a green fluorescent protein (GFP) reporter bound in the inner mitochondrial membrane. This work implies that yeast caspase 1 is closely connected to the oxidative stress response. We speculate that Yca1 can discriminate proteins damaged by oxidation and accelerate their hydrolysis to attenuate the ROS burst.

preprint2012arXiv

Delay Asymptotics with Retransmissions and Incremental Redundancy Codes over Erasure Channels

Recent studies have shown that retransmissions can cause heavy-tailed transmission delays even when packet sizes are light-tailed. Moreover, the impact of heavy-tailed delays persists even when packets size are upper bounded. The key question we study in this paper is how the use of coding techniques to transmit information, together with different system configurations, would affect the distribution of delay. To investigate this problem, we model the underlying channel as a Markov modulated binary erasure channel, where transmitted bits are either received successfully or erased. Erasure codes are used to encode information prior to transmission, which ensures that a fixed fraction of the bits in the codeword can lead to successful decoding. We use incremental redundancy codes, where the codeword is divided into codeword trunks and these trunks are transmitted one at a time to provide incremental redundancies to the receiver until the information is recovered. We characterize the distribution of delay under two different scenarios: (I) Decoder uses memory to cache all previously successfully received bits. (II) Decoder does not use memory, where received bits are discarded if the corresponding information cannot be decoded. In both cases, we consider codeword length with infinite and finite support. From a theoretical perspective, our results provide a benchmark to quantify the tradeoff between system complexity and the distribution of delay.

preprint2011arXiv

Secrecy Outage Capacity of Fading Channels

This paper considers point to point secure communication over flat fading channels under an outage constraint. More specifically, we extend the definition of outage capacity to account for the secrecy constraint and obtain sharp characterizations of the corresponding fundamental limits under two different assumptions on the transmitter CSI (Channel state information). First, we find the outage secrecy capacity assuming that the transmitter has perfect knowledge of the legitimate and eavesdropper channel gains. In this scenario, the capacity achieving scheme relies on opportunistically exchanging private keys between the legitimate nodes. These keys are stored in a key buffer and later used to secure delay sensitive data using the Vernam's one time pad technique. We then extend our results to the more practical scenario where the transmitter is assumed to know only the legitimate channel gain. Here, our achievability arguments rely on privacy amplification techniques to generate secret key bits. In the two cases, we also characterize the optimal power control policies which, interestingly, turn out to be a judicious combination of channel inversion and the optimal ergodic strategy. Finally, we analyze the effect of key buffer overflow on the overall outage probability.

Jian Tan

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

CobBO: Coordinate Backoff Bayesian Optimization with Two-Stage Kernels

Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation

LPC-AD: Fast and Accurate Multivariate Time Series Anomaly Detection via Latent Predictive Coding

Towards Dynamic and Safe Configuration Tuning for Cloud Databases

Learning Latent Features with Pairwise Penalties in Low-Rank Matrix Completion

Yeast caspase 1 suppresses the burst of reactive oxygen species and maintains mitochondrial stability in Saccharomyces cerevisiae

Delay Asymptotics with Retransmissions and Incremental Redundancy Codes over Erasure Channels

Secrecy Outage Capacity of Fading Channels