Source author record

Chansoo Lee

Chansoo Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.NT Artificial Intelligence Computer Science and Game Theory Data Structures and Algorithms Distributed, Parallel, and Cluster Computing math.OC

Catalog footprint

What is connected

10works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization

Vizier is the de-facto blackbox and hyperparameter optimization service across Google, having optimized some of Google's largest products and research efforts. To operate at the scale of tuning thousands of users' critical systems, Google Vizier solved key design challenges in providing multiple different features, while remaining fully fault-tolerant. In this paper, we introduce Open Source (OSS) Vizier, a standalone Python-based interface for blackbox optimization and research, based on the Google-internal Vizier infrastructure and framework. OSS Vizier provides an API capable of defining and solving a wide variety of optimization problems, including multi-metric, early stopping, transfer learning, and conditional search. Furthermore, it is designed to be a distributed system that assures reliability, and allows multiple parallel evaluations of the user's objective function. The flexible RPC-based infrastructure allows users to access OSS Vizier from binaries written in any language. OSS Vizier also provides a back-end ("Pythia") API that gives algorithm authors a way to interface new algorithms with the core OSS Vizier system. OSS Vizier is available at https://github.com/google/vizier.

preprint2022arXiv

Pre-training helps Bayesian optimization too

Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs on functions. However, even with expert knowledge, it is not an easy task to select a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.

preprint2022arXiv

Task Selection for AutoML System Evaluation

Our goal is to assess if AutoML system changes - i.e., to the search space or hyperparameter optimization - will improve the final model's performance on production tasks. However, we cannot test the changes on production tasks. Instead, we only have access to limited descriptors about tasks that our AutoML system previously executed, like the number of data points or features. We also have a set of development tasks to test changes, ex., sampled from OpenML with no usage constraints. However, the development and production task distributions are different leading us to pursue changes that only improve development and not production. This paper proposes a method to leverage descriptor information about AutoML production tasks to select a filtered subset of the most relevant development tasks. Empirical studies show that our filtering strategy improves the ability to assess AutoML system changes on holdout tasks with different distributions than development.

preprint2020arXiv

Gradientless Descent: High-Dimensional Zeroth-Order Optimization

Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do not rely on an underlying gradient estimate and are numerically stable. We analyze our algorithm from a novel geometric perspective and present a novel analysis that shows convergence within an $ε$-ball of the optimum in $O(kQ\log(n)\log(R/ε))$ evaluations, for any monotone transform of a smooth and strongly convex objective with latent dimension $k < n$, where the input dimension is $n$, $R$ is the diameter of the input space and $Q$ is the condition number. Our rates are the first of its kind to be both 1) poly-logarithmically dependent on dimensionality and 2) invariant under monotone transformations. We further leverage our geometric perspective to show that our analysis is optimal. Both monotone invariance and its ability to utilize a low latent dimensionality are key to the empirical success of our algorithms, as demonstrated on BBOB and MuJoCo benchmarks.

preprint2016arXiv

Hardness of Online Sleeping Combinatorial Optimization Problems

We show that several online combinatorial optimization problems that admit efficient no-regret algorithms become computationally hard in the sleeping setting where a subset of actions becomes unavailable in each round. Specifically, we show that the sleeping versions of these problems are at least as hard as PAC learning DNF expressions, a long standing open problem. We show hardness for the sleeping versions of Online Shortest Paths, Online Minimum Spanning Tree, Online $k$-Subsets, Online $k$-Truncated Permutations, Online Minimum Cut, and Online Bipartite Matching. The hardness result for the sleeping version of the Online Shortest Paths problem resolves an open problem presented at COLT 2015 (Koolen et al., 2015).

preprint2015arXiv

Fighting Bandits with a New Kind of Smoothness

We define a novel family of algorithms for the adversarial multi-armed bandit problem, and provide a simple analysis technique based on convex smoothing. We prove two main results. First, we show that regularization via the \emph{Tsallis entropy}, which includes EXP3 as a special case, achieves the $Θ(\sqrt{TN})$ minimax regret. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as $O(\sqrt{TN \log N})$ if the perturbation distribution has a bounded hazard rate. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property.

preprint2015arXiv

Spectral Smoothing via Random Matrix Perturbations

We consider stochastic smoothing of spectral functions of matrices using perturbations commonly studied in random matrix theory. We show that a spectral function remains spectral when smoothed using a unitarily invariant perturbation distribution. We then derive state-of-the-art smoothing bounds for the maximum eigenvalue function using the Gaussian Orthogonal Ensemble (GOE). Smoothing the maximum eigenvalue function is important for applications in semidefinite optimization and online learning. As a direct consequence of our GOE smoothing results, we obtain an $O((N \log N)^{1/4} \sqrt{T})$ expected regret bound for the online variance minimization problem using an algorithm that performs only a single maximum eigenvector computation per time step. Here $T$ is the number of rounds and $N$ is the matrix dimension. Our algorithm and its analysis also extend to the more general online PCA problem where the learner has to output a rank $k$ subspace. The algorithm just requires computing $k$ maximum eigenvectors per step and enjoys an $O(k (N \log N)^{1/4} \sqrt{T})$ expected regret bound.

preprint2014arXiv

Cubic Irrationals and Periodicity via a Family of Multi-dimensional Continued Fraction Algorithms

We construct a countable family of multi-dimensional continued fraction algorithms, built out of five specific multidimensional continued fractions, and find a wide class of cubic irrational real numbers a so that either (a, a^2) or (a, a-a^2) is purely periodic with respect to an element in the family. These cubic irrationals seem to be quite natural, as we show that, for every cubic number field, there exists a pair (u,u') with u a unit in the cubic number field (or possibly the quadratic extension of the cubic number field by the square root of the discriminant) such that (u,u') has a periodic multidimensional continued fraction expansion under one of the maps in the family generated by the initial five maps. Thus these results are built on a careful technical analysis of certain units in cubic number fields and our family of multi-dimensional continued fractions. We then recast the linking of cubic irrationals with periodicity to the linking of cubic irrationals with the construction of a matrix with nonnegative integer entries for which at least one row is eventually periodic.

preprint2014arXiv

Online Linear Optimization via Smoothing

We present a new optimization-theoretic approach to analyzing Follow-the-Leader style algorithms, particularly in the setting where perturbations are used as a tool for regularization. We show that adding a strongly convex penalty function to the decision rule and adding stochastic perturbations to data correspond to deterministic and stochastic smoothing operations, respectively. We establish an equivalence between "Follow the Regularized Leader" and "Follow the Perturbed Leader" up to the smoothness properties. This intuition leads to a new generic analysis framework that recovers and improves the previous known regret bounds of the class of algorithms commonly known as Follow the Perturbed Leader.

preprint2012arXiv

A Generalized Family of Multidimensional Continued Fractions: TRIP Maps

Most well-known multidimensional continued fractions, including the Mönkemeyer map and the triangle map, are generated by repeatedly subdividing triangles. This paper constructs a family of multidimensional continued fractions by permuting the vertices of these triangles before and after each subdivision. We obtain an even larger class of multidimensional continued fractions by composing the maps in the family. These include the algorithms of Brun, Parry-Daniels and Güting. We give criteria for when multidimensional continued fractions associate sequences to unique points, which allows us to determine when periodicity of the corresponding multidimensional continued fraction corresponds to pairs of real numbers being cubic irrationals in the same number field.

Chansoo Lee

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization

Pre-training helps Bayesian optimization too

Task Selection for AutoML System Evaluation

Gradientless Descent: High-Dimensional Zeroth-Order Optimization

Hardness of Online Sleeping Combinatorial Optimization Problems

Fighting Bandits with a New Kind of Smoothness

Spectral Smoothing via Random Matrix Perturbations

Cubic Irrationals and Periodicity via a Family of Multi-dimensional Continued Fraction Algorithms

Online Linear Optimization via Smoothing

A Generalized Family of Multidimensional Continued Fractions: TRIP Maps