Researcher profile

Kai Hou Yip

Kai Hou Yip contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Adaptive Online Emulation for Accelerating Complex Physical Simulations

Complex physical simulations often require trade-offs between model fidelity and computational feasibility. We introduce Adaptive Online Emulation (AOE), which dynamically learns neural network surrogates during simulation execution to accelerate expensive components. Unlike existing methods requiring extensive offline training, AOE uses Online Sequential Extreme Learning Machines (OS-ELMs) to continuously adapt emulators along the actual simulation trajectory. We employ a numerically stable variant of the OS-ELM using cumulative sufficient statistics to avoid matrix inversion instabilities. AOE integrates with time-stepping frameworks through a three-phase strategy balancing data collection, updates, and surrogate usage, while requiring orders of magnitude less training data than conventional surrogate approaches. Demonstrated on a 1D atmospheric model of exoplanet GJ1214b, AOE achieves 11.1 times speedup (91% time reduction) across 200,000 timesteps while maintaining accuracy, potentially making previously intractable high-fidelity time-stepping simulations computationally feasible.

preprint2023arXiv

ESA-Ariel Data Challenge NeurIPS 2022: Introduction to exo-atmospheric studies and presentation of the Atmospheric Big Challenge (ABC) Database

This is an exciting era for exo-planetary exploration. The recently launched JWST, and other upcoming space missions such as Ariel, Twinkle and ELTs are set to bring fresh insights to the convoluted processes of planetary formation and evolution and its connections to atmospheric compositions. However, with new opportunities come new challenges. The field of exoplanet atmospheres is already struggling with the incoming volume and quality of data, and machine learning (ML) techniques lands itself as a promising alternative. Developing techniques of this kind is an inter-disciplinary task, one that requires domain knowledge of the field, access to relevant tools and expert insights on the capability and limitations of current ML models. These stringent requirements have so far limited the developments of ML in the field to a few isolated initiatives. In this paper, We present the Atmospheric Big Challenge Database (ABC Database), a carefully designed, organised and publicly available database dedicated to the study of the inverse problem in the context of exoplanetary studies. We have generated 105,887 forward models and 26,109 complementary posterior distributions generated with Nested Sampling algorithm. Alongside with the database, this paper provides a jargon-free introduction to non-field experts interested to dive into the intricacy of atmospheric studies. This database forms the basis for a multitude of research directions, including, but not limited to, developing rapid inference techniques, benchmarking model performance and mitigating data drifts. A successful application of this database is demonstrated in the NeurIPS Ariel ML Data Challenge 2022.

preprint2022arXiv

Don't Pay Attention to the Noise: Learning Self-supervised Representations of Light Curves with a Denoising Time Series Transformer

Astrophysical light curves are particularly challenging data objects due to the intensity and variety of noise contaminating them. Yet, despite the astronomical volumes of light curves available, the majority of algorithms used to process them are still operating on a per-sample basis. To remedy this, we propose a simple Transformer model -- called Denoising Time Series Transformer (DTST) -- and show that it excels at removing the noise and outliers in datasets of time series when trained with a masked objective, even when no clean targets are available. Moreover, the use of self-attention enables rich and illustrative queries into the learned representations. We present experiments on real stellar light curves from the Transiting Exoplanet Space Satellite (TESS), showing advantages of our approach compared to traditional denoising techniques.

preprint2022arXiv

ESA-Ariel Data Challenge NeurIPS 2022: Inferring Physical Properties of Exoplanets From Next-Generation Telescopes

The study of extra-solar planets, or simply, exoplanets, planets outside our own Solar System, is fundamentally a grand quest to understand our place in the Universe. Discoveries in the last two decades have re-defined our understanding of planets, and helped us comprehend the uniqueness of our very own Earth. In recent years the focus has shifted from planet detection to planet characterisation, where key planetary properties are inferred from telescope observations using Monte Carlo-based methods. However, the efficiency of sampling-based methodologies is put under strain by the high-resolution observational data from next generation telescopes, such as the James Webb Space Telescope and the Ariel Space Mission. We are delighted to announce the acceptance of the Ariel ML Data Challenge 2022 as part of the NeurIPS competition track. The goal of this challenge is to identify a reliable and scalable method to perform planetary characterisation. Depending on the chosen track, participants are tasked to provide either quartile estimates or the approximate distribution of key planetary properties. To this end, a synthetic spectroscopic dataset has been generated from the official simulators for the ESA Ariel Space Mission. The aims of the competition are three-fold. 1) To offer a challenging application for comparing and advancing conditional density estimation methods. 2) To provide a valuable contribution towards reliable and efficient analysis of spectroscopic data, enabling astronomers to build a better picture of planetary demographics, and 3) To promote the interaction between ML and exoplanetary science. The competition is open from 15th June and will run until early October, participants of all skill levels are more than welcomed!

preprint2021arXiv

Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values

In high-stakes applications of data-driven decision making like healthcare, it is of paramount importance to learn a policy that maximizes the reward while avoiding potentially dangerous actions when there is uncertainty. There are two main challenges usually associated with this problem. Firstly, learning through online exploration is not possible due to the critical nature of such applications. Therefore, we need to resort to observational datasets with no counterfactuals. Secondly, such datasets are usually imperfect, additionally cursed with missing values in the attributes of features. In this paper, we consider the problem of constructing personalized policies using logged data when there are missing values in the attributes of features in both training and test data. The goal is to recommend an action (treatment) when $\Xt$, a degraded version of $\Xb$ with missing values, is observed. We consider three strategies for dealing with missingness. In particular, we introduce the \textit{conservative strategy} where the policy is designed to safely handle the uncertainty due to missingness. In order to implement this strategy we need to estimate posterior distribution $p(\Xb|\Xt)$, we use variational autoencoder to achieve this. In particular, our method is based on partial variational autoencoders (PVAE) which are designed to capture the underlying structure of features with missing values.

preprint2020arXiv

ARES I: WASP-76 b, A Tale of Two HST Spectra

We analyse the transmission and emission spectra of the ultra-hot Jupiter WASP-76b, observed with the G141 grism of the Hubble Space Telescope's Wide Field Camera 3 (WFC3). We reduce and fit the raw data for each observation using the open-source software Iraclis before performing a fully Bayesian retrieval using the publicly available analysis suite TauRex 3. Previous studies of the WFC3 transmission spectra of WASP-76 b found hints of titanium oxide (TiO) and vanadium oxide (VO) or non-grey clouds. Accounting for a fainter stellar companion to WASP-76, we reanalyse this data and show that removing the effects of this background star changes the slope of the spectrum, resulting in these visible absorbers no longer being detected, eliminating the need for a non-grey cloud model to adequately fit the data but maintaining the strong water feature previously seen. However, our analysis of the emission spectrum suggests the presence of TiO and an atmospheric thermal inversion, along with a significant amount of water. Given the brightness of the host star and the size of the atmospheric features, WASP-76 b is an excellent target for further characterisation with HST, or with future facilities, to better understand the nature of its atmosphere, to confirm the presence of TiO and to search for other optical absorbers.

preprint2020arXiv

ARES II: Characterising the Hot Jupiters WASP-127 b, WASP-79 b and WASP-62 b with HST

This paper presents the atmospheric characterisation of three large, gaseous planets: WASP-127b, WASP-79b and WASP-62b. We analysed spectroscopic data obtained with the G141 grism (1.088 - 1.68 $μ$m) of the Wide Field Camera 3 (WFC3) onboard the Hubble Space Telescope (HST) using the Iraclis pipeline and the TauREx3 retrieval code, both of which are publicly available. For WASP-127 b, which is the least dense planet discovered so far and is located in the short-period Neptune desert, our retrieval results found strong water absorption corresponding to an abundance of log(H$_2$O) = -2.71$^{+0.78}_{-1.05}$, and absorption compatible with an iron hydride abundance of log(FeH)=$-5.25^{+0.88}_{-1.10}$, with an extended cloudy atmosphere. We also detected water vapour in the atmospheres of WASP-79 b and WASP-62 b, with best-fit models indicating the presence of iron hydride, too. We used the Atmospheric Detectability Index (ADI) as well as Bayesian log evidence to quantify the strength of the detection and compared our results to the hot Jupiter population study by Tsiaras et al. 2018. While all the planets studied here are suitable targets for characterisation with upcoming facilities such as the James Webb Space Telescope (JWST) and Ariel, WASP-127 b is of particular interest due to its low density, and a thorough atmospheric study would develop our understanding of planet formation and migration.

preprint2020arXiv

ARES III: Unveiling the Two Faces of KELT-7 b with HST WFC3

We present the analysis of the hot-Jupiter KELT-7b using transmission and emission spectroscopy from the Hubble Space Telescope (HST), both taken with the Wide Field Camera 3 (WFC3). Our study uncovers a rich transmission spectrum which is consistent with a cloud-free atmosphere and suggests the presence of H2O and H-. In contrast, the extracted emission spectrum does not contain strong absorption features and, although it is not consistent with a simple blackbody, it can be explained by a varying temperature-pressure profile, collision induced absorption (CIA) and H-. KELT-7 b had also been studied with other space-based instruments and we explore the effects of introducing these additional datasets. Further observations with Hubble, or the next generation of space-based telescopes, are needed to allow for the optical opacity source in transmission to be confirmed and for molecular features to be disentangled in emission.

preprint2020arXiv

Hubble WFC3 Spectroscopy of the Habitable-zone Super-Earth LHS 1140 b

Atmospheric characterisation of temperate, rocky planets is the holy grail of exoplanet studies. These worlds are at the limits of our capabilities with current instrumentation in transmission spectroscopy and challenge our state-of-the-art statistical techniques. Here we present the transmission spectrum of the temperate Super-Earth LHS 1140b using the Hubble Space Telescope (HST). The Wide Field Camera 3 (WFC3) G141 grism data of this habitable zone (T$_{\rm{eq}}$ = 235 K) Super-Earth (R = 1.7 $R_\oplus$), shows tentative evidence of water. However, the signal-to-noise ratio, and thus the significance of the detection, is low and stellar contamination models can cause modulation over the spectral band probed. We attempt to correct for contamination using these models and find that, while many still lead to evidence for water, some could provide reasonable fits to the data without the need for molecular absorption although most of these cause also features in the visible ground-based data which are nonphysical. Future observations with the James Webb Space Telescope (JWST) would be capable of confirming, or refuting, this atmospheric detection.

preprint2020arXiv

Original Research By Young Twinkle Students (ORBYTS): Ephemeris Refinement of Transiting Exoplanets

We report follow-up observations of transiting exoplanets that have either large uncertainties (>10 minutes) in their transit times or have not been observed for over three years. A fully robotic ground-based telescope network, observations from citizen astronomers and data from TESS have been used to study eight planets, refining their ephemeris and orbital data. Such follow-up observations are key for ensuring accurate transit times for upcoming ground and space-based telescopes which may seek to characterise the atmospheres of these planets. We find deviations from the expected transit time for all planets, with transits occurring outside the 1 sigma uncertainties for seven planets. Using the newly acquired observations, we subsequently refine their periods and reduce the current predicted ephemeris uncertainties to 0.28 - 4.01 minutes. A significant portion of this work has been completed by students at two high schools in London as part of the Original Research By Young Twinkle Students (ORBYTS) programme.

preprint2020arXiv

Original Research By Young Twinkle Students (ORBYTS): Ephemeris Refinement of Transiting Exoplanets II

We report follow-up observations of four transiting exoplanets, TRES-2b, HAT-P-22b, HAT-P-36b and XO-2b, as part of the Original Research By Young Twinkle Students (ORBYTS) programme. These observations were taken using the Las Cumbres Observatory Global Telescope Network's (LCOGT) robotic 0.4 m telescopes and were analysed using the HOlomon Photometric Software (HOPS). Such observations are key for ensuring accurate transit times for upcoming telescopes, such as the James Webb Space Telescope (JWST), Twinkle and Ariel, which may seek to characterise the atmospheres of these planets. The data have been uploaded to ExoClock and a significant portion of this work has been completed by secondary school students in London.

preprint2020arXiv

Pushing the Limits of Exoplanet Discovery via Direct Imaging with Deep Learning

Further advances in exoplanet detection and characterisation require sampling a diverse population of extrasolar planets. One technique to detect these distant worlds is through the direct detection of their thermal emission. The so-called direct imaging technique, is suitable for observing young planets far from their star. These are very low signal-to-noise-ratio (SNR) measurements and limited ground truth hinders the use of supervised learning approaches. In this paper, we combine deep generative and discriminative models to bypass the issues arising when directly training on real data. We use a Generative Adversarial Network to obtain a suitable dataset for training Convolutional Neural Network classifiers to detect and locate planets across a wide range of SNRs. Tested on artificial data, our detectors exhibit good predictive performance and robustness across SNRs. To demonstrate the limits of the detectors, we provide maps of the precision and recall of the model per pixel of the input image. On real data, the models can re-confirm bright source detections.