Researcher profile

Donggyu Kim

Donggyu Kim contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

High-Dimensional Time-Varying Coefficient Estimation in Diffusion Models

In this paper, we develop a novel high-dimensional time-varying coefficient estimation method, based on high-dimensional Itô diffusion processes. To account for high-dimensional time-varying coefficients, we first estimate local (or instantaneous) coefficients using a time-localized Dantzig selection scheme under a sparsity condition, which results in biased local coefficient estimators due to the regularization. To handle the bias, we propose a debiasing scheme, which provides well-performing unbiased local coefficient estimators. With the unbiased local coefficient estimators, we estimate the integrated coefficient, and to further account for the sparsity of the coefficient process, we apply thresholding schemes. We call this Thresholding dEbiased Dantzig (TED). We establish asymptotic properties of the proposed TED estimator. In the empirical analysis, TED achieves a higher average out-of-sample $R^2$ across assets than benchmark estimators in most periods. Industry-related factors play a central role in explaining asset returns. The estimated integrated coefficients show pronounced time variation associated with firm-specific events and seasonal patterns.

preprint2022arXiv

Benchmark Dataset for Precipitation Forecasting by Post-Processing the Numerical Weather Prediction

Precipitation forecasting is an important scientific challenge that has wide-reaching impacts on society. Historically, this challenge has been tackled using numerical weather prediction (NWP) models, grounded on physics-based simulations. Recently, many works have proposed an alternative approach, using end-to-end deep learning (DL) models to replace physics-based NWP models. While these DL methods show improved performance and computational efficiency, they exhibit limitations in long-term forecasting and lack the explainability. In this work, we present a hybrid NWP-DL workflow to fill the gap between standalone NWP and DL approaches. Under this workflow, the outputs of NWP models are fed into a deep neural network, which post-processes the data to yield a refined precipitation forecast. The deep model is trained with supervision, using Automatic Weather Station (AWS) observations as ground-truth labels. This can achieve the best of both worlds, and can even benefit from future improvements in NWP technology. To facilitate study in this direction, we present a novel dataset focused on the Korean Peninsula, termed KoMet (Korea Meteorological Dataset), comprised of NWP outputs and AWS observations. For the NWP model, the Global Data Assimilation and Prediction Systems-Korea Integrated Model (GDAPS-KIM) is utilized. We provide analysis on a comprehensive set of baseline methods aimed at addressing the challenges of KoMet, including the sparsity of AWS observations and class imbalance. To lower the barrier to entry and encourage further study, we also provide an extensive open-source Python package for data processing and model development. Our benchmark data and code are available at https://github.com/osilab-kaist/KoMet-Benchmark-Dataset.

preprint2022arXiv

Dynamic Realized Beta Models Using Robust Realized Integrated Beta Estimator

This paper introduces a unified parametric modeling approach for time-varying market betas that can accommodate continuous-time diffusion and discrete-time series models based on a continuous-time series regression model to better capture the dynamic evolution of market betas. We call this the dynamic realized beta (DR Beta). We first develop a non-parametric realized integrated beta estimator using high-frequency financial data contaminated by microstructure noises, which is robust to the stylized features, such as the time-varying beta and the dependence structure of microstructure noises, and construct the estimator's asymptotic properties. Then, with the robust realized integrated beta estimator, we propose a quasi-likelihood procedure for estimating the model parameters based on the combined high-frequency data and low frequency dynamic structure. We also establish asymptotic theorems for the proposed estimator and conduct a simulation study to check the performance of finite samples of the estimator. The empirical study with the S&P 500 index and the top 50 large trading volume stocks from the S&P 500 illustrates that the proposed DR Beta model effectively accounts for dynamics in the market beta of individual stocks and better predicts future market betas.

preprint2022arXiv

Next Generation Models for Portfolio Risk Management: An Approach Using Financial Big Data

This paper proposes a dynamic process of portfolio risk measurement to address potential information loss. The proposed model takes advantage of financial big data to incorporate out-of-target-portfolio information that may be missed when one considers the Value at Risk (VaR) measures only from certain assets of the portfolio. We investigate how the curse of dimensionality can be overcome in the use of financial big data and discuss where and when benefits occur from a large number of assets. In this regard, the proposed approach is the first to suggest the use of financial big data to improve the accuracy of risk analysis. We compare the proposed model with benchmark approaches and empirically show that the use of financial big data improves small portfolio risk analysis. Our findings are useful for portfolio managers and financial regulators, who may seek for an innovation to improve the accuracy of portfolio risk estimation.

preprint2022arXiv

Overnight GARCH-Itô Volatility Models

Various parametric volatility models for financial data have been developed to incorporate high-frequency realized volatilities and better capture market dynamics. However, because high-frequency trading data are not available during the close-to-open period, the volatility models often ignore volatility information over the close-to-open period and thus may suffer from loss of important information relevant to market dynamics. In this paper, to account for whole-day market dynamics, we propose an overnight volatility model based on Itô diffusions to accommodate two different instantaneous volatility processes for the open-to-close and close-to-open periods. We develop a weighted least squares method to estimate model parameters for two different periods and investigate its asymptotic properties. We conduct a simulation study to check the finite sample performance of the proposed model and method. Finally, we apply the proposed approaches to real trading data.

preprint2022arXiv

Volatility Models for Stylized Facts of High-Frequency Financial Data

This paper introduces novel volatility diffusion models to account for the stylized facts of high-frequency financial data such as volatility clustering, intra-day U-shape, and leverage effect. For example, the daily integrated volatility of the proposed volatility process has a realized GARCH structure with an asymmetric effect on log-returns. To further explain the heavy-tailedness of the financial data, we assume that the log-returns have a finite $2b$-th moment for $b \in (1,2]$. Then, we propose a Huber regression estimator which has an optimal convergence rate of $n^{(1-b)/b}$. We also discuss how to adjust bias coming from Huber loss and show its asymptotic properties.

preprint2021arXiv

State Heterogeneity Analysis of Financial Volatility Using High-Frequency Financial Data

Recently, to account for low-frequency market dynamics, several volatility models, employing high-frequency financial data, have been developed. However, in financial markets, we often observe that financial volatility processes depend on economic states, so they have a state heterogeneous structure. In this paper, to study state heterogeneous market dynamics based on high-frequency data, we introduce a novel volatility model based on a continuous Ito diffusion process whose intraday instantaneous volatility process evolves depending on the exogenous state variable, as well as its integrated volatility. We call it the state heterogeneous GARCH-Ito (SG-Ito) model. We suggest a quasi-likelihood estimation procedure with the realized volatility proxy and establish its asymptotic behaviors. Moreover, to test the low-frequency state heterogeneity, we develop a Wald test-type hypothesis testing procedure. The results of empirical studies suggest the existence of leverage, investor attention, market illiquidity, stock market comovement, and post-holiday effect in S&P 500 index volatility.

preprint2021arXiv

Statistical Analysis of Quantum Annealing

Quantum computers use quantum resources to carry out computational tasks and may outperform classical computers in solving certain computational problems. Special-purpose quantum computers such as quantum annealers employ quantum adiabatic theorem to solve combinatorial optimization problems. In this paper, we compare classical annealings such as simulated annealing and quantum annealings that are done by the D-Wave machines both theoretically and numerically. We show that if the classical and quantum annealing are characterized by equivalent Ising models, then solving an optimization problem, i.e., finding the minimal energy of each Ising model, by the two annealing procedures, are mathematically identical. For quantum annealing, we also derive the probability lower-bound on successfully solving an optimization problem by measuring the system at the end of the annealing procedure. Moreover, we present the Markov chain Monte Carlo (MCMC) method to realize quantum annealing by classical computers and investigate its statistical properties. In the numerical section, we discuss the discrepancies between the MCMC based annealing approaches and the quantum annealing approach in solving optimization problems.

preprint2020arXiv

How and When the Cassie-Baxter Droplet Starts to Slide on the Textured Surfaces

The Cassie-Baxter state droplet has many local energy minima on the textured surface, while the amount of the energy barrier between them can be affected by the gravity. When the droplet cannot find any local energy minimum point on the surface, the droplet starts to slide. Based on the Laplace pressure equation, the shape of a two-dimensional Cassie-Baxter droplet on a textured surface is predicted. Then the stability of the droplet is examined by considering the interference between the liquid and the surface microstructure as well as analyzing the free energy change upon the de-pinning. Afterward, the theoretical analysis is validated against the line-tension based front tracking method simulation (LTM), that seamlessly captures the attachment and detachment between the liquid and the substrate. We answer to the open debates on the sliding research field: (i) Whether the sliding initiates with the front end slip or the rear end slip, and (ii) whether the advancing and receding contact angles measured on the horizontal surface are comparable with the front and rear contact angle of the droplet at the onset of sliding. Additionally, a new droplet translation mechanism promoted by cycle of condensation and evaporation is suggested.

preprint2020arXiv

Unified Discrete-Time Factor Stochastic Volatility and Continuous-Time Ito Models for Combining Inference Based on Low-Frequency and High-Frequency

This paper introduces unified models for high-dimensional factor-based Ito process, which can accommodate both continuous-time Ito diffusion and discrete-time stochastic volatility (SV) models by embedding the discrete SV model in the continuous instantaneous factor volatility process. We call it the SV-Ito model. Based on the series of daily integrated factor volatility matrix estimators, we propose quasi-maximum likelihood and least squares estimation methods. Their asymptotic properties are established. We apply the proposed method to predict future vast volatility matrix whose asymptotic behaviors are studied. A simulation study is conducted to check the finite sample performance of the proposed estimation and prediction method. An empirical analysis is carried out to demonstrate the advantage of the SV-Ito model in volatility prediction and portfolio allocation problems.

preprint2020arXiv

Volatility Analysis with Realized GARCH-Ito Models

This paper introduces a unified approach for modeling high-frequency financial data that can accommodate both the continuous-time jump-diffusion and discrete-time realized GARCH model by embedding the discrete realized GARCH structure in the continuous instantaneous volatility process. The key feature of the proposed model is that the corresponding conditional daily integrated volatility adopts an autoregressive structure where both integrated volatility and jump variation serve as innovations. We name it as the realized GARCH-Ito model. Given the autoregressive structure in the conditional daily integrated volatility, we propose a quasi-likelihood function for parameter estimation and establish its asymptotic properties. To improve the parameter estimation, we propose a joint quasi-likelihood function that is built on the marriage of daily integrated volatility estimated by high-frequency data and nonparametric volatility estimator obtained from option data. We conduct a simulation study to check the finite sample performance of the proposed methodologies and an empirical study with the S&P500 stock index and option data.