Source author record

Jinsung Yoon

Jinsung Yoon appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computation and Language Computer Vision Cryptography and Security cs.CY

Catalog footprint

What is connected

13works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LEAF: A Living Benchmark for Event-Augmented Forecasting

Large Language Models (LLMs) are increasingly applied to forecasting. To evaluate this capability while mitigating pre-training data contamination, several living benchmarks have been proposed. However, existing benchmarks either lack the multidimensional events essential for accurate forecasting due to data scarcity, or focus on relatively closed environments. To assess the predictive capabilities of LLMs in complex, real-world scenarios, we propose LEAF, the first living benchmark for event-augmented forecasting tasks, including future event probabilities, trend and time series forecasting. LEAF utilizes a recursive retrieval agent system paired with dual-agent cross-validation to provide comprehensive and relevant auxiliary text for forecasting. Evaluating state-of-the-art proprietary and open-weight LLMs, we find that these models can leverage signals extracted from complex events to enhance predictive performance. In the stock domain, we find that LLMs achieve better performance on equities they confidently identify as more predictable. Furthermore, the events demonstrate a strong correlation with the target equities. To this end, LEAF provides a necessary, dynamically updating testbed to continuously track and drive progress in event-driven forecasting tasks.

preprint2026arXiv

Nexus : An Agentic Framework for Time Series Forecasting

Time series forecasting is not just numerical extrapolation, but often requires reasoning with unstructured contextual data such as news or events. While specialized Time Series Foundation Models (TSFMs) excel at forecasting based on numerical patterns, they remain unaware to real-world textual signals. Conversely, while LLMs are emerging as zero-shot forecasters, their performance remains uneven across domains and contextual grounding. To bridge this gap, we introduce Nexus, a multi-agent forecasting framework that decomposes prediction into specialized stages: isolating macro-level and micro-level temporal fluctuations, and integrating contextual information when available before synthesizing a final forecast. This decomposition enables Nexus to adapt from seasonal signals to volatile, event-driven information without relying on external statistical anchors or monolithic prompting. We show that current-generation LLMs possess substantially stronger intrinsic forecasting ability than previously recognized, depending critically on how numerical and contextual reasoning are organized. Evaluated on data strictly succeeding LLM knowledge cutoffs spanning Zillow real estate metrics and volatile stock market equities, Nexus consistently matches or outperforms state-of-the-art TSFMs and strong LLM baselines. Beyond numerical accuracy, Nexus produces high-quality reasoning traces that explicitly show the fundamental drivers behind each forecast. Our results establish that real-world forecasting is an agentic reasoning problem extending well beyond only sequence modeling.

preprint2026arXiv

Reasoning-Aware Training for Time Series Forecasting

Time Series Foundation Models (TSFMs) excel at numerical forecasting but operate as black boxes lacking qualitative reasoning. Conversely, applying LLMs directly to temporal data introduces a modality gap: text tokenizers fragment continuous numerical values, degrading mathematical relationships and exploding sequence lengths, leading to computational overhead. To resolve this, we introduce STRIDE (Strategic Time-series Reasoning Injected via Distilled Embeddings), a novel framework natively integrating LLM reasoning into the continuous embedding space of TSFMs. Instead of discrete tokens, STRIDE distills reasoning traces into a lightweight LLM, dynamically projecting its mean-pooled hidden states as a cross-modal prior into the target numerical encoder. The architecture is jointly optimized using cross-entropy and quantile losses. Evaluations demonstrate STRIDE establishes state-of-the-art numerical forecasting on GIFT-Eval (0.674 MASE, 0.454 CRPS) compared to TSFMs and exhibits superior in-domain and out-of-domain numerical as well as reasoning performance on TFRBench. Specifically, STRIDE acts as a plug-and-play enhancement, consistently improving diverse TSFMs (e.g., Chronos-2, Timer-S1) across various LLM configurations. Thus, injecting semantic reasoning as a continuous prior equips TSFMs with human-interpretable reasoning while fundamentally improving predictive accuracy.

preprint2022arXiv

Self-supervise, Refine, Repeat: Improving Unsupervised Anomaly Detection

Anomaly detection (AD), separating anomalies from normal data, has many applications across domains, from security to healthcare. While most previous works were shown to be effective for cases with fully or partially labeled data, that setting is in practice less common due to labeling being particularly tedious for this task. In this paper, we focus on fully unsupervised AD, in which the entire training dataset, containing both normal and anomalous samples, is unlabeled. To tackle this problem effectively, we propose to improve the robustness of one-class classification trained on self-supervised representations using a data refinement process. Our proposed data refinement approach is based on an ensemble of one-class classifiers (OCCs), each of which is trained on a disjoint subset of training data. Representations learned by self-supervised learning on the refined data are iteratively updated as the data refinement improves. We demonstrate our method on various unsupervised AD tasks with image and tabular data. With a 10% anomaly ratio on CIFAR-10 image data / 2.5% anomaly ratio on Thyroid tabular data, the proposed method outperforms the state-of-the-art one-class classifier by 6.3 AUC and 12.5 average precision / 22.9 F1-score.

preprint2022arXiv

Towards Group Robustness in the presence of Partial Group Labels

Learning invariant representations is an important requirement when training machine learning models that are driven by spurious correlations in the datasets. These spurious correlations, between input samples and the target labels, wrongly direct the neural network predictions resulting in poor performance on certain groups, especially the minority groups. Robust training against these spurious correlations requires the knowledge of group membership for every sample. Such a requirement is impractical in situations where the data labeling efforts for minority or rare groups are significantly laborious or where the individuals comprising the dataset choose to conceal sensitive information. On the other hand, the presence of such data collection efforts results in datasets that contain partially labeled group information. Recent works have tackled the fully unsupervised scenario where no labels for groups are available. Thus, we aim to fill the missing gap in the literature by tackling a more realistic setting that can leverage partially available sensitive or group information during training. First, we construct a constraint set and derive a high probability bound for the group assignment to belong to the set. Second, we propose an algorithm that optimizes for the worst-off group assignments from the constraint set. Through experiments on image and tabular datasets, we show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.

preprint2021arXiv

6MapNet: Representing soccer players from tracking data by a triplet network

Although the values of individual soccer players have become astronomical, subjective judgments still play a big part in the player analysis. Recently, there have been new attempts to quantitatively grasp players' styles using video-based event stream data. However, they have some limitations in scalability due to high annotation costs and sparsity of event stream data. In this paper, we build a triplet network named 6MapNet that can effectively capture the movement styles of players using in-game GPS data. Without any annotation of soccer-specific actions, we use players' locations and velocities to generate two types of heatmaps. Our subnetworks then map these heatmap pairs into feature vectors whose similarity corresponds to the actual similarity of playing styles. The experimental results show that players can be accurately identified with only a small number of matches by our method.

preprint2021arXiv

Interpretable Sequence Learning for COVID-19 Forecasting

We propose a novel approach that integrates machine learning into compartmental disease modeling to predict the progression of COVID-19. Our model is explainable by design as it explicitly shows how different compartments evolve and it uses interpretable encoders to incorporate covariates and improve performance. Explainability is valuable to ensure that the model's forecasts are credible to epidemiologists and to instill confidence in end-users such as policy makers and healthcare institutions. Our model can be applied at different geographic resolutions, and here we demonstrate it for states and counties in the United States. We show that our model provides more accurate forecasts, in metrics averaged across the entire US, than state-of-the-art alternatives, and that it provides qualitatively meaningful explanatory insights. Lastly, we analyze the performance of our model for different subgroups based on the subgroup distributions within the counties.

preprint2020arXiv

Hide-and-Seek Privacy Challenge

The clinical time-series setting poses a unique combination of challenges to data modeling and sharing. Due to the high dimensionality of clinical time series, adequate de-identification to preserve privacy while retaining data utility is difficult to achieve using common de-identification techniques. An innovative approach to this problem is synthetic data generation. From a technical perspective, a good generative model for time-series data should preserve temporal dynamics, in the sense that new sequences respect the original relationships between high-dimensional variables across time. From the privacy perspective, the model should prevent patient re-identification by limiting vulnerability to membership inference attacks. The NeurIPS 2020 Hide-and-Seek Privacy Challenge is a novel two-tracked competition to simultaneously accelerate progress in tackling both problems. In our head-to-head format, participants in the synthetic data generation track (i.e. "hiders") and the patient re-identification track (i.e. "seekers") are directly pitted against each other by way of a new, high-quality intensive care time-series dataset: the AmsterdamUMCdb dataset. Ultimately, we seek to advance generative techniques for dense and high-dimensional temporal data streams that are (1) clinically meaningful in terms of fidelity and predictivity, as well as (2) capable of minimizing membership privacy risks in terms of the concrete notion of patient re-identification.

preprint2016arXiv

A Semi-Markov Switching Linear Gaussian Model for Censored Physiological Data

Critically ill patients in regular wards are vulnerable to unanticipated clinical dete- rioration which requires timely transfer to the intensive care unit (ICU). To allow for risk scoring and patient monitoring in such a setting, we develop a novel Semi- Markov Switching Linear Gaussian Model (SSLGM) for the inpatients' physiol- ogy. The model captures the patients' latent clinical states and their corresponding observable lab tests and vital signs. We present an efficient unsupervised learn- ing algorithm that capitalizes on the informatively censored data in the electronic health records (EHR) to learn the parameters of the SSLGM; the learned model is then used to assess the new inpatients' risk for clinical deterioration in an online fashion, allowing for timely ICU admission. Experiments conducted on a het- erogeneous cohort of 6,094 patients admitted to a large academic medical center show that the proposed model significantly outperforms the currently deployed risk scores such as Rothman index, MEWS, SOFA and APACHE.

preprint2016arXiv

Adaptive Ensemble Learning with Confidence Bounds

Extracting actionable intelligence from distributed, heterogeneous, correlated and high-dimensional data sources requires run-time processing and learning both locally and globally. In the last decade, a large number of meta-learning techniques have been proposed in which local learners make online predictions based on their locally-collected data instances, and feed these predictions to an ensemble learner, which fuses them and issues a global prediction. However, most of these works do not provide performance guarantees or, when they do, these guarantees are asymptotic. None of these existing works provide confidence estimates about the issued predictions or rate of learning guarantees for the ensemble learner. In this paper, we provide a systematic ensemble learning method called Hedged Bandits, which comes with both long run (asymptotic) and short run (rate of learning) performance guarantees. Moreover, our approach yields performance guarantees with respect to the optimal local prediction strategy, and is also able to adapt its predictions in a data-driven manner. We illustrate the performance of Hedged Bandits in the context of medical informatics and show that it outperforms numerous online and offline ensemble learning methods.

preprint2016arXiv

Personalized Donor-Recipient Matching for Organ Transplantation

Organ transplants can improve the life expectancy and quality of life for the recipient but carries the risk of serious post-operative complications, such as septic shock and organ rejection. The probability of a successful transplant depends in a very subtle fashion on compatibility between the donor and the recipient but current medical practice is short of domain knowledge regarding the complex nature of recipient-donor compatibility. Hence a data-driven approach for learning compatibility has the potential for significant improvements in match quality. This paper proposes a novel system (ConfidentMatch) that is trained using data from electronic health records. ConfidentMatch predicts the success of an organ transplant (in terms of the 3 year survival rates) on the basis of clinical and demographic traits of the donor and recipient. ConfidentMatch captures the heterogeneity of the donor and recipient traits by optimally dividing the feature space into clusters and constructing different optimal predictive models to each cluster. The system controls the complexity of the learned predictive model in a way that allows for assuring more granular and confident predictions for a larger number of potential recipient-donor pairs, thereby ensuring that predictions are "personalized" and tailored to individual characteristics to the finest possible granularity. Experiments conducted on the UNOS heart transplant dataset show the superiority of the prognostic value of ConfidentMatch to other competing benchmarks; ConfidentMatch can provide predictions of success with 95% confidence for 5,489 patients of a total population of 9,620 patients, which corresponds to 410 more patients than the most competitive benchmark algorithm (DeepBoost).

preprint2016arXiv

Personalized Risk Scoring for Critical Care Patients using Mixtures of Gaussian Process Experts

We develop a personalized real time risk scoring algorithm that provides timely and granular assessments for the clinical acuity of ward patients based on their (temporal) lab tests and vital signs. Heterogeneity of the patients population is captured via a hierarchical latent class model. The proposed algorithm aims to discover the number of latent classes in the patients population, and train a mixture of Gaussian Process (GP) experts, where each expert models the physiological data streams associated with a specific class. Self-taught transfer learning is used to transfer the knowledge of latent classes learned from the domain of clinically stable patients to the domain of clinically deteriorating patients. For new patients, the posterior beliefs of all GP experts about the patient's clinical status given her physiological data stream are computed, and a personalized risk score is evaluated as a weighted average of those beliefs, where the weights are learned from the patient's hospital admission information. Experiments on a heterogeneous cohort of 6,313 patients admitted to Ronald Regan UCLA medical center show that our risk score outperforms the currently deployed risk scores, such as MEWS and Rothman scores.

preprint2016arXiv

Personalized Risk Scoring for Critical Care Prognosis using Mixtures of Gaussian Processes

Objective: In this paper, we develop a personalized real-time risk scoring algorithm that provides timely and granular assessments for the clinical acuity of ward patients based on their (temporal) lab tests and vital signs; the proposed risk scoring system ensures timely intensive care unit (ICU) admissions for clinically deteriorating patients. Methods: The risk scoring system learns a set of latent patient subtypes from the offline electronic health record data, and trains a mixture of Gaussian Process (GP) experts, where each expert models the physiological data streams associated with a specific patient subtype. Transfer learning techniques are used to learn the relationship between a patient's latent subtype and her static admission information (e.g. age, gender, transfer status, ICD-9 codes, etc). Results: Experiments conducted on data from a heterogeneous cohort of 6,321 patients admitted to Ronald Reagan UCLA medical center show that our risk score significantly and consistently outperforms the currently deployed risk scores, such as the Rothman index, MEWS, APACHE and SOFA scores, in terms of timeliness, true positive rate (TPR), and positive predictive value (PPV). Conclusion: Our results reflect the importance of adopting the concepts of personalized medicine in critical care settings; significant accuracy and timeliness gains can be achieved by accounting for the patients' heterogeneity. Significance: The proposed risk scoring methodology can confer huge clinical and social benefits on more than 200,000 critically ill inpatient who exhibit cardiac arrests in the US every year.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint