Source author record

Jae Sung Park

Jae Sung Park appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision physics.flu-dyn Robotics Computation and Language Machine Learning cond-mat.soft eess.SP math.DS nlin.PS

Catalog footprint

What is connected

12works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition

Videos are unique in their ability to capture actions which transcend multiple frames. Accordingly, for many years action recognition was the quintessential task for video understanding. Unfortunately, due to a lack of sufficiently diverse and challenging data, modern vision-language models (VLMs) are no longer evaluated on their action recognition capabilities. To revitalize action recognition in the era of VLMs, we advocate for a returned focus on domain-specific actions. To this end, we introduce VideoNet, a domain-specific action recognition benchmark covering 1,000 distinct actions from 37 domains. We begin with a multiple-choice evaluation setting, where the difference between closed and open models is stark: Gemini 3.1 Pro attains 69.9% accuracy while Qwen3-VL-8B gets a mere 45.0%. To understand why VLMs struggle on VideoNet, we relax the questions into a binary setting, where random chance is 50%. Still, Qwen achieves only 59.2% accuracy. Further relaxing the evaluation setup, we provide $k\in\{1,2,3\}$ in-context examples of the action. Some models excel in the few-shot setting, while others falter; Qwen improves $+7.0\%$, while Gemini declines $-4.8\%$. Notably, these gains fall short of the $+13.6\%$ improvement in non-expert humans when given few-shot examples. Finding that VLMs struggle to fully exploit in-context examples, we shift from test-time improvements to the training side. We collect the first large-scale training dataset for domain-specific actions, totaling nearly 500k video question-answer pairs. Fine-tuning a Molmo2-4B model on our data, we surpass all open-weight 8B models on the VideoNet benchmark.

preprint2022arXiv

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

Humans have remarkable capacity to reason abductively and hypothesize about what lies beyond the literal content of an image. By identifying concrete visual clues scattered throughout a scene, we almost can't help but draw probable inferences beyond the literal scene based on our everyday experience and knowledge about the world. For example, if we see a "20 mph" sign alongside a road, we might assume the street sits in a residential area (rather than on a highway), even if no houses are pictured. Can machines perform similar visual reasoning? We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents. We adopt a free-viewing paradigm: participants first observe and identify salient clues within images (e.g., objects, actions) and then provide a plausible inference about the scene, given the clue. In total, we collect 363K (clue, inference) pairs, which form a first-of-its-kind abductive visual reasoning dataset. Using our corpus, we test three complementary axes of abductive reasoning. We evaluate the capacity of models to: i) retrieve relevant inferences from a large candidate corpus; ii) localize evidence for inferences via bounding boxes, and iii) compare plausible inferences to match human judgments on a newly-collected diagnostic corpus of 19K Likert-scale judgments. While we find that fine-tuning CLIP-RN50x64 with a multitask objective outperforms strong baselines, significant headroom exists between model performance and human agreement. Data, models, and leaderboard available at http://visualabduction.com/

preprint2021arXiv

Exact coherent structures and phase space geometry of pre-turbulent 2D active nematic channel flow

Confined active nematics exhibit rich dynamical behavior, including spontaneous flows, periodic defect dynamics, and chaotic `active turbulence'. Here, we study these phenomena using the framework of Exact Coherent Structures, which has been successful in characterizing the routes to high Reynolds number turbulence of passive fluids. Exact Coherent Structures are stationary, periodic, quasiperiodic, or traveling wave solutions of the hydrodynamic equations that, together with their invariant manifolds, serve as an organizing template of the dynamics. We compute the dominant Exact Coherent Structures and connecting orbits in a pre-turbulent active nematic channel flow, which enables a fully nonlinear but highly reduced order description in terms of a directed graph. Using this reduced representation, we compute instantaneous perturbations that switch the system between disparate spatiotemporal states occupying distant regions of the infinite dimensional phase space. Our results lay the groundwork for a systematic means of understanding and controlling active nematic flows in the moderate to high activity regime.

preprint2020arXiv

Dynamics of laminar and transitional flows over slip surfaces: effects on the laminar-turbulent separatrix

The effect of slip surfaces on the laminar-turbulent separatrix of plane Poiseuille flow is studied by direct numerical simulation. Turbulence lifetimes, the likelihood that turbulence is sustained, is investigated for transitional flows with various slip lengths. Slip surfaces decrease the likelihood of sustained turbulence compared to the no-slip case, and likelihood is further decreased as slip length is increased. A deterministic analysis of the effects of slip surfaces on transition to turbulence is performed using nonlinear traveling wave solutions to the Navier-Stokes equations, also known as exact coherent solutions. Two solution families, dubbed P3 and P4, are used since their lower-branch solutions are embedded on the boundary of the basin of attraction of laminar and turbulent flows (Park & Graham 2015). Additionally, they exhibit distinct flow structures -- the P3 and P4 are denoted as core mode and critical layer mode, respectively. Distinct effects of slip surfaces on the solutions are observed by the skin friction evolution, linear growth rate, and phase-space projection of transitional trajectories. The slip surface modifies transition dynamics little for the core mode, but considerably for the critical layer mode. Most importantly, the slip surface promotes different transition dynamics -- early and bypass-like transition for the core mode and delayed and H-/K-type-like transition for the critical layer mode. Based on spatiotemporal and quadrant analyses, it is found that slip surfaces promote the prevalence of strong wall-toward motions (sweep-like events) near vortex cores close to the channel centre, inducing an early transition, while sustained ejection events are present in the region of the $Λ$-shaped vortex cores close to the critical layer, resulting in a delayed transition.

preprint2020arXiv

HMPO: Human Motion Prediction in Occluded Environments for Safe Motion Planning

We present a novel approach to generate collision-free trajectories for a robot operating in close proximity with a human obstacle in an occluded environment. The self-occlusions of the robot can significantly reduce the accuracy of human motion prediction, and we present a novel deep learning-based prediction algorithm. Our formulation uses CNNs and LSTMs and we augment human-action datasets with synthetically generated occlusion information for training. We also present an occlusion-aware planner that uses our motion prediction algorithm to compute collision-free trajectories. We highlight performance of the overall approach (HMPO) in complex scenarios and observe upto 68% performance improvement in motion prediction accuracy, and 38% improvement in terms of error distance between the ground-truth and the predicted human joint positions.

preprint2020arXiv

Identity-Aware Multi-Sentence Video Description

Standard video and movie description tasks abstract away from person identities, thus failing to link identities across sentences. We propose a multi-sentence Identity-Aware Video Description task, which overcomes this limitation and requires to re-identify persons locally within a set of consecutive clips. We introduce an auxiliary task of Fill-in the Identity, that aims to predict persons' IDs consistently within a set of clips, when the video descriptions are given. Our proposed approach to this task leverages a Transformer architecture allowing for coherent joint prediction of multiple IDs. One of the key components is a gender-aware textual representation as well an additional gender prediction objective in the main model. This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description. We first generate multi-sentence video descriptions, and then apply our Fill-in the Identity model to establish links between the predicted person entities. To be able to tackle both tasks, we augment the Large Scale Movie Description Challenge (LSMDC) benchmark with new annotations suited for our problem statement. Experiments show that our proposed Fill-in the Identity model is superior to several baselines and recent works, and allows us to generate descriptions with locally re-identified people.

preprint2020arXiv

LSTM-based Anomaly Detection for Non-linear Dynamical System

Anomaly detection for non-linear dynamical system plays an important role in ensuring the system stability. However, it is usually complex and has to be solved by large-scale simulation which requires extensive computing resources. In this paper, we propose a novel anomaly detection scheme in non-linear dynamical system based on Long Short-Term Memory (LSTM) to capture complex temporal changes of the time sequence and make multi-step predictions. Specifically, we first present the framework of LSTM-based anomaly detection in non-linear dynamical system, including data preprocessing, multi-step prediction and anomaly detection. According to the prediction requirement, two types of training modes are explored in multi-step prediction, where samples in a wall shear stress dataset are collected by an adaptive sliding window. On the basis of the multi-step prediction result, a Local Average with Adaptive Parameters (LAAP) algorithm is proposed to extract local numerical features of the time sequence and estimate the upcoming anomaly. The experimental results show that our proposed multi-step prediction method can achieve a higher prediction accuracy than traditional method in wall shear stress dataset, and the LAAP algorithm performs better than the absolute value-based method in anomaly detection task.

preprint2020arXiv

VisualCOMET: Reasoning about the Dynamic Context of a Still Image

Even from a single frame of a still image, people can reason about the dynamic story of the image before, after, and beyond the frame. For example, given an image of a man struggling to stay afloat in water, we can reason that the man fell into the water sometime in the past, the intent of that man at the moment is to stay alive, and he will need help in the near future or else he will get washed away. We propose VisualComet, the novel framework of visual commonsense reasoning tasks to predict events that might have happened before, events that might happen next, and the intents of the people at present. To support research toward visual commonsense reasoning, we introduce the first large-scale repository of Visual Commonsense Graphs that consists of over 1.4 million textual descriptions of visual commonsense inferences carefully annotated over a diverse set of 60,000 images, each paired with short video summaries of before and after. In addition, we provide person-grounding (i.e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text. We establish strong baseline performances on this task and demonstrate that integration between visual and textual commonsense reasoning is the key and wins over non-integrative alternatives.

preprint2016arXiv

Efficient Probabilistic Collision Detection for Non-Convex Shapes

We present new algorithms to perform fast probabilistic collision queries between convex as well as non-convex objects. Our approach is applicable to general shapes, where one or more objects are represented using Gaussian probability distributions. We present a fast new algorithm for a pair of convex objects, and extend the approach to non-convex models using hierarchical representations. We highlight the performance of our algorithms with various convex and non-convex shapes on complex synthetic benchmarks and trajectory planning benchmarks for a 7-DOF Fetch robot arm.

preprint2016arXiv

Fast and Bounded Probabilistic Collision Detection in Dynamic Environments for High-DOF Trajectory Planning

We present a novel approach to perform probabilistic collision detection between a high-DOF robot and high-DOF obstacles in dynamic, uncertain environments. In dynamic environments with a high-DOF robot and moving obstacles, our approach efficiently computes accurate collision probability between the robot and obstacles with upper error bounds. Furthermore, we describe a prediction algorithm for future obstacle position and motion that accounts for both spatial and temporal uncertainties. We present a trajectory optimization algorithm for high-DOF robots in dynamic, uncertain environments based on probabilistic collision detection. We highlight motion planning performance in challenging scenarios with robot arms operating in environments with dynamically moving human obstacles.

preprint2016arXiv

Low-dimensional representations of exact coherent states of the Navier-Stokes equations from the resolvent model of wall turbulence

We report that many exact invariant solutions of the Navier-Stokes equations for both pipe and channel flows are well represented by just few modes of the model of McKeon & Sharma J. Fl. Mech. 658, 356 (2010). This model provides modes that act as a basis to decompose the velocity field, ordered by their amplitude of response to forcing arising from the interaction between scales. The model was originally derived from the Navier-Stokes equations to represent turbulent flows and has been used to explain coherent structure and to predict turbulent statistics. This establishes a surprising new link between the two distinct approaches to understanding turbulence.

preprint2015arXiv

Exact coherent states and connections to turbulent dynamics in minimal channel flow

Several new families of nonlinear three-dimensional travelling wave solutions to the Navier-Stokes equation, also known as exact coherent states, are computed for Newtonian plane Poiseuille flow. The symmetries and streak/vortex structures are reported and their possible connections to critical layer dynamics examined. While some of the solutions clearly display fluctuations that are localized around the critical layer (the surface on which the streamwise velocity matches the wave speed of the solution), for others this connection is not as clear. Dynamical trajectories along unstable directions of the solutions are computed. Over certain ranges of Reynolds number, two solution families are shown to lie on the basin boundary between laminar and turbulent flow. Direct comparison of nonlinear travelling wave solutions to turbulent flow in the same channel is presented. The state-space dynamics of the turbulent flow are organized around one of the newly-identified travelling wave families, and in particular the lower branch solutions of this family are closely approached during transient excursions away from the dominant behaviour. These observations provide a firm dynamical-systems foundation for prior observations that minimal channel turbulence displays time intervals of "active" turbulence punctuated by brief periods of "hibernation" (see e.g. Xi, L. and Graham, M. D., Phys. Rev. Lett., 104, 218301 (2010)). The hibernating intervals are approaches to lower branch nonlinear travelling waves. Representing these solutions on a Prandtl-von Karman plot illustrates how their bulk flow properties are related to those of Newtonian turbulence as well as the universal asymptotic state called maximum drag reduction (MDR) found in viscoelastic turbulent flow.

Jae Sung Park

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

Exact coherent structures and phase space geometry of pre-turbulent 2D active nematic channel flow

Dynamics of laminar and transitional flows over slip surfaces: effects on the laminar-turbulent separatrix

HMPO: Human Motion Prediction in Occluded Environments for Safe Motion Planning

Identity-Aware Multi-Sentence Video Description

LSTM-based Anomaly Detection for Non-linear Dynamical System

VisualCOMET: Reasoning about the Dynamic Context of a Still Image

Efficient Probabilistic Collision Detection for Non-Convex Shapes

Fast and Bounded Probabilistic Collision Detection in Dynamic Environments for High-DOF Trajectory Planning

Low-dimensional representations of exact coherent states of the Navier-Stokes equations from the resolvent model of wall turbulence

Exact coherent states and connections to turbulent dynamics in minimal channel flow