Researcher profile

Yi Han

Yi Han contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Adaptive Causal Coordination Detection for Social Media: A Memory-Guided Framework with Semi-Supervised Learning

Detecting coordinated inauthentic behavior on social media remains a critical and persistent challenge, as most existing approaches rely on superficial correlation analysis, employ static parameter settings, and demand extensive and labor-intensive manual annotation. To address these limitations systematically, we propose the Adaptive Causal Coordination Detection (ACCD) framework. ACCD adopts a three-stage, progressive architecture that leverages a memory-guided adaptive mechanism to dynamically learn and retain optimal detection configurations for diverse coordination scenarios. Specifically, in the first stage, ACCD introduces an adaptive Convergent Cross Mapping (CCM) technique to deeply identify genuine causal relationships between accounts. The second stage integrates active learning with uncertainty sampling within a semi-supervised classification scheme, significantly reducing the burden of manual labeling. The third stage deploys an automated validation module driven by historical detection experience, enabling self-verification and optimization of the detection outcomes. We conduct a comprehensive evaluation using real-world datasets, including the Twitter IRA dataset, Reddit coordination traces, and several widely-adopted bot detection benchmarks. Experimental results demonstrate that ACCD achieves an F1-score of 87.3\% in coordinated attack detection, representing a 15.2\% improvement over the strongest existing baseline. Furthermore, the system reduces manual annotation requirements by 68\% and achieves a 2.8x speedup in processing through hierarchical clustering optimization. In summary, ACCD provides a more accurate, efficient, and highly automated end-to-end solution for identifying coordinated behavior on social platforms, offering substantial practical value and promising potential for broad application.

preprint2026arXiv

CARLA-Round: A Multi-Factor Simulation Dataset for Roundabout Trajectory Prediction

Accurate trajectory prediction of vehicles at roundabouts is critical for reducing traffic accidents, yet it remains highly challenging due to their circular road geometry, continuous merging and yielding interactions, and absence of traffic signals. Developing accurate prediction algorithms relies on reliable, multimodal, and realistic datasets; however, such datasets for roundabout scenarios are scarce, as real-world data collection is often limited by incomplete observations and entangled factors that are difficult to isolate. We present CARLA-Round, a systematically designed simulation dataset for roundabout trajectory prediction. The dataset varies weather conditions (five types) and traffic density levels (spanning Level-of-Service A-E) in a structured manner, resulting in 25 controlled scenarios. Each scenario incorporates realistic mixtures of driving behaviors and provides explicit annotations that are largely absent from existing datasets. Unlike randomly sampled simulation data, this structured design enables precise analysis of how different conditions influence trajectory prediction performance. Validation experiments using standard baselines (LSTM, GCN, GRU+GCN) reveal traffic density dominates prediction difficulty with strong monotonic effects, while weather shows non-linear impacts. The best model achieves 0.312m ADE on real-world rounD dataset, demonstrating effective sim-to-real transfer. This systematic approach quantifies factor impacts impossible to isolate in confounded real-world datasets. Our CARLA-Round dataset is available at https://github.com/Rebecca689/CARLA-Round.

preprint2026arXiv

Herculean: An Agentic Benchmark for Financial Intelligence

As AI agents improve, the central question is no longer whether they can solve isolated well-defined financial tasks, but whether they can reliably carry out financial professional work. Existing financial benchmarks offer only a partial view of this ability, as they primarily evaluate static competencies such as question answering, retrieval, summarization, and classification. We introduce Herculean, the first skilled benchmark for agentic financial intelligence spanning four representative workflows, including Trading, Hedging, Market Insights, and Auditing. Each workflow is instantiated as a standardized MCP-based skill environment with its own tools, interaction dynamics, constraints, and success criteria, enabling consistent end-to-end assessment of heterogeneous agent systems. Across frontier agents, we find agents perform relatively well on Trading and Market Insights, but struggle substantially on Hedging and Auditing, where long-horizon coordination, state consistency, and structured verification are critical. Overall, our results point to a key gap in current agents in turning financial reasoning into dependable workflow execution in high-stakes financial workflows.

preprint2026arXiv

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Spatial referring is a fundamental capability of embodied robots to interact with the 3D physical world. However, even with the powerful pretrained vision language models (VLMs), recent approaches are still not qualified to accurately understand the complex 3D scenes and dynamically reason about the instruction-indicated locations for interaction. To this end, we propose RoboRefer, a 3D-aware VLM that can first achieve precise spatial understanding by integrating a disentangled but dedicated depth encoder via supervised fine-tuning (SFT). Moreover, RoboRefer advances generalized multi-step spatial reasoning via reinforcement fine-tuning (RFT), with metric-sensitive process reward functions tailored for spatial referring tasks. To support SFT and RFT training, we introduce RefSpatial, a large-scale dataset of 20M QA pairs (2x prior), covering 31 spatial relations (vs. 15 prior) and supporting complex reasoning processes (up to 5 steps). In addition, we introduce RefSpatial-Bench, a challenging benchmark filling the gap in evaluating spatial referring with multi-step reasoning. Experiments show that SFT-trained RoboRefer achieves state-of-the-art spatial understanding, with an average success rate of 89.6%. RFT-trained RoboRefer further outperforms all other baselines by a large margin, even surpassing Gemini-2.5-Pro by 17.4% in average accuracy on RefSpatial-Bench. Notably, RoboRefer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (e,g., UR5, G1 humanoid) in cluttered real-world scenes.

preprint2026arXiv

RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics

Spatial tracing, as a fundamental embodied interaction ability for robots, is inherently challenging as it requires multi-step metric-grounded reasoning compounded with complex spatial referring and real-world metric measurement. However, existing methods struggle with this compositional task. To this end, we propose RoboTracer, a 3D-aware VLM that first achieves both 3D spatial referring and measuring via a universal spatial encoder and a regression-supervised decoder to enhance scale awareness during supervised fine-tuning (SFT). Moreover, RoboTracer advances multi-step metric-grounded reasoning via reinforcement fine-tuning (RFT) with metric-sensitive process rewards, supervising key intermediate perceptual cues to accurately generate spatial traces. To support SFT and RFT training, we introduce TraceSpatial, a large-scale dataset of 30M QA pairs, spanning outdoor/indoor/tabletop scenes and supporting complex reasoning processes (up to 9 steps). We further present TraceSpatial-Bench, a challenging benchmark filling the gap to evaluate spatial tracing. Experimental results show that RoboTracer surpasses baselines in spatial understanding, measuring, and referring, with an average success rate of 79.1%, and also achieves SOTA performance on TraceSpatial-Bench by a large margin, exceeding Gemini-2.5-Pro by 36% accuracy. Notably, RoboTracer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (UR5, G1 humanoid) in cluttered real-world scenes. See the project page at https://zhoues.github.io/RoboTracer.

preprint2022arXiv

Smoothness of the density for McKean-Vlasov SDEs with measurable kernel

Consider the McKean-Vlasov SDE $$ dX_t=\langle b(X_t-\cdot),μ_t\rangle dt+dW_t,\quad μ_t=\operatorname{Law}(X_t), $$ where $W$ is the $n$-dimensional Brownian motion and $b:\mathbb{R}^d\to\mathbb{R}^d$ is a measurable function. First assuming $b\in L^\infty$, we prove that the law $μ_t$ of $X_t$ has a density $p_t$ with respect to the Lebesgue measure, which is continuously differentiable with gradient being $γ$-Hölder continuous for each $γ\in(0,1)$. Assume further that $b\in \mathcal{C}_b^1$, we prove that the density $p_t$ is infinitely differentiable. In the regularization by noise perspective, this shows McKean-Vlasov SDEs tend to have a smoother density function than SDEs without density dependence, under the same regularity assumption of the coefficients. We observe similar phenomenon for singular interaction kernels satisfying Krylov's integrability condition, for distributional kernels $b\in B_{\infty,\infty}^α$, $α\in(-1,0)$, and for processes driven by an $α$-stable noise for $α\in(1,2)$.

preprint2021arXiv

Example-based Real-time Clothing Synthesis for Virtual Agents

We present a real-time cloth animation method for dressing virtual humans of various shapes and poses. Our approach formulates the clothing deformation as a high-dimensional function of body shape parameters and pose parameters. In order to accelerate the computation, our formulation factorizes the clothing deformation into two independent components: the deformation introduced by body pose variation (Clothing Pose Model) and the deformation from body shape variation (Clothing Shape Model). Furthermore, we sample and cluster the poses spanning the entire pose space and use those clusters to efficiently calculate the anchoring points. We also introduce a sensitivity-based distance measurement to both find nearby anchoring points and evaluate their contributions to the final animation. Given a query shape and pose of the virtual agent, we synthesize the resulting clothing deformation by blending the Taylor expansion results of nearby anchoring points. Compared to previous methods, our approach is general and able to add the shape dimension to any clothing pose model. %and therefore it is more general. Furthermore, we can animate clothing represented with tens of thousands of vertices at 50+ FPS on a CPU. Moreover, our example database is more representative and can be generated in parallel, and thereby saves the training time. We also conduct a user evaluation and show that our method can improve a user's perception of dressed virtual agents in an immersive virtual environment compared to a conventional linear blend skinning method.

preprint2020arXiv

Adversarial Reinforcement Learning under Partial Observability in Autonomous Computer Network Defence

Recent studies have demonstrated that reinforcement learning (RL) agents are susceptible to adversarial manipulation, similar to vulnerabilities previously demonstrated in the supervised learning setting. While most existing work studies the problem in the context of computer vision or console games, this paper focuses on reinforcement learning in autonomous cyber defence under partial observability. We demonstrate that under the black-box setting, where the attacker has no direct access to the target RL model, causative attacks---attacks that target the training process---can poison RL agents even if the attacker only has partial observability of the environment. In addition, we propose an inversion defence method that aims to apply the opposite perturbation to that which an attacker might use to generate their adversarial samples. Our experimental results illustrate that the countermeasure can effectively reduce the impact of the causative attack, while not significantly affecting the training process in non-attack scenarios.

preprint2020arXiv

E-commerce Recommendation with Weighted Expected Utility

Different from shopping at retail stores, consumers on e-commerce platforms usually cannot touch or try products before purchasing, which means that they have to make decisions when they are uncertain about the outcome (e.g., satisfaction level) of purchasing a product. To study people's preferences, economics researchers have proposed the hypothesis of Expected Utility (EU) that models the subject value associated with an individual's choice as the statistical expectations of that individual's valuations of the outcomes of this choice. Despite its success in studies of game theory and decision theory, the effectiveness of EU, however, is mostly unknown in e-commerce recommendation systems. Previous research on e-commerce recommendation interprets the utility of purchase decisions either as a function of the consumed quantity of the product or as the gain of sellers/buyers in the monetary sense. As most consumers just purchase one unit of a product at a time and most alternatives have similar prices, such modeling of purchase utility is likely to be inaccurate in practice. In this paper, we interpret purchase utility as the satisfaction level a consumer gets from a product and propose a recommendation framework using EU to model consumers' behavioral patterns. We assume that consumer estimates the expected utilities of all the alternatives and choose products with maximum expected utility for each purchase. To deal with the potential psychological biases of each consumer, we introduce the usage of Probability Weight Function (PWF) and design our algorithm based on Weighted Expected Utility (WEU). Empirical study on real-world e-commerce datasets shows that our proposed ranking-based recommendation framework achieves statistically significant improvement against both classical Collaborative Filtering/Latent Factor Models and state-of-the-art deep models in top-K recommendation.

preprint2020arXiv

Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

Although significant effort has been applied to fact-checking, the prevalence of fake news over social media, which has profound impact on justice, public trust and our society, remains a serious problem. In this work, we focus on propagation-based fake news detection, as recent studies have demonstrated that fake news and real news spread differently online. Specifically, considering the capability of graph neural networks (GNNs) in dealing with non-Euclidean data, we use GNNs to differentiate between the propagation patterns of fake and real news on social media. In particular, we concentrate on two questions: (1) Without relying on any text information, e.g., tweet content, replies and user descriptions, how accurately can GNNs identify fake news? Machine learning models are known to be vulnerable to adversarial attacks, and avoiding the dependence on text-based features can make the model less susceptible to the manipulation of advanced fake news fabricators. (2) How to deal with new, unseen data? In other words, how does a GNN trained on a given dataset perform on a new and potentially vastly different dataset? If it achieves unsatisfactory performance, how do we solve the problem without re-training the model on the entire data from scratch? We study the above questions on two datasets with thousands of labelled news items, and our results show that: (1) GNNs can achieve comparable or superior performance without any text information to state-of-the-art methods. (2) GNNs trained on a given dataset may perform poorly on new, unseen data, and direct incremental training cannot solve the problem---this issue has not been addressed in the previous work that applies GNNs for fake news detection. In order to solve the problem, we propose a method that achieves balanced performance on both existing and new datasets, by using techniques from continual learning to train GNNs incrementally.

preprint2020arXiv

Image Analysis Enhanced Event Detection from Geo-tagged Tweet Streams

Events detected from social media streams often include early signs of accidents, crimes or disasters. Therefore, they can be used by related parties for timely and efficient response. Although significant progress has been made on event detection from tweet streams, most existing methods have not considered the posted images in tweets, which provide richer information than the text, and potentially can be a reliable indicator of whether an event occurs or not. In this paper, we design an event detection algorithm that combines textual, statistical and image information, following an unsupervised machine learning approach. Specifically, the algorithm starts with semantic and statistical analyses to obtain a list of tweet clusters, each of which corresponds to an event candidate, and then performs image analysis to separate events from non-events---a convolutional autoencoder is trained for each cluster as an anomaly detector, where a part of the images are used as the training data and the remaining images are used as the test instances. Our experiments on multiple datasets verify that when an event occurs, the mean reconstruction errors of the training and test images are much closer, compared with the case where the candidate is a non-event cluster. Based on this finding, the algorithm rejects a candidate if the difference is larger than a threshold. Experimental results over millions of tweets demonstrate that this image analysis enhanced approach can significantly increase the precision with minimum impact on the recall.