Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
23works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

23 published item(s)

preprint2026arXiv

AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval

Despite notable advancements in remote sensing vision-language models (VLMs), existing models often struggle with spatial understanding, limiting their effectiveness in real-world applications. To push the boundaries of VLMs in remote sensing, we specifically address vehicle imagery captured by drones and introduce a spatially-aware dataset AirSpatial, which comprises over 206K instructions and introduces two novel tasks: Spatial Grounding and Spatial Question Answering. It is also the first remote sensing grounding dataset to provide 3DBB. To effectively leverage existing image understanding of VLMs to spatial domains, we adopt a two-stage training strategy comprising Image Understanding Pre-training and Spatial Understanding Fine-tuning. Utilizing this trained spatially-aware VLM, we develop an aerial agent, AirSpatialBot, which is capable of fine-grained vehicle attribute recognition and retrieval. By dynamically integrating task planning, image understanding, spatial understanding, and task execution capabilities, AirSpatialBot adapts to diverse query requirements. Experimental results validate the effectiveness of our approach, revealing the spatial limitations of existing VLMs while providing valuable insights. The model, code, and datasets will be released at https://github.com/VisionXLab/AirSpatialBot

preprint2026arXiv

Amory: Building Coherent Narrative-Driven Agent Memory through Agentic Reasoning

Long-term conversational agents face a fundamental scalability challenge as interactions extend over time: repeatedly processing entire conversation histories becomes computationally prohibitive. Current approaches attempt to solve this through memory frameworks that predominantly fragment conversations into isolated embeddings or graph representations and retrieve relevant ones in a RAG style. While computationally efficient, these methods often treat memory formation minimally and fail to capture the subtlety and coherence of human memory. We introduce Amory, a working memory framework that actively constructs structured memory representations through enhancing agentic reasoning during offline time. Amory organizes conversational fragments into episodic narratives, consolidates memories with momentum, and semanticizes peripheral facts into semantic memory. At retrieval time, the system employs coherence-driven reasoning over narrative structures. Evaluated on the LOCOMO benchmark for long-term reasoning, Amory achieves considerable improvements over previous state-of-the-art, with performance comparable to full context reasoning while reducing response time by 50%. Analysis shows that momentum-aware consolidation significantly enhances response quality, while coherence-driven retrieval provides superior memory coverage compared to embedding-based approaches.

preprint2026arXiv

Co-Training Vision Language Models for Remote Sensing Multi-task Learning

With Transformers achieving outstanding performance on individual remote sensing (RS) tasks, we are now approaching the realization of a unified model that excels across multiple tasks through multi-task learning (MTL). Compared to single-task approaches, MTL methods offer improved generalization, enhanced scalability, and greater practical applicability. Recently, vision language models (VLMs) have achieved promising results in RS image understanding, grounding, and ultra-high-resolution (UHR) image reasoning, respectively. Moreover, the unified text-based interface demonstrates significant potential for MTL. Hence, in this work, we present RSCoVLM, a simple yet flexible VLM baseline for RS MTL. Firstly, we create the data curation engine, including data acquisition, offline processing and integrating, as well as online loading and weighting. This data engine effectively addresses complex RS data enviroment and generates flexible vision-language conversations. Furthermore, we propose a unified dynamic-resolution strategy to address the diverse image scales inherent in RS imagery. For UHR images, we introduce the Zoom-in Chain mechanism together with its corresponding dataset, LRS-VQA-Zoom. The strategies are flexible and effectively mitigate the computational burdens. Additionally, we significantly enhance the model's object detection capability and propose a novel evaluation protocol that ensures fair comparison between VLMs and conventional detection models. Extensive experiments demonstrate that RSCoVLM achieves state-of-the-art performance across diverse tasks, outperforming existing RS VLMs and even rivaling specialized expert models. All the training and evaluating tools, model weights, and datasets have been fully open-sourced to support reproducibility. We expect that this baseline will promote further progress toward general-purpose RS models.

preprint2026arXiv

DVGBench: Implicit-to-Explicit Visual Grounding Benchmark in UAV Imagery with Large Vision-Language Models

Remote sensing (RS) large vision-language models (LVLMs) have shown strong promise across visual grounding (VG) tasks. However, existing RS VG datasets predominantly rely on explicit referring expressions-such as relative position, relative size, and color cues-thereby constraining performance on implicit VG tasks that require scenario-specific domain knowledge. This article introduces DVGBench, a high-quality implicit VG benchmark for drones, covering six major application scenarios: traffic, disaster, security, sport, social activity, and productive activity. Each object provides both explicit and implicit queries. Based on the dataset, we design DroneVG-R1, an LVLM that integrates the novel Implicit-to-Explicit Chain-of-Thought (I2E-CoT) within a reinforcement learning paradigm. This enables the model to take advantage of scene-specific expertise, converting implicit references into explicit ones and thus reducing grounding difficulty. Finally, an evaluation of mainstream models on both explicit and implicit VG tasks reveals substantial limitations in their reasoning capabilities. These findings provide actionable insights for advancing the reasoning capacity of LVLMs for drone-based agents. The code and datasets will be released at https://github.com/zytx121/DVGBench

preprint2026arXiv

Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy

Agentic reinforcement learning trains large language models using multi-turn trajectories that interleave long reasoning traces with short environment-facing actions. Common policy-gradient methods, such as PPO and GRPO, treat each token in a trajectory equally, leading to uniform credit assignment. In this paper, we critically demonstrate that such uniform credit assignment largely misallocates token-level training signals. From an energy-based modeling perspective, we show that token-level training signals, quantified by their correlations with reward variance of different rollouts sampled from a given prompt, concentrate sharply on action tokens rather than reasoning tokens, even though action tokens account for only a small fraction of the trajectory. We refer to this phenomenon as the Action Bottleneck. Motivated by this observation, we propose an embarrassingly simple token reweighting approach, ActFocus, that downweights gradients on reasoning tokens, along with an additional energy-based redistribution mechanism that further increases the weights on action tokens with higher uncertainty. Across four environments and different model sizes, ActFocus consistently outperforms PPO and GRPO, yielding final-step gains of up to 65.2 and 63.7 percentage points, respectively, without any additional runtime or memory cost.

preprint2026arXiv

SurgGoal: Rethinking Surgical Planning Evaluation via Goal-Satisfiability

Surgical planning integrates visual perception, long-horizon reasoning, and procedural knowledge, yet it remains unclear whether current evaluation protocols reliably assess vision-language models (VLMs) in safety-critical settings. Motivated by a goal-oriented view of surgical planning, we define planning correctness via phase-goal satisfiability, where plan validity is determined by expert-defined surgical rules. Based on this definition, we introduce a multicentric meta-evaluation benchmark with valid procedural variations and invalid plans containing order and content errors. Using this benchmark, we show that sequence similarity metrics systematically misjudge planning quality, penalizing valid plans while failing to identify invalid ones. We therefore adopt a rule-based goal-satisfiability metric as a high-precision meta-evaluation reference to assess Video-LLMs under progressively constrained settings, revealing failures due to perception errors and under-constrained reasoning. Structural knowledge consistently improves performance, whereas semantic guidance alone is unreliable and benefits larger models only when combined with structural constraints.

preprint2026arXiv

Towards Vision-Language Geo-Foundation Model: A Survey

Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding. However, most methods rely on training with general image datasets, and the lack of geospatial data leads to poor performance on earth observation. Numerous geospatial image-text pair datasets and VLFMs fine-tuned on them have been proposed recently. These new approaches aim to leverage large-scale, multimodal geospatial data to build versatile intelligent models with diverse geo-perceptive capabilities, which we refer to as Vision-Language Geo-Foundation Models (VLGFMs). This paper thoroughly reviews VLGFMs, summarizing and analyzing recent developments in the field. In particular, we introduce the background and motivation behind the rise of VLGFMs, highlighting their unique research significance. Then, we systematically summarize the core technologies employed in VLGFMs, including data construction, model architectures, and applications of various multimodal geospatial tasks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To the best of our knowledge, this is the first comprehensive literature review of VLGFMs. We keep tracing related works at https://github.com/zytx121/Awesome-VLGFM.

preprint2025arXiv

Protonic Nickelate Device Networks for Spatiotemporal Neuromorphic Computing

Computation in biological neural circuits arises from the interplay of nonlinear temporal responses and spatially distributed dynamic network interactions. Replicating this richness in hardware has remained challenging, as most neuromorphic devices emulate only isolated neuron- or synapse-like functions. In this work, we introduce an integrated neuromorphic computing platform in which both nonlinear spatiotemporal processing and programmable memory are realized within a single perovskite nickelate material system. By engineering symmetric and asymmetric hydrogenated NdNiO3 junction devices on the same wafer, we combine ultrafast, proton-mediated transient dynamics with stable multilevel resistance states. Networks of symmetric NdNiO3 junctions exhibit emergent spatial interactions mediated by proton redistribution, while each node simultaneously provides short-term temporal memory, enabling nanoseconds scale operation with an energy cost of 0.2 nJ per input. When interfaced with asymmetric output units serving as reconfigurable long-term weights, these networks allow both feature transformation and linear classification in the same material system. Leveraging these emergent interactions, the platform enables real-time pattern recognition and achieves high accuracy in spoken-digit classification and early seizure detection, outperforming temporal-only or uncoupled architectures. These results position protonic nickelates as a compact, energy-efficient, CMOS-compatible platform that integrates processing and memory for scalable intelligent hardware.

preprint2022arXiv

Contrast mechanisms in pump-probe microscopy of melanin

Pump-probe microscopy of melanin in tumors has been proposed to improve diagnosis of malignant melanoma, based on the hypothesis that aggressive cancers disaggregate melanin structure. However, measured signals of melanin are complex superpositions of multiple nonlinear processes, which makes interpretation challenging. Polarization control during measurement and data fitting is used to decompose signals of melanin into their underlying molecular mechanisms. We then identify the molecular mechanisms that are most susceptible to melanin disaggregation and derive false-coloring schemes to highlight these processes in biological tissue. We exemplary demonstrate that false-colored images of a small set of melanoma tumors correlate with clinical concern. More generally, our systematic approach of decomposing pump-probe signals can be applied to a multitude of different samples.

preprint2022arXiv

MMRotate: A Rotated Object Detection Benchmark using PyTorch

We present an open-source toolbox, named MMRotate, which provides a coherent algorithm framework of training, inferring, and evaluation for the popular rotated object detection algorithm based on deep learning. MMRotate implements 18 state-of-the-art algorithms and supports the three most frequently used angle definition methods. To facilitate future research and industrial applications of rotated object detection-related problems, we also provide a large number of trained models and detailed benchmarks to give insights into the performance of rotated object detection. MMRotate is publicly released at https://github.com/open-mmlab/mmrotate.

preprint2022arXiv

Multi-Robot Collaborative Perception with Graph Neural Networks

Multi-robot systems such as swarms of aerial robots are naturally suited to offer additional flexibility, resilience, and robustness in several tasks compared to a single robot by enabling cooperation among the agents. To enhance the autonomous robot decision-making process and situational awareness, multi-robot systems have to coordinate their perception capabilities to collect, share, and fuse environment information among the agents in an efficient and meaningful way such to accurately obtain context-appropriate information or gain resilience to sensor noise or failures. In this paper, we propose a general-purpose Graph Neural Network (GNN) with the main goal to increase, in multi-robot perception tasks, single robots' inference perception accuracy as well as resilience to sensor failures and disturbances. We show that the proposed framework can address multi-view visual perception problems such as monocular depth estimation and semantic segmentation. Several experiments both using photo-realistic and real data gathered from multiple aerial robots' viewpoints show the effectiveness of the proposed approach in challenging inference conditions including images corrupted by heavy noise and camera occlusions or failures.

preprint2021arXiv

High Energy Irradiation Effects on Silicon Photonic Passive Devices

In this work, the radiation responses of silicon photonic passive devices built in silicon-on-insulator (SOI) technology are investigated through high energy neutron and 60Co gamma-ray irradiation. The wavelengths of both micro-ring resonators (MRRs) and Mach-Zehnder interferometers (MZIs) exhibit blue shifts after high-energy neutron irradiation to a fluence of 1*1012 n/cm2; the blue shift is smaller in MZI devices than in MRRs due to different waveguide widths. Devices with SiO2 upper cladding layer show strong tolerance to irradiation. Neutron irradiation leads to slight changes in the crystal symmetry in the Si cores of the optical devices and accelerated oxidization for devices without SiO2 cladding. A 2 um top cladding of SiO2 layer significantly improves the radiation tolerance of these passive photonic devices.

preprint2021arXiv

Remarks on Viscosity Super-Solutions of Quasi-Variational Inequalities

For Hamilton-Jacobi-Bellman (HJB) equations, with the standard definitions of viscosity super-solution and sub-solution, it is known that there is a comparison between any (viscosity) super-solutions and sub-solutions. This should be the same for HJB type quasi-variational inequalities (QVIs) arising from optimal impulse control problems. However, according to a natural adoption of the definition found in Barles 1985, Barles 1985b, the uniqueness of the viscosity solution could be guaranteed, but the comparison between viscosity super- and sub-solutions could not be guaranteed. This paper introduces a modification of the definition for the viscosity super-solution of HJB type QVIs so that the desired comparison theorem will hold.

preprint2020arXiv

A lower bound on the number of inequivalent APN functions

In this paper, we establish a lower bound on the total number of inequivalent APN functions on the finite field with $2^{2m}$ elements, where $m$ is even. We obtain this result by proving that the APN functions introduced by Pott and the second author, that depend on three parameters $k$, $s$ and $α$, are pairwise inequivalent for distinct choices of the parameters $k$ and $s$. Moreover, we determine the automorphism group of these APN functions.

preprint2020arXiv

A recursion for a symmetric function generalization of the $q$-Dyson constant term identity

In 2000, Kadell gave an orthogonality conjecture for a symmetric function generalization of the $q$-Dyson constant term identity or the Zeilberger--Bressoud $q$-Dyson theorem. The non-zero part of Kadell's orthogonality conjecture is a constant term identity indexed by a weak composition $v=(v_1,\dots,v_n)$ in the case when only one $v_i\neq 0$. This conjecture was first proved by Károlyi, Lascoux and Warnaar in 2015. They further formulated a closed-form expression for the above mentioned constant term in the case when all the parts of $v$ are distinct. Recently we obtain a recursion for this constant term provided that the largest part of $v$ occurs with multiplicity one in $v$. In this paper, we generalize our previous result to all compositions $v$.

preprint2020arXiv

A symmetric function generalization of the Zeilberger--Bressoud $q$-Dyson theorem

In 2000, Kadell gave an orthogonality conjecture for a symmetric function generalization of the Zeilberger--Bressoud $q$-Dyson theorem or the $q$-Dyson constant term identity. This conjecture was proved by Károlyi, Lascoux and Warnaar in 2015. In this paper, by slightly changing the variables of Kadell's conjecture, we obtain another symmetric function generalization of the $q$-Dyson constant term identity. This new generalized constant term admits a simple product-form expression.

preprint2020arXiv

Asymptotics of Moore exponent sets

Let $n$ be a positive integer and $I$ a $k$-subset of integers in $[0,n-1]$. Given a $k$-tuple $A=(α_0, \cdots, α_{k-1})\in \mathbb{F}^k_{q^n}$, let $M_{A,I}$ denote the matrix $(α_i^{q^j})$ with $0\leq i\leq k-1$ and $j\in I$. When $I=\{0,1,\cdots, k-1\}$, $M_{A,I}$ is called a Moore matrix which was introduced by E. H. Moore in 1896. It is well known that the determinant of a Moore matrix equals $0$ if and only if $α_0,\cdots, α_{k-1}$ are $\mathbb{F}_q$-linearly dependent. We call $I$ that satisfies this property a Moore exponent set. In fact, Moore exponent sets are equivalent to maximum rank-distance (MRD) code with maximum left and right idealisers over finite fields. It is already known that $I=\{0,\cdots, k-1\}$ is not the unique Moore exponent set, for instance, (generalized) Delsarte-Gabidulin codes and the MRD codes recently discovered by Csajbók, Marino, Polverino and the second author both give rise to new Moore exponent sets. By using algebraic geometry approach, we obtain an asymptotic classification result: for $q>5$, if $I$ is not an arithmetic progression, then there exist an integer $N$ depending on $I$ such that $I$ is not a Moore exponent set provided that $n>N$.

preprint2020arXiv

Decentralized Ride-Sharing and Vehicle-Pooling Based on Fair Cost-Sharing Mechanisms

Ride-sharing or vehicle-pooling allows commuters to team up spontaneously for transportation cost sharing. This has become a popular trend in the emerging paradigm of sharing economy. One crucial component to support effective ride-sharing is the matching mechanism that pairs up suitable commuters. Traditionally, matching has been performed in a centralized manner, whereby an operator arranges ride-sharing according to a global objective (e.g., total cost of all commuters). However, ride-sharing is a decentralized decision-making paradigm, where commuters are self-interested and only motivated to team up based on individual payments. Particularly, it is not clear how transportation cost should be shared fairly between commuters, and what ramifications of cost-sharing are on decentralized ride-sharing. This paper sheds light on the principles of decentralized ride-sharing and vehicle-pooling mechanisms based on stable matching, such that no one would be better off to deviate from a stable matching outcome. We study various fair cost-sharing mechanisms and the induced stable matching outcomes. We compare the stable matching outcomes with a social optimal outcome (that minimizes total cost) by theoretical bounds of social optimality ratios, and show that several fair cost-sharing mechanisms can achieve high social optimality. We also corroborate our results with an empirical study of taxi sharing under fair cost-sharing mechanisms by a data analysis on New York City taxi trip dataset, and provide useful insights on effective decentralized mechanisms for practical ride-sharing and vehicle-pooling.

preprint2020arXiv

MRD codes with maximum idealizers

Left and right idealizers are important invariants of linear rank-distance codes. In the case of maximum rank-distance (MRD for short) codes in $\mathbb{F}_q^{n\times n}$ the idealizers have been proved to be isomorphic to finite fields of size at most $q^n$. Up to now, the only known MRD codes with maximum left and right idealizers are generalized Gabidulin codes, which were first constructed in 1978 by Delsarte and later generalized by Kshevetskiy and Gabidulin in 2005. In this paper we classify MRD codes in $\mathbb{F}_q^{n\times n}$ for $n\leq 9$ with maximum left and right idealizers and connect them to Moore-type matrices. Apart from generalized Gabidulin codes, it turns out that there is a further family of rank-distance codes providing MRD ones with maximum idealizers for $n=7$, $q$ odd and for $n=8$, $q\equiv 1 \pmod 3$. These codes are not equivalent to any previously known MRD code. Moreover, we show that this family of rank-distance codes does not provide any further examples for $n\geq 9$.

preprint2020arXiv

Rational Kernel on Pricing Models of Inflation Derivatives

The aim of this thesis is to analyze and renovate few main-stream models on inflation derivatives. In the first chapter of the thesis, concepts of financial instruments and fundamental terms are introduced, such as coupon bond, inflation-indexed bond, swap. In the second chapter of the thesis, classic models along the history of developing quantified interest rate models are introduced and analyzed. Moreover, the classification of interest rate models is introduced to help audiences understand the intrinsic ideology behind each type of models. In the third chapter of the thesis, the related mathematical knowledge is introduced. This part has the contribution on understanding the terms and relation among terms in each model introduced previously. In the fourth part of the thesis, the renovation of HJM frame work is introduced and analysis has been initiated.

preprint2020arXiv

Stock Index Prediction with Multi-task Learning and Word Polarity Over Time

Sentiment-based stock prediction systems aim to explore sentiment or event signals from online corpora and attempt to relate the signals to stock price variations. Both the feature-based and neural-networks-based approaches have delivered promising results. However, the frequently minor fluctuations of the stock prices restrict learning the sentiment of text from price patterns, and learning market sentiment from text can be biased if the text is irrelevant to the underlying market. In addition, when using discrete word features, the polarity of a certain term can change over time according to different events. To address these issues, we propose a two-stage system that consists of a sentiment extractor to extract the opinion on the market trend and a summarizer that predicts the direction of the index movement of following week given the opinions of the news over the current week. We adopt BERT with multitask learning which additionally predicts the worthiness of the news and propose a metric called Polarity-Over-Time to extract the word polarity among different event periods. A Weekly-Monday prediction framework and a new dataset, the 10-year Reuters financial news dataset, are also proposed.

preprint2020arXiv

Using Machine Learning to Forecast Future Earnings

In this essay, we have comprehensively evaluated the feasibility and suitability of adopting the Machine Learning Models on the forecast of corporation fundamentals (i.e. the earnings), where the prediction results of our method have been thoroughly compared with both analysts' consensus estimation and traditional statistical models. As a result, our model has already been proved to be capable of serving as a favorable auxiliary tool for analysts to conduct better predictions on company fundamentals. Compared with previous traditional statistical models being widely adopted in the industry like Logistic Regression, our method has already achieved satisfactory advancement on both the prediction accuracy and speed. Meanwhile, we are also confident enough that there are still vast potentialities for this model to evolve, where we do hope that in the near future, the machine learning model could generate even better performances compared with professional analysts.