Source author record

Yu Xia

Yu Xia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision Data Structures and Algorithms math.OC Computation and Language Computational Engineering, Finance, and Science Distributed, Parallel, and Cluster Computing Machine Learning math.ST Performance Social and Information Networks Statistics Theory

Catalog footprint

What is connected

11works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Deep But Reliable: Advancing Multi-turn Reasoning for Thinking with Images

Recent advances in large Vision-Language Models (VLMs) have exhibited strong reasoning capabilities on complex visual tasks by thinking with images in their Chain-of-Thought (CoT), which is achieved by actively invoking tools to analyze visual inputs rather than merely perceiving them. However, existing models often struggle to reflect on and correct themselves when attempting incorrect reasoning trajectories. To address this limitation, we propose DRIM, a model that enables deep but reliable multi-turn reasoning when thinking with images in its multimodal CoT. Our pipeline comprises three stages: data construction, cold-start SFT and RL. Based on a high-resolution image dataset, we construct high-difficulty and verifiable visual question-answer pairs, where solving each task requires multi-turn tool calls to reach the correct answer. In the SFT stage, we collect tool trajectories as cold-start data, guiding a multi-turn reasoning pattern. In the RL stage, we introduce redundancy-penalized policy optimization, which incentivizes the model to develop a self-reflective reasoning pattern. The basic idea is to impose judgment on reasoning trajectories and penalize those that produce incorrect answers without sufficient multi-scale exploration. Extensive experiments demonstrate that DRIM achieves superior performance on visual understanding benchmarks.

preprint2026arXiv

Democratizing planetary-scale analysis: An ultra-lightweight Earth embedding database for accurate and flexible global land monitoring

The rapid evolution of satellite-borne Earth Observation (EO) systems has revolutionized terrestrial monitoring, yielding petabyte-scale archives. However, the immense computational and storage requirements for global-scale analysis often preclude widespread use, hindering planetary-scale studies. To address these barriers, we present Embedded Seamless Data (ESD), an ultra-lightweight, 30-m global Earth embedding database spanning the 25-year period from 2000 to 2024. By transforming high-dimensional, multi-sensor observations from the Landsat series (5, 7, 8, and 9) and MODIS Terra into information-dense, quantized latent vectors, ESD distills essential geophysical and semantic features into a unified latent space. Utilizing the ESDNet architecture and Finite Scalar Quantization (FSQ), the dataset achieves a transformative ~340-fold reduction in data volume compared to raw archives. This compression allows the entire global land surface for a single year to be encapsulated within approximately 2.4 TB, enabling decadal-scale global analysis on standard local workstations. Rigorous validation demonstrates high reconstructive fidelity (MAE: 0.0130; RMSE: 0.0179; CC: 0.8543). By condensing the annual phenological cycle into 12 temporal steps, the embeddings provide inherent denoising and a semantically organized space that outperforms raw reflectance in land-cover classification, achieving 79.74% accuracy (vs. 76.92% for raw fusion). With robust few-shot learning capabilities and longitudinal consistency, ESD provides a versatile foundation for democratizing planetary-scale research and advancing next-generation geospatial artificial intelligence.

preprint2026arXiv

Wetland mapping from sparse annotations with satellite image time series and temporal-aware segment anything model

Accurate wetland mapping is essential for ecosystem monitoring, yet dense pixel-level annotation is prohibitively expensive and practical applications usually rely on sparse point labels, under which existing deep learning models perform poorly, while strong seasonal and inter-annual wetland dynamics further render single-date imagery inadequate and lead to significant mapping errors; although foundation models such as SAM show promising generalization from point prompts, they are inherently designed for static images and fail to model temporal information, resulting in fragmented masks in heterogeneous wetlands. To overcome these limitations, we propose WetSAM, a SAM-based framework that integrates satellite image time series for wetland mapping from sparse point supervision through a dual-branch design, where a temporally prompted branch extends SAM with hierarchical adapters and dynamic temporal aggregation to disentangle wetland characteristics from phenological variability, and a spatial branch employs a temporally constrained region-growing strategy to generate reliable dense pseudo-labels, while a bidirectional consistency regularization jointly optimizes both branches. Extensive experiments across eight global regions of approximately 5,000 km2 each demonstrate that WetSAM substantially outperforms state-of-the-art methods, achieving an average F1-score of 85.58%, and delivering accurate and structurally consistent wetland segmentation with minimal labeling effort, highlighting its strong generalization capability and potential for scalable, low-cost, high-resolution wetland mapping.

preprint2022arXiv

Block-STM: Scaling Blockchain Execution by Turning Ordering Curse to a Performance Blessing

Block-STM is a parallel execution engine for smart contracts, built around the principles of Software Transactional Memory. Transactions are grouped in blocks, and every execution of the block must yield the same deterministic outcome. Block-STM further enforces that the outcome is consistent with executing transactions according to a preset order, leveraging this order to dynamically detect dependencies and avoid conflicts during speculative transaction execution. At the core of Block-STM is a novel, low-overhead collaborative scheduler of execution and validation tasks. Block-STM is implemented on the main branch of the Diem Blockchain code-base and runs in production at Aptos. Our evaluation demonstrates that Block-STM is adaptive to workloads with different conflict rates and utilizes the inherent parallelism therein. Block-STM achieves up to $110k$ tps in the Diem benchmarks and up to $170k$ tps in the Aptos Benchmarks, which is a $20$x and $17$x improvement over the sequential baseline with $32$ threads, respectively. The throughput on a contended workload is up to $50k$ tps and $80k$ tps in Diem and Aptos benchmarks, respectively.

preprint2022arXiv

Low Resource Style Transfer via Domain Adaptive Meta Learning

Text style transfer (TST) without parallel data has achieved some practical success. However, most of the existing unsupervised text style transfer methods suffer from (i) requiring massive amounts of non-parallel data to guide transferring different text styles. (ii) colossal performance degradation when fine-tuning the model in new domains. In this work, we propose DAML-ATM (Domain Adaptive Meta-Learning with Adversarial Transfer Model), which consists of two parts: DAML and ATM. DAML is a domain adaptive meta-learning approach to learn general knowledge in multiple heterogeneous source domains, capable of adapting to new unseen domains with a small amount of data. Moreover, we propose a new unsupervised TST approach Adversarial Transfer Model (ATM), composed of a sequence-to-sequence pre-trained language model and uses adversarial style training for better content preservation and style transfer. Results on multi-domain datasets demonstrate that our approach generalizes well on unseen low-resource domains, achieving state-of-the-art results against ten strong baselines.

preprint2016arXiv

A Subgradient Method for Free Material Design

A small improvement in the structure of the material could save the manufactory a lot of money. The free material design can be formulated as an optimization problem. However, due to its large scale, second-order methods cannot solve the free material design problem in reasonable size. We formulate the free material optimization (FMO) problem into a saddle-point form in which the inverse of the stiffness matrix A(E) in the constraint is eliminated. The size of A(E) is generally large, denoted as N by N. This is the first formulation of FMO without A(E). We apply the primal-dual subgradient method [17] to solve the restricted saddle-point formula. This is the first gradient-type method for FMO. Each iteration of our algorithm takes a total of $O(N^2)$ foating-point operations and an auxiliary vector storage of size O(N), compared with formulations having the inverse of A(E) which requires $O(N^3)$ arithmetic operations and an auxiliary vector storage of size $O(N^2)$. To solve the problem, we developed a closed-form solution to a semidefinite least squares problem and an efficient parameter update scheme for the gradient method, which are included in the appendix. We also approximate a solution to the bounded Lagrangian dual problem. The problem is decomposed into small problems each only having an unknown of k by k (k = 3 or 6) matrix, and can be solved in parallel. The iteration bound of our algorithm is optimal for general subgradient scheme. Finally we present promising numerical results.

preprint2016arXiv

Information Cascades on Arbitrary Topologies

In this paper, we study information cascades on graphs. In this setting, each node in the graph represents a person. One after another, each person has to take a decision based on a private signal as well as the decisions made by earlier neighboring nodes. Such information cascades commonly occur in practice and have been studied in complete graphs where everyone can overhear the decisions of every other player. It is known that information cascades can be fragile and based on very little information, and that they have a high likelihood of being wrong. Generalizing the problem to arbitrary graphs reveals interesting insights. In particular, we show that in a random graph $G(n,q)$, for the right value of $q$, the number of nodes making a wrong decision is logarithmic in $n$. That is, in the limit for large $n$, the fraction of players that make a wrong decision tends to zero. This is intriguing because it contrasts to the two natural corner cases: empty graph (everyone decides independently based on his private signal) and complete graph (all decisions are heard by all nodes). In both of these cases a constant fraction of nodes make a wrong decision in expectation. Thus, our result shows that while both too little and too much information sharing causes nodes to take wrong decisions, for exactly the right amount of information sharing, asymptotically everyone can be right. We further show that this result in random graphs is asymptotically optimal for any topology, even if nodes follow a globally optimal algorithmic strategy. Based on the analysis of random graphs, we explore how topology impacts global performance and construct an optimal deterministic topology among layer graphs.

preprint2015arXiv

Second-Order Cone Programming for P-Spline Simulation Metamodeling

This paper approximates simulation models by B-splines with a penalty on high-order finite differences of the coefficients of adjacent B-splines. The penalty prevents overfitting. The simulation output is assumed to be nonnegative. The nonnegative spline simulation metamodel is casted as a second-order cone programming model, which can be solved efficiently by modern optimization techniques. The method is implemented in MATLAB/GNU Octave.

preprint2014arXiv

Efficient Clustering with Limited Distance Information

Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s 2 S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. We use our algorithm to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire dataset. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.

preprint2011arXiv

Clustering Protein Sequences Given the Approximation Stability of the Min-Sum Objective Function

We study the problem of efficiently clustering protein sequences in a limited information setting. We assume that we do not know the distances between the sequences in advance, and must query them during the execution of the algorithm. Our goal is to find an accurate clustering using few queries. We model the problem as a point set $S$ with an unknown metric $d$ on $S$, and assume that we have access to \emph{one versus all} distance queries that given a point $s \in S$ return the distances between $s$ and all other points. Our one versus all query represents an efficient sequence database search program such as BLAST, which compares an input sequence to an entire data set. Given a natural assumption about the approximation stability of the \emph{min-sum} objective function for clustering, we design a provably accurate clustering algorithm that uses few one versus all queries. In our empirical study we show that our method compares favorably to well-established clustering algorithms when we compare computationally derived clusterings to gold-standard manual classifications.

preprint2011arXiv

Efficient Clustering with Limited Distance Information

Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s in S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. Our algorithm uses an active selection strategy to choose a small set of points that we call landmarks, and considers only the distances between landmarks and other points to produce a clustering. We use our algorithm to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire dataset. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.

Yu Xia

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Deep But Reliable: Advancing Multi-turn Reasoning for Thinking with Images

Democratizing planetary-scale analysis: An ultra-lightweight Earth embedding database for accurate and flexible global land monitoring

Wetland mapping from sparse annotations with satellite image time series and temporal-aware segment anything model

Block-STM: Scaling Blockchain Execution by Turning Ordering Curse to a Performance Blessing

Low Resource Style Transfer via Domain Adaptive Meta Learning

A Subgradient Method for Free Material Design

Information Cascades on Arbitrary Topologies

Second-Order Cone Programming for P-Spline Simulation Metamodeling

Efficient Clustering with Limited Distance Information

Clustering Protein Sequences Given the Approximation Stability of the Min-Sum Objective Function

Efficient Clustering with Limited Distance Information