Source author record

Min Wang

Min Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computer Vision econ.TH eess.IV math.NA Numerical Analysis Robotics

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents

LLM agents are widely deployed in complex interactive tasks, yet privacy constraints often preclude centralized optimization and co-evolution across dynamic environments. Despite the demonstrated success of Federated Learning (FL) on static datasets, its effectiveness in open-ended, self-evolving agent systems remains largely unexplored. In such settings, the direct application of standard FL is particularly challenging, as heterogeneous tasks and sparse, trajectory-level reward signals give rise to severe gradient instability, which undermines the global optimization process. To bridge this gap, we propose Fed-SE, a Federated Self-Evolution framework for LLM agents that establishes a local evolution-global aggregation paradigm. Locally, agents employ parameter-efficient fine-tuning on filtered, high-return trajectories to achieve stable gradient updates. Globally, Fed-SE aggregates updates within a low-rank subspace, reducing communication cost across clients. Experiments across five heterogeneous environments demonstrate that Fed-SE improves average task success rates by 10\% over the state-of-the-art FedIT, validating its effectiveness in cross-environment knowledge transfer under privacy constraints.

preprint2026arXiv

Implicit-Explicit Scheme with Multiscale Vanka Two-Grid Solver for Heterogeneous Unsaturated Poroelasticity

We consider a coupled nonlinear system of equations that describe unsaturated flow in heterogeneous poroelastic media. For the numerical solution, we use a finite element approximation in space and present an efficient multiscale two-grid solver for solving the coupled system of equations. The proposed two-grid solver contains two main parts: (i) accurate coarse grid approximation based on local spectral spaces and (ii) coupled smoothing iterations based on an overlapping multiscale Vanka method. A Vanka smoother and local spectral coarse grids come with significant computational cost in the setup phase. To avoid constructing a new solver for each time step and/or nonlinear iteration, we utilize an implicit-explicit integration scheme in time, where we partition the nonlinear operator as a sum of linear and nonlinear parts. In particular, we construct an implicit linear approximation of the stiff components that remains fixed across all time, while treating the remaining nonlinear residual explicitly. This allows us to construct a robust two-grid solver offline and utilize it for fast and efficient online time integration. A linear stability analysis of the proposed novel coupled scheme is presented based on the representation of the system as a two-step scheme. We show that the careful decomposition of linear and nonlinear parts guarantees a linearly stable scheme. A numerical study is presented for a two-dimensional nonlinear coupled test problem of unsaturated flow in heterogeneous poroelastic media. We demonstrate the robustness of the two-grid solver, particularly the efficacy of block smoothing compared with simple pointwise smoothing, and illustrate the accuracy and stability of implicit-explicit time integration.

preprint2026arXiv

Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization

Offline meta-reinforcement learning (OMRL) combines the strengths of learning from diverse datasets in offline RL with the adaptability to new tasks of meta-RL, promising safe and efficient knowledge acquisition by RL agents. However, OMRL still suffers extrapolation errors due to out-of-distribution (OOD) actions, compromised by broad task distributions and Markov Decision Process (MDP) ambiguity in meta-RL setups. Existing research indicates that the generalization of the $Q$ network affects the extrapolation error in offline RL. This paper investigates this relationship by decomposing the $Q$ value into feature and weight components, observing that while decomposition enhances adaptability and convergence in the case of high-quality data, it often leads to policy degeneration or collapse in complex tasks. We observe that decomposed $Q$ values introduce a large estimation bias when the feature encounters OOD samples, a phenomenon we term ''feature overgeneralization''. To address this issue, we propose FLORA, which identifies OOD samples by modeling feature distributions and estimating their uncertainties. FLORA integrates a return feedback mechanism to adaptively adjust feature components. Furthermore, to learn precise task representations, FLORA explicitly models the complex task distribution using a chain of invertible transformations. We theoretically and empirically demonstrate that FLORA achieves rapid adaptation and meta-policy improvement compared to baselines across various environments.

preprint2026arXiv

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models

Vision-Language-Action (VLA) models remain brittle in long-horizon, contact-rich manipulation because success-only imitation provides little supervision for execution drift, while failed rollouts are often discarded. We introduce RePO-VLA, a recovery-driven policy optimization framework that assigns distinct roles to success, recovery, and failure trajectories. RePO-VLA first applies Recovery-Aware Initialization (RAI), slicing recovery segments and resetting history so corrective actions depend on the current adverse state rather than the preceding failure. It then learns a Progress-Aware Semantic Value Function (PAS-VF), aligning spatiotemporal trajectory features with instructions and successful references. The resulting labels salvage useful failure prefixes via reliability decay, while low-value labels mark drift and terminal breakdowns, teaching differences among nominal, failed, and corrective actions. The data engine turns adverse states into planner-generated or human-collected corrective rollouts, teaching recovery to the success manifold. Value-Conditioned Refinement (VCR) trains the policy to prefer high-progress actions. At deployment, a fixed high value ($v=1.0$) biases actions toward the learned success manifold without online failure detectors or heuristic retries. We introduce FRBench, with standardized error injection and recovery-focused evaluation. Across simulated and real-world bimanual tasks, RePO-VLA improves robustness, raising adversarial success from 20% to 75% on average and up to 80% in scaled real-world trials.

preprint2026arXiv

The Origin of Corporate Control Power

We discover that the probability of a shareholder possessing optimal control power evolves in Fibonacci sequence pattern and emerges as the simple harmonic oscillation of 1/2 - 2/3- 1/2 in 12 operations. This novel feature suggests the efficiency of the allocation of corporate shareholders' right and power. Data on the Chinese stock market support this prediction.

preprint2025arXiv

Exploiting Scale-Variant Attention for Segmenting Small Medical Objects

Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. Identifying mild syndrome with small pathological regions serves as an ominous warning and is fundamental in the early diagnosis of diseases. While deep learning algorithms, particularly convolutional neural networks (CNNs), have shown promise in segmenting medical objects, analyzing small areas in medical images remains challenging. This difficulty arises due to information losses and compression defects from convolution and pooling operations in CNNs, which become more pronounced as the network deepens, especially for small medical objects. To address these challenges, we propose a novel scale-variant attention-based network (SvANet) for accurately segmenting small-scale objects in medical images. The SvANet consists of scale-variant attention, cross-scale guidance, Monte Carlo attention, and vision transformer, which incorporates cross-scale features and alleviates compression artifacts for enhancing the discrimination of small medical objects. Quantitative experimental results demonstrate the superior performance of SvANet, achieving 96.12%, 96.11%, 89.79%, 84.15%, 80.25%, 73.05%, and 72.58% in mean Dice coefficient for segmenting kidney tumors, skin lesions, hepatic tumors, polyps, surgical excision cells, retinal vasculatures, and sperms, which occupy less than 1% of the image areas in KiTS23, ISIC 2018, ATLAS, PolypGen, TissueNet, FIVES, and SpermHealth datasets, respectively.

Min Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents

Implicit-Explicit Scheme with Multiscale Vanka Two-Grid Solver for Heterogeneous Unsaturated Poroelasticity

Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models

The Origin of Corporate Control Power

Exploiting Scale-Variant Attention for Segmenting Small Medical Objects