Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
32works
0followers
27topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

32 published item(s)

preprint2026arXiv

A Systematic Literature Review of Software Engineering Research on Jupyter Notebook

Context: Jupyter Notebook has emerged as a versatile tool that transforms how researchers, developers, and data scientists conduct and communicate their work. As the adoption of Jupyter notebooks continues to rise, so does the interest from the software engineering research community in improving the software engineering practices for Jupyter notebooks. Objective: The purpose of this study is to analyze trends, gaps, and methodologies used in software engineering research on Jupyter notebooks. Method: We selected 146 relevant publications from the DBLP Computer Science Bibliography up to the end of 2024, following established systematic literature review guidelines. We explored publication trends, categorized them based on software engineering topics, and reported findings based on those topics. Results: The most popular venues for publishing software engineering research on Jupyter notebooks are related to human-computer interaction instead of traditional software engineering venues. Researchers have addressed a wide range of software engineering topics on notebooks, such as code reuse, readability, and execution environment. Although reusability is one of the research topics for Jupyter notebooks, only 64 of the 146 studies can be reused based on their provided URLs. Additionally, most replication packages are not hosted on permanent repositories for long-term availability and adherence to open science principles. Conclusion: Solutions specific to notebooks for software engineering issues, including testing, refactoring, and documentation, are underexplored. Future research opportunities exist in automatic testing frameworks, refactoring clones between notebooks, and generating group documentation for coherent code cells.

preprint2026arXiv

A unified multimodal understanding and generation model for cross-disciplinary scientific research

Scientific discovery increasingly relies on integrating heterogeneous, high-dimensional data across disciplines nowadays. While AI models have achieved notable success across various scientific domains, they typically remain domain-specific or lack the capability of simultaneously understanding and generating multimodal scientific data, particularly for high-dimensional data. Yet, many pressing global challenges and scientific problems are inherently cross-disciplinary and require coordinated progress across multiple fields. Here, we present FuXi-Uni, a native unified multimodal model for scientific understanding and high-fidelity generation across scientific domains within a single architecture. Specifically, FuXi-Uni aligns cross-disciplinary scientific tokens within natural language tokens and employs science decoder to reconstruct scientific tokens, thereby supporting both natural language conversation and scientific numerical prediction. Empirically, we validate FuXi-Uni in Earth science and Biomedicine. In Earth system modeling, the model supports global weather forecasting, tropical cyclone (TC) forecast editing, and spatial downscaling driven by only language instructions. FuXi-Uni generates 10-day global forecasts at 0.25° resolution that outperform the SOTA physical forecasting system. It shows superior performance for both TC track and intensity prediction relative to the SOTA physical model, and generates high-resolution regional weather fields that surpass standard interpolation baselines. Regarding biomedicine, FuXi-Uni outperforms leading multimodal large language models on multiple biomedical visual question answering benchmarks. By unifying heterogeneous scientific modalities within a native shared latent space while maintaining strong domain-specific performance, FuXi-Uni provides a step forward more general-purpose, multimodal scientific models.

preprint2026arXiv

Assessing Building Heat Resilience Using UAV and Street-View Imagery with Coupled Global Context Vision Transformer

Climate change is intensifying human heat exposure, particularly in densely built urban centers of the Global South. Low-cost construction materials and high thermal-mass surfaces further exacerbate this risk. Yet scalable methods for assessing such heat-relevant building attributes remain scarce. We propose a machine learning framework that fuses openly available unmanned aerial vehicle (UAV) and street-view (SV) imagery via a coupled global context vision transformer (CGCViT) to learn heat-relevant representations of urban structures. Thermal infrared (TIR) measurements from HotSat-1 are used to quantify the relationship between building attributes and heat-associated health risks. Our dual-modality cross-view learning approach outperforms the best single-modality models by up to $9.3\%$, demonstrating that UAV and SV imagery provide valuable complementary perspectives on urban structures. The presence of vegetation surrounding buildings (versus no vegetation), brighter roofing (versus darker roofing), and roofing made of concrete, clay, or wood (versus metal or tarpaulin) are all significantly associated with lower HotSat-1 TIR values. Deployed across the city of Dar es Salaam, Tanzania, the proposed framework illustrates how household-level inequalities in heat exposure - often linked to socio-economic disadvantage and reflected in building materials - can be identified and addressed using machine learning. Our results point to the critical role of localized, data-driven risk assessment in shaping climate adaptation strategies that deliver equitable outcomes.

preprint2026arXiv

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

Large Language Models (LLMs) have achieved remarkable capabilities but remain vulnerable to adversarial ``jailbreak'' attacks designed to bypass safety guardrails. Current safety alignment methods depend heavily on static external red teaming, utilizing fixed defense prompts or pre-collected adversarial datasets. This leads to a rigid defense that overfits known patterns and fails to generalize to novel, sophisticated threats. To address this critical limitation, we propose empowering the model to be its own red teamer, capable of achieving autonomous and evolving adversarial attacks. Specifically, we introduce Safety Self- Play (SSP), a system that utilizes a single LLM to act concurrently as both the Attacker (generating jailbreaks) and the Defender (refusing harmful requests) within a unified Reinforcement Learning (RL) loop, dynamically evolving attack strategies to uncover vulnerabilities while simultaneously strengthening defense mechanisms. To ensure the Defender effectively addresses critical safety issues during the self-play, we introduce an advanced Reflective Experience Replay Mechanism, which uses an experience pool accumulated throughout the process. The mechanism employs a Upper Confidence Bound (UCB) sampling strategy to focus on failure cases with low rewards, helping the model learn from past hard mistakes while balancing exploration and exploitation. Extensive experiments demonstrate that our SSP approach autonomously evolves robust defense capabilities, significantly outperforming baselines trained on static adversarial datasets and establishing a new benchmark for proactive safety alignment.

preprint2026arXiv

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

While large language models (LLMs) with Chain-of-Thought (CoT) reasoning excel in mathematics and coding, their potential for systematic reasoning in chemistry, a domain demanding rigorous structural analysis for real-world tasks like drug design and reaction engineering, remains untapped. Current benchmarks focus on simple knowledge retrieval, neglecting step-by-step reasoning required for complex tasks such as molecular optimization and reaction prediction. To address this, we introduce ChemCoTBench, a reasoning framework that bridges molecular structure understanding with arithmetic-inspired operations, including addition, deletion, and substitution, to formalize chemical problem-solving into transparent, step-by-step workflows. By treating molecular transformations as modular "chemical operations", the framework enables slow-thinking reasoning, mirroring the logic of mathematical proofs while grounding solutions in real-world chemical constraints. We evaluate models on two high-impact tasks: Molecular Property Optimization and Chemical Reaction Prediction. These tasks mirror real-world challenges while providing structured evaluability. By providing annotated datasets, a reasoning taxonomy, and baseline evaluations, ChemCoTBench bridges the gap between abstract reasoning methods and practical chemical discovery, establishing a foundation for advancing LLMs as tools for AI-driven scientific innovation.

preprint2026arXiv

Building Digital Twins of Different Human Organs for Personalized Healthcare

Digital twins are virtual replicas of physical entities and are poised to transform personalized medicine through the real-time simulation and prediction of human physiology. Translating this paradigm from engineering to biomedicine requires overcoming profound challenges, including anatomical variability, multi-scale biological processes, and the integration of multi-physics phenomena. This survey systematically reviews methodologies for building digital twins of human organs, structured around a pipeline decoupled into anatomical twinning (capturing patient-specific geometry and structure) and functional twinning (simulating multi-scale physiology from cellular to organ-level function). We categorize approaches both by organ-specific properties and by technical paradigm, with particular emphasis on multi-scale and multi-physics integration. A key focus is the role of artificial intelligence (AI), especially physics-informed AI, in enhancing model fidelity, scalability, and personalization. Furthermore, we discuss the critical challenges of clinical validation and translational pathways. This study not only charts a roadmap for overcoming current bottlenecks in single-organ twins but also outlines the promising, albeit ambitious, future of interconnected multi-organ digital twins for whole-body precision healthcare.

preprint2026arXiv

Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation

Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical, as obtaining entirely clean or human-annotated samples for large-scale pre-training tasks is often impractical. Nonetheless, existing noise mitigation methods often encounter limitations in practical applications due to their task-specific design, model dependency, and significant computational overhead. In this work, we exploit the properties of high-dimensional orthogonality to identify a robust and effective boundary in cone space for separating clean and noisy samples. Building on this, we propose One-Step Anti-noise (OSA), a model-agnostic noisy label mitigation paradigm that employs an estimator model and a scoring function to assess the noise level of input pairs through just one-step inference. We empirically validate the superiority of OSA, demonstrating its enhanced training robustness, improved task transferability, streamlined deployment, and reduced computational overhead across diverse benchmarks, models, and tasks. Our code is released at https://github.com/leolee99/OSA.

preprint2026arXiv

Defenses Against Prompt Attacks Learn Surface Heuristics

Large language models (LLMs) are increasingly deployed in security-sensitive applications, where they must follow system- or developer-specified instructions that define the intended task behavior, while completing benign user requests. When adversarial instructions appear in user queries or externally retrieved content, models may override intended logic. Recent defenses rely on supervised fine-tuning with benign and malicious labels. Although these methods achieve high attack rejection rates, we find that they rely on narrow correlations in defense data rather than harmful intent, leading to systematic rejection of safe inputs. We analyze three recurring shortcut behaviors induced by defense fine-tuning. \emph{Position bias} arises when benign content placed later in a prompt is rejected at much higher rates; across reasoning benchmarks, suffix-task rejection rises from below \textbf{10\%} to as high as \textbf{90\%}. \emph{Token trigger bias} occurs when strings common in attack data raise rejection probability even in benign contexts; inserting a single trigger token increases false refusals by up to \textbf{50\%}. \emph{Topic generalization bias} reflects poor generalization beyond the defense data distribution, with defended models suffering test-time accuracy drops of up to \textbf{40\%}. These findings suggest that current prompt-injection defenses frequently respond to attack-like surface patterns rather than the underlying intent. We introduce controlled diagnostic datasets and a systematic evaluation across two base models and multiple defense pipelines, highlighting limitations of supervised fine-tuning for reliable LLM security.

preprint2026arXiv

Density-based topology optimization for turbulent fluid flow using the standard k-epsilon RANS model with wall-functions imposed through an implicit wall penalty formulation

Turbulent flows have high requirements for very fine meshes near the boundary to ensure accuracy. In the context of topology optimization (TO), such fine meshes become unrealistic and common approaches are hampered by low accuracy and overestimation of boundary layer thickness. Wall-functions are a natural way to ease the computational requirements, but they are not naturally imposed in density-based TO due to the diffuse design parametrization. We propose an implicit wall-function formulation for the Reynolds-Averaged Navier-Stokes (RANS), standard k-epsilon model that extracts wall-normal information directly from the gradient of the design variable and enables a penalty-based formulation for imposing wall-functions to the RANS equations, without the need for body-fitted meshes. The method provides a reliable route to high Reynolds number turbulent topology optimization, delivering boundary layer accuracy comparable to explicit-wall body-fitted analyses, while retaining the flexibility of density-based TO. Furthermore, because wall effects are modeled using wall-functions, accurate solutions are obtained on substantially coarser meshes, leading to significant reductions in computational cost. The approach is validated on three canonical benchmarks over Reynolds numbers up to Re = 2e5: a pipe-bend; a U-bend; and a Tesla-valve. Across all cases, the proposed method accurately recovers near-wall velocity profiles, closely matching verification simulations on body-fitted meshes with explicit wall-functions. In contrast, a conventional turbulent TO formulation, without the proposed wall-function treatment, mispredicts boundary-layer development and yields sub-optimal results.

preprint2026arXiv

Exploring Lorentz Violation in Spacetime through Universal Finsler Geometry

Finsler geometry serves as a fundamental and natural extension of Riemannian geometry, providing a valuable framework for investigating Lorentz violation in spacetime. Previous studies have treated the Finsler structures associated with different particles as distinct entities. In this paper, we propose a novel hypothesis suggesting that the Finsler structure may represent an intrinsic property of the universe itself. Under this assumption, we derive a series of modified dispersion relations that have not been previously explored, and we analyze their implications. Our findings indicate that the scales of Lorentz violation for massive particles are proportional to their masses. Furthermore, we demonstrate that this hypothesis aligns well with existing phenomenological results regarding Lorentz violation observed in photons, neutrinos, and electrons.

preprint2026arXiv

Implicit Graph, Explicit Retrieval: Towards Efficient and Interpretable Long-horizon Memory for Large Language Models

Long-horizon applications increasingly require large language models (LLMs) to answer queries when relevant evidence is sparse and dispersed across very long contexts. Existing memory systems largely follow two paradigms: explicit structured memories offer interpretability but often become brittle under long-context overload, while latent memory mechanisms are efficient and stable yet difficult to inspect. We propose LatentGraphMem, a memory framework that combines implicit graph memory with explicit subgraph retrieval. LatentGraphMem stores a graph-structured memory in latent space for stability and efficiency, and exposes a task-specific subgraph retrieval interface that returns a compact symbolic subgraph under a fixed budget for downstream reasoning and human inspection. During training, an explicit graph view is materialized to interface with a frozen reasoner for question-answering supervision. At inference time, retrieval is performed in latent space and only the retrieved subgraph is externalized. Experiments on long-horizon benchmarks across multiple model scales show that LatentGraphMem consistently outperforms representative explicit-graph and latent-memory baselines, while enabling parameter-efficient adaptation and flexible scaling to larger reasoners without introducing large symbolic artifacts.

preprint2026arXiv

IntraStyler: Exemplar-based Style Synthesis for Cross-modality Domain Adaptation

Image-level domain alignment is the de facto approach for unsupervised domain adaptation, where unpaired image translation is used to minimize the domain gap. Prior studies mainly focus on the domain shift between the source and target domains, whereas the intra-domain variability remains under-explored. To address the latter, an effective strategy is to diversify the styles of the synthetic target domain data during image translation. However, previous methods typically require intra-domain variations to be pre-specified for style synthesis, which may be impractical. In this paper, we propose an exemplar-based style synthesis method named IntraStyler, which can capture diverse intra-domain styles without any prior knowledge. Specifically, IntraStyler uses an exemplar image to guide the style synthesis such that the output style matches the exemplar style. To extract the style-only features, we introduce a style encoder to learn styles discriminatively based on contrastive learning. We evaluate the proposed method on the largest public dataset for cross-modality domain adaptation, CrossMoDA 2023. Our experiments show the efficacy of our method in controllable style synthesis and the benefits of diverse synthetic data for downstream segmentation. Code is available at https://github.com/han-liu/IntraStyler.

preprint2026arXiv

Lance: Unified Multimodal Modeling by Multi-Task Synergy

We present Lance, a lightweight native unified model supporting multimodal understanding, generation, and editing for both images and videos. Rather than relying on model capacity scaling or text-image-dominant designs, Lance explores a practical paradigm for unified multimodal modeling via collaborative multi-task training. It is grounded in two core principles: unified context modeling and decoupled capability pathways. Specifically, Lance is trained from scratch and employs a dual-stream mixture-of-experts architecture on shared interleaved multimodal sequences, enabling joint context learning while decoupling the pathways for understanding and generation. We further introduce modality-aware rotary positional encoding to mitigate interference among heterogeneous visual tokens and boost cross-task alignment. During training, Lance adopts a staged multi-task training paradigm with capability-oriented objectives and adaptive data scheduling to strengthen both semantic comprehension and visual generation performance. Experimental results demonstrate that Lance substantially outperforms existing open-source unified models in image and video generation, while retaining strong multimodal understanding capabilities. The homepage is available at https://lance-project.github.io.

preprint2026arXiv

LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing

Large language model (LLM) routing assigns each query to the most suitable model from an ensemble. We introduce LLMRouterBench, a large-scale benchmark and unified framework for LLM routing. It comprises over 400K instances from 21 datasets and 33 models. Moreover, it provides comprehensive metrics for both performance-oriented routing and performance-cost trade-off routing, and integrates 10 representative routing baselines. Using LLMRouterBench, we systematically re-evaluate the field. While confirming strong model complementarity-the central premise of LLM routing-we find that many routing methods exhibit similar performance under unified evaluation, and several recent approaches, including commercial routers, fail to reliably outperform a simple baseline. Meanwhile, a substantial gap remains to the Oracle, driven primarily by persistent model-recall failures. We further show that backbone embedding models have limited impact, that larger ensembles exhibit diminishing returns compared to careful model curation, and that the benchmark also enables latency-aware analysis. All code and data are available at https://github.com/ynulihao/LLMRouterBench.

preprint2026arXiv

Neural Operators for Biomedical Spherical Heterogeneity

Spherical deep learning has been widely applied to a broad range of real-world problems. Existing approaches often face challenges in balancing strong spherical geometric inductive biases with the need to model real-world heterogeneity. To solve this while retaining spherical geometry, we first introduce a designable Green's function framework (DGF) to provide new spherical operator solution strategy: Design systematic Green's functions under rotational group. Based on DGF, to model biomedical heterogeneity, we propose Green's-Function Spherical Neural Operator (GSNO) fusing 3 operator solutions: (1) Equivariant Solution derived from Equivariant Green's Function for symmetry-consistent modeling; (2) Invariant Solution derived from Invariant Green's Function to eliminate nuisance heterogeneity, e.g., consistent background field; (3) Anisotropic Solution derived from Anisotropic Green's Function to model anisotropic systems, especially fibers with preferred direction. Therefore, the resulting model, GSNO can adapt to real-world heterogeneous systems with nuisance variability and anisotropy while retaining spectral efficiency. Evaluations on spherical MNIST, Shallow Water Equation, diffusion MRI fiber prediction, cortical parcellation and molecule structure modeling demonstrate the superiority of GSNO.

preprint2026arXiv

PERSEUS: Perception with Semantic Endoscopic Understanding and SLAM

Purpose: Natural orifice surgeries minimize the need for incisions and reduce the recovery time compared to open surgery; however, they require a higher level of expertise due to visualization and orientation challenges. We propose a perception pipeline for these surgeries that allows semantic scene understanding. Methods: We bring learning-based segmentation, depth estimation, and 3D reconstruction modules together to create real-time segmented maps of the surgical scenes. Additionally, we use registration with robot poses to solve the scale ambiguity of mapping from monocular images, and allow the use of semantically informed real-time reconstructions in robotic surgeries. Results: We achieve sub-milimeter reconstruction accuracy based on average one-sided Chamfer distances, average pose registration RMSE of 0.9 mm, and an estimated scale within 2% of ground truth. Conclusion: We present a modular perception pipeline, integrating semantic segmentation with real-time monocular SLAM for natural orifice surgeries. This pipeline offers a promising solution for scene understanding that can facilitate automation or surgeon guidance.

preprint2026arXiv

R$^3$D: Regional-guided Residual Radar Diffusion

Millimeter-wave radar enables robust environment perception in autonomous systems under adverse conditions yet suffers from sparse, noisy point clouds with low angular resolution. Existing diffusion-based radar enhancement methods either incur high learning complexity by modeling full LiDAR distributions or fail to prioritize critical structures due to uniform regional processing. To address these issues, we propose R3D, a regional-guided residual radar diffusion framework that integrates residual diffusion modeling-focusing on the concentrated LiDAR-radar residual encoding complementary high-frequency details to reduce learning difficulty-and sigma-adaptive regional guidance-leveraging radar-specific signal properties to generate attention maps and applying lightweight guidance only in low-noise stages to avoid gradient imbalance while refining key regions. Extensive experiments on the ColoRadar dataset demonstrate that R3D outperforms state-of-the-art methods, providing a practical solution for radar perception enhancement. Our anonymous code and pretrained models are released here: https://anonymous.4open.science/r/r3d-F836

preprint2026arXiv

Radio AGN feedback sustains quiescence only in a minority of massive galaxies

Radio active galactic nuclei (AGNs) eject a huge amount of energy into the surrounding medium and are thought to potentially prevent gas cooling and maintain the quiescence of massive galaxies. The short-lived, sporadic, and anisotropic nature of radio activities, coupled with the detection of abundant cold gas around some massive quiescent galaxies, raise questions about the efficiency of radio feedback in massive galaxies. Here we present an innovative method rooted in artificial intelligence to separate galaxies in which radio feedback is effective (RFE), regardless of current radio emission, from those in which radio feedback is ineffective (RFI), according to their optical images. Galaxies categorized as RFE are all dynamically hot, whereas quiescent RFI (RFI-Q) galaxies usually have extended cold-disk components. At given stellar mass, dark matter halos hosting RFE galaxies are between four to ten times more massive than those of RFI-Q galaxies. We find, for the first time, that almost all RFE galaxies have scant cold gas, irrespective of AGN activity. In contrast, many RFI-Q galaxies are surrounded by substantial amounts of condensed atomic gas, indicating a different evolutionary path from RFE galaxies. Our finding provides direct and compelling evidence that a radio AGN has gone through about 300 on-off cycles and that radio feedback can prevent gas cooling over a timescale much longer than that of radio activity. Contrary to general belief, our analysis shows that only a small fraction of massive galaxies are influenced by strong radio AGNs, suggesting that current galaxy formation models need serious revision.

preprint2026arXiv

ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack

Large Language Models (LLMs) have enabled the development of powerful agentic systems capable of automating complex workflows across various fields. However, these systems are highly vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external data can hijack agent behavior. In this work, we present ReasAlign, a model-level solution to improve safety alignment against indirect prompt injection attacks. The core idea of ReasAlign is to incorporate structured reasoning steps to analyze user queries, detect conflicting instructions, and preserve the continuity of the user's intended tasks to defend against indirect injection attacks. To further ensure reasoning logic and accuracy, we introduce a test-time scaling mechanism with a preference-optimized judge model that scores reasoning steps and selects the best trajectory. Comprehensive evaluations across various benchmarks show that ReasAlign maintains utility comparable to an undefended model while consistently outperforming Meta SecAlign, the strongest prior guardrail. On the representative open-ended CyberSecEval2 benchmark, which includes multiple prompt-injected tasks, ReasAlign achieves 94.6% utility and only 3.6% ASR, far surpassing the state-of-the-art defensive model of Meta SecAlign (56.4% utility and 74.4% ASR). These results demonstrate that ReasAlign achieves the best trade-off between security and utility, establishing a robust and practical defense against prompt injection attacks in real-world agentic systems. Our code and experimental results could be found at https://github.com/leolee99/ReasAlign.

preprint2026arXiv

Rethinking Popularity Bias in Collaborative Filtering via Analytical Vector Decomposition

Popularity bias fundamentally undermines the personalization capabilities of collaborative filtering (CF) models, causing them to disproportionately recommend popular items while neglecting users' genuine preferences for niche content. While existing approaches treat this as an external confounding factor, we reveal that popularity bias is an intrinsic geometric artifact of Bayesian Pairwise Ranking (BPR) optimization in CF models. Through rigorous mathematical analysis, we prove that BPR systematically organizes item embeddings along a dominant "popularity direction" where embedding magnitudes directly correlate with interaction frequency. This geometric distortion forces user embeddings to simultaneously handle two conflicting tasks-expressing genuine preference and calibrating against global popularity-trapping them in suboptimal configurations that favor popular items regardless of individual tastes. We propose Directional Decomposition and Correction (DDC), a universally applicable framework that surgically corrects this embedding geometry through asymmetric directional updates. DDC guides positive interactions along personalized preference directions while steering negative interactions away from the global popularity direction, disentangling preference from popularity at the geometric source. Extensive experiments across multiple BPR-based architectures demonstrate that DDC significantly outperforms state-of-the-art debiasing methods, reducing training loss to less than 5% of heavily-tuned baselines while achieving superior recommendation quality and fairness. Code is available in https://github.com/LingFeng-Liu-AI/DDC.

preprint2026arXiv

Searth Transformer: A Transformer Architecture Incorporating Earth's Geospheric Physical Priors for Global Mid-Range Weather Forecasting

Accurate global medium-range weather forecasting is fundamental to Earth system science. Most existing Transformer-based forecasting models adopt vision-centric architectures that neglect the Earth's spherical geometry and zonal periodicity. In addition, conventional autoregressive training is computationally expensive and limits forecast horizons due to error accumulation. To address these challenges, we propose the Shifted Earth Transformer (Searth Transformer), a physics-informed architecture that incorporates zonal periodicity and meridional boundaries into window-based self-attention for physically consistent global information exchange. We further introduce a Relay Autoregressive (RAR) fine-tuning strategy that enables learning long-range atmospheric evolution under constrained memory and computational budgets. Based on these methods, we develop YanTian, a global medium-range weather forecasting model. YanTian achieves higher accuracy than the high-resolution forecast of the European Centre for Medium-Range Weather Forecasts and performs competitively with state-of-the-art AI models at one-degree resolution, while requiring roughly 200 times lower computational cost than standard autoregressive fine-tuning. Furthermore, YanTian attains a longer skillful forecast lead time for Z500 (10.3 days) than HRES (9 days). Beyond weather forecasting, this work establishes a robust algorithmic foundation for predictive modeling of complex global-scale geophysical circulation systems, offering new pathways for Earth system science.

preprint2026arXiv

Understanding Prompt Management in GitHub Repositories: A Call for Best Practices

The rapid adoption of foundation models (e.g., large language models) has given rise to promptware, i.e., software built using natural language prompts. Effective management of prompts, such as organization and quality assurance, is essential yet challenging. In this study, we perform an empirical analysis of 24,800 open-source prompts from 92 GitHub repositories to investigate prompt management practices and quality attributes. Our findings reveal critical challenges such as considerable inconsistencies in prompt formatting, substantial internal and external prompt duplication, and frequent readability and spelling issues. Based on these findings, we provide actionable recommendations for developers to enhance the usability and maintainability of open-source prompts within the rapidly evolving promptware ecosystem.

preprint2026arXiv

Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion

Recent breakthroughs of transformer-based diffusion models, particularly with Multimodal Diffusion Transformers (MMDiT) driven models like FLUX and Qwen Image, have facilitated thrilling experiences in text-to-image generation and editing. To understand the internal mechanism of MMDiT-based models, existing methods tried to analyze the effect of specific components like positional encoding and attention layers. Yet, a comprehensive understanding of how different blocks and their interactions with textual conditions contribute to the synthesis process remains elusive. In this paper, we first develop a systematic pipeline to comprehensively investigate each block's functionality by removing, disabling and enhancing textual hidden-states at corresponding blocks. Our analysis reveals that 1) semantic information appears in earlier blocks and finer details are rendered in later blocks, 2) removing specific blocks is usually less disruptive than disabling text conditions, and 3) enhancing textual conditions in selective blocks improves semantic attributes. Building on these observations, we further propose novel training-free strategies for improved text alignment, precise editing, and acceleration. Extensive experiments demonstrated that our method outperforms various baselines and remains flexible across text-to-image generation, image editing, and inference acceleration. Our method improves T2I-Combench++ from 56.92% to 63.00% and GenEval from 66.42% to 71.63% on SD3.5, without sacrificing synthesis quality. These results advance understanding of MMDiT models and provide valuable insights to unlock new possibilities for further improvements.

preprint2026arXiv

Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Test

As world models gain momentum in Embodied AI, an increasing number of works explore using video foundation models as predictive world models for downstream embodied tasks like 3D prediction or interactive generation. However, before exploring these downstream tasks, video foundation models still have two critical questions unanswered: (1) whether their generative generalization is sufficient to maintain perceptual fidelity in the eyes of human observers, and (2) whether they are robust enough to serve as a universal prior for real-world embodied agents. To provide a standardized framework for answering these questions, we introduce the Embodied Turing Test benchmark: WoW-World-Eval (Wow,wo,val). Building upon 609 robot manipulation data, Wow-wo-val examines five core abilities, including perception, planning, prediction, generalization, and execution. We propose a comprehensive evaluation protocol with 22 metrics to assess the models' generation ability, which achieves a high Pearson Correlation between the overall score and human preference (>0.93) and establishes a reliable foundation for the Human Turing Test. On Wow-wo-val, models achieve only 17.27 on long-horizon planning and at best 68.02 on physical consistency, indicating limited spatiotemporal consistency and physical reasoning. For the Inverse Dynamic Model Turing Test, we first use an IDM to evaluate the video foundation models' execution accuracy in the real world. However, most models collapse to $\approx$ 0% success, while WoW maintains a 40.74% success rate. These findings point to a noticeable gap between the generated videos and the real world, highlighting the urgency and necessity of benchmarking World Model in Embodied AI.

preprint2025arXiv

EndoStreamDepth: Temporally Consistent Monocular Depth Estimation for Endoscopic Video Streams

This work presents EndoStreamDepth, a monocular depth estimation framework for endoscopic video streams. It provides accurate depth maps with sharp anatomical boundaries for each frame, temporally consistent predictions across frames, and real-time throughput. Unlike prior work that uses batched inputs, EndoStreamDepth processes individual frames with a temporal module to propagate inter-frame information. The framework contains three main components: (1) a single-frame depth network with endoscopy-specific transformation to produce accurate depth maps, (2) multi-level Mamba temporal modules that leverage inter-frame information to improve accuracy and stabilize predictions, and (3) a hierarchical design with comprehensive multi-scale supervision, where complementary loss terms jointly improve local boundary sharpness and global geometric consistency. We conduct comprehensive evaluations on two publicly available colonoscopy depth estimation datasets. Compared to state-of-the-art monocular depth estimation methods, EndoStreamDepth substantially improves performance, and it produces depth maps with sharp, anatomically aligned boundaries, which are essential to support downstream tasks such as automation for robotic surgery. The code is publicly available at https://github.com/MedICL-VU/EndoStreamDepth

preprint2025arXiv

From Target Tracking to Targeting Track -- Part II: Regularized Polynomial Trajectory Optimization

Target tracking entails the estimation of the evolution of the target state over time, namely the target trajectory. Different from the classical state space model, our series of studies, including this paper, model the collection of the target state as a stochastic process (SP) that is further decomposed into a deterministic part which represents the trend of the trajectory and a residual SP representing the residual fitting error. Subsequently, the tracking problem is formulated as a learning problem regarding the trajectory SP for which a key part is to estimate a trajectory FoT (T-FoT) best fitting the measurements in time series. For this purpose, we consider the polynomial T-FoT and address the regularized polynomial T-FoT optimization employing two distinct regularization strategies seeking trade-off between the accuracy and simplicity. One limits the order of the polynomial and then the best choice is determined by grid searching in a narrow, bounded range while the other adopts $\ell_0$ norm regularization for which the hybrid Newton solver is employed. Simulation results obtained in both single and multiple maneuvering target scenarios demonstrate the effectiveness of our approaches.

preprint2025arXiv

Impact of helium ion irradiation on the thermal properties of superconducting nanowire single-photon detectors

SNSPDs are indispensable for applications ranging from quantum information processing to deep-space optical communications, owing to their high detection efficiency, low dark counts, and excellent timing resolution. However, further improving the intrinsic detection efficiency (IDE) remains crucial for optimizing SNSPD performance. Ion irradiation has recently emerged as a powerful post-fabrication method to enhance SNSPD characteristics. Here, we studied the effects of He-ion irradiation on the thermal properties of NbN SNSPDs. We systematically examine the evolution of thermal boundary conductance as a function of ion fluence (0-1.1E17 ions/cm2), observing a 57% decrease from 127 to 54 W/m^2K^4 with increasing fluence, followed by saturation at approximately 9E16 ions/cm2. At this fluence, the minimum hotspot relaxation time measurements indicate a 41% increase, rising from 17 to 24 ps,while the electron-phonon interaction time decreases, with the magnitude of change depending on temperature and sample batch.TEM reveals defect formation at the NbN/SiO2 interface (6-8 nm) and He-bubble formation within the SiO2 layer (30-260 nm), contributing to the extended thermal relaxation time. These irradiation-induced modifications play a key role in enhancing the IDE of the treated devices. We further demonstrate a post-irradiation SNSPD showing a saturated IDE plateau at 2000 nm from 2.7 K to 28 mK, enabled by thermal modifications and a weakly wavelength-dependent avalanche-assisted mechanism. Our findings highlight ion irradiation as a valuable tool for thermal tailoring in SNSPDs and advance the understanding of detection physics and defect engineering in superconducting optoelectronics.

preprint2025arXiv

Inferring Solar Magnetic fields From the Polarization of the Mg II h and k Lines

The polarization of the Mg II h and k lines holds significant diagnostic potential for measuring chromospheric magnetic fields, which are crucial for understanding the physical processes governing the energy transport and dissipation in the solar upper atmosphere, as well as the subsequent heating of the chromosphere and corona. The Chromospheric Layer Spectropolarimeter was launched twice in 2019 and 2021, successfully acquiring spectropolarimetric observations across the Mg II h and k lines. The analysis of these observations confirms the capability of these lines for inferring magnetic fields in the upper chromosphere. In this review, we briefly introduce the physical mechanisms behind the polarization of the Mg II h and k lines, including the joint action of the Zeeman and Hanle effects, the magneto-optical effect, partial frequency redistribution, and atomic level polarization. We also provide an overview of recent progress in the interpretation of the Stokes profiles of the Mg II h and k lines.

preprint2025arXiv

Resolving exciton and polariton multi-particle correlations in an optical microcavity in the strong coupling regime

Multi-particle correlations of exciton-polaritons and reservoir-excitons in the strong light-matter coupling regime dictate the quantum dynamics of optical microcavities. In this letter, we examine the many-body exciton-polariton dynamics in a Fabry-Pérot microcavity of a two-dimensional metal-halide semiconductor over timescales involving polariton ($\ll 1$\,ps) and exciton ($\gg 1$\,ps) scattering. We find enhanced exciton nonlinear dynamics in the microcavity versus the bare semiconductor, concomitant with ultrafast polariton scattering dynamics. We measure, by means of coherent spectroscopy, the coupling between exciton-polaritons, bright excitons, and reservoir-excitons that highlight the complex scattering landscape that fundamentally drives polariton condensation.

preprint2025arXiv

SALP-CG: Standard-Aligned LLM Pipeline for Classifying and Grading Large Volumes of Online Conversational Health Data

Online medical consultations generate large volumes of conversational health data that often embed protected health information, requiring robust methods to classify data categories and assign risk levels in line with policies and practice. However, existing approaches lack unified standards and reliable automated methods to fulfill sensitivity classification for such conversational health data. This study presents a large language model-based extraction pipeline, SALP-CG, for classifying and grading privacy risks in online conversational health data. We concluded health-data classification and grading rules in accordance with GB/T 39725-2020. Combining few-shot guidance, JSON Schema constrained decoding, and deterministic high-risk rules, the backend-agnostic extraction pipeline achieves strong category compliance and reliable sensitivity across diverse LLMs. On the MedDialog-CN benchmark, models yields robust entity counts, high schema compliance, and accurate sensitivity grading, while the strongest model attains micro-F1=0.900 for maximum-level prediction. The category landscape stratified by sensitivity shows that Level 2-3 items dominate, enabling re-identification when combined; Level 4-5 items are less frequent but carry outsize harm. SALP-CG reliably helps classify categories and grading sensitivity in online conversational health data across LLMs, offering a practical method for health data governance. Code is available at https://github.com/dommii1218/SALP-CG.

preprint2025arXiv

Static Spherical Vacuum Solution to Bumblebee Gravity with Time-like VEVs

The static spherical vacuum solution in a bumblebee gravity model where the bumblebee field \(B_μ\) has a one-component time-like vacuum expectation value \(b_μ\) is studied. We show that in general curved space-time solutions are not allowed and only the Minkowski space-time exists. However, it is surprising that non-trivial solutions can be obtained so long as a unique condition for the vacuum expectation \(b^2\equiv -b^μb_μ=2/κ\), where \(κ=8πG\), is satisfied. We argue that naturally these solutions are not stable since quantum corrections would invalidate the likely numerical coincidence, unless there are some unknown \emph{fine-tuning} mechanisms preventing any deviation from this condition. Nevertheless, the naked singularities and the photon sphere of these novel but peculiar solutions are discussed, and we show that the extremal Reissner-Nordstr{ö}m solution is a limit of one of our solutions.

preprint2024arXiv

Interplay of Quantum Resources in Nonlocality Tests

Nonlocality, evidenced by the violation of Bell inequalities, not only signifies entanglement but also highlights measurement incompatibility in quantum systems. Utilizing the generalized Clauser-Horne-Shimony-Holt (CHSH) Bell inequality, our high-efficiency optical setup achieves a loophole-free violation of $2.0132$. This result provides a device-independent lower bound on entanglement, quantified as the entanglement of formation at $0.0159$. Moreover, by tuning the parameters of the generalized Bell inequality, we enhance the estimation of measurement incompatibility, which is quantified by an effective overlap of $4.3883 \times 10^{-5}$. To explore the intricate interplay among nonlocality, entanglement, and measurement incompatibility, we generate mixed states, allowing for flexible modulation of entanglement via fast switching among the four Bell states using Pockels cells, achieving a fidelity above $99.10\%$. Intriguingly, our results reveal a counterintuitive relationship where increasing incompatibility initially boosts nonlocality but eventually leads to its reduction. Typically, maximal nonlocality does not coincide with maximal incompatibility. This experimental study sheds light on the optimal management of quantum resources for Bell-inequality-based quantum information processing.