Source author record

Senkang Hu

Senkang Hu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Artificial Intelligence Machine Learning Computation and Language Computer Vision Robotics

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs

When a chest X-ray shows consolidation but the question asks which finding is present, a medical vision-language model may answer "No consolidation." This is more than an incorrect choice: it is a polarity reversal that emits a clinical statement contradicting the image. We study this failure as negated-option attraction, where a model is drawn to a negated answer option even when it conflicts with both the visual evidence and the question. We introduce CXR-ContraBench (Chest X-Ray Contradiction Benchmark), a diagnostic benchmark spanning internal ReXVQA slices and external OpenI and CheXpert protocols. The benchmark centers on present-finding questions, where selecting "No X" despite visible X creates the main clinical risk, and uses absent-finding questions as secondary tests of whether models copy negated wording. Across CheXpert protocols, the failure is substantial and persistent. On a strict direct presence probe, MedGemma and Qwen2.5-VL reach only 31.49% and 30.21% accuracy, respectively; on a matched 135,754-record CheXpert training-split protocol, both models select negated options on over 62% of presence questions. Chain-of-thought prompting reduces some presence-side reversals but does not eliminate them and can amplify absence-side contradictions. Finally, QCCV-Neg (Question-Conditioned Consistency Verifier for Negation) deterministically repairs the measured polarity-confused subset without retraining, raising MedGemma and Qwen2.5-VL to 96.60% and 95.32% accuracy on the direct presence probe. These results show that standard accuracy can hide a clinically meaningful inference-time polarity failure. Source code and benchmark construction scripts are available at https://github.com/fangzr/cxr-contrabench-code.

preprint2026arXiv

HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts

While federated learning (FL) enables fine-tuning of large language models (LLMs) without compromising data privacy, the substantial size of an LLM renders on-device training impractical for resource-constrained clients, such as mobile devices. Thus, Mixture-of-Experts (MoE) models have emerged as a computation-efficient solution, which activates only a sparse subset of experts during model training to reduce computing burden without sacrificing performance. Though integrating MoE into FL fine-tuning holds significant potential, it still encounters three key challenges: i) selecting appropriate experts for clients remains challenging due to the lack of a reliable metric to measure each expert's impact on local fine-tuning performance, ii) the heterogeneous computing resources across clients severely hinder MoE-based LLM fine-tuning, as dynamic expert activations across diverse input samples can overwhelm resource-constrained devices, and iii) client-specific expert subsets and routing preference undermine global aggregation, where misaligned expert updates and inconsistent gating networks in troduce destructive interference. To address these challenges, we propose HFedMoE, a heterogeneous MoE-based FL fine-tuning framework that customizes a subset of experts to each client for computation-efficient LLM fine-tuning. Specifically, HFedMoE identifies the expert importance based on its contributions to fine-tuning performance, and then adaptively selects a subset of experts from an information bottleneck perspective to align with each client' s computing budget. A sparsity-aware model aggregation strategy is also designed to aggregate the actively fine-tuned experts and gating parameters with importance weighted contributions. Extensive experiments demonstrate that HFedMoE outperforms state-of-the-art benchmarks in training accuracy and convergence speed.

preprint2026arXiv

Pelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action

We present Pelican-Unified 1.0, the first embodied foundation model trained according to the principle of unification. Pelican-Unified 1.0 uses a single VLM as a unified understanding module, mapping scenes, instructions, visual contexts, and action histories into a shared semantic space. The same VLM also serves as a unified reasoning module, autoregressively producing task-, action-, and future-oriented chains of thought in a single forward pass and projecting the final hidden state into a dense latent variable. A Unified Future Generator (UFG) then conditions on this latent variable and jointly generates future videos and future actions through two modality-specific output heads within the same denoising process. The language, video, and action losses are all backpropagated into the shared representation, enabling the model to jointly optimize understanding, reasoning, imagination, and action during training, rather than training three isolated expert systems. Experiments demonstrate that unification does not imply compromise. With a single checkpoint, Pelican-Unified 1.0 achieves strong performance across all three capabilities: 64.7 on eight VLM benchmarks, the best among comparable-scale models; 66.03 on WorldArena, ranking first; and 93.5 on RoboTwin, the second-best average among compared action methods. These results show that the unified paradigm succeeds in preserving specialist strength while bringing understanding, reasoning, imagination, and action into one model.

preprint2026arXiv

Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers

Long-horizon LLM agents depend on intermediate information-gathering turns, yet training feedback is usually observed only at the final answer, because process-level rewards require high-quality human annotation. Existing turn-level shaping methods reward turns that increase the likelihood of a gold answer, but they require answer supervision or stable task-specific verifiers. Conversely, label-free RL methods extract self-signals from output distributions, but mainly at the answer or trajectory level and therefore cannot assign credit to intermediate turns. We propose Self-Induced Outcome Potential (SIOP), which treats semantic clusters of final answers as latent future outcome states for potential-based turn-level credit assignment. For each query, SIOP samples multiple rollouts, clusters final answers into semantic outcome modes, and builds a reliability-aware target distribution over these states. It then rewards turns for increasing posterior support for reliable future states using a tractable cluster-level approximation. The objective generalizes information-potential shaping from gold-answer supervision to settings without task-specific gold verifiers while avoiding the broadcasted rollout-level advantages used by standard GRPO. We formalize the framework, characterize its supervised gold-answer limit, and show that SIOP improves average performance over verifier-free outcome-level baselines on seven search-augmented agentic reasoning benchmarks while approaching a gold-supervised outcome baseline. Code is available at https://github.com/dl-m9/SIOP.git.

preprint2026arXiv

UAV-enabled Computing Power Networks: Design and Performance Analysis under Energy Constraints

This paper presents an innovative framework that boosts computing power by utilizing ubiquitous computing power distribution and enabling higher computing node accessibility via adaptive UAV positioning, establishing a UAV-enabled Computing Power Network (UAV-CPN). In a UAV-CPN, a UAV functions as a dynamic relay, outsourcing computing tasks from the request zone to an expanded service zone with diverse computing nodes, including vehicle onboard units, edge servers, and dedicated powerful nodes. This approach has the potential to alleviate communication bottlenecks and overcome the "island effect" observed in multi-access edge computing. A significant challenge is to quantify computing power performance under complex dynamics of communication and computing. To address this challenge, we introduce task completion probability to capture the capability of UAV-CPNs for task computing. We further enhance UAV-CPN performance under a hybrid energy architecture by jointly optimizing UAV altitude and transmit power, where fuel cells and batteries collectively power both UAV propulsion and communication systems. Extensive evaluations show significant performance gains, highlighting the importance of balancing communication and computing capabilities, especially under dual-energy constraints. These findings underscore the potential of UAV-CPNs to significantly boost computing power.

preprint2026arXiv

UAV-enabled Computing Power Networks: Task Completion Probability Analysis

This paper presents an innovative framework that synergistically enhances computing performance through ubiquitous computing power distribution and dynamic computing node accessibility control via adaptive unmanned aerial vehicle (UAV) positioning, establishing UAV-enabled Computing Power Networks (UAV-CPNs). In UAV-CPNs, UAVs function as dynamic aerial relays, outsourcing tasks generated in the request zone to an expanded service zone, consisting of a diverse range of computing devices, from vehicles with onboard computational capabilities and edge servers to dedicated computing nodes. This approach has the potential to alleviate communication bottlenecks in traditional computing power networks and overcome the "island effect" observed in multi-access edge computing. However, how to quantify the network performance under the complex spatio-temporal dynamics of both communication and computing power is a significant challenge, which introduces intricacies beyond those found in conventional networks. To address this, in this paper, we introduce task completion probability as the primary performance metric for evaluating the ability of UAV-CPNs to complete ground users' tasks within specified end-to-end latency requirements. Utilizing theories from stochastic processes and stochastic geometry, we derive analytical expressions that facilitate the assessment of this metric. Our numerical results emphasize that striking a delicate balance between communication and computational capabilities is essential for enhancing the performance of UAV-CPNs. Moreover, our findings show significant performance gains from the widespread distribution of computing nodes.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint