Source author record

Yi Zhou

Yi Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.DG Artificial Intelligence Biomolecules Computation and Language Computational Engineering, Finance, and Science Data Structures and Algorithms eess.SP nucl-ex physics.ins-det Robotics

Catalog footprint

What is connected

11works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Bidirectional Channel-selective Semantic Interaction for Semi-Supervised Medical Segmentation

Semi-supervised medical image segmentation is an effective method for addressing scenarios with limited labeled data. Existing methods mainly rely on frameworks such as mean teacher and dual-stream consistency learning. These approaches often face issues like error accumulation and model structural complexity, while also neglecting the interaction between labeled and unlabeled data streams. To overcome these challenges, we propose a Bidirectional Channel-selective Semantic Interaction~(BCSI) framework for semi-supervised medical image segmentation. First, we propose a Semantic-Spatial Perturbation~(SSP) mechanism, which disturbs the data using two strong augmentation operations and leverages unsupervised learning with pseudo-labels from weak augmentations. Additionally, we employ consistency on the predictions from the two strong augmentations to further improve model stability and robustness. Second, to reduce noise during the interaction between labeled and unlabeled data, we propose a Channel-selective Router~(CR) component, which dynamically selects the most relevant channels for information exchange. This mechanism ensures that only highly relevant features are activated, minimizing unnecessary interference. Finally, the Bidirectional Channel-wise Interaction~(BCI) strategy is employed to supplement additional semantic information and enhance the representation of important channels. Experimental results on multiple benchmarking 3D medical datasets demonstrate that the proposed method outperforms existing semi-supervised approaches.

preprint2026arXiv

DST-Calib: A Dual-Path, Self-Supervised, Target-Free LiDAR-Camera Extrinsic Calibration Network

LiDAR-camera extrinsic calibration is essential for multi-modal data fusion in robotic perception systems. However, existing approaches typically rely on handcrafted calibration targets (e.g., checkerboards) or specific, static scene types, limiting their adaptability and deployment in real-world autonomous and robotic applications. This article presents the first self-supervised LiDAR-camera extrinsic calibration network that operates in an online fashion and eliminates the need for specific calibration targets. We first identify a significant generalization degradation problem in prior methods, caused by the conventional single-sided data augmentation strategy. To overcome this limitation, we propose a novel double-sided data augmentation technique that generates multi-perspective camera views using estimated depth maps, thereby enhancing robustness and diversity during training. Built upon this augmentation strategy, we design a dual-path, self-supervised calibration framework that reduces the dependence on high-precision ground truth labels and supports fully adaptive online calibration. Furthermore, to improve cross-modal feature association, we replace the traditional dual-branch feature extraction design with a difference map construction process that explicitly correlates LiDAR and camera features. This not only enhances calibration accuracy but also reduces model complexity. Extensive experiments conducted on five public benchmark datasets, as well as our own recorded dataset, demonstrate that the proposed method significantly outperforms existing approaches in terms of generalizability.

preprint2026arXiv

Exact Clique Number Manipulation via Edge Interdiction

The Edge Interdiction Clique Problem (EICP) aims to remove at most $k$ edges from a graph so as to minimize the size of the largest clique in the remaining graph. This problem captures a fundamental question in graph manipulation: which edges are structurally critical for preserving large cliques? Such a problem is also motivated by practical applications including protein function maintenance and image matching. The EICP is computationally challenging and belongs to a complexity class beyond NP. Existing approaches rely on general mixed-integer bilevel programming solvers or reformulate the problem into a single-level mixed integer linear program. However, they are still not scalable when the graph size and interdiction budget $k$ grow. To overcome this, we investigate new mixed integer linear formulations, which recast the problem into a sequence of parameterized Edge Blocker Clique Problems (EBCP). This perspective decomposes the original problem into simpler subproblems and enables tighter modeling of clique-related inequalities. Furthermore, we propose a two-stage exact algorithm, \textsc{RLCM}, which first applies problem-specific reduction techniques to shrink the graph and then solves the reduced problem using a tailored branch-and-cut framework. Extensive computational experiments on maximum clique benchmark graphs, large real-world sparse networks, and random graphs demonstrate that \textsc{RLCM} consistently outperforms existing approaches.

preprint2026arXiv

KALE-LM-Chem: Vision and Practice Toward an AI Brain for Chemistry

Recent advancements in large language models (LLMs) have demonstrated strong potential for enabling domain-specific intelligence. In this work, we present our vision for building an AI-powered chemical brain, which frames chemical intelligence around four core capabilities: information extraction, semantic parsing, knowledge-based QA, and reasoning & planning. We argue that domain knowledge and logic are essential pillars for enabling such a system to assist and accelerate scientific discovery. To initiate this effort, we introduce our first generation of large language models for chemistry: KALE-LM-Chem and KALE-LM-Chem-1.5, which have achieved outstanding performance in tasks related to the field of chemistry. We hope that our work serves as a strong starting point, helping to realize more intelligent AI and promoting the advancement of human science and technology, as well as societal development.

preprint2026arXiv

Normal Scalar Curvature Inequality on a Class of Austere Submanifolds

In this paper, we establish new normal scalar curvature inequalities on a class of austere submanifolds by proving sharper DDVV-type inequalities on associated austere subspaces. We also provide some examples of austere submanifolds in this class and point out one of them achieves the equality in our normal scalar curvature inequality everywhere. As a byproduct, we obtain a Simons-type gap theorem for closed austere submanifolds in unit spheres which belong to that class.

preprint2026arXiv

Rigidity Results for Compact Submanifolds with Pinched Ricci Curvature in Euclidean and Spherical Space Forms

For compact submanifolds in Euclidean and Spherical space forms with Ricci curvature bounded below by a function $α(n,k,H,c)$ of mean curvature, we prove that the submanifold is either isometric to the Einstein Clifford torus, or a topological sphere for the maximal bound $α(n,[\frac{n}{2}],H,c)$, or has up to $k$-th homology groups vanishing. This gives an almost complete (except for the differentiable sphere theorem) characterization of compact submanifolds with pinched Ricci curvature, generalizing celebrated rigidity results obtained by Ejiri, Xu-Tian, Xu-Gu, Xu-Leng-Gu, Vlachos, Dajczer-Vlachos.

preprint2026arXiv

Seeing Together: Multi-Robot Cooperative Egocentric Spatial Reasoning with Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) have made substantial progress in egocentric video understanding, but their ability to reason cooperatively from multiple embodied viewpoints remains largely unexplored. We study this problem through multi-robot cooperative dynamic spatial reasoning, where a model must answer spatial, temporal, visibility, and coordination questions by integrating synchronized egocentric videos from a team of moving robots. To support this setting, we introduce CoopSR, the first benchmark for this task, together with EgoTeam, a multi-robot egocentric QA dataset. EgoTeam contains 114,227 QA pairs spanning 19 question types, four difficulty tiers, and three team sizes in Habitat and iGibson, along with a real-world test set of around 2,326 QAs collected using two quadruped robots. We further propose SP-CoR (Spectral and Physics-Informed Cooperative Reasoner), an MLLM framework for fine-grained cooperative spatial reasoning. SP-CoR combines dynamics-aware multi-robot frame sampling, spectral- and physics-guided view fusion, and physics-aligned prompt distillation, enabling the model to benefit from privileged robot-pose supervision during training while requiring only egocentric videos at test time. Across 22 MLLM baselines, SP-CoR consistently improves cooperative reasoning, outperforming the strongest fine-tuned baseline by +3.87% on Habitat and +7.12% on iGibson. It also shows stronger generalization to unseen team sizes and real-world robot tests. Code can be found at https://github.com/KPeng9510/seeing-together.git.

preprint2026arXiv

Spectral point transformer for significant wave height estimation from sea clutter

This paper presents a method for estimating significant wave height (Hs) from sparse S_pectral P_oint using a T_ransformer-based approach (SPT). Based on empirical observations that only a minority of spectral points with strong power contribute to wave energy, the proposed SPT effectively integrates geometric and spectral characteristics of ocean surface waves to estimate Hs through multi-dimensional feature representation. The experiment reveals an intriguing phenomenon: the learned features of SPT align well with physical dispersion relations, where the contribution-score map of selected points is concentrated along dispersion curves. Compared to conventional vision networks that process image sequences and full spectra, SPT demonstrates superior performance in Hs regression while consuming significantly fewer computational resources. On a consumer-grade GPU, SPT completes the training of regression model for 1080 sea clutter image sequences within 4 minutes, showcasing its potential to reduce deployment costs for radar wave-measuring systems. The open-source implementation of SPT will be available at https://github.com/joeyee/spt

preprint2026arXiv

Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion

Semi-supervised remote sensing (RS) image semantic segmentation offers a promising solution to alleviate the burden of exhaustive annotation, yet it fundamentally struggles with pseudo-label drift, a phenomenon where confirmation bias leads to the accumulation of errors during training. In this work, we propose Co2S, a stable semi-supervised RS segmentation framework that synergistically fuses priors from vision-language models and self-supervised models. Specifically, we construct a heterogeneous dual-student architecture comprising two distinct ViT-based vision foundation models initialized with pretrained CLIP and DINOv3 to mitigate error accumulation and pseudo-label drift. To effectively incorporate these distinct priors, an explicit-implicit semantic co-guidance mechanism is introduced that utilizes text embeddings and learnable queries to provide explicit and implicit class-level guidance, respectively, thereby jointly enhancing semantic consistency. Furthermore, a global-local feature collaborative fusion strategy is developed to effectively fuse the global contextual information captured by CLIP with the local details produced by DINOv3, enabling the model to generate highly precise segmentation results. Extensive experiments on six popular datasets demonstrate the superiority of the proposed method, which consistently achieves leading performance across various partition protocols and diverse scenarios. Project page is available at https://xavierjiezou.github.io/Co2S/.

preprint2025arXiv

A Low Background Beta Detection System using a Time Projection Chamber

In this paper, we present a Time Projection Chamber (TPC) system for low-background beta radiation measurements. The system consists of a TPC with two-dimensional-strip readout Micromegas and an anti-coincidence detector with readout pads for cosmic ray veto. The detector system utilize an AGET-based waveform sampling system for data acquisition. The beta detection capability of the system was verified through experimental test using $^{90}$Sr beta source. Additionally, a dedicated simulation program based on Geant4 was developed to model the entire detection process, including responses to both the beta source and background radiation. Simulation results were compared with experimental data for both beta and background samples, showing good agreements. The simulation samples were utilized to optimize and train classification models for beta and background discrimination. By applying the selected model into test data, the system achieved a background rate of 0.49 $\rm cpm/cm^2$ while retaining more than 55% of $^{90}$Sr beta signals within a 7 cm diameter detection region. Further analysis revealed that approximately 70% of the background originates from environmental gamma radiation, while the remaining contribution mainly comes from intrinsic radioactivity of detector materials, particularly the FR-4 based field cage and readout plane. Based on the knowledge gained from the experiments and simulations, an optimization of the TPC system has been proposed, with simulation predicting a potential reduction of the background rate to 0.0012 $\rm cpm/cm^2$.

preprint2025arXiv

SeedFold: Scaling Biomolecular Structure Prediction

Highly accurate biomolecular structure prediction is a key component of developing biomolecular foundation models, and one of the most critical aspects of building foundation models is identifying the recipes for scaling the model. In this work, we present SeedFold, a folding model that successfully scales up the model capacity. Our contributions are threefold: first, we identify an effective width-scaling strategy for the Pairformer to increase representation capacity; second, we introduce a novel linear triangular attention that reduces computational complexity to enable efficient scaling; finally, we construct a large-scale distillation dataset to substantially enlarge the training set. Experiments on FoldBench show that SeedFold outperforms AlphaFold3 on most protein-related tasks.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint