Source author record

Shuo Ye

Shuo Ye appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision astro-ph.GA

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Seg-Agent: Test-Time Multimodal Reasoning for Training-Free Language-Guided Segmentation

Language-guided segmentation transcends the scope limitations of traditional semantic segmentation, enabling models to segment arbitrary target regions based on natural language instructions. Existing approaches typically adopt a two-stage framework: employing Multimodal Large Language Models (MLLMs) to interpret instructions and generate visual prompts, followed by foundational segmentation models (e.g., SAM) to produce masks. However, due to the limited spatial grounding capabilities of off-the-shelf MLLMs, these methods often rely on extensive training on large-scale datasets to achieve satisfactory accuracy. While recent advances have introduced reasoning mechanisms to improve performance, they predominantly operate within the textual domain, performing chain-of-thought reasoning solely based on abstract text representations without direct visual feedback. In this paper, we propose Seg-Agent, a completely training-free framework that pioneers Explicit Multimodal Chain-of-Reasoning. Unlike prior text-only reasoning, our approach constructs an interactive visual reasoning loop comprising three stages: generation, selection, and refinement. Specifically, we leverage Set-of-Mark (SoM) visual prompting to render candidate regions directly onto the image, allowing the MLLM to ``see'' and iteratively reason about spatial relationships in the visual domain rather than just the textual one. This explicit multimodal interaction enables Seg-Agent to achieve performance comparable to state-of-the-art training-based methods without any parameter updates. Furthermore, to comprehensively evaluate generalization across diverse scenarios, we introduce Various-LangSeg, a novel benchmark covering explicit semantic, generic object, and reasoning-guided segmentation tasks. Extensive experiments demonstrate the effectiveness and robustness of our method.

preprint2025arXiv

Scalable Stellar Parameter Inference Using Python-based LASP: From CPU Optimization to GPU Acceleration

To enhance the efficiency, scalability, and cross-survey applicability of stellar parameter inference in large spectroscopic datasets, we present a modular, parallelized Python framework with automated error estimation, built on the LAMOST Atmospheric Parameter Pipeline (LASP) originally implemented in IDL. Rather than a direct code translation, this framework refactors LASP with two complementary modules: LASP-CurveFit, a new implementation of the LASP fitting procedure that runs on a CPU, preserving legacy logic while improving data I/O and multithreaded execution efficiency; and LASP-Adam-GPU, a GPU-accelerated method that introduces grouped optimization by constructing a joint residual function over multiple observed and model spectra, enabling high-throughput parameter inference across tens of millions of spectra. Applied to 10 million LAMOST spectra, the framework reduces runtime from 84 to 48 hr on the same CPU platform and to 7 hr on an NVIDIA A100 GPU, while producing results consistent with those from the original pipeline. The inferred errors agree well with the parameter variations from repeat observations of the same target (excluding radial velocities), while the official empirical errors used in LASP are more conservative. When applied to DESI DR1, our effective temperatures and surface gravities agree better with APOGEE than those from the DESI pipeline, particularly for cool giants, while the latter performs slightly better in radial velocity and metallicity. These results suggest that the framework delivers reliable accuracy, efficiency, and transferability, offering a practical approach to parameter inference in large spectroscopic surveys. The code and DESI-based catalog are available via \dataset[DOI: 10.12149/101679]{https://doi.org/10.12149/101679} and \dataset[DOI: 10.12149/101675]{https://doi.org/10.12149/101675}, respectively.

preprint2022arXiv

R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction

Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences. Existing FGVC methods usually select discriminant regions found by a trained model, which is prone to neglect other potential discriminant information. On the other hand, the massive interactions between the sequence of image patches in ViT make the resulting class-token contain lots of redundant information, which may also impacts FGVC performance. In this paper, we present a novel approach for FGVC, which can simultaneously make use of partial yet sufficient discriminative information in environmental cues and also compress the redundant information in class-token with respect to the target. Specifically, our model calculates the ratio of high-weight regions in a batch, adaptively adjusts the masking threshold and achieves moderate extraction of background information in the input space. Moreover, we also use the Information Bottleneck~(IB) approach to guide our network to learn a minimum sufficient representations in the feature space. Experimental results on three widely-used benchmark datasets verify that our approach can achieve outperforming performance than other state-of-the-art approaches and baseline models.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint