Source author record

Yutao Sun

Yutao Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence astro-ph.GA astro-ph.SR Computer Vision Machine Learning

Catalog footprint

What is connected

2works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models

Visual latent reasoning lets a multimodal large language model (MLLM) create intermediate visual evidence as continuous tokens, avoiding external tools or image generators. However, existing methods usually follow an output-as-input latent paradigm and yield unstable gains. We identify evidence for a feature-space mismatch that can contribute to this instability: dominant visual-latent models build on pre-norm MLLMs and reuse decoder hidden states as predicted latent inputs, even though these states occupy a substantially different norm regime from the input embeddings the model was trained to consume~\citep{xie2025mhc,li2026siamesenorm,team2026attention}. This mismatch can make direct latent feedback unreliable. Motivated by this diagnosis, we propose \textbf{GAP}, a \textbf{G}ranular \textbf{A}lignment \textbf{P}aradigm for visual latent modeling. GAP aligns visual latent reasoning at three levels: feature-level alignment maps decoder outputs into input-compatible visual latents through a lightweight PCA-aligned latent head; context-level alignment grounds latent targets with inspectable auxiliary visual supervision; and capacity-guided alignment assigns latent supervision selectively to examples where the base MLLM struggles. On Qwen2.5-VL 7B, the resulting model achieves the best mean aggregate perception and reasoning performance among our supervised variants. Inference-time intervention probing further suggests that generated latents provide task-relevant visual signal beyond merely adding token slots.

preprint2012arXiv

Determining gravitational wave radiation from close galaxy pairs using a binary population synthesis approach

Context. The early phase of the coalescence of supermassive black hole (SMBH) binaries from their host galaxies provides a guaranteed source of low-frequency (nHz-$μ$Hz) gravitational wave (GW) radiation by pulsar timing observations. These types of GW sources would survive the coalescing and be potentially identifiable. Aims. We aim to provide an outline of a new method for detecting GW radiation from individual SMBH systems based on the Sloan Digital Sky Survey (SDSS) observational results, which can be verified by future observations. Methods. Combining the sensitivity of the international Pulsar Timing Array (PTA) and the Square Kilometer Array (SKA) detectors, we used a binary population synthesis (BPS) approach to determine GW radiation from close galaxy pairs under the assumption that SMBHs formed at the core of merged galaxies. We also performed second post-Newtonian approximation methods to estimate the variation of the strain amplitude with time. Results. We find that the value of the strain amplitude \emph{h} varies from about $10^{-14}$ to $10^{-17}$ using the observations of 20 years, and we estimate that about 100 SMBH sources can be detected with the SKA detector.