Source author record

Yi Yang

Yi Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Computation and Language Artificial Intelligence astro-ph.HE astro-ph.SR Graphics math.GN Robotics

Catalog footprint

What is connected

11works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A binary merger product as the direct progenitor of a Type II-P supernova

Type II-P supernovae (SNe II-P) are the most common class of core-collapse SNe in the local Universe and play critical roles in many aspects of astrophysics. Since decades ago theorists have predicted that SNe II-P may originate not only from single stars but also from interacting binaries. While ~20 SNII-P progenitors have been directly detected on pre-explosion images, observational evidence still remains scarce for this speculated binary progenitor channel. In this work, we report the discovery of a red supergiant progenitor for the Type II-P SN 2018gj. While the progenitor resembles those of other SNe II-P in terms of effective temperature and luminosity, it is located in a very old environment and SN 2018gj has an abnormally short plateau in the light curve. With state-of-the-art binary evolution simulations, we find these characteristics can only be explained if the progenitor of SN 2018gj is the merger product of a close binary system, which developed a different interior structure and evolved over a longer timescale compared with single-star evolution. This work provides the first compelling evidence for the long-sought binary progenitor channel toward SNe II-P, and our methodology serves as an innovative and pragmatic tool to motivate further investigations into this previously hidden population of SNe II-P from binaries.

preprint2026arXiv

A Classification of Fractal Squares

Let $λ_K:\bbR^2\rightarrow\{0,1,\ldots\}\cup\{\infty\}$ be the lambda function of a planar comapctum $K$, as defined in MR4488162. It is known that a planar continuum is locally connected if and only if its lambda function vanishes everywhere, or equivalently, $λ_K(K)=\{0\}$. In this article we show that every fractal square $K$ satisfies $λ_K(K)\subset\{0,1\}$ and find criterions to classify when $λ_K(K)$ equals $\{0\}$, $\{1\}$ or $\{0,1\}$. Here for any integer $N\ge2$ and any set $\Dc=\left\{(i,j): 0\le i,j\le N-1\right\}$ with cardinality $\ge2$, if we set $K^{(0)}=[0,1]^2$ and $\displaystyle K^{(n)}=\left\{\frac{x+d}{N}: x\in K^{(n-1)}, d\in\Dc\right\}(n\ge1)$ then $K=\bigcap_nK^{(n)}$ is called a fractal square.

preprint2026arXiv

A Highly Magnetic Ultra Massive White Dwarf with a 23-minute Rotation Period

We present a physical characterization of TMTS J00063798+3104160 (J0006), a rapidly rotating,ultra-massive white dwarf (WD) identified in high-cadence light curves from the Tsinghua University-Ma Huateng Telescope for Survey (TMTS). A coherent 23-minute periodicity is detected in TMTS, TESS, and ZTF photometry. A time series of low-resolution spectra with the Keck-I 10 m telescope reveals broad, shallow hydrogen absorption features indicative of an extreme magnetic field and shows no evidence for radial-velocity variations. Atmospheric modeling yields a magnetic field strength of $\sim$ 250 MG, while Gaia astrometry and photometry imply a mass of 1.06 $\pm$ 0.01 M$_{\odot}$. A significant infrared excess is detected in the WISE W1 band and is well fitted by a 550 K blackbody, likely arising from residual material of a merger. We interpret the 23-minute photometric modulation as the rotation period of an isolated, massive WD formed likely through the merger of a double WD binary. With one of the shortest rotation periods known among candidate merger remnants and with constraints from a deep Einstein Probe X-ray nondetection, J0006 provides a rare and important observational window into the poorly explored intermediate stages of post-merger evolution.

preprint2026arXiv

CktGen: Automated Analog Circuit Design with Generative Artificial Intelligence

The automatic synthesis of analog circuits presents significant challenges. Most existing approaches formulate the problem as a single-objective optimization task, overlooking that design specifications for a given circuit type vary widely across applications. To address this, we introduce specification-conditioned analog circuit generation, a task that directly generates analog circuits based on target specifications. The motivation is to leverage existing well-designed circuits to improve automation in analog circuit design. Specifically, we propose CktGen, a simple yet effective variational autoencoder that maps discretized specifications and circuits into a joint latent space and reconstructs the circuit from that latent vector. Notably, as a single specification may correspond to multiple valid circuits, naively fusing specification information into the generative model does not capture these one-to-many relationships. To address this, we decouple the encoding of circuits and specifications and align their mapped latent space. Then, we employ contrastive training with a filter mask to maximize differences between encoded circuits and specifications. Furthermore, classifier guidance along with latent feature alignment promotes the clustering of circuits sharing the same specification, avoiding model collapse into trivial one-to-one mappings. By canonicalizing the latent space with respect to specifications, we can search for an optimal circuit that meets valid target specifications. We conduct comprehensive experiments on the open circuit benchmark and introduce metrics to evaluate cross-model consistency. Experimental results demonstrate that CktGen achieves substantial improvements over state-of-the-art methods.

preprint2026arXiv

CM-EVS: Sparse Panoramic RGB-D-Pose Data for Complete Scene Coverage

Modern 3D visual learning relies on observations sampled from metric 3D assets, yet existing scans, meshes, point clouds, simulations, and reconstructions do not directly provide a sparse, comparable, and geometry-consistent panoramic training interface. Dense trajectories duplicate nearby views, source-specific rendering policies yield heterogeneous annotations, and sparse heuristics may miss important regions or introduce depth-inconsistent observations. We study how to convert 3D assets into sparse panoramic RGB-D-pose data that preserves complete scene coverage with low redundancy and auditable provenance. We propose COVER (Coverage-Oriented Viewpoint curation with ERP Range-depth warping), a training-free ERP viewpoint curator that projects geometry observed from selected views into candidate ERP probes, scores incremental coverage, and penalizes depth conflicts. Under bounded proxy error, its greedy coverage proxy preserves the standard coverage-style approximation behavior up to an additive error term. Using COVER, we build CM-EVS (Coverage-curated Metric ERP View Set), a panoramic RGB-D-pose dataset with 36,373 curated ERP frames from 1,275 indoor scenes across Blender indoor, HM3D, and ScanNet++, complemented by outdoor panoramas from TartanGround and OB3D re-encoded into the same schema. Each frame provides full-sphere RGB, metric range depth, calibrated pose; COVER-produced indoor frames include per-step provenance logs. With a median of only 25 frames per indoor scene, CM-EVS covers all 13 unified room types while maintaining compact scene-level coverage. Experiments show that COVER improves the coverage-conflict trade-off, making CM-EVS a sparse, compact, and auditable RGB-D-pose resource for geometry-consistent panoramic 3D learning.

preprint2026arXiv

Compositional Feature Augmentation for Unbiased Scene Graph Generation

Scene Graph Generation (SGG) aims to detect all the visual relation triplets $<$\texttt{sub}, \texttt{pred}, \texttt{obj}$>$ in a given image. With the emergence of various advanced techniques for better utilizing both the intrinsic and extrinsic information in each relation triplet, SGG has achieved great progress over the recent years. However, due to the ubiquitous long-tailed predicate distributions, today's SGG models are still easily biased to the head predicates. Currently, the most prevalent debiasing solutions for SGG are re-balancing methods, \eg, changing the distributions of original training samples. In this paper, we argue that all existing re-balancing strategies fail to increase the diversity of the relation triplet features of each predicate, which is critical for robust SGG. To this end, we propose a novel Compositional Feature Augmentation (\textbf{CFA}) strategy, which is the first unbiased SGG work to mitigate the bias issue from the perspective of increasing the diversity of triplet features. Specifically, we first decompose each relation triplet feature into two components: intrinsic feature and extrinsic feature, which correspond to the intrinsic characteristics and extrinsic contexts of a relation triplet, respectively. Then, we design two different feature augmentation modules to enrich the feature diversity of original relation triplets by replacing or mixing up either their intrinsic or extrinsic features from other samples. Due to its model-agnostic nature, CFA can be seamlessly incorporated into various SGG frameworks. Extensive ablations have shown that CFA achieves a new state-of-the-art performance on the trade-off between different metrics.

preprint2026arXiv

DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization

Recent advances in Emotional Support Conversation (ESC) have improved emotional support generation by fine-tuning Large Language Models (LLMs) via Supervised Fine-Tuning (SFT). However, common psychological errors still persist. While Direct Preference Optimization (DPO) shows promise in reducing such errors through pairwise preference learning, its effectiveness in ESC tasks is limited by two key challenges: (1) Entangled data structure: Existing ESC data inherently entangles psychological strategies and response content, making it difficult to construct high-quality preference pairs; and (2) Optimization ambiguity: Applying vanilla DPO to such entangled pairwise data leads to ambiguous training objectives. To address these issues, we introduce Inferential Preference Mining (IPM) to construct high-quality preference data, forming the IPM-PrefDial dataset. Building upon this data, we propose a Decoupled ESC framework inspired by Gross's Extended Process Model of Emotion Regulation, which decomposes the ESC task into two sequential subtasks: strategy planning and empathic response generation. Each was trained via SFT and subsequently enhanced by DPO to align with the psychological preference. Extensive experiments demonstrate that our Decoupled ESC framework outperforms joint optimization baselines, reducing preference bias and improving response quality.

preprint2026arXiv

KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

While LLMs are powerful embedding backbones, their application in training-free settings faces two structural challenges: causal attention restricts early tokens from accessing subsequent context, and the next-token prediction objective biases representations toward generation rather than semantic compression. To address these limitations, we propose KV-Embedding, a framework that activates the latent representation power of frozen LLMs. Our method leverages the observation that the key-value (KV) states of the final token at each layer encode a compressed view of the sequence. By re-routing these states as a prepended prefix, we enable all tokens to access sequence-level context within a single forward pass. To ensure model-agnostic applicability, we introduce an automated layer selection strategy based on intrinsic dimensionality. Evaluations on MTEB across Qwen, Mistral, and Llama backbones show that KV-Embedding outperforms existing training-free baselines by up to 10%, while maintaining robust performance on sequences up to 4,096 tokens. These results demonstrate that internal state manipulation offers an efficient alternative to input modification, and we hope this work encourages further exploration of LLM internals for representation learning.

preprint2026arXiv

RSATalker: Realistic Socially-Aware Talking Head Generation for Multi-Turn Conversation

Talking head generation is increasingly important in virtual reality (VR), especially for social scenarios involving multi-turn conversation. Existing approaches face notable limitations: mesh-based 3D methods can model dual-person dialogue but lack realistic textures, while large-model-based 2D methods produce natural appearances but incur prohibitive computational costs. Recently, 3D Gaussian Splatting (3DGS) based methods achieve efficient and realistic rendering but remain speaker-only and ignore social relationships. We introduce RSATalker, the first framework that leverages 3DGS for realistic and socially-aware talking head generation with support for multi-turn conversation. Our method first drives mesh-based 3D facial motion from speech, then binds 3D Gaussians to mesh facets to render high-fidelity 2D avatar videos. To capture interpersonal dynamics, we propose a socially-aware module that encodes social relationships, including blood and non-blood as well as equal and unequal, into high-level embeddings through a learnable query mechanism. We design a three-stage training paradigm and construct the RSATalker dataset with speech-mesh-image triplets annotated with social relationships. Extensive experiments demonstrate that RSATalker achieves state-of-the-art performance in both realism and social awareness. The code and dataset will be released.

preprint2026arXiv

Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization

Parallel test-time scaling typically trains separate generation and verification models, incurring high training and inference costs. We propose Advantage Decoupled Preference Optimization (ADPO), a unified reinforcement learning framework that jointly learns answer generation and self-verification within a single policy. ADPO introduces two innovations: a preference verification reward improving verification capability and a decoupled optimization mechanism enabling synergistic optimization of generation and verification. Specifically, the preference verification reward computes mean verification scores from positive and negative samples as decision thresholds, providing positive feedback when prediction correctness aligns with answer correctness. Meanwhile, the advantage decoupled optimization computes separate advantages for generation and verification, applies token masks to isolate gradients, and combines masked GRPO objectives, preserving generation quality while calibrating verification scores. ADPO achieves up to +34.1% higher verification AUC and -53.5% lower inference time, with significant gains of +2.8%/+1.4% accuracy on MathVista/MMMU, +1.9 cIoU on ReasonSeg, and +1.7%/+1.0% step success rate on AndroidControl/GUI Odyssey.

preprint2026arXiv

V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation

Recent advances in multimodal learning have significantly enhanced the reasoning capabilities of vision-language models (VLMs). However, state-of-the-art approaches rely heavily on large-scale human-annotated datasets, which are costly and time-consuming to acquire. To overcome this limitation, we introduce V-Zero, a general post-training framework that facilitates self-improvement using exclusively unlabeled images. V-Zero establishes a co-evolutionary loop by instantiating two distinct roles: a Questioner and a Solver. The Questioner learns to synthesize high-quality, challenging questions by leveraging a dual-track reasoning reward that contrasts intuitive guesses with reasoned results. The Solver is optimized using pseudo-labels derived from majority voting over its own sampled responses. Both roles are trained iteratively via Group Relative Policy Optimization (GRPO), driving a cycle of mutual enhancement. Remarkably, without a single human annotation, V-Zero achieves consistent performance gains on Qwen2.5-VL-7B-Instruct, improving visual mathematical reasoning by +1.7 and general vision-centric by +2.6, demonstrating the potential of self-improvement in multimodal systems. Code is available at https://github.com/SatonoDia/V-Zero

Yi Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

A binary merger product as the direct progenitor of a Type II-P supernova

A Classification of Fractal Squares

A Highly Magnetic Ultra Massive White Dwarf with a 23-minute Rotation Period

CktGen: Automated Analog Circuit Design with Generative Artificial Intelligence

CM-EVS: Sparse Panoramic RGB-D-Pose Data for Complete Scene Coverage

Compositional Feature Augmentation for Unbiased Scene Graph Generation

DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization

KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

RSATalker: Realistic Socially-Aware Talking Head Generation for Multi-Turn Conversation

Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization

V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation