Source author record

Mingxin Yang

Mingxin Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Graphics physics.optics

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Tango3D: Towards Alignment for Global and Local 2D-3D Correspondence

Existing 3D foundation models typically align point clouds to frozen vision-language spaces like CLIP, which achieve strong cross-modal retrieval by compressing 3D shape into a global vector. However, this global-only alignment cannot establish fine-grained pixel-to-point correspondence. To solve this, we present Tango3D, a foundation model that unifies dense correspondence and global retrieval. We use a geometry-aware 2D visual backbone and a pretrained 3D VAE to encode images into 2D patches and point clouds into 3D tokens. These are mapped into a single shared space to achieve both local pixel-to-point alignment and global semantic alignment. To stabilize the joint learning of dense and global objectives, we introduce a three-stage progressive training strategy. Experiments show our model successfully achieves object-level pixel-to-point alignment while maintaining competitive global retrieval, a joint capability not offered by existing 3D foundation models. By establishing a fine-grained alignment feature space, Tango3D injects rich semantics into purely geometric 3D tokens, paving the way for a wide range of dense 3D downstream tasks.

preprint2026arXiv

Ultralow-noise microwave oscillator via optical frequency division with a co-self-injection-locked miniature Fabry-Perot reference

Optical frequency division (OFD) provides the purest microwaves by down-converting the stability of optical cavity references. State-of-the-art references typically rely on electronic co-Pound-Drever-Hall locking to ultrahigh-Q microresonators-a complex approach that introduces servo bumps and increases footprint. Alternatively, optical co-self-injection-locking (co-SIL) offers inherent simplicity but is limited by the large thermo-refractive noise and confined mode volumes of integrated cavities. Here, we demonstrate a two-point OFD-based microwave oscillator that combines an ultrahigh-Q miniature Fabry-Perot cavity with optical co-SIL. Leveraging its low relative phase noise optical reference and combing with an integrated soliton microcomb, the system generates a microwave with phase noise of -147 dBc/Hz at 4 kHz offset (scaled to 10 GHz)-performance rivalling most electronically stabilized systems. This work marries the superior noise floor of ultrahigh-Q cavities with the simplicity of optical locking, providing a compact, cost-effective, and field-deployable path to pure microwaves for next-generation communications, radar and metrology.

preprint2022arXiv

Self-supervised Re-renderable Facial Albedo Reconstruction from Single Image

Reconstructing high-fidelity 3D facial texture from a single image is a quite challenging task due to the lack of complete face information and the domain gap between the 3D face and 2D image. Further, obtaining re-renderable 3D faces has become a strongly desired property in many applications, where the term 're-renderable' demands the facial texture to be spatially complete and disentangled with environmental illumination. In this paper, we propose a new self-supervised deep learning framework for reconstructing high-quality and re-renderable facial albedos from single-view images in-the-wild. Our main idea is to first utilize a prior generation module based on the 3DMM proxy model to produce an unwrapped texture and a globally parameterized prior albedo. Then we apply a detail refinement module to synthesize the final texture with both high-frequency details and completeness. To further make facial textures disentangled with illumination, we propose a novel detailed illumination representation which is reconstructed with the detailed albedo together. We also design several novel regularization losses on both the albedo and illumination maps to facilitate the disentanglement of these two factors. Finally, by leveraging a differentiable renderer, each face attribute can be jointly trained in a self-supervised manner without requiring ground-truth facial reflectance. Extensive comparisons and ablation studies on challenging datasets demonstrate that our framework outperforms state-of-the-art approaches.