Source author record

Xin Wen

Xin Wen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision physics.optics quant-ph physics.atom-ph Information Theory math.IT math.OC

Catalog footprint

What is connected

12works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Vision Foundation Models as Generalist Tokenizers for Image Generation

In this work, we explore the largely unexplored direction of building a generalist image tokenizer directly on top of a frozen vision foundation model (VFM). To build this tokenizer, we utilize a frozen VFM as the encoder and introduce two key innovations: (1) a region-adaptive quantization framework to eliminate spatial redundancy in standard 2D grid features, and (2) a semantic reconstruction objective that aligns the decoded outputs with the VFM's representations to preserve semantic fidelity. Grounded in these designs, we propose VFMTok, a generalist visual tokenizer capable of operating seamlessly in both discrete and continuous latent spaces. VFMTok achieves substantial improvements in synthesis quality while drastically enhancing token efficiency. For discrete autoregressive (AR) generation, it accelerates model convergence by \textbf{3 times} and achieves a state-of-the-art gFID of \textbf{1.36} on ImageNet class-conditional synthesis. Similarly, for continuous-space generation, integrating VFMTok with a denoising model yields an exceptional gFID of \textbf{1.25}. Furthermore, because the latent space inherently captures rich spatial semantics, VFMTok enables high-fidelity class-conditional synthesis without classifier-free guidance (\textbf{w/o CFG}) across both generative paradigms, significantly accelerating inference speed. Beyond these remarkable empirical results, we systematically investigate the underlying mechanisms of our approach. We discover that the specific self-supervised learning objectives utilized during VFM pre-training dictate its effectiveness as a tokenizer. Specifically, a VFM jointly optimized with global contrastive learning and latent masked image modeling provides the optimal representations for image tokenization. These insights establish a strong foundation and offer valuable guidance for the design of future image tokenizers.

preprint2022arXiv

3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow

Reconstructing 3D shape from a single 2D image is a challenging task, which needs to estimate the detailed 3D structures based on the semantic attributes from 2D image. So far, most of the previous methods still struggle to extract semantic attributes for 3D reconstruction task. Since the semantic attributes of a single image are usually implicit and entangled with each other, it is still challenging to reconstruct 3D shape with detailed semantic structures represented by the input image. To address this problem, we propose 3DAttriFlow to disentangle and extract semantic attributes through different semantic levels in the input images. These disentangled semantic attributes will be integrated into the 3D shape reconstruction process, which can provide definite guidance to the reconstruction of specific attribute on 3D shape. As a result, the 3D decoder can explicitly capture high-level semantic features at the bottom of the network, and utilize low-level features at the top of the network, which allows to reconstruct more accurate 3D shapes. Note that the explicit disentangling is learned without extra labels, where the only supervision used in our training is the input image and its corresponding 3D shape. Our comprehensive experiments on ShapeNet dataset demonstrate that 3DAttriFlow outperforms the state-of-the-art shape reconstruction methods, and we also validate its generalization ability on shape completion task.

preprint2022arXiv

Learning Deep Implicit Functions for 3D Shapes with Dynamic Code Clouds

Deep Implicit Function (DIF) has gained popularity as an efficient 3D shape representation. To capture geometry details, current methods usually learn DIF using local latent codes, which discretize the space into a regular 3D grid (or octree) and store local codes in grid points (or octree nodes). Given a query point, the local feature is computed by interpolating its neighboring local codes with their positions. However, the local codes are constrained at discrete and regular positions like grid points, which makes the code positions difficult to be optimized and limits their representation ability. To solve this problem, we propose to learn DIF with Dynamic Code Cloud, named DCC-DIF. Our method explicitly associates local codes with learnable position vectors, and the position vectors are continuous and can be dynamically optimized, which improves the representation ability. In addition, we propose a novel code position loss to optimize the code positions, which heuristically guides more local codes to be distributed around complex geometric details. In contrast to previous methods, our DCC-DIF represents 3D shapes more efficiently with a small amount of local codes, and improves the reconstruction quality. Experiments demonstrate that DCC-DIF achieves better performance over previous methods. Code and data are available at https://github.com/lity20/DCCDIF.

preprint2022arXiv

PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-step Point Moving Paths

Point cloud completion concerns to predict missing part for incomplete 3D shapes. A common strategy is to generate complete shape according to incomplete input. However, unordered nature of point clouds will degrade generation of high-quality 3D shapes, as detailed topology and structure of unordered points are hard to be captured during the generative process using an extracted latent code. We address this problem by formulating completion as point cloud deformation process. Specifically, we design a novel neural network, named PMP-Net++, to mimic behavior of an earth mover. It moves each point of incomplete input to obtain a complete point cloud, where total distance of point moving paths (PMPs) should be the shortest. Therefore, PMP-Net++ predicts unique PMP for each point according to constraint of point moving distances. The network learns a strict and unique correspondence on point-level, and thus improves quality of predicted complete shape. Moreover, since moving points heavily relies on per-point features learned by network, we further introduce a transformer-enhanced representation learning network, which significantly improves completion performance of PMP-Net++. We conduct comprehensive experiments in shape completion, and further explore application on point cloud up-sampling, which demonstrate non-trivial improvement of PMP-Net++ over state-of-the-art point cloud completion/up-sampling methods.

preprint2021arXiv

Enhancement of spin noise spectroscopy of rubidium atomic ensemble by using of the polarization squeezed light

We measured the spin noise spectroscopy (SNS) of rubidium atomic ensemble with two different atomic vapor cells (filled with the buffer gases or coated with paraffin film on the inner wall), and demonstrated the enhancement of signal to noise ratio (SNR) by using of the polarization squeezed state (PSS) of 795 nm light field with Stokes operator S2 squeezed. PSS is prepared by locking the relative phase between the squeezed vacuum state of light obtained by a sub-threshold optical parametric oscillator and the orthogonal polarized local oscillator beam by means of the quantum noise lock. Under the same conditions, PSS can be employed not only to improve SNR, but also to keep the full width at half maximum (FWHM) of SNS unchanged, compared with the case of using polarization coherent state (PCS), and the enhancement of SNR is positively correlated with the squeezing level of PSS. With the increase of probe laser power and atomic number density, the SNR and FWHM of SNS will increase correspondingly. With the help of PSS of Stokes operator S2, quantum enhancement of both SNR and FWHM of SNS signal has been demonstrated by controlling optical power of the S2 polarization squeezed light beam or atomic number density in our experiments.

preprint2020arXiv

Distilling Visual Priors from Self-Supervised Learning

Convolutional Neural Networks (CNNs) are prone to overfit small training datasets. We present a novel two-phase pipeline that leverages self-supervised learning and knowledge distillation to improve the generalization ability of CNN models for image classification under the data-deficient setting. The first phase is to learn a teacher model which possesses rich and generalizable visual representations via self-supervised learning, and the second phase is to distill the representations into a student model in a self-distillation manner, and meanwhile fine-tune the student model for the image classification task. We also propose a novel margin loss for the self-supervised contrastive learning proxy task to better learn the representation under the data-deficient scenario. Together with other tricks, we achieve competitive performance in the VIPriors image classification challenge.

preprint2020arXiv

Laser Intensity Noise Suppression for Preparing Audio-Frequency 795 nm Squeezed Vacuum State of Light at Rubidium D1 Line

Laser intensity noise suppression has essential effects on preparation and characterization of the audio-frequency squeezed vacuum state of light based on a sub-threshold optical parametric oscillator (OPO).We have implemented two feedback loops by using relevant acousto-optical modulators (AOM) to stabilize the intensity of 795-nm near infrared (NIR) fundamental laser and 397.5-nm ultraviolet (UV) laser generated by cavity-enhanced frequency doubling.Typical peak-to-peak laser intensity fluctuation with a bandwidth of $\sim10$ kHz in a half hour has been improved from $\pm7.45$$\%$ to $\pm0.06$$\%$ for 795-nm NIR laser beam, and from $\pm9.04$$\%$ to $\pm0.05$$\%$ for 397.5-nm UV laser beam, respectively. The squeezing level of the squeezed vacuum state at 795 nm prepared by the sub-threshold OPO with a PPKTP crystal has been improved from -3.3 to -4.0 dB around 3$\sim$9 kHz of audio analysis frequency range.

preprint2020arXiv

Point Cloud Completion by Skip-attention Network with Hierarchical Folding

Point cloud completion aims to infer the complete geometries for missing regions of 3D objects from incomplete ones. Previous methods usually predict the complete point cloud based on the global shape representation extracted from the incomplete input. However, the global representation often suffers from the information loss of structure details on local regions of incomplete point cloud. To address this problem, we propose Skip-Attention Network (SA-Net) for 3D point cloud completion. Our main contributions lie in the following two-folds. First, we propose a skip-attention mechanism to effectively exploit the local structure details of incomplete point clouds during the inference of missing parts. The skip-attention mechanism selectively conveys geometric information from the local regions of incomplete point clouds for the generation of complete ones at different resolutions, where the skip-attention reveals the completion process in an interpretable way. Second, in order to fully utilize the selected geometric information encoded by skip-attention mechanism at different resolutions, we propose a novel structure-preserving decoder with hierarchical folding for complete shape generation. The hierarchical folding preserves the structure of complete point cloud generated in upper layer by progressively detailing the local regions, using the skip-attentioned geometry at the same resolution. We conduct comprehensive experiments on ShapeNet and KITTI datasets, which demonstrate that the proposed SA-Net outperforms the state-of-the-art point cloud completion methods.

preprint2016arXiv

Comparison and characterization of efficient frequency doubling at 397.5 nm with PPKTP, LBO and BiBO crystals

A continuous-wave Ti:sapphire laser at 795 nm is frequency doubled in a bow-tie type enhancement four-mirror ring cavity with LiB3O5 (LBO), BiB3O6 (BiBO), and periodically polled KTiOPO4 (PPKTP) crystals, respectively. The properties of 397.5 nm ultra-violet (UV) output power, beam quality, stability for these different nonlinear crystals are investigated and compared. For PPKTP crystal, the highest doubling efficiency of 58.1% is achieved from 191 mW of 795 nm mode-matched fundamental power to 111 mW of 397.5 nm UV output. For LBO crystal, with 1.34 W of mode-matched 795 nm power, 770 mW of 397.5 nm UV output is achieved, implying a doubling efficiency of 57.4%. For BiBO crystal, with 323 mW of mode-matched 795 nm power, 116 mW of 397.5 nm UV output is achieved, leading to a doubling efficiency of 35.9%. The generated UV radiation has potential applications in the fields of quantum physics

preprint2016arXiv

Improved Sufficient Conditions for Exact Convex Relaxation of Storage-Concerned ED

To avoid simultaneous charging and discharging of storages, complementarity constraints are introduced to storage-concerned economic dispatch (ED), which makes the problem non-convex. This letter concerns the conditions under which the convex relaxation of storage-concerned ED with complementarity constraints is exact. Two new sufficient conditions are proposed, proved and verified to significantly reduce the conservatism of recent results [3], [4].

preprint2016arXiv

Improvement of vacuum squeezing resonant on the rubidium D1 line at 795 nm

We report on efficient generation of second harmonic laser and single-mode vacuum squeezed light of 795 nm with periodically poled KTiOPO4 (PPKTP) crystals. We achieved 111 mW of ultra-violet (UV) light at 397.5 nm from 191 mW of fundamental light with a PPKTP crystal in a doubling cavity, corresponding to a conversion efficiency of 58.1%. Using the UV light to pump an optical parametric oscillator with a PPKTP crystal, we realized -5.6 dB of a maximum squeezing. We analyzed the pump power dependence of the squeezing level and concluded that the UV light induced losses limit the improvement of the squeezing level. The generated squeezed light has huge potential application in quantum memory and ultra-precise measurement.

preprint2015arXiv

General Rank Multiuser Downlink Beamforming With Shaping Constraints Using Real-valued OSTBC

In this paper we consider optimal multiuser downlink beamforming in the presence of a massive number of arbitrary quadratic shaping constraints. We combine beamforming with full-rate high dimensional real-valued orthogonal space time block coding (OSTBC) to increase the number of beamforming weight vectors and associated degrees of freedom in the beamformer design. The original multi-constraint beamforming problem is converted into a convex optimization problem using semidefinite relaxation (SDR) which can be solved efficiently. In contrast to conventional (rank-one) beamforming approaches in which an optimal beamforming solution can be obtained only when the SDR solution (after rank reduction) exhibits the rank-one property, in our approach optimality is guaranteed when a rank of eight is not exceeded. We show that our approach can incorporate up to 79 additional shaping constraints for which an optimal beamforming solution is guaranteed as compared to a maximum of two additional constraints that bound the conventional rank-one downlink beamforming designs. Simulation results demonstrate the flexibility of our proposed beamformer design.

Xin Wen

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Vision Foundation Models as Generalist Tokenizers for Image Generation

3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow

Learning Deep Implicit Functions for 3D Shapes with Dynamic Code Clouds

PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-step Point Moving Paths

Enhancement of spin noise spectroscopy of rubidium atomic ensemble by using of the polarization squeezed light

Distilling Visual Priors from Self-Supervised Learning

Laser Intensity Noise Suppression for Preparing Audio-Frequency 795 nm Squeezed Vacuum State of Light at Rubidium D1 Line

Point Cloud Completion by Skip-attention Network with Hierarchical Folding

Comparison and characterization of efficient frequency doubling at 397.5 nm with PPKTP, LBO and BiBO crystals

Improved Sufficient Conditions for Exact Convex Relaxation of Storage-Concerned ED

Improvement of vacuum squeezing resonant on the rubidium D1 line at 795 nm

General Rank Multiuser Downlink Beamforming With Shaping Constraints Using Real-valued OSTBC