Source author record

Cheng Sun

Cheng Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision physics.optics math.CO Biological Physics Graphics Subcellular Processes

Catalog footprint

What is connected

14works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding

We propose OpenVoxel, a training-free algorithm for grouping and captioning sparse voxels for the open-vocabulary 3D scene understanding tasks. Given the sparse voxel rasterization (SVR) model obtained from multi-view images of a 3D scene, our OpenVoxel is able to produce meaningful groups that describe different objects in the scene. Also, by leveraging powerful Vision Language Models (VLMs) and Multi-modal Large Language Models (MLLMs), our OpenVoxel successfully build an informative scene map by captioning each group, enabling further 3D scene understanding tasks such as open-vocabulary segmentation (OVS) or referring expression segmentation (RES). Unlike previous methods, our method is training-free and does not introduce embeddings from a CLIP/BERT text encoder. Instead, we directly proceed with text-to-text search using MLLMs. Through extensive experiments, our method demonstrates superior performance compared to recent studies, particularly in complex referring expression segmentation (RES) tasks. The code will be open.

preprint2022arXiv

A bijection between the sets of $(a,b,b^2)$-Generalized Motzkin paths avoiding $\mathbf{uvv}$-patterns and $\mathbf{uvu}$-patterns

A generalized Motzkin path, called G-Motzkin path for short, of length $n$ is a lattice path from $(0, 0)$ to $(n, 0)$ in the first quadrant of the XOY-plane that consists of up steps $\mathbf{u}=(1, 1)$, down steps $\mathbf{d}=(1, -1)$, horizontal steps $\mathbf{h}=(1, 0)$ and vertical steps $\mathbf{v}=(0, -1)$. An $(a,b,c)$-G-Motzkin path is a weighted G-Motzkin path such that the $\mathbf{u}$-steps, $\mathbf{h}$-steps, $\mathbf{v}$-steps and $\mathbf{d}$-steps are weighted respectively by $1, a, b$ and $c$. Let $τ$ be a word on $\{\mathbf{u}, \mathbf{d}, \mathbf{v}, \mathbf{d}\}$, denoted by $\mathcal{G}_n^τ(a,b,c)$ the set of $τ$-avoiding $(a,b,c)$-G-Motzkin paths of length $n$ for a pattern $τ$. In this paper, we consider the $\mathbf{uvv}$-avoiding $(a,b,c)$-G-Motzkin paths and provide a direct bijection $σ$ between $\mathcal{G}_n^{\mathbf{uvv}}(a,b,b^2)$ and $\mathcal{G}_n^{\mathbf{uvu}}(a,b,b^2)$. Finally, the set of fixed points of $σ$ is also described and counted.

preprint2022arXiv

Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction

We present a super-fast convergence approach to reconstructing the per-scene radiance field from a set of images that capture the scene with known poses. This task, which is often applied to novel view synthesis, is recently revolutionized by Neural Radiance Field (NeRF) for its state-of-the-art quality and flexibility. However, NeRF and its variants require a lengthy training time ranging from hours to days for a single scene. In contrast, our approach achieves NeRF-comparable quality and converges rapidly from scratch in less than 15 minutes with a single GPU. We adopt a representation consisting of a density voxel grid for scene geometry and a feature voxel grid with a shallow network for complex view-dependent appearance. Modeling with explicit and discretized volume representations is not new, but we propose two simple yet non-trivial techniques that contribute to fast convergence speed and high-quality output. First, we introduce the post-activation interpolation on voxel density, which is capable of producing sharp surfaces in lower grid resolution. Second, direct voxel density optimization is prone to suboptimal geometry solutions, so we robustify the optimization process by imposing several priors. Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF's quality, yet it only takes about 15 minutes to train from scratch for a new scene.

preprint2022arXiv

Improved Direct Voxel Grid Optimization for Radiance Fields Reconstruction

In this technical report, we improve the DVGO framework (called DVGOv2), which is based on Pytorch and uses the simplest dense grid representation. First, we re-implement part of the Pytorch operations with cuda, achieving 2-3x speedup. The cuda extension is automatically compiled just in time. Second, we extend DVGO to support Forward-facing and Unbounded Inward-facing capturing. Third, we improve the space time complexity of the distortion loss proposed by mip-NeRF 360 from O(N^2) to O(N). The distortion loss improves our quality and training speed. Our efficient implementation could allow more future works to benefit from the loss.

preprint2022arXiv

Multiview Regenerative Morphing with Dual Flows

This paper aims to address a new task of image morphing under a multiview setting, which takes two sets of multiview images as the input and generates intermediate renderings that not only exhibit smooth transitions between the two input sets but also ensure visual consistency across different views at any transition state. To achieve this goal, we propose a novel approach called Multiview Regenerative Morphing that formulates the morphing process as an optimization to solve for rigid transformation and optimal-transport interpolation. Given the multiview input images of the source and target scenes, we first learn a volumetric representation that models the geometry and appearance for each scene to enable the rendering of novel views. Then, the morphing between the two scenes is obtained by solving optimal transport between the two volumetric representations in Wasserstein metrics. Our approach does not rely on user-specified correspondences or 2D/3D input meshes, and we do not assume any predefined categories of the source and target scenes. The proposed view-consistent interpolation scheme directly works on multiview images to yield a novel and visually plausible effect of multiview free-form morphing.

preprint2022arXiv

Self-supervised 360$^{\circ}$ Room Layout Estimation

We present the first self-supervised method to train panoramic room layout estimation models without any labeled data. Unlike per-pixel dense depth that provides abundant correspondence constraints, layout representation is sparse and topological, hindering the use of self-supervised reprojection consistency on images. To address this issue, we propose Differentiable Layout View Rendering, which can warp a source image to the target camera pose given the estimated layout from the target image. As each rendered pixel is differentiable with respect to the estimated layout, we can now train the layout estimation model by minimizing reprojection loss. Besides, we introduce regularization losses to encourage Manhattan alignment, ceiling-floor alignment, cycle consistency, and layout stretch consistency, which further improve our predictions. Finally, we present the first self-supervised results on ZilloIndoor and MatterportLayout datasets. Our approach also shows promising solutions in data-scarce scenarios and active learning, which would have an immediate value in the real estate virtual tour software. Code is available at https://github.com/joshua049/Stereo-360-Layout.

preprint2022arXiv

SVMAC: Unsupervised 3D Human Pose Estimation from a Single Image with Single-view-multi-angle Consistency

Recovering 3D human pose from 2D joints is still a challenging problem, especially without any 3D annotation, video information, or multi-view information. In this paper, we present an unsupervised GAN-based model consisting of multiple weight-sharing generators to estimate a 3D human pose from a single image without 3D annotations. In our model, we introduce single-view-multi-angle consistency (SVMAC) to significantly improve the estimation performance. With 2D joint locations as input, our model estimates a 3D pose and a camera simultaneously. During training, the estimated 3D pose is rotated by random angles and the estimated camera projects the rotated 3D poses back to 2D. The 2D reprojections will be fed into weight-sharing generators to estimate the corresponding 3D poses and cameras, which are then mixed to impose SVMAC constraints to self-supervise the training process. The experimental results show that our method outperforms the state-of-the-art unsupervised methods on Human 3.6M and MPI-INF-3DHP. Moreover, qualitative results on MPII and LSP show that our method can generalize well to unknown data.

preprint2022arXiv

The $\mathbf{uvu}$-avoiding $(a,b,c)$-Generalized Motzkin paths with vertical steps: bijections and statistic enumerations

A generalized Motzkin path, called G-Motzkin path for short, of length $n$ is a lattice path from $(0, 0)$ to $(n, 0)$ in the first quadrant of the XOY-plane that consists of up steps $\mathbf{u}=(1, 1)$, down steps $\mathbf{d}=(1, -1)$, horizontal steps $\mathbf{h}=(1, 0)$ and vertical steps $\mathbf{v}=(0, -1)$. An $(a,b,c)$-G-Motzkin path is a weighted G-Motzkin path such that the $\mathbf{u}$-steps, $\mathbf{h}$-steps, $\mathbf{v}$-steps and $\mathbf{d}$-steps are weighted respectively by $1, a, b$ and $c$. In this paper, we first give bijections between the set of $\mathbf{uvu}$-avoiding $(a,b,b^2)$-G-Motzkin paths of length $n$ and the set of $(a,b)$-Schröder paths as well as the set of $(a+b,b)$-Dyck paths of length $2n$, between the set of $\{\mathbf{uvu, uu}\}$-avoiding $(a,b,b^2)$-G-Motzkin paths of length $n$ and the set of $(a+b,ab)$-Motzkin paths of length $n$, between the set of $\{\mathbf{uvu,uu}\}$-avoiding $(a,b,b^2)$-G-Motzkin paths of length $n+1$ beginning with an $\mathbf{h}$-step weighted by $a$ and the set of $(a,b)$-Dyck paths of length $2n+2$. In the last section, we focus on the enumeration of statistics "number of $\mathbf{z}$-steps" for $\mathbf{z}\in \{\mathbf{u}, \mathbf{h}, \mathbf{v}, \mathbf{d}\}$ and "number of points" at given level in $\mathbf{uvu}$-avoiding G-Motzkin paths. These counting results are linked with Riordan arrays.

preprint2015arXiv

Ultrafast All-optical Modulation Exploiting the Vibrational Dynamic of Metallic Meta-atoms

Optical control over elementary molecular vibration establishes fundamental capabilities for exploiting the broad range of optical linear and nonlinear phenomena. However, experimental demonstration of the coherently driven molecular vibration remains a challenge task due to the weak optical force imposed on natural materials. Here we report the design of "meta-atom" that exhibits giant artificial optical nonlinearity. These "meta-atoms" support co-localized magnetic resonance at optical frequency and vibration resonance at GHz frequency with a deep-sub-diffraction-limit spatial confinement ($λ^2/100$). The coherent coupling of those two distinct resonances manifests a strong optical force, which is fundamentally different from the commonly studied form of radiation forces, the gradient forces, or photo-thermal induced deformation. It results in a giant third-order susceptibility $χ^{(3)}$ of $10^{-13}$ $m^2$/$V^2$, which is more than six orders of magnitude larger than that found in natural materials. The all-optical modulation at the frequency well above 1 GHz has thus been demonstrated experimentally.

preprint2011arXiv

Construction of Chiral Metamaterial with a Helix Array

Here we report the designing of chiral metamaterial with metallic helix array. The effective electric and magnetic dipoles, which originate from the induced surface electric current upon illumination of incident light, are collinear at the resonant frequency. Consequently, for the circularly polarized incident light, negative refractive index is realized. Our design provides a unique approach to tune the optical properties by assembling helices, and demonstrates a different approach in exploring three- dimensional chiral metamaterial.

preprint2011arXiv

Direct measurement of the correlated dynamics of the protein-backbone and proximal waters of hydration in mechanically strained elastin

We report on the direct measurement of the correlation times of the protein backbone carbons and proximal waters of hydration in mechanically strained elastin by nuclear magnetic resonance methods. The experimental data indicate a decrease in the correlation times of the carbonyl carbons as the strain on the biopolymer is increased. These observations are in good agreement with short 4ns molecular dynamics simulations of (VPGVG)3, a well studied mimetic peptide of elastin. The experimental results also indicate a reduction in the correlation time of proximal waters of hydration with increasing strain applied to the elastomer. A simple model is suggested that correlates the increase in the motion of proximal waters of hydration to the increase in frequency of libration of the protein backbone that develops with increasing strain. Together, the reduction in the protein entropy accompanied with the increase in entropy of the proximal waters of hydration with increasing strain, support the notion that the source of elasticity is driven by an entropic mechanism arising from the change in entropy of the protein backbone.

preprint2011arXiv

Hiding a Realistic Object Using a Broadband Terahertz Invisibility Cloak

The invisibility cloak has been a long-standing dream for many researchers over the decades. The introduction of transformational optics has revitalized this field by providing a general method to design material distributions to hide the subject from detection. By transforming space and light propagation, a three-dimensional (3D) object is perceived as having a reduced number of dimensions, in the form of points, lines, and thin sheets, making it "undetectable" judging from the scattered field. Although a variety of cloaking devices have been reported at microwave and optical frequencies, the spectroscopically important Terahertz (THz) domain remains unexplored. Moreover, due to the difficulties in fabricating cloaking devices that are optically large in all three dimensions, hiding realistic 3D objects has yet to be demonstrated. Here, we report the first experimental demonstration of a 3D THz cloaking device fabricated using a scalable Projection Microstereolithography process. The cloak operates at a broad frequency range between 0.3 and 0.6 THz, and is placed over an α- lactose monohydrate absorber with rectangular shape. Characterized using angularresolved reflection THz time-domain spectroscopy (THz-TDS), the results indicate that the THz invisibility cloak has successfully concealed both the geometrical and spectroscopic signatures of the absorber, making it undetectable to the observer.

preprint2010arXiv

Three-Dimensional Cloaking Device Operates at Terahertz Frequencies

The invisibility cloak has been a long-standing dream for many researchers over the decades. By transforming space and light propagation, a three-dimensional (3D) object can be perceived as having reduced number of dimensions, in the form of points, lines, and thin sheets, making it "undetectable" judging from scattered field. Although a variety of cloaking devices have been reported at microwave and optical frequencies, the Terahertz (THz) domain remains unexplored. Moreover, it should be noted that all the previous experimental demonstrations are performed in a two-dimensional (2D) waveguide configuration. Although those works represent a critical step in validating the concept of the invisibility cloak, one would expect the cloaking device to be realized in 3D with the ability to cloak an object of realistic size. This requires the construction of an optically large cloaking device with features much smaller than the wavelength. Fabricating 3D structures with aspect ratio close to 100:1 is obviously a challenging task. Here, we report an experimental demonstration of a 3D THz ground plane cloak. Reflection terahertz time-domain spectroscopy (THz-TDS) was employed to characterize the cloaking samples. Two distinct reflection peaks can be clearly observed across a broad frequency range, which is caused by the reflection at the surface of the bump. The measured peak positions are consistent with the numerical simulation peak positions. By contrast, in the spectral map of the cloak sample, the wavefront is relatively smooth with a single peak.

preprint2009arXiv

Construction of Chiral Metamaterial with U-Shaped Resonator Assembly

Chiral structure can be applied to construct metamaterial with negative refractive index (NRI). In an assembly of double-layered metallic U-shaped resonators with two resonant frequencies wH and wL, the effective induced electric and magnetic dipoles, which are contributed by the specific surface current distributions, are collinear at the same frequency. Consequently, for left circularly polarized light, NRI occurs at wH, whereas for right circularly polarized light it occurs at wL. Our design provides a new example to apply chiral structures to tune electromagnetic properties, and could be enlightening in exploring chiral metamaterials.

Cheng Sun

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding

A bijection between the sets of $(a,b,b^2)$-Generalized Motzkin paths avoiding $\mathbf{uvv}$-patterns and $\mathbf{uvu}$-patterns

Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction

Improved Direct Voxel Grid Optimization for Radiance Fields Reconstruction

Multiview Regenerative Morphing with Dual Flows

Self-supervised 360$^{\circ}$ Room Layout Estimation

SVMAC: Unsupervised 3D Human Pose Estimation from a Single Image with Single-view-multi-angle Consistency

The $\mathbf{uvu}$-avoiding $(a,b,c)$-Generalized Motzkin paths with vertical steps: bijections and statistic enumerations

Ultrafast All-optical Modulation Exploiting the Vibrational Dynamic of Metallic Meta-atoms

Construction of Chiral Metamaterial with a Helix Array

Direct measurement of the correlated dynamics of the protein-backbone and proximal waters of hydration in mechanically strained elastin

Hiding a Realistic Object Using a Broadband Terahertz Invisibility Cloak

Three-Dimensional Cloaking Device Operates at Terahertz Frequencies

Construction of Chiral Metamaterial with U-Shaped Resonator Assembly