Source author record

Xin Yu

Xin Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision hep-ph math.AP Machine Learning Artificial Intelligence eess.IV Human-Computer Interaction Multimedia Robotics Multiagent Systems Computational Engineering, Finance, and Science cond-mat.mes-hall cond-mat.mtrl-sci math.OC Neural and Evolutionary Computing physics.ao-ph physics.optics

Catalog footprint

What is connected

53works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ResiHMR: Residual-Limb Aware Single-Image 3D Human Mesh Recovery for Individuals with Limb Loss

Single-image human mesh recovery provides a compact 3D, person-centric representation that supports analysis, animation, AR and VR, rehabilitation, and human-computer interaction. However, prevailing systems impose an intact-limb prior and degrade on people with limb loss, because fixed-topology models cannot represent residual limbs. In this work, we present ResiHMR, a residual-limb aware framework for single-image 3D human modeling. ResiHMR adopts residual-limb keypoints and introduces two components: (i) a topology-adaptive Residual Anchor-Factor Optimization module that constrains estimation to the observed kinematic subgraph of anatomically valid structures, and (ii) a geometry-based Residual-Limb Reconstruction module that estimates residual-limb boundaries and convex limb-termination geometry. These components introduce topology-aware optimization and explicit termination geometry as tools for human mesh recovery under non-standard limb anatomy. Unlike joint-removal methods in a fixed topology, ResiHMR explicitly reconstructs residual-limb surfaces and aligns optimization with limb-loss topology, which better matches prosthetic biomechanics and real-world use. To the best of our knowledge, this is the first single-image HMR system that explicitly reconstructs residual-limb surfaces and performs topology-adaptive optimization for individuals with limb loss. On a curated dataset of real-world images with limb loss, ResiHMR improves reconstruction quality under both SMPLify-X and HSMR backbones, reducing intact-joint 2D MPJPE from 41.32 to 37.40 with SMPLify-X and residual-limb 2D MPJPE from 73.61 to 23.19 with HSMR.

preprint2025arXiv

Mirage: One-Step Video Diffusion for Photorealistic and Coherent Asset Editing in Driving Scenes

Vision-centric autonomous driving systems rely on diverse and scalable training data to achieve robust performance. While video object editing offers a promising path for data augmentation, existing methods often struggle to maintain both high visual fidelity and temporal coherence. In this work, we propose \textbf{Mirage}, a one-step video diffusion model for photorealistic and coherent asset editing in driving scenes. Mirage builds upon a text-to-video diffusion prior to ensure temporal consistency across frames. However, 3D causal variational autoencoders often suffer from degraded spatial fidelity due to compression, and directly passing 3D encoder features to decoder layers breaks temporal causality. To address this, we inject temporally agnostic latents from a pretrained 2D encoder into the 3D decoder to restore detail while preserving causal structures. Furthermore, because scene objects and inserted assets are optimized under different objectives, their Gaussians exhibit a distribution mismatch that leads to pose misalignment. To mitigate this, we introduce a two-stage data alignment strategy combining coarse 3D alignment and fine 2D refinement, thereby improving alignment and providing cleaner supervision. Extensive experiments demonstrate that Mirage achieves high realism and temporal consistency across diverse editing scenarios. Beyond asset editing, Mirage can also generalize to other video-to-video translation tasks, serving as a reliable baseline for future research. Our code is available at https://github.com/wm-research/mirage.

preprint2024arXiv

EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation

Generating vivid and diverse 3D co-speech gestures is crucial for various applications in animating virtual avatars. While most existing methods can generate gestures from audio directly, they usually overlook that emotion is one of the key factors of authentic co-speech gesture generation. In this work, we propose EmotionGesture, a novel framework for synthesizing vivid and diverse emotional co-speech 3D gestures from audio. Considering emotion is often entangled with the rhythmic beat in speech audio, we first develop an Emotion-Beat Mining module (EBM) to extract the emotion and audio beat features as well as model their correlation via a transcript-based visual-rhythm alignment. Then, we propose an initial pose based Spatial-Temporal Prompter (STP) to generate future gestures from the given initial poses. STP effectively models the spatial-temporal correlations between the initial poses and the future gestures, thus producing the spatial-temporal coherent pose prompt. Once we obtain pose prompts, emotion, and audio beat features, we will generate 3D co-speech gestures through a transformer architecture. However, considering the poses of existing datasets often contain jittering effects, this would lead to generating unstable gestures. To address this issue, we propose an effective objective function, dubbed Motion-Smooth Loss. Specifically, we model motion offset to compensate for jittering ground-truth by forcing gestures to be smooth. Last, we present an emotion-conditioned VAE to sample emotion features, enabling us to generate diverse emotional results. Extensive experiments demonstrate that our framework outperforms the state-of-the-art, achieving vivid and diverse emotional co-speech 3D gestures. Our code and dataset will be released at the project page: https://xingqunqi-lab.github.io/Emotion-Gesture-Web/

preprint2023arXiv

ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) has achieved promising results in recent years. However, most existing reinforcement learning methods require a large amount of data for model training. In addition, data-efficient reinforcement learning requires the construction of strong inductive biases, which are ignored in the current MARL approaches. Inspired by the symmetry phenomenon in multi-agent systems, this paper proposes a framework for exploiting prior knowledge by integrating data augmentation and a well-designed consistency loss into the existing MARL methods. In addition, the proposed framework is model-agnostic and can be applied to most of the current MARL algorithms. Experimental tests on multiple challenging tasks demonstrate the effectiveness of the proposed framework. Moreover, the proposed framework is applied to a physical multi-robot testbed to show its superiority.

preprint2023arXiv

Getting Away with More Network Pruning: From Sparsity to Geometry and Linear Regions

One surprising trait of neural networks is the extent to which their connections can be pruned with little to no effect on accuracy. But when we cross a critical level of parameter sparsity, pruning any further leads to a sudden drop in accuracy. This drop plausibly reflects a loss in model complexity, which we aim to avoid. In this work, we explore how sparsity also affects the geometry of the linear regions defined by a neural network, and consequently reduces the expected maximum number of linear regions based on the architecture. We observe that pruning affects accuracy similarly to how sparsity affects the number of linear regions and our proposed bound for the maximum number. Conversely, we find out that selecting the sparsity across layers to maximize our bound very often improves accuracy in comparison to pruning as much with the same sparsity in all layers, thereby providing us guidance on where to prune.

preprint2023arXiv

Leveraging Partial Symmetry for Multi-Agent Reinforcement Learning

Incorporating symmetry as an inductive bias into multi-agent reinforcement learning (MARL) has led to improvements in generalization, data efficiency, and physical consistency. While prior research has succeeded in using perfect symmetry prior, the realm of partial symmetry in the multi-agent domain remains unexplored. To fill in this gap, we introduce the partially symmetric Markov game, a new subclass of the Markov game. We then theoretically show that the performance error introduced by utilizing symmetry in MARL is bounded, implying that the symmetry prior can still be useful in MARL even in partial symmetry situations. Motivated by this insight, we propose the Partial Symmetry Exploitation (PSE) framework that is able to adaptively incorporate symmetry prior in MARL under different symmetry-breaking conditions. Specifically, by adaptively adjusting the exploitation of symmetry, our framework is able to achieve superior sample efficiency and overall performance of MARL algorithms. Extensive experiments are conducted to demonstrate the superior performance of the proposed framework over baselines. Finally, we implement the proposed framework in real-world multi-robot testbed to show its superiority.

preprint2023arXiv

MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views

We propose MHR-Net, a novel method for recovering Non-Rigid Shapes from Motion (NRSfM). MHR-Net aims to find a set of reasonable reconstructions for a 2D view, and it also selects the most likely reconstruction from the set. To deal with the challenging unsupervised generation of non-rigid shapes, we develop a new Deterministic Basis and Stochastic Deformation scheme in MHR-Net. The non-rigid shape is first expressed as the sum of a coarse shape basis and a flexible shape deformation, then multiple hypotheses are generated with uncertainty modeling of the deformation part. MHR-Net is optimized with reprojection loss on the basis and the best hypothesis. Furthermore, we design a new Procrustean Residual Loss, which reduces the rigid rotations between similar shapes and further improves the performance. Experiments show that MHR-Net achieves state-of-the-art reconstruction accuracy on Human3.6M, SURREAL and 300-VW datasets.

preprint2022arXiv

A Novel Position-based VR Online Shopping Recommendation System based on Optimized Collaborative Filtering Algorithm

This paper proposes a VR supermarket with an intelligent recommendation, which consists of three parts. The VR supermarket, the recommendation system, and the database. The VR supermarket provides a 360-degree virtual environment for users to move and interact in the virtual environment through VR devices. The recommendation system will make intelligent recommendations to the target users based on the data in the database. The intelligent recommendation system is developed based on item similarity (ICF), which solves the cold start problem of ICF. This allows VR supermarkets to present real-time recommendations in any situation. It not only makes up for the lack of user perception of item attributes in traditional online shopping systems but also VR Supermarket improves the shopping efficiency of users through the intelligent recommendation system. The application can be extended to enterprise-level systems, which adds new possibilities for users to do VR shopping at home.

preprint2022arXiv

Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image Matching

We address the problem of ground-to-satellite image geo-localization, that is, estimating the camera latitude, longitude and orientation (azimuth angle) by matching a query image captured at the ground level against a large-scale database with geotagged satellite images. Our prior arts treat the above task as pure image retrieval by selecting the most similar satellite reference image matching the ground-level query image. However, such an approach often produces coarse location estimates because the geotag of the retrieved satellite image only corresponds to the image center while the ground camera can be located at any point within the image. To further consolidate our prior research findings, we present a novel geometry-aware geo-localization method. Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image, once its coarse location and orientation have been determined. Moreover, we propose a new geometry-aware image retrieval pipeline to improve the coarse localization accuracy. Apart from a polar transform in our conference work, this new pipeline also maps satellite image pixels to the ground-level plane in the ground-view via a geometry-constrained projective transform to emphasize informative regions, such as road structures, for cross-view geo-localization. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our newly proposed framework. We also significantly improve the performance of coarse localization results compared to the state-of-the-art in terms of location recalls.

preprint2022arXiv

Characterizing Renal Structures with 3D Block Aggregate Transformers

Efficiently quantifying renal structures can provide distinct spatial context and facilitate biomarker discovery for kidney morphology. However, the development and evaluation of the transformer model to segment the renal cortex, medulla, and collecting system remains challenging due to data inefficiency. Inspired by the hierarchical structures in vision transformer, we propose a novel method using a 3D block aggregation transformer for segmenting kidney components on contrast-enhanced CT scans. We construct the first cohort of renal substructures segmentation dataset with 116 subjects under institutional review board (IRB) approval. Our method yields the state-of-the-art performance (Dice of 0.8467) against the baseline approach of 0.8308 with the data-efficient design. The Pearson R achieves 0.9891 between the proposed method and manual standards and indicates the strong correlation and reproducibility for volumetric analysis. We extend the proposed method to the public KiTS dataset, the method leads to improved accuracy compared to transformer-based approaches. We show that the 3D block aggregation transformer can achieve local communication between sequence representations without modifying self-attention, and it can serve as an accurate and efficient quantification tool for characterizing renal structures.

preprint2022arXiv

CVLNet: Cross-View Semantic Correspondence Learning for Video-based Camera Localization

This paper tackles the problem of Cross-view Video-based camera Localization (CVL). The task is to localize a query camera by leveraging information from its past observations, i.e., a continuous sequence of images observed at previous time stamps, and matching them to a large overhead-view satellite image. The critical challenge of this task is to learn a powerful global feature descriptor for the sequential ground-view images while considering its domain alignment with reference satellite images. For this purpose, we introduce CVLNet, which first projects the sequential ground-view images into an overhead view by exploring the ground-and-overhead geometric correspondences and then leverages the photo consistency among the projected images to form a global representation. In this way, the cross-view domain differences are bridged. Since the reference satellite images are usually pre-cropped and regularly sampled, there is always a misalignment between the query camera location and its matching satellite image center. Motivated by this, we propose estimating the query camera's relative displacement to a satellite image before similarity matching. In this displacement estimation process, we also consider the uncertainty of the camera location. For example, a camera is unlikely to be on top of trees. To evaluate the performance of the proposed method, we collect satellite images from Google Map for the KITTI dataset and construct a new cross-view video-based localization benchmark dataset, KITTI-CVL. Extensive experiments have demonstrated the effectiveness of video-based localization over single image-based localization and the superiority of each proposed module over other alternatives.

preprint2022arXiv

Gait Recognition with Mask-based Regularization

Most gait recognition methods exploit spatial-temporal representations from static appearances and dynamic walking patterns. However, we observe that many part-based methods neglect representations at boundaries. In addition, the phenomenon of overfitting on training data is relatively common in gait recognition, which is perhaps due to insufficient data and low-informative gait silhouettes. Motivated by these observations, we propose a novel mask-based regularization method named ReverseMask. By injecting perturbation on the feature map, the proposed regularization method helps convolutional architecture learn the discriminative representations and enhances generalization. Also, we design an Inception-like ReverseMask Block, which has three branches composed of a global branch, a feature dropping branch, and a feature scaling branch. Precisely, the dropping branch can extract fine-grained representations when partial activations are zero-outed. Meanwhile, the scaling branch randomly scales the feature map, keeping structural information of activations and preventing overfitting. The plug-and-play Inception-like ReverseMask block is simple and effective to generalize networks, and it also improves the performance of many state-of-the-art methods. Extensive experiments demonstrate that the ReverseMask regularization help baseline achieves higher accuracy and better generalization. Moreover, the baseline with Inception-like Block significantly outperforms state-of-the-art methods on the two most popular datasets, CASIA-B and OUMVLP. The source code will be released.

preprint2022arXiv

GaitGL: Learning Discriminative Global-Local Feature Representations for Gait Recognition

Existing gait recognition methods either directly establish Global Feature Representation (GFR) from original gait sequences or generate Local Feature Representation (LFR) from several local parts. However, GFR tends to neglect local details of human postures as the receptive fields become larger in the deeper network layers. Although LFR allows the network to focus on the detailed posture information of each local region, it neglects the relations among different local parts and thus only exploits limited local information of several specific regions. To solve these issues, we propose a global-local based gait recognition network, named GaitGL, to generate more discriminative feature representations. To be specific, a novel Global and Local Convolutional Layer (GLCL) is developed to take full advantage of both global visual information and local region details in each layer. GLCL is a dual-branch structure that consists of a GFR extractor and a mask-based LFR extractor. GFR extractor aims to extract contextual information, e.g., the relationship among various body parts, and the mask-based LFR extractor is presented to exploit the detailed posture changes of local regions. In addition, we introduce a novel mask-based strategy to improve the local feature extraction capability. Specifically, we design pairs of complementary masks to randomly occlude feature maps, and then train our mask-based LFR extractor on various occluded feature maps. In this manner, the LFR extractor will learn to fully exploit local information. Extensive experiments demonstrate that GaitGL achieves better performance than state-of-the-art gait recognition methods. The average rank-1 accuracy on CASIA-B, OU-MVLP, GREW and Gait3D is 93.6%, 98.7%, 68.0% and 63.8%, respectively, significantly outperforming the competing methods. The proposed method has won the first prize in two competitions: HID 2020 and HID 2021.

preprint2022arXiv

Geometry-Guided Street-View Panorama Synthesis from Satellite Imagery

This paper presents a new approach for synthesizing a novel street-view panorama given an overhead satellite image. Taking a small satellite image patch as input, our method generates a Google's omnidirectional street-view type panorama, as if it is captured from the same geographical location as the center of the satellite patch. Existing works tackle this task as an image generation problem which adopts generative adversarial networks to implicitly learn the cross-view transformations, while ignoring the domain relevance. In this paper, we propose to explicitly establish the geometric correspondences between the two-view images so as to facilitate the cross-view transformation learning. Specifically, we observe that when a 3D point in the real world is visible in both views, there is a deterministic mapping between the projected points in the two-view images given the height information of this 3D point. Motivated by this, we develop a novel Satellite to Street-view image Projection (S2SP) module which explicitly establishes such geometric correspondences and projects the satellite images to the street viewpoint. With these projected satellite images as network input, we next employ a generator to synthesize realistic street-view panoramas that are geometrically consistent with the satellite images. Our S2SP module is differentiable and the whole framework is trained in an end-to-end manner. Extensive experimental results on two cross-view benchmark datasets demonstrate that our method generates images that better respect the scene geometry than existing approaches.

preprint2022arXiv

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

Modeling temporal information for both detection and tracking in a unified framework has been proved a promising solution to video instance segmentation (VIS). However, how to effectively incorporate the temporal information into an online model remains an open problem. In this work, we propose a new online VIS paradigm named Instance As Identity (IAI), which models temporal information for both detection and tracking in an efficient way. In detail, IAI employs a novel identification module to predict identification number for tracking instances explicitly. For passing temporal information cross frame, IAI utilizes an association module which combines current features and past embeddings. Notably, IAI can be integrated with different image models. We conduct extensive experiments on three VIS benchmarks. IAI outperforms all the online competitors on YouTube-VIS-2019 (ResNet-101 43.7 mAP) and YouTube-VIS-2021 (ResNet-50 38.0 mAP). Surprisingly, on the more challenging OVIS, IAI achieves SOTA performance (20.6 mAP). Code is available at https://github.com/zfonemore/IAI

preprint2022arXiv

Learning Implicit Body Representations from Double Diffusion Based Neural Radiance Fields

In this paper, we present a novel double diffusion based neural radiance field, dubbed DD-NeRF, to reconstruct human body geometry and render the human body appearance in novel views from a sparse set of images. We first propose a double diffusion mechanism to achieve expressive representations of input images by fully exploiting human body priors and image appearance details at two levels. At the coarse level, we first model the coarse human body poses and shapes via an unclothed 3D deformable vertex model as guidance. At the fine level, we present a multi-view sampling network to capture subtle geometric deformations and image detailed appearances, such as clothing and hair, from multiple input views. Considering the sparsity of the two level features, we diffuse them into feature volumes in the canonical space to construct neural radiance fields. Then, we present a signed distance function (SDF) regression network to construct body surfaces from the diffused features. Thanks to our double diffused representations, our method can even synthesize novel views of unseen subjects. Experiments on various datasets demonstrate that our approach outperforms the state-of-the-art in both geometric reconstruction and novel view synthesis.

preprint2022arXiv

Pro-UIGAN: Progressive Face Hallucination from Occluded Thumbnails

In this paper, we study the task of hallucinating an authentic high-resolution (HR) face from an occluded thumbnail. We propose a multi-stage Progressive Upsampling and Inpainting Generative Adversarial Network, dubbed Pro-UIGAN, which exploits facial geometry priors to replenish and upsample (8*) the occluded and tiny faces (16*16 pixels). Pro-UIGAN iteratively (1) estimates facial geometry priors for low-resolution (LR) faces and (2) acquires non-occluded HR face images under the guidance of the estimated priors. Our multi-stage hallucination network super-resolves and inpaints occluded LR faces in a coarse-to-fine manner, thus reducing unwanted blurriness and artifacts significantly. Specifically, we design a novel cross-modal transformer module for facial priors estimation, in which an input face and its landmark features are formulated as queries and keys, respectively. Such a design encourages joint feature learning across the input facial and landmark features, and deep feature correspondences will be discovered by attention. Thus, facial appearance features and facial geometry priors are learned in a mutual promotion manner. Extensive experiments demonstrate that our Pro-UIGAN achieves visually pleasing HR faces, reaching superior performance in downstream tasks, i.e., face alignment, face parsing, face recognition and expression classification, compared with other state-of-the-art (SotA) methods.

preprint2022arXiv

Pseudo-Label Guided Multi-Contrast Generalization for Non-Contrast Organ-Aware Segmentation

Non-contrast computed tomography (NCCT) is commonly acquired for lung cancer screening, assessment of general abdominal pain or suspected renal stones, trauma evaluation, and many other indications. However, the absence of contrast limits distinguishing organ in-between boundaries. In this paper, we propose a novel unsupervised approach that leverages pairwise contrast-enhanced CT (CECT) context to compute non-contrast segmentation without ground-truth label. Unlike generative adversarial approaches, we compute the pairwise morphological context with CECT to provide teacher guidance instead of generating fake anatomical context. Additionally, we further augment the intensity correlations in 'organ-specific' settings and increase the sensitivity to organ-aware boundary. We validate our approach on multi-organ segmentation with paired non-contrast & contrast-enhanced CT scans using five-fold cross-validation. Full external validations are performed on an independent non-contrast cohort for aorta segmentation. Compared with current abdominal organs segmentation state-of-the-art in fully supervised setting, our proposed pipeline achieves a significantly higher Dice by 3.98% (internal multi-organ annotated), and 8.00% (external aorta annotated) for abdominal organs segmentation. The code and pretrained models are publicly available at https://github.com/MASILab/ContrastMix.

preprint2022arXiv

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Point cloud sequences are irregular and unordered in the spatial dimension while exhibiting regularities and order in the temporal dimension. Therefore, existing grid based convolutions for conventional video processing cannot be directly applied to spatio-temporal modeling of raw point cloud sequences. In this paper, we propose a point spatio-temporal (PST) convolution to achieve informative representations of point cloud sequences. The proposed PST convolution first disentangles space and time in point cloud sequences. Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension. Furthermore, we incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner. Extensive experiments on widely-used 3D action recognition and 4D semantic segmentation datasets demonstrate the effectiveness of PSTNet to model point cloud sequences.

preprint2022arXiv

Realization of a photonic topological insulator in Kagome crystals at terahertz wavelengths

Topological systems are inherently robust to disorder and continuous perturbations, resulting in dissipation-free edge transport of electrons in quantum solids, or reflectionless guiding of photons and phonons in classical wave systems characterized by topological invariants. Despite considerable efforts, direct experimental demonstration of theoretically predicted robust, lossless energy transport in topological insulators operating at terahertz frequencies is needed further investigations to shed affirmative light on the unique properties enabled by topological protection. Here, we introduce Kagome lattice that exhibits a new class of symmetry-protected topological phases with very low Berry curvature but nontrivial bulk polarization, and fabricate an optical topological insulator that provide the valley hall effect. Theoretical analysis show that four type edge states can be obtained. Measurements of THz-TDs with high time-resolution demonstrate that terahertz wave propagating along the straight topological edge and Z-shape edge with sharp turns have almost same high transmission in 0.440 THz to 0.457 THz domain range. Those results quantitatively illustrate the suppression of backscattering due to the non-trivial topology of the structure. The THz-TDs measurement yields amplitude and phase information, showing significant advantage compared to general broadband infrared, single wavelength continuous-wave THz measurements and visible spectroscopy. It allows further exploration of the effective refractive index, group velocity and dispersion relations of edge states. Our work offers possibilities for advanced control of the propagation and manipulation of THz waves, and facilitates the applications including sixth-generation (6G) wireless communication, terahertz integrated circuits, and interconnects for intrachip and interchip communication.

preprint2022arXiv

RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning. In this paper, we introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data. Specifically, we first map the feature of each mode to a lower dimensional feature vector, and adopt mutual information minimization as a regularizer to reduce the redundancy between appearance features from RGB and geometric features from depth. We then perform multi-stage cascaded learning to impose the mutual information minimization constraint at every stage of the network. Extensive experiments on benchmark RGB-D saliency datasets illustrate the effectiveness of our framework. Further, to prosper the development of this field, we contribute the largest (7x larger than NJU2K) dataset, which contains 15,625 image pairs with high quality polygon-/scribble-/object-/instance-/rank-level annotations. Based on these rich labels, we additionally construct four new benchmarks with strong baselines and observe some interesting phenomena, which can motivate future model design. Source code and dataset are available at "https://github.com/JingZhang617/cascaded_rgbd_sod".

preprint2022arXiv

Surrogate Neural Network Model for Sensitivity Analysis and Uncertainty Quantification of the Mechanical Behavior in the Optical Lens-Barrel Assembly

Surrogate neural network-based models have been lately trained and used in a variety of science and engineering applications where the number of evaluations of a target function is limited by execution time. In cell phone camera systems, various errors, such as interferences at the lens-barrel and lens-lens interfaces and axial, radial, and tilt misalignments, accumulate and alter profile of the lenses in a stochastic manner which ultimately changes optical focusing properties. Nonlinear finite element analysis of the stochastic mechanical behavior of lenses due to the interference fits is used on high-performance computing (HPC) to generate sufficient training and testing data for subsequent deep learning. Once properly trained and validated, the surrogate neural network model enabled accurate and almost instant evaluations of millions of function evaluations providing the final lens profiles. This computational model, enhanced by artificial intelligence, enabled us to efficiently perform Monte-Carlo analysis for sensitivity and uncertainty quantification of the final lens profile to various interferences. It can be further coupled with an optical analysis to perform ray tracing and analyze the focal properties of the lens module. Moreover, it can provide a valuable tool for optimizing tolerance design and intelligent components matching for many similar press-fit assembly processes.

preprint2022arXiv

The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks

Neural networks tend to achieve better accuracy with training if they are larger -- even if the resulting models are overparameterized. Nevertheless, carefully removing such excess parameters before, during, or after training may also produce models with similar or even improved accuracy. In many cases, that can be curiously achieved by heuristics as simple as removing a percentage of the weights with the smallest absolute value -- even though magnitude is not a perfect proxy for weight relevance. With the premise that obtaining significantly better performance from pruning depends on accounting for the combined effect of removing multiple weights, we revisit one of the classic approaches for impact-based pruning: the Optimal Brain Surgeon(OBS). We propose a tractable heuristic for solving the combinatorial extension of OBS, in which we select weights for simultaneous removal, as well as a systematic update of the remaining weights. Our selection method outperforms other methods under high sparsity, and the weight update is advantageous even when combined with the other methods.

preprint2022arXiv

Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing

With the rapid development of mobile devices, modern widely-used mobile phones typically allow users to capture 4K resolution (i.e., ultra-high-definition) images. However, for image demoireing, a challenging task in low-level vision, existing works are generally carried out on low-resolution or synthetic images. Hence, the effectiveness of these methods on 4K resolution images is still unknown. In this paper, we explore moire pattern removal for ultra-high-definition images. To this end, we propose the first ultra-high-definition demoireing dataset (UHDM), which contains 5,000 real-world 4K resolution image pairs, and conduct a benchmark study on current state-of-the-art methods. Further, we present an efficient baseline model ESDNet for tackling 4K moire images, wherein we build a semantic-aligned scale-aware module to address the scale variation of moire patterns. Extensive experiments manifest the effectiveness of our approach, which outperforms state-of-the-art methods by a large margin while being much more lightweight. Code and dataset are available at https://xinyu-andy.github.io/uhdm-page.

preprint2022arXiv

Video Demoireing with Relation-Based Temporal Consistency

Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras. Considering the increasing demands for capturing videos, we study how to remove such undesirable moire patterns in videos, namely video demoireing. To this end, we introduce the first hand-held video demoireing dataset with a dedicated data collection pipeline to ensure spatial and temporal alignments of captured data. Further, a baseline video demoireing model with implicit feature space alignment and selective feature aggregation is developed to leverage complementary information from nearby frames to improve frame-level video demoireing. More importantly, we propose a relation-based temporal consistency loss to encourage the model to learn temporal consistency priors directly from ground-truth reference videos, which facilitates producing temporally consistent predictions and effectively maintains frame-level qualities. Extensive experiments manifest the superiority of our model. Code is available at \url{https://daipengwa.github.io/VDmoire_ProjectPage/}.

preprint2021arXiv

ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring

Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions. In order to utilize neighboring sharp patches, typical methods rely mainly on homography or optical flows to spatially align neighboring blurry frames. However, such explicit approaches are less effective in the presence of fast motions with large pixel displacements. In this work, we propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space. To construct distant pixel correspondences, our model builds a correlation volume pyramid among all the pixel-pairs between neighboring frames. To enhance the features of the reference frame, we design a correlative aggregation module that maximizes the pixel-pair correlations with its neighbors based on the volume pyramid. Finally, we feed the aggregated features into a reconstruction module to obtain the restored frame. We design a generative adversarial paradigm to optimize the model progressively. Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) Dataset for Video Deblurring (HFR-DVD). Quantitative and qualitative experiments show that our model performs favorably on both datasets against previous state-of-the-art methods, confirming the benefit of modeling all-range spatial correspondence for video deblurring.

preprint2021arXiv

Iterative Optimisation with an Innovation CNN for Pose Refinement

Object pose estimation from a single RGB image is a challenging problem due to variable lighting conditions and viewpoint changes. The most accurate pose estimation networks implement pose refinement via reprojection of a known, textured 3D model, however, such methods cannot be applied without high quality 3D models of the observed objects. In this work we propose an approach, namely an Innovation CNN, to object pose estimation refinement that overcomes the requirement for reprojecting a textured 3D model. Our approach improves initial pose estimation progressively by applying the Innovation CNN iteratively in a stochastic gradient descent (SGD) framework. We evaluate our method on the popular LINEMOD and Occlusion LINEMOD datasets and obtain state-of-the-art performance on both datasets.

preprint2021arXiv

Modeling the Probabilistic Distribution of Unlabeled Data forOne-shot Medical Image Segmentation

Existing image segmentation networks mainly leverage large-scale labeled datasets to attain high accuracy. However, labeling medical images is very expensive since it requires sophisticated expert knowledge. Thus, it is more desirable to employ only a few labeled data in pursuing high segmentation performance. In this paper, we develop a data augmentation method for one-shot brain magnetic resonance imaging (MRI) image segmentation which exploits only one labeled MRI image (named atlas) and a few unlabeled images. In particular, we propose to learn the probability distributions of deformations (including shapes and intensities) of different unlabeled MRI images with respect to the atlas via 3D variational autoencoders (VAEs). In this manner, our method is able to exploit the learned distributions of image deformations to generate new authentic brain MRI images, and the number of generated samples will be sufficient to train a deep segmentation network. Furthermore, we introduce a new standard segmentation benchmark to evaluate the generalization performance of a segmentation network through a cross-dataset setting (collected from different sources). Extensive experiments demonstrate that our method outperforms the state-of-the-art one-shot medical segmentation methods. Our code has been released at https://github.com/dyh127/Modeling-the-Probabilistic-Distribution-of-Unlabeled-Data.

preprint2020arXiv

6DoF Object Pose Estimation via Differentiable Proxy Voting Loss

Estimating a 6DOF object pose from a single image is very challenging due to occlusions or textureless appearances. Vector-field based keypoint voting has demonstrated its effectiveness and superiority on tackling those issues. However, direct regression of vector-fields neglects that the distances between pixels and keypoints also affect the deviations of hypotheses dramatically. In other words, small errors in direction vectors may generate severely deviated hypotheses when pixels are far away from a keypoint. In this paper, we aim to reduce such errors by incorporating the distances between pixels and keypoints into our objective. To this end, we develop a simple yet effective differentiable proxy voting loss (DPVL) which mimics the hypothesis selection in the voting procedure. By exploiting our voting loss, we are able to train our network in an end-to-end manner. Experiments on widely used datasets, i.e., LINEMOD and Occlusion LINEMOD, manifest that our DPVL improves pose estimation performance significantly and speeds up the training convergence.

preprint2020arXiv

Copy and Paste GAN: Face Hallucination from Shaded Thumbnails

Existing face hallucination methods based on convolutional neural networks (CNN) have achieved impressive performance on low-resolution (LR) faces in a normal illumination condition. However, their performance degrades dramatically when LR faces are captured in low or non-uniform illumination conditions. This paper proposes a Copy and Paste Generative Adversarial Network (CPGAN) to recover authentic high-resolution (HR) face images while compensating for low and non-uniform illumination. To this end, we develop two key components in our CPGAN: internal and external Copy and Paste nets (CPnets). Specifically, our internal CPnet exploits facial information residing in the input image to enhance facial details; while our external CPnet leverages an external HR face for illumination compensation. A new illumination compensation loss is thus developed to capture illumination from the external guided face image effectively. Furthermore, our method offsets illumination and upsamples facial details alternately in a coarse-to-fine fashion, thus alleviating the correspondence ambiguity between LR inputs and external HR inputs. Extensive experiments demonstrate that our method manifests authentic HR face images in a uniform illumination condition and outperforms state-of-the-art methods qualitatively and quantitatively.

preprint2020arXiv

Face Hallucination with Finishing Touches

Obtaining a high-quality frontal face image from a low-resolution (LR) non-frontal face image is primarily important for many facial analysis applications. However, mainstreams either focus on super-resolving near-frontal LR faces or frontalizing non-frontal high-resolution (HR) faces. It is desirable to perform both tasks seamlessly for daily-life unconstrained face images. In this paper, we present a novel Vivid Face Hallucination Generative Adversarial Network (VividGAN) for simultaneously super-resolving and frontalizing tiny non-frontal face images. VividGAN consists of coarse-level and fine-level Face Hallucination Networks (FHnet) and two discriminators, i.e., Coarse-D and Fine-D. The coarse-level FHnet generates a frontal coarse HR face and then the fine-level FHnet makes use of the facial component appearance prior, i.e., fine-grained facial components, to attain a frontal HR face image with authentic details. In the fine-level FHnet, we also design a facial component-aware module that adopts the facial geometry guidance as clues to accurately align and merge the frontal coarse HR face and prior information. Meanwhile, two-level discriminators are designed to capture both the global outline of a face image as well as detailed facial characteristics. The Coarse-D enforces the coarsely hallucinated faces to be upright and complete while the Fine-D focuses on the fine hallucinated ones for sharper details. Extensive experiments demonstrate that our VividGAN achieves photo-realistic frontal HR faces, reaching superior performance in downstream tasks, i.e., face recognition and expression classification, compared with other state-of-the-art methods.

preprint2020arXiv

Learning Object Relation Graph and Tentative Policy for Visual Navigation

Target-driven visual navigation aims at navigating an agent towards a given target based on the observation of the agent. In this task, it is critical to learn informative visual representation and robust navigation policy. Aiming to improve these two components, this paper proposes three complementary techniques, object relation graph (ORG), trial-driven imitation learning (IL), and a memory-augmented tentative policy network (TPN). ORG improves visual representation learning by integrating object relationships, including category closeness and spatial correlations, e.g., a TV usually co-occurs with a remote spatially. Both Trial-driven IL and TPN underlie robust navigation policy, instructing the agent to escape from deadlock states, such as looping or being stuck. Specifically, trial-driven IL is a type of supervision used in policy network training, while TPN, mimicking the IL supervision in unseen environment, is applied in testing. Experiment in the artificial environment AI2-Thor validates that each of the techniques is effective. When combined, the techniques bring significantly improvement over baseline methods in navigation effectiveness and efficiency in unseen environments. We report 22.8% and 23.5% increase in success rate and Success weighted by Path Length (SPL), respectively. The code is available at https://github.com/xiaobaishu0097/ECCV-VN.git.

preprint2020arXiv

LyRN (Lyapunov Reaching Network): A Real-Time Closed Loop approach from Monocular Vision

We propose a closed-loop, multi-instance control algorithm for visually guided reaching based on novel learning principles. A control Lyapunov function methodology is used to design a reaching action for a complex multi-instance task in the case where full state information (poses of all potential reaching points) is available. The proposed algorithm uses monocular vision and manipulator joint angles as the input to a deep convolution neural network to predict the value of the control Lyapunov function (cLf) and corresponding velocity control. The resulting network output is used in real-time as visual control for the grasping task with the multi-instance capability emerging naturally from the design of the control Lyapunov function. We demonstrate the proposed algorithm grasping mugs (textureless and symmetric objects) on a table-top from an over-the-shoulder monocular RGB camera. The manipulator dynamically converges to the best-suited target among multiple identical instances from any random initial pose within the workspace. The system trained with only simulated data is able to achieve 90.3% grasp success rate in the real-world experiments with up to 85Hz closed-loop control on one GTX 1080Ti GPU and significantly outperforms a Pose-Based-Visual-Servo (PBVS) grasping system adapted from a state-of-the-art single shot RGB 6D pose estimation algorithm. A key contribution of the paper is the inclusion of a first-order differential constraint associated with the cLf as a regularisation term during learning, and we provide evidence that this leads to more robust and reliable reaching/grasping performance than vanilla regression on general control inputs.

preprint2020arXiv

Transferring Cross-domain Knowledge for Video Sign Language Recognition

Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation. It requires models to recognize isolated sign words from videos. However, annotating WSLR data needs expert knowledge, thus limiting WSLR dataset acquisition. On the contrary, there are abundant subtitled sign news videos on the internet. Since these videos have no word-level annotation and exhibit a large domain gap from isolated signs, they cannot be directly used for training WSLR models. We observe that despite the existence of a large domain gap, isolated and news signs share the same visual concepts, such as hand gestures and body movements. Motivated by this observation, we propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them. To this end, we extract news signs using a base WSLR model, and then design a classifier jointly trained on news and isolated signs to coarsely align these two domain features. In order to learn domain-invariant features within each class and suppress domain-specific features, our method further resorts to an external memory to store the class centroids of the aligned news signs. We then design a temporal attention based on the learnt descriptor to improve recognition performance. Experimental results on standard WSLR datasets show that our method outperforms previous state-of-the-art methods significantly. We also demonstrate the effectiveness of our method on automatically localizing signs from sign news, achieving 28.1 for AP@0.5.

preprint2020arXiv

Using observed bacteria concentration and modeled transit time under an analytical framework to estimate overall removal rate of fecal coliform in an estuary

Abundance of fecal coliform (FC) is widely used to indicate the potential presence of pathogens, the No.1 cause of water impairments in the U.S. Despite extensive monitoring efforts, assessing and modeling FC pollution still faces challenges, largely owing to the uncertainties in estimation of overall removal rate (K). This study proposes an alternative method to estimate in situ K by combining observational data, hydrodynamic simulation, and analytical solution. The method requires the observed spatial distribution of FC concentration along an estuarine channel and the numerically-simulated transit time, and converts the K estimation from a temporal problem into a spatial problem, potentially reducing survey duration, effort, and cost. Application of the method gave an estimation of K = 0.5 d-1 on average for the Nassawadox Creek in Chesapeake Bay. The numerical and analytical model results with the estimated K agreed well with the observation, demonstrating the credibility of the method.

preprint2020arXiv

Weakly-Supervised Salient Object Detection via Scribble Annotations

Compared with laborious pixel-wise dense labeling, it is much easier to label data by scribbles, which only costs 1$\sim$2 seconds to label one image. However, using scribble labels to learn salient object detection has not been explored. In this paper, we propose a weakly-supervised salient object detection model to learn saliency from such annotations. In doing so, we first relabel an existing large-scale salient object detection dataset with scribbles, namely S-DUTS dataset. Since object structure and detail information is not identified by scribbles, directly training with scribble labels will lead to saliency maps of poor boundary localization. To mitigate this problem, we propose an auxiliary edge detection task to localize object edges explicitly, and a gated structure-aware loss to place constraints on the scope of structure to be recovered. Moreover, we design a scribble boosting scheme to iteratively consolidate our scribble annotations, which are then employed as supervision to learn high-quality saliency maps. As existing saliency evaluation metrics neglect to measure structure alignment of the predictions, the saliency map ranking metric may not comply with human perception. We present a new metric, termed saliency structure measure, to measure the structure alignment of the predicted saliency maps, which is more consistent with human perception. Extensive experiments on six benchmark datasets demonstrate that our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models. Our code and data is publicly available at https://github.com/JingZhang617/Scribble_Saliency.

preprint2020arXiv

Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching

Cross-view geo-localization is the problem of estimating the position and orientation (latitude, longitude and azimuth angle) of a camera at ground level given a large-scale database of geo-tagged aerial (e.g., satellite) images. Existing approaches treat the task as a pure location estimation problem by learning discriminative feature descriptors, but neglect orientation alignment. It is well-recognized that knowing the orientation between ground and aerial images can significantly reduce matching ambiguity between these two views, especially when the ground-level images have a limited Field of View (FoV) instead of a full field-of-view panorama. Therefore, we design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization. In particular, we address the cross-view domain gap by applying a polar transform to the aerial images to approximately align the images up to an unknown azimuth angle. Then, a two-stream convolutional network is used to learn deep features from the ground and polar-transformed aerial images. Finally, we obtain the orientation by computing the correlation between cross-view features, which also provides a more accurate measure of feature similarity, improving location recall. Experiments on standard datasets demonstrate that our method significantly improves state-of-the-art performance. Remarkably, we improve the top-1 location recall rate on the CVUSA dataset by a factor of 1.5x for panoramas with known orientation, by a factor of 3.3x for panoramas with unknown orientation, and by a factor of 6x for 180-degree FoV images with unknown orientation.

preprint2020arXiv

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

Vision-based sign language recognition aims at helping deaf people to communicate with others. However, most existing sign language datasets are limited to a small number of words. Due to the limited vocabulary size, models learned from those datasets cannot be applied in practice. In this paper, we introduce a new large-scale Word-Level American Sign Language (WLASL) video dataset, containing more than 2000 words performed by over 100 signers. This dataset will be made publicly available to the research community. To our knowledge, it is by far the largest public ASL dataset to facilitate word-level sign recognition research. Based on this new large-scale dataset, we are able to experiment with several deep learning methods for word-level sign recognition and evaluate their performances in large scale scenarios. Specifically we implement and compare two different models,i.e., (i) holistic visual appearance-based approach, and (ii) 2D human pose based approach. Both models are valuable baselines that will benefit the community for method benchmarking. Moreover, we also propose a novel pose-based temporal graph convolution networks (Pose-TGCN) that models spatial and temporal dependencies in human pose trajectories simultaneously, which has further boosted the performance of the pose-based method. Our results show that pose-based and appearance-based models achieve comparable performances up to 66% at top-10 accuracy on 2,000 words/glosses, demonstrating the validity and challenges of our dataset. Our dataset and baseline deep models are available at \url{https://dxli94.github.io/WLASL/}.

preprint2014arXiv

Global existence of null-form wave equations on small asymptotically Euclidean manifolds

We prove the global existence of the small solutions to the Cauchy problem for quasilinear wave equations satisfying the null condition on $(R^3, g)$, where the metric $g$ is a small perturbation of the flat metric and approaches the Euclidean metric like $(1+|x|)^{-a}$ with $a>1$. Global and almost global existence for systems without the null condition are also discussed for certain small time-dependent perturbations of the flat metric in the appendix.

preprint2014arXiv

Perturbative QCD study of $B_s$ decays to a pseudoscalar meson and a tensor meson

We study two-body hadronic $B_s\to PT$ decays, with $P (T)$ being a light pseudoscalar (tensor) meson, in the perturbative QCD approach. The CP-averaged branching ratios and the direct CP asymmetries of the $ΔS=0$ modes are predicted, where $ΔS$ is the difference between the strange numbers of final and initial states. We also define and calculate experimental observables for the $ΔS=1$ modes under the $B_s^0-\bar{B}_s^0$ mixing, including CP averaged branching ratios, time-integrated CP asymmetries, and the CP observables $C_{f}$, $D_{f}$ and $S_{f}$. Results are compared to the $B_s\to PV$ ones in the literature, and to the $B\to PT$ ones, which indicate considerable U-spin symmetry breaking. Our work provides theoretical predictions for the $B_s\to PT$ decays for the first time, some of which will be potentially measurable at future experiments.

preprint2014arXiv

Semileptonic decays $B_c^+\to D^{(*)}_{(s)}(l^+ν,l^+l^-,ν\barν)$ in the perturbative QCD approach

In this paper we study the semileptonic decays of $B_c^+\to D^{(*)}_{(s)}(l^+ν_l,l^+l^-,ν\barν)$ (here $l$ stands for $e$, $μ$, or $τ$). After evaluating the $B_c^+ \to (D_{(s)},D^*_{(s)})$ transition form factors $F_{0,+,T}(q^2)$ and $V(q^2), A_{0,1,2}(q^2), T_{1,2,3}(q^2)$ by employing the perturbative QCD factorization approach, we calculate the branching ratios for all these semileptonic decays. Our predictions for the values of the $B_c^+ \to D_{(s)}$ and $B_c^+ \to D^*_{(s)}$ transition form factors are consistent with those obtained by using other methods. The branching ratios of the decay modes with $\barνν$ are almost an order of magnitude larger than the corresponding decays with $l^+l^-$ after the summation over the three neutrino generations. The branching ratios for the decays with $b\to d$ transitions are much smaller than those decays with the $b\to s$ transitions, due to the Cabibbo-Kobayashi-Maskawa suppression. We define ratios $R_D$ and $R_{D^*}$ for the branching ratios with the $τ$ lepton versus $μ$, $e$ lepton final states to cancel the uncertainties of the form factors, which could possibly be tested in the near future.

preprint2014arXiv

The NLO twist-3 contributions to $B \to π$ form factors in $k_{T}$ factorization

In this paper, we calculate the next-to-leading-order (NLO) twist-3 contribution to the form factors of $B \to π$ transitions by employing the $k_{T}$ factorization theorem. All the infrared divergences regulated by the logarithms $\ln(k_{iT}^{2})$ cancel between those from the quark diagrams and from the effective diagrams for the initial $B$ meson wave function and the final pion meson wave function. An infrared finite NLO hard kernel is therefore obtained, which confirms the application of the $k_{T}$ factorization theorem to $B$ meson semileptonic decays at twist-3 level. From our analytical and numerical evaluations, we find that the NLO twist-3 contributions to the form factors $f^{+,0}(q^2)$ of $B \to π$ transition are similar in size, but have an opposite sign with the NLO twist-2 contribution, which leads to a large cancelation between these two NLO parts. For the case of $f^+(0)$, for example, the $24\%$ NLO twist-2 enhancement to the full LO prediction is largely canceled by the negative ( about $-17\%$ ) NLO twist-3 contribution, leaving a small and stable $7\%$ enhancement to the full LO prediction in the whole range of $0\leq q^2\leq 12$ GeV$^2$. At the full NLO level, the perturbative QCD prediction is $F^{B \to π}(0)=0.269^{+0.054}_{-0.050}$. We also studied the possible effects on the pQCD predictions when different sets of the B meson and pion distribution amplitudes are used in the numerical evaluation.

preprint2013arXiv

Time-dependent CP-violations of B(Bs) decays in the perturbative QCD approach

We study the decay modes of B_{s}^{0}(\bar{B}_{s}^{0})-->D_{s}^{\pm} K^{\mp}, B_{s}^{0}(\bar{B}_{s}^{0})-->D^{\pm} π^{\mp} and B^{0}(\bar{B}^{0})-->D^{\pm} π^{\mp} in the perterbative QCD approach based on k_T factorization, including the branching ratios and CP violation parameters which provide a clear way to extract the Cabibbo-Kobayashi-Maskawa angle γ. Our results of branching ratios of B_{s}^{0}(\bar{B}_{s}^{0})-->D_{s}^{\pm} K^{\mp} and B^{0}(\bar{B}^{0})--> D^{\pm} π^{\mp} and the CP asymmetry of B^{0}(\bar{B}^{0})--> D^{\pm} π^{\mp} agree well with the experimental data. We also give the predictions of the other observables, which provide some guidance for experiments in the future, especially for LHCb experiment.

preprint2012arXiv

Charmed B(B_{s}) decays involving a light tensor meson in PQCD approach

We study the $B(B_{s})\rightarrow D_{(s)}(\bar{D}_{(s)})\,T$ and $D_{(s)}^{\ast}(\bar{D}_{(s)}^{*})\,T$ decays in perturbative QCD approach, where T denotes a light tensor meson. In addition to the branching ratios, we also give predictions to the polarization fractions for the $D^{*}_{(s)}(\bar{D}_{(s)}^{*})T$ decays. For those decays with a tensor meson emitted, the factorizable emission diagrams do not contribute because of lorentz covariance considerations. In order to give the predictions to these decay channels, it is necessary to go beyond the naive factorization to calculate the nonfactorizable and annihilation diagrams.

preprint2012arXiv

Charmed Scalar Meson Production in $B$ Decays

The study on the charmed scalar meson spectroscopy has become a hot topic both experimentally and theoretically. The $B_{(s)}$ decays provide an ideal place to study their property. We employ the $B$-meson light-cone sum rules to compute the $\bar B_s^0\to D_s^{*+}(2317)$ and $B^-\to D_0^{*0}(2400)$ transition form factors at large recoil, assuming $D_s^{*+}(2317)$ and $D_0^{*0}(2400)$ being scalar quark-anti-quark states. The results are extrapolated to the whole momentum region with the help of HQET. Considering large uncertainties, our results can be consistent with the previous studies, while the power corrections should be large. We also estimate the semi-leptonic decays $\bar B_s^0\to D_s^{*+}(2317)l \barν_l$ and $B^-\to D_0^{*0}(2400)l \barν_l$. The branching fraction of the semi-leptonic $\bar B_s^0\to D_s^{*+}(2317)l \barν_l$ decay is around $6\times 10^{-3} $ for light leptons and $0.8\times 10^{-3} $ for tau final state. The predicted branching ration of $B^-\to D_0^{*0}(2400)l \barν_l$ is slightly larger than $\bar B_s^0\to D_s^{*+}(2317)l \barν_l$, and we hope the future data in LHCb can test these results.

preprint2012arXiv

Nonleptonic two-body charmless B decays involving a tensor meson in the Perturbative QCD Approach

Two-body charmless hadronic B decays involving a light tensor meson in the final states are studied in the perturbative QCD approach based on $k_T$ factorization. From our calculations, we find that the decay branching ratios for color allowed tree-dominated decays $B\to a_{2}^{0}π^{+}$ and $B\to a_{2}^{-}π^{+}$ modes are of order $10^{-6}$ and $10^{-5}$, respectively. While other color suppressed tree-dominated decays have very small branching ratios. In general, the branching ratios of most decays are in the range of $10^{-5}$ to $10^{-8}$, which are bigger by one or two orders of magnitude than those predictions obtained in Isgur-Scora-Grinstein-Wise II model and in the covariant light-front approach, but consistent with the recent experimental measurements and the QCD factorization calculations. Since the decays with a tensor meson emitted from vacuum are prohibited in naive factorization, the contributions of nonfactorizable and annihilation diagrams are very important to these decays, which are calculable in our perturbative QCD approach. We also give predictions to the direct CP asymmetries, some of which are large enough for the future experiments to measure. Because we considered the mixing between $f_{2}$ and $f_{2}'$, the decay rates are enhanced significantly for some decays involving $f_{2}^{\prime}$ meson, even with a small mixing angle.

preprint2012arXiv

Recent works on the Strauss conjecture

In this review paper, we summarize the current state-of-art on the Strauss conjecture with nontrapping obstacles. Among others, three essential estimates are emphasized and presented: Morawetz-KSS estimates (also known as local energy estimates), weighted Strichartz estimates and generalized Strichartz estimates.

preprint2012arXiv

The $B_c\rightarrow D^{(*)}T$ decays in perturbative QCD approach

In this work, we investigate those $B_{c}\rightarrow D^{(*)}T$ decays in perturbative QCD approach, based on $k_T$ factorization, where T denotes a light tensor meson. For all decays considered in this work, there are no contributions from factorizable emission diagrams because the emitted meson is the tensor meson. We find that the annihilation amplitudes are dominant in these decays due to the large Cabibbo-Kobayashi-Maskawa elements, which are only calculable in the pQCD approach. The numerical results show that the predictions for the branching ratios of most decays are in the order of $10^{-6}$ or even bigger, which can be observed in the ongoing experiments. We also predict large percentage of transverse polarizations in those W annihilation diagram dominant $B_{c} \rightarrow D^{*}T$ decay channels.

preprint2012arXiv

The $B(B_{s})\rightarrow D_{(s)}(\bar{D}_{(s)}) T$ and $D_{(s)}^{}(\bar{D}_{(s)}^{})T$ decays in perturbative QCD approach

In perturbative QCD approach, we investigate the $B(B_{s})\rightarrow D_{(s)}(\bar{D}_{(s)})\,T$ and $D_{(s)}^{*}(\bar{D}_{(s)}^{*})\,T$ decays, which include the Cabibbo-Kobayashi-Maskawa (CKM) favored decays and the Cabibbo-Kobayashi-Maskawa-suppressed decays, where T denotes a light tensor meson. From our calculation, we find that the nonfactorizable emission diagrams and the annihilation type diagrams are important, especially for those color suppressed channels. For those decays with a tensor meson emitted, the factorizable emission diagrams vanish owing to the fact that a tensor meson can not be produced through the local (V-A) or tensor current. The numerical results show that the predictions for the branching ratios of considered charmed B decays are in the range of $10^{-4}$ to $10^{-6}$ for those CKM-favored decays (governed by $|V_{cb}|$) and in the range of $10^{-5}$ to $10^{-8}$ for those CKM-suppressed decays (governed by $|V_{ub}|$). We also predict large transverse polarization contributions in many of the $B(B_{s}) \rightarrow D_{(s)}^{\ast}(\bar{D}_{(s)}^{*})T$ decay channels.

preprint2011arXiv

Circuit Modeling of Tunneling Real-Space Transfer Transistors: Toward Terahertz Frequency Operation

High frequency operation of tunneling real-space transfer transistor (TRSTT) in the negative differential resistance (NDR) regime is assessed by calculating the device common source unity current gain frequency (fT) range with a small signal equivalent circuit model including tunneling. Our circuit model is based on an In0.2Ga0.8As and delta-doped GaAs dual channel structure with various gate lengths. The calculated TRSTT fT agrees very well with experimental data, limiting factor being the resistance of the delta-doped GaAs layer. By optimizing the gate dimensions and channel materials, we find fT in the NDR region approaches terahertz range, which anticipates potential use of TRSTT as terahertz sources.

preprint2011arXiv

Concerning the Strauss conjecture on asymptotically Euclidean manifolds

In this paper we verify the Strauss conjecture for semilinear wave equations on asymptotically Euclidean manifolds when n=3,4, we also give an almost sharp life span for the subcritical case $2\le p<p_c$ when n=3. The main ingredients include a KSS type estimate with $0<μ<1/2$ and weighted Strichartz estimates of order two.

preprint2011arXiv

Generalized and weighted Strichartz estimates

In this paper, we explore the relations between different kinds of Strichartz estimates and give new estimates in Euclidean space $\mathbb{R}^n$. In particular, we prove the generalized and weighted Strichartz estimates for a large class of dispersive operators including the Schrödinger and wave equation. As a sample application of these new estimates, we are able to prove the Strauss conjecture with low regularity for dimension 2 and 3.

preprint2011arXiv

Generalized Strichartz Estimates on Perturbed Wave Equation and Applications on Strauss Conjecture

In this paper we show a general Strichartz estimate for certain perturbed wave equation, and here we can drop the nontrapping hypothesis and handle trapping obstacles with some loss of derivatives for data in the local energy decay estimates. Then we give the obstacle version of sharp life span for semilinear wave equations when $n=3,p<p_c$, by using a real interpolation method, and by getting a corresponding finite time Strichartz estimates(see section 3). Finally, as an application of the general Strichartz estimates we have gained, we get the Strauss conjecture for semilinear wave equations with several convex obstacles when $n=3,4$(see Section 4).

Xin Yu

What is connected

Connect this record

See the researcher in context

Building this map preview

53 published item(s)

ResiHMR: Residual-Limb Aware Single-Image 3D Human Mesh Recovery for Individuals with Limb Loss

Mirage: One-Step Video Diffusion for Photorealistic and Coherent Asset Editing in Driving Scenes

EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation

ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning

Getting Away with More Network Pruning: From Sparsity to Geometry and Linear Regions

Leveraging Partial Symmetry for Multi-Agent Reinforcement Learning

MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views

A Novel Position-based VR Online Shopping Recommendation System based on Optimized Collaborative Filtering Algorithm

Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image Matching

Characterizing Renal Structures with 3D Block Aggregate Transformers

CVLNet: Cross-View Semantic Correspondence Learning for Video-based Camera Localization

Gait Recognition with Mask-based Regularization

GaitGL: Learning Discriminative Global-Local Feature Representations for Gait Recognition

Geometry-Guided Street-View Panorama Synthesis from Satellite Imagery

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

Learning Implicit Body Representations from Double Diffusion Based Neural Radiance Fields

Pro-UIGAN: Progressive Face Hallucination from Occluded Thumbnails

Pseudo-Label Guided Multi-Contrast Generalization for Non-Contrast Organ-Aware Segmentation

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Realization of a photonic topological insulator in Kagome crystals at terahertz wavelengths

RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Surrogate Neural Network Model for Sensitivity Analysis and Uncertainty Quantification of the Mechanical Behavior in the Optical Lens-Barrel Assembly

The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks

Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing

Video Demoireing with Relation-Based Temporal Consistency

ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring

Iterative Optimisation with an Innovation CNN for Pose Refinement

Modeling the Probabilistic Distribution of Unlabeled Data forOne-shot Medical Image Segmentation

6DoF Object Pose Estimation via Differentiable Proxy Voting Loss

Copy and Paste GAN: Face Hallucination from Shaded Thumbnails

Face Hallucination with Finishing Touches

Learning Object Relation Graph and Tentative Policy for Visual Navigation

LyRN (Lyapunov Reaching Network): A Real-Time Closed Loop approach from Monocular Vision

Transferring Cross-domain Knowledge for Video Sign Language Recognition

Using observed bacteria concentration and modeled transit time under an analytical framework to estimate overall removal rate of fecal coliform in an estuary

Weakly-Supervised Salient Object Detection via Scribble Annotations

Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

Global existence of null-form wave equations on small asymptotically Euclidean manifolds

Perturbative QCD study of $B_s$ decays to a pseudoscalar meson and a tensor meson

Semileptonic decays $B_c^+\to D^{(*)}_{(s)}(l^+ν,l^+l^-,ν\barν)$ in the perturbative QCD approach

The NLO twist-3 contributions to $B \to π$ form factors in $k_{T}$ factorization

Time-dependent CP-violations of B(Bs) decays in the perturbative QCD approach

Charmed B(B_{s}) decays involving a light tensor meson in PQCD approach

Charmed Scalar Meson Production in $B$ Decays

Nonleptonic two-body charmless B decays involving a tensor meson in the Perturbative QCD Approach

Recent works on the Strauss conjecture

The $B_c\rightarrow D^{(*)}T$ decays in perturbative QCD approach

The $B(B_{s})\rightarrow D_{(s)}(\bar{D}_{(s)}) T$ and $D_{(s)}^{*}(\bar{D}_{(s)}^{*})T$ decays in perturbative QCD approach

Circuit Modeling of Tunneling Real-Space Transfer Transistors: Toward Terahertz Frequency Operation

Concerning the Strauss conjecture on asymptotically Euclidean manifolds

Generalized and weighted Strichartz estimates

Generalized Strichartz Estimates on Perturbed Wave Equation and Applications on Strauss Conjecture

The $B(B_{s})\rightarrow D_{(s)}(\bar{D}_{(s)}) T$ and $D_{(s)}^{}(\bar{D}_{(s)}^{})T$ decays in perturbative QCD approach