Source author record

Li Cheng

Li Cheng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning cond-mat.mtrl-sci eess.IV cond-mat.str-el eess.AS hep-ex hep-ph physics.app-ph physics.ins-det Sound Artificial Intelligence cond-mat.mes-hall cond-mat.other cs.CY Multimedia nlin.AO physics.comp-ph

Catalog footprint

What is connected

32works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Band degeneration and evolution in nonlinear triatomic chain superlattices

Nonlinear superlattices exhibit unique features allowing for wave manipulations. Despite the increasing attention received, the underlying physical mechanisms and the evolution process of the band structures and bandgaps in strongly nonlinear superlattices remain unclear. Here we establish and examine strongly nonlinear superlattice models (three triatomic models) to show the evolution process of typical nonlinear band structures based on analytical and numerical approaches. We find that the strongly nonlinear superlattices present particular band degeneration and bifurcation, accompanied with the vibration mode transfer in their unit cells. The evolution processes and the physical mechanisms of the band degeneration in different models are clarified with the consideration of the mode transfer. The observed degeneration may occur as the shifting, bifurcating, shortening, merging or disappearing of dispersion curves, all depending on the arrangement of the coupled nonlinear elements. Meanwhile, the dimension of the unit cell reduces, alongside changes in the frequency range and mechanisms (Bragg and local resonance) of the bandgaps. These findings answer some foundamental questions peritinent to the study of nonlinear periodic structures, nonlinear crystals and nonlinear metamaterials, which are of interest to the broad community of physics

preprint2022arXiv

Dual Learning Music Composition and Dance Choreography

Music and dance have always co-existed as pillars of human activities, contributing immensely to the cultural, social, and entertainment functions in virtually all societies. Notwithstanding the gradual systematization of music and dance into two independent disciplines, their intimate connection is undeniable and one art-form often appears incomplete without the other. Recent research works have studied generative models for dance sequences conditioned on music. The dual task of composing music for given dances, however, has been largely overlooked. In this paper, we propose a novel extension, where we jointly model both tasks in a dual learning approach. To leverage the duality of the two modalities, we introduce an optimal transport objective to align feature embeddings, as well as a cycle consistency loss to foster overall consistency. Experimental results demonstrate that our dual learning framework improves individual task performance, delivering generated music compositions and dance choreographs that are realistic and faithful to the conditioned inputs.

preprint2022arXiv

Music-to-Dance Generation with Optimal Transport

Dance choreography for a piece of music is a challenging task, having to be creative in presenting distinctive stylistic dance elements while taking into account the musical theme and rhythm. It has been tackled by different approaches such as similarity retrieval, sequence-to-sequence modeling and generative adversarial networks, but their generated dance sequences are often short of motion realism, diversity and music consistency. In this paper, we propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographies from music. We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music. This gives a well defined and non-divergent training objective that mitigates the limitation of standard GAN training which is frequently plagued with instability and divergent generator loss issues. Extensive experiments demonstrate that our MDOT-Net can synthesize realistic and diverse dances which achieve an organic unity with the input music, reflecting the shared intentionality and matching the rhythmic articulation. Sample results are found at https://www.youtube.com/watch?v=dErfBkrlUO8.

preprint2022arXiv

Object Wake-up: 3D Object Rigging from a Single Image

Given a single image of a general object such as a chair, could we also restore its articulated 3D shape similar to human modeling, so as to animate its plausible articulations and diverse motions? This is an interesting new question that may have numerous downstream augmented reality and virtual reality applications. Comparing with previous efforts on object manipulation, our work goes beyond 2D manipulation and rigid deformation, and involves articulated manipulation. To achieve this goal, we propose an automated approach to build such 3D generic objects from single images and embed articulated skeletons in them. Specifically, our framework starts by reconstructing the 3D object from an input image. Afterwards, to extract skeletons for generic 3D objects, we develop a novel skeleton prediction method with a multi-head structure for skeleton probability field estimation by utilizing the deep implicit functions. A dataset of generic 3D objects with ground-truth annotated skeletons is collected. Empirically our approach is demonstrated with satisfactory performance on public datasets as well as our in-house dataset; our results surpass those of the state-of-the-arts by a noticeable margin on both 3D reconstruction and skeleton prediction.

preprint2022arXiv

Promoting Saliency From Depth: Deep Unsupervised RGB-D Saliency Detection

Growing interests in RGB-D salient object detection (RGB-D SOD) have been witnessed in recent years, owing partly to the popularity of depth sensors and the rapid progress of deep learning techniques. Unfortunately, existing RGB-D SOD methods typically demand large quantity of training images being thoroughly annotated at pixel-level. The laborious and time-consuming manual annotation has become a real bottleneck in various practical scenarios. On the other hand, current unsupervised RGB-D SOD methods still heavily rely on handcrafted feature representations. This inspires us to propose in this paper a deep unsupervised RGB-D saliency detection approach, which requires no manual pixel-level annotation during training. It is realized by two key ingredients in our training pipeline. First, a depth-disentangled saliency update (DSU) framework is designed to automatically produce pseudo-labels with iterative follow-up refinements, which provides more trustworthy supervision signals for training the saliency network. Second, an attentive training strategy is introduced to tackle the issue of noisy pseudo-labels, by properly re-weighting to highlight the more reliable pseudo-labels. Extensive experiments demonstrate the superior efficiency and effectiveness of our approach in tackling the challenging unsupervised RGB-D SOD scenarios. Moreover, our approach can also be adapted to work in fully-supervised situation. Empirical studies show the incorporation of our approach gives rise to notably performance improvement in existing supervised RGB-D SOD models.

preprint2022arXiv

TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts

Inspired by the strong ties between vision and language, the two intimate human sensing and communication modalities, our paper aims to explore the generation of 3D human full-body motions from texts, as well as its reciprocal task, shorthanded for text2motion and motion2text, respectively. To tackle the existing challenges, especially to enable the generation of multiple distinct motions from the same text, and to avoid the undesirable production of trivial motionless pose sequences, we propose the use of motion token, a discrete and compact motion representation. This provides one level playing ground when considering both motions and text signals, as the motion and text tokens, respectively. Moreover, our motion2text module is integrated into the inverse alignment process of our text2motion training pipeline, where a significant deviation of synthesized text from the input text would be penalized by a large training loss; empirically this is shown to effectively improve performance. Finally, the mappings in-between the two modalities of motions and texts are facilitated by adapting the neural model for machine translation (NMT) to our context. This autoregressive modeling of the distribution over discrete motion tokens further enables non-deterministic production of pose sequences, of variable lengths, from an input text. Our approach is flexible, could be used for both text2motion and motion2text tasks. Empirical evaluations on two benchmark datasets demonstrate the superior performance of our approach on both tasks over a variety of state-of-the-art methods. Project page: https://ericguo5513.github.io/TM2T/

preprint2021arXiv

Deep Learning for Visual Tracking: A Comprehensive Survey

Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years -- predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.

preprint2021arXiv

Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction

Predicting human motion from historical pose sequence is crucial for a machine to succeed in intelligent interactions with humans. One aspect that has been obviated so far, is the fact that how we represent the skeletal pose has a critical impact on the prediction results. Yet there is no effort that investigates across different pose representation schemes. We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task. Moreover, recent approaches build upon off-the-shelf RNN units for motion prediction. These approaches process input pose sequence sequentially and inherently have difficulties in capturing long-term dependencies. In this paper, we propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction which simultaneously models local motion contexts and a global context. We further explore a geodesic loss and a forward kinematics loss for the motion prediction task, which have more geometric significance than the widely employed L2 loss. Interestingly, we applied our method to a range of articulate objects including human, fish, and mouse. Empirical results show that our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency, such as retaining natural human-like motions over 50 seconds predictions. Our codes are released.

preprint2021arXiv

Understanding the flat band in 1T-TaS2 using a rotated basis

Electronic flat bands serve as a unique platform to achieve strongly-correlated phases. The emergence of a flat band around the Fermi level in 1T-TaS$_2$ in accompany with the development of a $\sqrt{13}\times\sqrt{13}$ charge density wave (CDW) superlattice has long been noticed experimentally, but a transparent theoretical understanding remains elusive. We show that without CDW, the primary feature of the $1\times1$ bands can be fitted by a simple trigonometric function, and physically understood by choosing a rotated $\tilde{t}_{2g}$ basis with the principle axes aligning to the tilted TaS$_6$ octahedron. Using this basis, we trace the band evolution in the $\sqrt{13}\times\sqrt{13}$ superlattice by progressively including different CDW effects. We point out that CDW strongly rehybridizes the three $\tilde{t}_{2g}$ orbitals, which leads to the formation of a well-localized molecular orbital and spawns the flat band.

preprint2020arXiv

3D Human Shape Reconstruction from a Polarization Image

This paper tackles the problem of estimating 3D body shape of clothed humans from single polarized 2D images, i.e. polarization images. Polarization images are known to be able to capture polarized reflected lights that preserve rich geometric cues of an object, which has motivated its recent applications in reconstructing surface normal of the objects of interest. Inspired by the recent advances in human shape estimation from single color images, in this paper, we attempt at estimating human body shapes by leveraging the geometric cues from single polarization images. A dedicated two-stage deep learning approach, SfP, is proposed: given a polarization image, stage one aims at inferring the fined-detailed body surface normal; stage two gears to reconstruct the 3D body shape of clothing details. Empirical evaluations on a synthetic dataset (SURREAL) as well as a real-world dataset (PHSPD) demonstrate the qualitative and quantitative performance of our approach in estimating human poses and shapes. This indicates polarization camera is a promising alternative to the more conventional color or depth imaging for human shape estimation. Further, normal maps inferred from polarization imaging play a significant role in accurately recovering the body shapes of clothed people.

preprint2020arXiv

Achromatic metasurfaces with inversely customized dispersion for ultra-broadband acoustic beam engineering

Metasurfaces, the ultrathin media with extraordinary wavefront modulation ability, have shown versatile potential in manipulating waves. However, existing acoustic metasurfaces are limited by their narrow-band frequency-dependent capability, which severely hinders their real-world applications that usually require customized dispersion. To address this bottlenecking challenge, we report ultra-broadband achromatic metasurfaces that are capable of delivering arbitrary and frequency-independent wave properties by bottom-up topology optimization. We successively demonstrate three ultra-broadband functionalities, including acoustic beam steering, focusing and levitation, featuring record-breaking relative bandwidths of 93.3%, 120% and 118.9%, respectively. All metasurface elements show novel asymmetric geometries containing multiple scatters, curved air channels and local cavities. Moreover, we reveal that the inversely designed metasurfaces can support integrated internal resonances, bi-anisotropy and multiple scattering, which collectively form the mechanism underpinning the ultra-broadband customized dispersion. Our study opens new horizons for ultra-broadband high-efficiency achromatic functional devices on demand, with promising extension to the optical and elastic achromatic metamaterials.

preprint2020arXiv

Action2Motion: Conditioned Generation of 3D Human Motions

Action recognition is a relatively established task, where givenan input sequence of human motion, the goal is to predict its ac-tion category. This paper, on the other hand, considers a relativelynew problem, which could be thought of as an inverse of actionrecognition: given a prescribed action type, we aim to generateplausible human motion sequences in 3D. Importantly, the set ofgenerated motions are expected to maintain itsdiversityto be ableto explore the entire action-conditioned motion space; meanwhile,each sampled sequence faithfully resembles anaturalhuman bodyarticulation dynamics. Motivated by these objectives, we followthe physics law of human kinematics by adopting the Lie Algebratheory to represent thenaturalhuman motions; we also propose atemporal Variational Auto-Encoder (VAE) that encourages adiversesampling of the motion space. A new 3D human motion dataset, HumanAct12, is also constructed. Empirical experiments overthree distinct human motion datasets (including ours) demonstratethe effectiveness of our approach.

preprint2020arXiv

COMET: Context-Aware IoU-Guided Network for Small Object Tracking

We consider the problem of tracking an unknown small target from aerial videos of medium to high altitudes. This is a challenging problem, which is even more pronounced in unavoidable scenarios of drastic camera motion and high density. To address this problem, we introduce a context-aware IoU-guided tracker (COMET) that exploits a multitask two-stream network and an offline reference proposal generation strategy. The proposed network fully exploits target-related information by multi-scale feature learning and attention modules. The proposed strategy introduces an efficient sampling strategy to generalize the network on the target and its parts without imposing extra computational complexity during online tracking. These strategies contribute considerably in handling significant occlusions and viewpoint changes. Empirically, COMET outperforms the state-of-the-arts in a range of aerial view datasets that focusing on tracking small objects. Specifically, COMET outperforms the celebrated ATOM tracker by an average margin of 6.2% (and 7%) in precision (and success) score on challenging benchmarks of UAVDT, VisDrone-2019, and Small-90.

preprint2020arXiv

Giant renormalization of correlation strength in 1T-TaS2 by lattice vibration

The lattice thermodynamics of a 1T-TaS2 layer, e.g. the spontaneous formation of a sqrt13*sqrt13 commensurate charge density wave (CCDW) and vibrations around the equilibrium position, is calculated by ab initio molecular dynamics. Based on that, we examine how the ground-state electronic structure is renormalized by lattice temperature. We show that the band gap within the density functional theory plus onsite-U correction shrinks by half when the temperature raises from 0 K to 200 K. The gap size reduction is one order of magnitude larger than the temperature variation in energy. This giant temperature dependence is closely related to the CCDW-triggered Mottness in 1T-TaS2, and is expected to result in unconventional thermodynamic properties.

preprint2020arXiv

Outlier Detection Ensemble with Embedded Feature Selection

Feature selection places an important role in improving the performance of outlier detection, especially for noisy data. Existing methods usually perform feature selection and outlier scoring separately, which would select feature subsets that may not optimally serve for outlier detection, leading to unsatisfying performance. In this paper, we propose an outlier detection ensemble framework with embedded feature selection (ODEFS), to address this issue. Specifically, for each random sub-sampling based learning component, ODEFS unifies feature selection and outlier detection into a pairwise ranking formulation to learn feature subsets that are tailored for the outlier detection method. Moreover, we adopt the thresholded self-paced learning to simultaneously optimize feature selection and example selection, which is helpful to improve the reliability of the training set. After that, we design an alternate algorithm with proved convergence to solve the resultant optimization problem. In addition, we analyze the generalization error bound of the proposed framework, which provides theoretical guarantee on the method and insightful practical guidance. Comprehensive experimental results on 12 real-world datasets from diverse domains validate the superiority of the proposed ODEFS.

preprint2020arXiv

Polarization Human Shape and Pose Dataset

Polarization images are known to be able to capture polarized reflected lights that preserve rich geometric cues of an object, which has motivated its recent applications in reconstructing detailed surface normal of the objects of interest. Meanwhile, inspired by the recent breakthroughs in human shape estimation from a single color image, we attempt to investigate the new question of whether the geometric cues from polarization camera could be leveraged in estimating detailed human body shapes. This has led to the curation of Polarization Human Shape and Pose Dataset (PHSPD), our home-grown polarization image dataset of various human shapes and poses.

preprint2020arXiv

Reconstruct high-resolution multi-focal plane images from a single 2D wide field image

High-resolution 3D medical images are important for analysis and diagnosis, but axial scanning to acquire them is very time-consuming. In this paper, we propose a fast end-to-end multi-focal plane imaging network (MFPINet) to reconstruct high-resolution multi-focal plane images from a single 2D low-resolution wild filed image without relying on scanning. To acquire realistic MFP images fast, the proposed MFPINet adopts generative adversarial network framework and the strategies of post-sampling and refocusing all focal planes at one time. We conduct a series experiments on cytology microscopy images and demonstrate that MFPINet performs well on both axial refocusing and horizontal super resolution. Furthermore, MFPINet is approximately 24 times faster than current refocusing methods for reconstructing the same volume images. The proposed method has the potential to greatly increase the speed of high-resolution 3D imaging and expand the application of low-resolution wide-field images.

preprint2020arXiv

Renormalization of the Mott gap by lattice entropy: The case of 1T-TaS2

In many transition-metal oxides and dichalcogenides, the electronic and lattice degrees of freedom are strongly coupled, giving rise to remarkable phenomena, such as metal-insulator transition (MIT) and charge-density wave (CDW) order. We study this interplay by tracing the instant electronic structure under ab initio molecular dynamics. Applying this method to a 1T-TaS2 layer, we show that the CDW-triggered Mott gap undergoes a continuous reduction as the lattice temperature raises, despite a nearly constant CDW amplitude. Before the CDW order undergoes a sharp first-order transition around the room temperature, the dynamical CDW fluctuation already shrinks the Mott gap size by half. The gap size reduction is one order of magnitude larger than the lattice temperature variation. Our calculation not only provides an important clue to understand the thermodynamics behavior in 1T-TaS2, but also demonstrates a general approach to quantify the lattice entropy effect in MIT.

preprint2020arXiv

SparseFusion: Dynamic Human Avatar Modeling from Sparse RGBD Images

In this paper, we propose a novel approach to reconstruct 3D human body shapes based on a sparse set of RGBD frames using a single RGBD camera. We specifically focus on the realistic settings where human subjects move freely during the capture. The main challenge is how to robustly fuse these sparse frames into a canonical 3D model, under pose changes and surface occlusions. This is addressed by our new framework consisting of the following steps. First, based on a generative human template, for every two frames having sufficient overlap, an initial pairwise alignment is performed; It is followed by a global non-rigid registration procedure, in which partial results from RGBD frames are collected into a unified 3D shape, under the guidance of correspondences from the pairwise alignment; Finally, the texture map of the reconstructed human model is optimized to deliver a clear and spatially consistent texture. Empirical evaluations on synthetic and real datasets demonstrate both quantitatively and qualitatively the superior performance of our framework in reconstructing complete 3D human models with high fidelity. It is worth noting that our framework is flexible, with potential applications going beyond shape reconstruction. As an example, we showcase its use in reshaping and reposing to a new avatar.

preprint2020arXiv

Stabilizing Training of Generative Adversarial Nets via Langevin Stein Variational Gradient Descent

Generative adversarial networks (GANs), famous for the capability of learning complex underlying data distribution, are however known to be tricky in the training process, which would probably result in mode collapse or performance deterioration. Current approaches of dealing with GANs' issues almost utilize some practical training techniques for the purpose of regularization, which on the other hand undermines the convergence and theoretical soundness of GAN. In this paper, we propose to stabilize GAN training via a novel particle-based variational inference -- Langevin Stein variational gradient descent (LSVGD), which not only inherits the flexibility and efficiency of original SVGD but aims to address its instability issues by incorporating an extra disturbance into the update dynamics. We further demonstrate that by properly adjusting the noise variance, LSVGD simulates a Langevin process whose stationary distribution is exactly the target distribution. We also show that LSVGD dynamics has an implicit regularization which is able to enhance particles' spread-out and diversity. At last we present an efficient way of applying particle-based variational inference on a general GAN training procedure no matter what loss function is adopted. Experimental results on one synthetic dataset and three popular benchmark datasets -- Cifar-10, Tiny-ImageNet and CelebA validate that LSVGD can remarkably improve the performance and stability of various GAN models.

preprint2019arXiv

Robust 3D multi-polar acoustic metamaterials with broadband double negativity

Acoustic negative-index metamaterials show promise in achieving superlensing for diagnostic medical imaging. In spite of the recent progress made in this field, most metamaterials suffer from deficiencies such as low spatial symmetry, sophisticated labyrinth topologies and narrow-band features, which make them difficult to be utilized for symmetric subwavelength imaging applications. Here, we propose a category of robust multi-cavity metamaterials and reveal their common double-negative mechanism enabled by multi-polar (dipole, quadrupole and octupole) resonances in both two-dimensional (2D) and three-dimensional (3D) scenarios. In particular, we discover explicit relationships governing the double-negative frequency bounds from equivalent circuit analogy. Moreover, broadband single-source and double-source subwavelength imaging is realized and verified by 2D and 3D superlens. More importantly, the analogical 3D superlens can ensure the subwavelength imaging in all directions. The proposed multi-polar resonance-enabled robust metamaterials and design methodology open horizons for easier manipulation of subwavelength waves and realization of practical 3D metamaterial devices.

preprint2019arXiv

TBC-Net: A real-time detector for infrared small target detection using semantic constraint

Infrared small target detection is a key technique in infrared search and tracking (IRST) systems. Although deep learning has been widely used in the vision tasks of visible light images recently, it is rarely used in infrared small target detection due to the difficulty in learning small target features. In this paper, we propose a novel lightweight convolutional neural network TBC-Net for infrared small target detection. The TBCNet consists of a target extraction module (TEM) and a semantic constraint module (SCM), which are used to extract small targets from infrared images and to classify the extracted target images during the training, respectively. Meanwhile, we propose a joint loss function and a training method. The SCM imposes a semantic constraint on TEM by combining the high-level classification task and solve the problem of the difficulty to learn features caused by class imbalance problem. During the training, the targets are extracted from the input image and then be classified by SCM. During the inference, only the TEM is used to detect the small targets. We also propose a data synthesis method to generate training data. The experimental results show that compared with the traditional methods, TBC-Net can better reduce the false alarm caused by complicated background, the proposed network structure and joint loss have a significant improvement on small target feature learning. Besides, TBC-Net can achieve real-time detection on the NVIDIA Jetson AGX Xavier development board, which is suitable for applications such as field research with drones equipped with infrared sensors.

preprint2016arXiv

Hand Action Detection from Ego-centric Depth Sequences with Error-correcting Hough Transform

Detecting hand actions from ego-centric depth sequences is a practically challenging problem, owing mostly to the complex and dexterous nature of hand articulations as well as non-stationary camera motion. We address this problem via a Hough transform based approach coupled with a discriminatively learned error-correcting component to tackle the well known issue of incorrect votes from the Hough transform. In this framework, local parts vote collectively for the start $\&$ end positions of each action over time. We also construct an in-house annotated dataset of 300 long videos, containing 3,177 single-action subsequences over 16 action classes collected from 26 individuals. Our system is empirically evaluated on this real-life dataset for both the action recognition and detection tasks, and is shown to produce satisfactory results. To facilitate reproduction, the new dataset and our implementation are also provided online.

preprint2016arXiv

Learning to Search on Manifolds for 3D Pose Estimation of Articulated Objects

This paper focuses on the challenging problem of 3D pose estimation of a diverse spectrum of articulated objects from single depth images. A novel structured prediction approach is considered, where 3D poses are represented as skeletal models that naturally operate on manifolds. Given an input depth image, the problem of predicting the most proper articulation of underlying skeletal model is thus formulated as sequentially searching for the optimal skeletal configuration. This is subsequently addressed by convolutional neural nets trained end-to-end to render sequential prediction of the joint locations as regressing a set of tangent vectors of the underlying manifolds. Our approach is examined on various articulated objects including human hand, mouse, and fish benchmark datasets. Empirically it is shown to deliver highly competitive performance with respect to the state-of-the-arts, while operating in real-time (over 30 FPS).

preprint2016arXiv

Lie-X: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups

Pose estimation, tracking, and action recognition of articulated objects from depth images are important and challenging problems, which are normally considered separately. In this paper, a unified paradigm based on Lie group theory is proposed, which enables us to collectively address these related problems. Our approach is also applicable to a wide range of articulated objects. Empirically it is evaluated on lab animals including mouse and fish, as well as on human hand. On these applications, it is shown to deliver competitive results compared to the state-of-the-arts, and non-trivial baselines including convolutional neural networks and regression forest methods.

preprint2016arXiv

The test of the electronics system for the BESIII ETOF upgrade

It is proposed to upgrade the endcap time-of-flight (ETOF) of the Beijing Spectrometer III (BESIII) with multi-gap resistive plate chamber (MRPC), aiming at overall time resolution about 80 ps. After the entire electronics system is ready, some experiments, such as heat radiating, irradiation hardness and large-current beam tests,are carried out to certify the electronics' reliability and stability. The on-detector test of the electronics is also performed with the beam at BEPCII E3 line, the test results indicate that the electronics system fulfills its design requirements.

preprint2016arXiv

Two Higgs doublet models augmented by a scalar color octet

The LHC is now studying in detail the couplings of the Higgs boson in order to determine if there is new physics. Many recent studies have examined the available fits to Higgs couplings from the perspective of constraining two Higgs doublet models (2HDM). In this paper we extend those studies to include constraints on the one loop couplings of the Higgs to gluons and photons. These couplings are particularly sensitive to the existence of new colored particles that are hard to detect otherwise and we use them to constrain a 2HDM augmented with a color-octet scalar, a possibility motivated by minimal flavor violation. We first study theoretical constraints on this model and then compare them with LHC measurements.

preprint2015arXiv

Mouse Pose Estimation From Depth Images

We focus on the challenging problem of efficient mouse 3D pose estimation based on static images, and especially single depth images. We introduce an approach to discriminatively train the split nodes of trees in random forest to improve their performance on estimation of 3D joint positions of mouse. Our algorithm is capable of working with different types of rodents and with different types of depth cameras and imaging setups. In particular, it is demonstrated in this paper that when a top-mounted depth camera is combined with a bottom-mounted color camera, the final system is capable of delivering full-body pose estimation including four limbs and the paws. Empirical examinations on synthesized and real-world depth images confirm the applicability of our approach on mouse pose estimation, as well as the closely related task of part-based labeling of mouse.

preprint2014arXiv

Monte Carlo Simulation of RPC-based PET with GEANT4

The Resistive Plate Chambers (RPC) are low-cost charged-particle detectors with good timing resolution and potentially good spatial resolution. Using RPC as gamma detector provides an opportunity for application in positron emission tomography (PET). In this work, we use GEANT4 simulation package to study various methods improving the detection efficiency of a realistic RPC-based PET model for 511keV photons, by adding more detection units, changing the thickness of each layer, choosing different converters and using multi-gaps RPC (MRPC) technique. Proper balance among these factors are discussed. It's found that although RPC with materials of high atomic number can reach a higher efficiency, they may contribute to a poor spatial resolution and higher background level.

preprint2014arXiv

Top-quark forward-backward asymmetry from a color-octet t-channel resonance

We consider new physics contributions to the top-quark forward-backward asymmetry from a neutral $V^0_8$ or charged $V^+_8$ color-octet vector exchanged in the $t$-channel. We study the phenomenological constraints on these particles arising from the Tevatron and LHC7 measurements and compare them with those on their color singlet counterparts $Z^\prime$ and $W^\prime$. We find that the color octets fare better than the singlets in that they generate a lower $A_C$, a lower high-invariant mass cross-section at LHC7 and a lower same sign top-pair cross-section. However, they also generate a lower $A_{FB}$ than their color-singlet counterparts.

preprint2014arXiv

Transduction on Directed Graphs via Absorbing Random Walks

In this paper we consider the problem of graph-based transductive classification, and we are particularly interested in the directed graph scenario which is a natural form for many real world applications. Different from existing research efforts that either only deal with undirected graphs or circumvent directionality by means of symmetrization, we propose a novel random walk approach on directed graphs using absorbing Markov chains, which can be regarded as maximizing the accumulated expected number of visits from the unlabeled transient states. Our algorithm is simple, easy to implement, and works with large-scale graphs. In particular, it is capable of preserving the graph structure even when the input graph is sparse and changes over time, as well as retaining weak signals presented in the directed edges. We present its intimate connections to a number of existing methods, including graph kernels, graph Laplacian based methods, and interestingly, spanning forest of graphs. Its computational complexity and the generalization error are also studied. Empirically our algorithm is systematically evaluated on a wide range of applications, where it has shown to perform competitively comparing to a suite of state-of-the-art methods.

preprint2012arXiv

Big-Five Personality Prediction Based on User Behaviors at Social Network Sites

Many customer services are already available at Social Network Sites (SNSs), including user recommendation and media interaction, to name a few. There are strong desires to provide online users more dedicated and personalized services that fit into individual's need, usually strongly depending on the inner personalities of the user. However, little has been done to conduct proper psychological analysis, crucial for explaining the user's outer behaviors from their inner personality. In this paper, we propose an approach that intends to facilitate this line of research by directly predicting the so called Big-Five Personality from user's SNS behaviors. Comparing to the conventional inventory-based psychological analysis, we demonstrate via experimental studies that users' personalities can be predicted with reasonable precision based on their online behaviors. Except for proving some former behavior-personality correlation results, our experiments show that extraversion is positively related to one's status republishing proportion and neuroticism is positively related to the proportion of one's angry blogs (blogs making people angry).

Li Cheng

What is connected

Connect this record

See the researcher in context

Building this map preview

32 published item(s)

Band degeneration and evolution in nonlinear triatomic chain superlattices

Dual Learning Music Composition and Dance Choreography

Music-to-Dance Generation with Optimal Transport

Object Wake-up: 3D Object Rigging from a Single Image

Promoting Saliency From Depth: Deep Unsupervised RGB-D Saliency Detection

TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts

Deep Learning for Visual Tracking: A Comprehensive Survey

Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction

Understanding the flat band in 1T-TaS2 using a rotated basis

3D Human Shape Reconstruction from a Polarization Image

Achromatic metasurfaces with inversely customized dispersion for ultra-broadband acoustic beam engineering

Action2Motion: Conditioned Generation of 3D Human Motions

COMET: Context-Aware IoU-Guided Network for Small Object Tracking

Giant renormalization of correlation strength in 1T-TaS2 by lattice vibration

Outlier Detection Ensemble with Embedded Feature Selection

Polarization Human Shape and Pose Dataset

Reconstruct high-resolution multi-focal plane images from a single 2D wide field image

Renormalization of the Mott gap by lattice entropy: The case of 1T-TaS2

SparseFusion: Dynamic Human Avatar Modeling from Sparse RGBD Images

Stabilizing Training of Generative Adversarial Nets via Langevin Stein Variational Gradient Descent

Robust 3D multi-polar acoustic metamaterials with broadband double negativity

TBC-Net: A real-time detector for infrared small target detection using semantic constraint

Hand Action Detection from Ego-centric Depth Sequences with Error-correcting Hough Transform

Learning to Search on Manifolds for 3D Pose Estimation of Articulated Objects

Lie-X: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups

The test of the electronics system for the BESIII ETOF upgrade

Two Higgs doublet models augmented by a scalar color octet

Mouse Pose Estimation From Depth Images

Monte Carlo Simulation of RPC-based PET with GEANT4

Top-quark forward-backward asymmetry from a color-octet t-channel resonance

Transduction on Directed Graphs via Absorbing Random Walks

Big-Five Personality Prediction Based on User Behaviors at Social Network Sites