Source author record

He Tang

He Tang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence cs.CY eess.IV Machine Learning

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MotionCharacter: Fine-Grained Motion Controllable Human Video Generation

Recent advancements in personalized Text-to-Video (T2V) generation have made significant strides in synthesizing character-specific content. However, these methods face a critical limitation: the inability to perform fine-grained control over motion intensity. This limitation stems from an inherent entanglement of action semantics and their corresponding magnitudes within coarse textual descriptions, hindering the generation of nuanced human videos and limiting their applicability in scenarios demanding high precision, such as animating virtual avatars or synthesizing subtle micro-expressions. Furthermore, existing approaches often struggle to preserve high identity fidelity when other attributes are modified. To address these challenges, we introduce MotionCharacter, a framework for high-fidelity human video generation with precise motion control. At its core, MotionCharacter explicitly decouples motion into two independently controllable components: action type and motion intensity. This is achieved through two key technical contributions: (1) a Motion Control Module that leverages textual phrases to specify the action type and a quantifiable metric derived from optical flow to modulate its intensity, guided by a region-aware loss that localizes motion to relevant subject areas; and (2) an ID Content Insertion Module coupled with an ID-Consistency loss to ensure robust identity preservation during dynamic motions. To facilitate training for such fine-grained control, we also curate Human-Motion, a new large-scale dataset with detailed annotations for both motion and facial features. Extensive experiments demonstrate that MotionCharacter achieves substantial improvements over existing methods. Our framework excels in generating videos that are not only identity-consistent but also precisely adhere to specified motion types and intensities.

preprint2024arXiv

Synthetic Data in AI: Challenges, Applications, and Ethical Implications

In the rapidly evolving field of artificial intelligence, the creation and utilization of synthetic datasets have become increasingly significant. This report delves into the multifaceted aspects of synthetic data, particularly emphasizing the challenges and potential biases these datasets may harbor. It explores the methodologies behind synthetic data generation, spanning traditional statistical models to advanced deep learning techniques, and examines their applications across diverse domains. The report also critically addresses the ethical considerations and legal implications associated with synthetic datasets, highlighting the urgent need for mechanisms to ensure fairness, mitigate biases, and uphold ethical standards in AI development.

preprint2022arXiv

OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers

We present OSFormer, the first one-stage transformer framework for camouflaged instance segmentation (CIS). OSFormer is based on two key designs. First, we design a location-sensing transformer (LST) to obtain the location label and instance-aware parameters by introducing the location-guided queries and the blend-convolution feedforward network. Second, we develop a coarse-to-fine fusion (CFF) to merge diverse context information from the LST encoder and CNN backbone. Coupling these two components enables OSFormer to efficiently blend local features and long-range context dependencies for predicting camouflaged instances. Compared with two-stage frameworks, our OSFormer reaches 41% AP and achieves good convergence efficiency without requiring enormous training data, i.e., only 3,040 samples under 60 epochs. Code link: https://github.com/PJLallen/OSFormer.

preprint2020arXiv

Automatic Lumbar Spinal CT Image Segmentation with a Dual Densely Connected U-Net

The clinical treatment of degenerative and developmental lumbar spinal stenosis (LSS) is different. Computed tomography (CT) is helpful in distinguishing degenerative and developmental LSS due to its advantage in imaging of osseous and calcified tissues. However, boundaries of the vertebral body, spinal canal and dural sac have low contrast and hard to identify in a CT image, so the diagnosis depends heavily on the knowledge of expert surgeons and radiologists. In this paper, we develop an automatic lumbar spinal CT image segmentation method to assist LSS diagnosis. The main contributions of this paper are the following: 1) a new lumbar spinal CT image dataset is constructed that contains 2393 axial CT images collected from 279 patients, with the ground truth of pixel-level segmentation labels; 2) a dual densely connected U-shaped neural network (DDU-Net) is used to segment the spinal canal, dural sac and vertebral body in an end-to-end manner; 3) DDU-Net is capable of segmenting tissues with large scale-variant, inconspicuous edges (e.g., spinal canal) and extremely small size (e.g., dural sac); and 4) DDU-Net is practical, requiring no image preprocessing such as contrast enhancement, registration and denoising, and the running time reaches 12 FPS. In the experiment, we achieve state-of-the-art performance on the lumbar spinal image segmentation task. We expect that the technique will increase both radiology workflow efficiency and the perceived value of radiology reports for referring clinicians and patients.