Source author record

Jian Chang

Jian Chang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Multimedia physics.comp-ph

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EditEmoTalk: Controllable Speech-Driven 3D Facial Animation with Continuous Expression Editing

Speech-driven 3D facial animation aims to generate realistic and expressive facial motions directly from audio. While recent methods achieve high-quality lip synchronization, they often rely on discrete emotion categories, limiting continuous and fine-grained emotional control. We present EditEmoTalk, a controllable speech-driven 3D facial animation framework with continuous emotion editing. The key idea is a boundary-aware semantic embedding that learns the normal directions of inter-emotion decision boundaries, enabling a continuous expression manifold for smooth emotion manipulation. Moreover, we introduce an emotional consistency loss that enforces semantic alignment between the generated motion dynamics and the target emotion embedding through a mapping network, ensuring faithful emotional expression. Extensive experiments demonstrate that EditEmoTalk achieves superior controllability, expressiveness, and generalization while maintaining accurate lip synchronization. Code and pretrained models will be released.

preprint2024arXiv

MGNN: Moment Graph Neural Network for Universal Molecular Potentials

The quest for efficient and robust deep learning models for molecular systems representation is increasingly critical in scientific exploration. The advent of message passing neural networks has marked a transformative era in graph-based learning, particularly in the realm of predicting chemical properties and expediting molecular dynamics studies. We present the Moment Graph Neural Network (MGNN), a rotation-invariant message passing neural network architecture that capitalizes on the moment representation learning of 3D molecular graphs, is adept at capturing the nuanced spatial relationships inherent in three-dimensional molecular structures. MGNN demonstrates new state-of-the-art performance over contemporary methods on benchmark datasets such as QM9 and the revised MD17. The prowess of MGNN also extends to dynamic simulations, accurately predicting the structural and kinetic properties of complex systems such as amorphous electrolytes, with results that closely align with those from ab-initio simulations. The application of MGNN to the simulation of molecular spectra exemplifies its potential to significantly enhance the computational workflow, offering a promising alternative to traditional electronic structure methods

preprint2020arXiv

Shallow2Deep: Indoor Scene Modeling by Single Image Understanding

Dense indoor scene modeling from 2D images has been bottlenecked due to the absence of depth information and cluttered occlusions. We present an automatic indoor scene modeling approach using deep features from neural networks. Given a single RGB image, our method simultaneously recovers semantic contents, 3D geometry and object relationship by reasoning indoor environment context. Particularly, we design a shallow-to-deep architecture on the basis of convolutional networks for semantic scene understanding and modeling. It involves multi-level convolutional networks to parse indoor semantics/geometry into non-relational and relational knowledge. Non-relational knowledge extracted from shallow-end networks (e.g. room layout, object geometry) is fed forward into deeper levels to parse relational semantics (e.g. support relationship). A Relation Network is proposed to infer the support relationship between objects. All the structured semantics and geometry above are assembled to guide a global optimization for 3D scene modeling. Qualitative and quantitative analysis demonstrates the feasibility of our method in understanding and modeling semantics-enriched indoor scenes by evaluating the performance of reconstruction accuracy, computation performance and scene complexity.

preprint2020arXiv

Symmetric Dilated Convolution for Surgical Gesture Recognition

Automatic surgical gesture recognition is a prerequisite of intra-operative computer assistance and objective surgical skill assessment. Prior works either require additional sensors to collect kinematics data or have limitations on capturing temporal information from long and untrimmed surgical videos. To tackle these challenges, we propose a novel temporal convolutional architecture to automatically detect and segment surgical gestures with corresponding boundaries only using RGB videos. We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns and establish the frame-to-frame relationship accordingly. We validate the effectiveness of our approach on a fundamental robotic suturing task from the JIGSAWS dataset. The experiment results demonstrate the ability of our method on capturing long-term frame dependencies, which largely outperform the state-of-the-art methods on the frame-wise accuracy up to ~6 points and the F1@50 score ~6 points.

preprint2020arXiv

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Semantic reconstruction of indoor scenes refers to both scene understanding and object reconstruction. Existing works either address one part of this problem or focus on independent objects. In this paper, we bridge the gap between understanding and reconstruction, and propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image. Instead of separately resolving scene understanding and object reconstruction, our method builds upon a holistic scene context and proposes a coarse-to-fine hierarchy with three components: 1. room layout with camera pose; 2. 3D object bounding boxes; 3. object meshes. We argue that understanding the context of each component can assist the task of parsing the others, which enables joint understanding and reconstruction. The experiments on the SUN RGB-D and Pix3D datasets demonstrate that our method consistently outperforms existing methods in indoor layout estimation, 3D object detection and mesh reconstruction.

Jian Chang

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

EditEmoTalk: Controllable Speech-Driven 3D Facial Animation with Continuous Expression Editing

MGNN: Moment Graph Neural Network for Universal Molecular Potentials

Shallow2Deep: Indoor Scene Modeling by Single Image Understanding

Symmetric Dilated Convolution for Surgical Gesture Recognition

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image