Source author record

Jianwen Jiang

Jianwen Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Biomolecules Machine Learning

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FlowAct-R1: Towards Interactive Humanoid Video Generation

Interactive humanoid video generation aims to synthesize lifelike visual agents that can engage with humans through continuous and responsive video. Despite recent advances in video synthesis, existing methods often grapple with the trade-off between high-fidelity synthesis and real-time interaction requirements. In this paper, we propose FlowAct-R1, a framework specifically designed for real-time interactive humanoid video generation. Built upon a MMDiT architecture, FlowAct-R1 enables the streaming synthesis of video with arbitrary durations while maintaining low-latency responsiveness. We introduce a chunkwise diffusion forcing strategy, complemented by a novel self-forcing variant, to alleviate error accumulation and ensure long-term temporal consistency during continuous interaction. By leveraging efficient distillation and system-level optimizations, our framework achieves a stable 25fps at 480p resolution with a time-to-first-frame (TTFF) of only around 1.5 seconds. The proposed method provides holistic and fine-grained full-body control, enabling the agent to transition naturally between diverse behavioral states in interactive scenarios. Experimental results demonstrate that FlowAct-R1 achieves exceptional behavioral vividness and perceptual realism, while maintaining robust generalization across diverse character styles.

preprint2022arXiv

Rethinking Supervised Pre-training for Better Downstream Transferring

The pretrain-finetune paradigm has shown outstanding performance on many applications of deep learning, where a model is pre-trained on a upstream large dataset (e.g. ImageNet), and is then fine-tuned to different downstream tasks. Though for most cases, the pre-training stage is conducted based on supervised methods, recent works on self-supervised pre-training have shown powerful transferability and even outperform supervised pre-training on multiple downstream tasks. It thus remains an open question how to better generalize supervised pre-training model to downstream tasks. In this paper, we argue that the worse transferability of existing supervised pre-training methods arise from the negligence of valuable intra-class semantic difference. This is because these methods tend to push images from the same class close to each other despite of the large diversity in their visual contents, a problem to which referred as "overfit of upstream tasks". To alleviate this problem, we propose a new supervised pre-training method based on Leave-One-Out K-Nearest-Neighbor, or LOOK for short. It relieves the problem of overfitting upstream tasks by only requiring each image to share its class label with most of its k nearest neighbors, thus allowing each class to exhibit a multi-mode distribution and consequentially preserving part of intra-class difference for better transferring to downstream tasks. We developed efficient implementation of the proposed method that scales well to large datasets. Experimental studies on multiple downstream tasks show that LOOK outperforms other state-of-the-art methods for supervised and self-supervised pre-training.

preprint2011arXiv

Evaluation of Competing J domain:Hsp70 Complex Models in Light of Existing Mutational and NMR Data

Ahmad et al. recently presented an NMR-based model for a bacterial DnaJ J domain:DnaK(Hsp70):ADP complex(1) that differs significantly from the crystal structure of a disulfide linked mammalian auxilin J domain:Hsc70 complex that we previously published(2). They claimed that their model could better account for existing mutational data, was in better agreement with previous NMR studies, and that the presence of a cross-link in our structure made it irrelevant to understanding J:Hsp70 interactions. Here we detail extensive NMR and mutational data relevant to understanding J:Hsp70 function and show that, in fact, our structure is much better able to account for the mutational data and is in much better agreement with a previous NMR study of a mammalian polyoma virus T-ag J domain:Hsc70 complex than is the Ahmad et al. complex, and that our structure is predictive and provides insight into J:Hsp70 interactions and mechanism of ATPase activation.