Source author record

Junhui Liu

Junhui Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision astro-ph.HE Artificial Intelligence astro-ph.SR eess.AS Machine Learning Robotics Sound

Catalog footprint

What is connected

8works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Position: Embodied AI Requires a Privacy-Utility Trade-off

Embodied AI (EAI) systems are rapidly transitioning from simulations into real-world domestic and other sensitive environments. However, recent EAI solutions have largely demonstrated advancements within isolated stages such as instruction, perception, planning and interaction, without considering their coupled privacy implications in high-frequency deployments where privacy leakage is often irreversible. This position paper argues that optimizing these components independently creates a systemic privacy crisis when deployed in sensitive settings, thereby advancing the position that privacy in EAI is a life cycle-level architectural constraint rather than a stage-local feature. To address these challenges, we propose Secure Privacy Integration in Next-generation Embodied AI (SPINE), a unified privacy-aware framework that treats privacy as a dynamic control signal governing cross-stage coupling throughout the entire EAI life cycle. SPINE decomposes the EAI pipeline into various stages and establishes a multi-criterion privacy classification matrix to orchestrate contextual sensitivity across stage boundaries. We conduct preliminary simulation and real-world case studies to conceptually validate how privacy constraints propagate downstream to reshape system behavior, illustrating the insufficiency of fragmented privacy patches and motivating future research directions into secure yet functional embodied AI systems. We detail the SPINE framework and case studies at https://github.com/rminshen03/EAI_Privacy_Position.

preprint2022arXiv

ClothFormer:Taming Video Virtual Try-on in All Module

The task of video virtual try-on aims to fit the target clothes to a person in the video with spatio-temporal consistency. Despite tremendous progress of image virtual try-on, they lead to inconsistency between frames when applied to videos. Limited work also explored the task of video-based virtual try-on but failed to produce visually pleasing and temporally coherent results. Moreover, there are two other key challenges: 1) how to generate accurate warping when occlusions appear in the clothing region; 2) how to generate clothes and non-target body parts (e.g. arms, neck) in harmony with the complicated background; To address them, we propose a novel video virtual try-on framework, ClothFormer, which successfully synthesizes realistic, harmonious, and spatio-temporal consistent results in complicated environment. In particular, ClothFormer involves three major modules. First, a two-stage anti-occlusion warping module that predicts an accurate dense flow mapping between the body regions and the clothing regions. Second, an appearance-flow tracking module utilizes ridge regression and optical flow correction to smooth the dense flow sequence and generate a temporally smooth warped clothing sequence. Third, a dual-stream transformer extracts and fuses clothing textures, person features, and environment information to generate realistic try-on videos. Through rigorous experiments, we demonstrate that our method highly surpasses the baselines in terms of synthesized video quality both qualitatively and quantitatively.

preprint2022arXiv

Migrating Face Swap to Mobile Devices: A lightweight Framework and A Supervised Training Solution

Existing face swap methods rely heavily on large-scale networks for adequate capacity to generate visually plausible results, which inhibits its applications on resource-constraint platforms. In this work, we propose MobileFSGAN, a novel lightweight GAN for face swap that can run on mobile devices with much fewer parameters while achieving competitive performance. A lightweight encoder-decoder structure is designed especially for image synthesis tasks, which is only 10.2MB and can run on mobile devices at a real-time speed. To tackle the unstability of training such a small network, we construct the FSTriplets dataset utilizing facial attribute editing techniques. FSTriplets provides source-target-result training triplets, yielding pixel-level labels thus for the first time making the training process supervised. We also designed multi-scale gradient losses for efficient back-propagation, resulting in faster and better convergence. Experimental results show that our model reaches comparable performance towards state-of-the-art methods, while significantly reducing the number of network parameters. Codes and the dataset have been released.

preprint2022arXiv

X-ray emission of contact binary variables within 1 kpc

By assembling the largest sample to date of X-ray emitting EW-type binaries (EWXs), we carried out correlation analyses for the X-ray luminosity log$L_{\textrm{X}}$, and X-ray activity level log($L_{\textrm{X}}$/$L_{\textrm{bol}}$) versus the orbital period $P$ and effective temperature $T_{\rm eff}$. We find strong $P$-log$L_{\textrm{X}}$ and $P$-log($L_{\textrm{X}}$/$L_{\textrm{bol}}$) correlations for EWXs with $P$ < 0.44 days and we provide the linear parametrizations for these relations, on the basis of which the orbital period can be treated as a good predictor for log$L_{\textrm{X}}$ and log($L_{\textrm{X}}$/$L_{\textrm{bol}}$). The aforementioned binary stellar parameters are all correlated with log$L_{\textrm{X}}$, while only $T_{\rm eff}$ exhibits a strong correlation with log($L_{\textrm{X}}$/$L_{\textrm{bol}}$). Then, EWXs with higher temperature show lower X-ray activity level, which could indicate the thinning of the convective area related to the magnetic dynamo mechanism. The total X-ray luminosity of an EWX is essentially consistent with that of an X-ray saturated main sequence star with the same mass as its primary, which may imply that the primary star dominates the X-ray emission. The monotonically decreasing $P$-log($L_{\textrm{X}}$/$L_{\textrm{bol}}$) relation and the short orbital periods indicate that EWXs could all be in the X-ray saturated state, and they may inherit the changing trend of the saturated X-ray luminosities along with the mass shown by single stars. For EWXs, the orbital period, mass, and effective temperature increase in concordance. We demonstrate that the period $P=0.44$ days corresponds to the primary mass of $\sim1.1 \rm M_\odot$, beyond which the saturated X-ray luminosity of single stars will not continue to increase with mass. This explains the break in the positive $P$-log$L_{\textrm{X}}$ relation for EWXs with $P>0.44$ days.

preprint2021arXiv

The Disk Veiling Effect of the Black Hole Low-Mass X-ray Binary A0620-00

The optical light curves of quiescent black hole low-mass X-ray binaries often exhibit significant non-ellipsoidal variabilities, showing the photospheric radiation of the companion star is veiled by other source of optical emission. Assessing this "veiling" effect is critical to the black hole mass measurement. Here in this work, we carry out a strictly simultaneous spectroscopic and photometric campaign on the prototype of black hole low-mass X-ray binary A0620-00. We find that for each observation epoch, the extra optical flux beyond a pure ellipsoidal modulation is positively correlated with the fraction of veiling emission, indicating the accretion disk contributes most of the non-ellipsoidal variations. Meanwhile, we also obtain a K2V spectral classification of the companion, as well as the measurements of the companion's rotational velocity $v \sin i = 83.8\pm1.9$ km s$^{-1}$ and the mass ratio between the companion and the black hole $q=0.063\pm0.004$.

preprint2020arXiv

Boundary Content Graph Neural Network for Temporal Action Proposal Generation

Temporal action proposal generation plays an important role in video action understanding, which requires localizing high-quality action content precisely. However, generating temporal proposals with both precise boundaries and high-quality action content is extremely challenging. To address this issue, we propose a novel Boundary Content Graph Neural Network (BC-GNN) to model the insightful relations between the boundary and action content of temporal proposals by the graph neural networks. In BC-GNN, the boundaries and content of temporal proposals are taken as the nodes and edges of the graph neural network, respectively, where they are spontaneously linked. Then a novel graph computation operation is proposed to update features of edges and nodes. After that, one updated edge and two nodes it connects are used to predict boundary probabilities and content confidence score, which will be combined to generate a final high-quality proposal. Experiments are conducted on two mainstream datasets: ActivityNet-1.3 and THUMOS14. Without the bells and whistles, BC-GNN outperforms previous state-of-the-art methods in both temporal action proposal and temporal action detection tasks.

preprint2020arXiv

Cartoon Face Recognition: A Benchmark Dataset

Recent years have witnessed increasing attention in cartoon media, powered by the strong demands of industrial applications. As the first step to understand this media, cartoon face recognition is a crucial but less-explored task with few datasets proposed. In this work, we first present a new challenging benchmark dataset, consisting of 389,678 images of 5,013 cartoon characters annotated with identity, bounding box, pose, and other auxiliary attributes. The dataset, named iCartoonFace, is currently the largest-scale, high-quality, richannotated, and spanning multiple occurrences in the field of image recognition, including near-duplications, occlusions, and appearance changes. In addition, we provide two types of annotations for cartoon media, i.e., face recognition, and face detection, with the help of a semi-automatic labeling algorithm. To further investigate this challenging dataset, we propose a multi-task domain adaptation approach that jointly utilizes the human and cartoon domain knowledge with three discriminative regularizations. We hence perform a benchmark analysis of the proposed dataset and verify the superiority of the proposed approach in the cartoon face recognition task. We believe this public availability will attract more research attention in broad practical application scenarios.

preprint2020arXiv

Cross-modal supervised learning for better acoustic representations

Obtaining large-scale human-labeled datasets to train acoustic representation models is a very challenging task. On the contrary, we can easily collect data with machine-generated labels. In this work, we propose to exploit machine-generated labels to learn better acoustic representations, based on the synchronization between vision and audio. Firstly, we collect a large-scale video dataset with 15 million samples, which totally last 16,320 hours. Each video is 3 to 5 seconds in length and annotated automatically by publicly available visual and audio classification models. Secondly, we train various classical convolutional neural networks (CNNs) including VGGish, ResNet 50 and Mobilenet v2. We also make several improvements to VGGish and achieve better results. Finally, we transfer our models on three external standard benchmarks for audio classification task, and achieve significant performance boost over the state-of-the-art results. Models and codes are available at: https://github.com/Deeperjia/vgg-like-audio-models.

Junhui Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Position: Embodied AI Requires a Privacy-Utility Trade-off

ClothFormer:Taming Video Virtual Try-on in All Module

Migrating Face Swap to Mobile Devices: A lightweight Framework and A Supervised Training Solution

X-ray emission of contact binary variables within 1 kpc

The Disk Veiling Effect of the Black Hole Low-Mass X-ray Binary A0620-00

Boundary Content Graph Neural Network for Temporal Action Proposal Generation

Cartoon Face Recognition: A Benchmark Dataset

Cross-modal supervised learning for better acoustic representations