Source author record

Zan Gao

Zan Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Information Retrieval cond-mat.mtrl-sci physics.chem-ph

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MFVLR: Multi-domain Fine-grained Vision-Language Reconstruction for Generalizable Diffusion Face Forgery Detection and Localization

The swift advancement in photo-realistic face generation technology has sparked considerable concerns across society and academia, emphasizing the requirement of generalizable face forgery detection and localization methods. Prior works tend to capture face forgery patterns across multiple domains using image modality, other modalities like fine-grained texts are not comprehensively investigated, which restricts the generalization capability of models. Besides, they usually analyze facial images created by GAN, but struggle to identify and localize those synthesized by diffusion. To solve the problems, in this paper, we devise a novel multi-domain fine-grained vision-language reconstruction (MFVLR) model, which explores comprehensive and diverse visual forgery traces via language-guided face forgery representation learning, to achieve generalizable diffusion-synthesized face forgery detection and localization (DFFDL). Specifically, we devise a fine-grained language transformer that studies general fine-grained language embeddings using language reconstruction. We propose a multi-domain vision encoder to capture general and complementary visual forgery patterns across the image and residual domains. A vision decoder is designed to reconstruct image appearance and achieve forgery localization. Besides, we propose an innovative plug-and-play vision injection module to enhance the interaction between the vision and language embeddings. Extensive experiments and visualizations demonstrate that our network outperforms the state of the art on different settings like cross-generator, cross-forgery, and cross-dataset evaluations.

preprint2022arXiv

Disentangled Graph Neural Networks for Session-based Recommendation

Session-based recommendation (SBR) has drawn increasingly research attention in recent years, due to its great practical value by only exploiting the limited user behavior history in the current session. Existing methods typically learn the session embedding at the item level, namely, aggregating the embeddings of items with or without the attention weights assigned to items. However, they ignore the fact that a user's intent on adopting an item is driven by certain factors of the item (e.g., the leading actors of an movie). In other words, they have not explored finer-granularity interests of users at the factor level to generate the session embedding, leading to sub-optimal performance. To address the problem, we propose a novel method called Disentangled Graph Neural Network (Disen-GNN) to capture the session purpose with the consideration of factor-level attention on each item. Specifically, we first employ the disentangled learning technique to cast item embeddings into the embedding of multiple factors, and then use the gated graph neural network (GGNN) to learn the embedding factor-wisely based on the item adjacent similarity matrix computed for each factor. Moreover, the distance correlation is adopted to enhance the independence between each pair of factors. After representing each item with independent factors, an attention mechanism is designed to learn user intent to different factors of each item in the session. The session embedding is then generated by aggregating the item embeddings with attention weights of each item's factors. To this end, our model takes user intents at the factor level into account to infer the user purpose in a session. Extensive experiments on three benchmark datasets demonstrate the superiority of our method over existing methods.

preprint2022arXiv

Review Polarity-wise Recommender

Utilizing review information to enhance recommendation, the de facto review-involved recommender systems, have received increasing interests over the past few years. Thereinto, one advanced branch is to extract salient aspects from textual reviews (i.e., the item attributes that users express) and combine them with the matrix factorization technique. However, existing approaches all ignore the fact that semantically different reviews often include opposite aspect information. In particular, positive reviews usually express aspects that users prefer, while negative ones describe aspects that users reject. As a result, it may mislead the recommender systems into making incorrect decisions pertaining to user preference modeling. Towards this end, in this paper, we propose a Review Polarity-wise Recommender model, dubbed as RPR, to discriminately treat reviews with different polarities. To be specific, in this model, positive and negative reviews are separately gathered and utilized to model the user-preferred and user-rejected aspects, respectively. Besides, in order to overcome the imbalance problem of semantically different reviews, we also develop an aspect-aware importance weighting approach to align the aspect importance for these two kinds of reviews. Extensive experiments conducted on eight benchmark datasets have demonstrated the superiority of our model as compared to a series of state-of-the-art review-involved baselines. Moreover, our method can provide certain explanations to the real-world rating prediction scenarios.

preprint2022arXiv

Temporal Action Localization with Multi-temporal Scales

Temporal action localization plays an important role in video analysis, which aims to localize and classify actions in untrimmed videos. The previous methods often predict actions on a feature space of a single-temporal scale. However, the temporal features of a low-level scale lack enough semantics for action classification while a high-level scale cannot provide rich details of the action boundaries. To address this issue, we propose to predict actions on a feature space of multi-temporal scales. Specifically, we use refined feature pyramids of different scales to pass semantics from high-level scales to low-level scales. Besides, to establish the long temporal scale of the entire video, we use a spatial-temporal transformer encoder to capture the long-range dependencies of video frames. Then the refined features with long-range dependencies are fed into a classifier for the coarse action prediction. Finally, to further improve the prediction accuracy, we propose to use a frame-level self attention module to refine the classification and boundaries of each action instance. Extensive experiments show that the proposed method can outperform state-of-the-art approaches on the THUMOS14 dataset and achieves comparable performance on the ActivityNet1.3 dataset. Compared with A2Net (TIP20, Avg\{0.3:0.7\}), Sub-Action (CSVT2022, Avg\{0.1:0.5\}), and AFSD (CVPR21, Avg\{0.3:0.7\}) on the THUMOS14 dataset, the proposed method can achieve improvements of 12.6\%, 17.4\% and 2.2\%, respectively

preprint2020arXiv

Polyacrylonitrile/Graphene Nanocomposite: Towards the Next Generation of Carbon Fibers

Carbon Fibers (CFs) are the key solution for the future lightweight vehicle with enhanced fuel efficiency and reduced emissions owing to their ultrahigh strength to weight ratio. However, the high cost of the current dominant PAN-based CFs hinders their application. The use of low-cost alternative precursors may overcome this issue. Unfortunately, low-cost CFs derived from cheaper single component precursors suffer from poor mechanical properties. Developing composite CFs by adding nanoadditives is very promising for low-cost CFs. Therefore, a fundamental understanding of carbonization condition impacts and polymer/additives conversion mechanisms during whole CF production are essential to develop low-cost CFs. In this work, we have demonstrated how the carbonization temperature affects the PAN/graphene CFs properties by performing a series of ReaxFF based molecular dynamics simulations. We found that graphene edges along with the nitrogen and oxygen functional groups have a catalytic role and act as seeds for the graphitic structure growth. Our MD simulations unveil that the addition of the graphene to PAN precursor modifies all-carbon membered rings in CFs and enhances the alignments of 6-member carbon rings in carbonization which leads to superior mechanical properties compare to PAN-based CFs. These ReaxFF simulation results are validates by experimental structural and mechanical characterizations. Interestingly, mechanical characterizations indicate that PAN/graphene CFs carbonized at 1250 C demonstrate 90.9% increase in strength and 101.9% enhancement in Young's modulus compare to the PAN-based CFs carbonized at 1500 C. The superior mechanical properties of PAN/graphene CFs at lower carbonization temperatures offers a path to both energy savings and cost reduction by decreasing the carbonization temperature and could provide key insights for the development of low-cost CFs.

Zan Gao

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

MFVLR: Multi-domain Fine-grained Vision-Language Reconstruction for Generalizable Diffusion Face Forgery Detection and Localization

Disentangled Graph Neural Networks for Session-based Recommendation

Review Polarity-wise Recommender

Temporal Action Localization with Multi-temporal Scales

Polyacrylonitrile/Graphene Nanocomposite: Towards the Next Generation of Carbon Fibers