Source author record

Yujie Wei

Yujie Wei appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci Biomolecules cond-mat.mes-hall Machine Learning

Catalog footprint

What is connected

10works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AesRM: Improving Video Aesthetics with Expert-Level Feedback

Despite rapid advances in photorealistic video generation, real-world applications such as filmmaking require video aesthetics, e.g., harmonious colors and cinematic lighting, beyond visual fidelity. Prior work on visual aesthetics largely focuses on images, often reducing aesthetics to coarse definitions, e.g., visual pleasure, without a rigorous and systematic evaluation. To improve video aesthetics, we propose a hierarchical rubric that decomposes video aesthetics into three core dimensions, Visual Aesthetics (VA), Visual Fidelity (VF), and Visual Plausibility (VP), with 15 fine-grained criteria, e.g., shot composition. This framework enables a large-scale expert-annotated preference dataset and an evaluation benchmark, AesVideo-Bench, containing about 2500 video pairs with expert annotations on VA, VF, and VP. We then build a family of Video Aesthetic Reward Models (AesRM): AesRM-Base, which directly predicts pairwise preferences on these dimensions to provide efficient post-training rewards, and AesRM-CoT, which additionally generates CoT aligned with all 15 criteria to improve assessment interpretability. Specifically, we train AesRM with a three-stage progressive scheme: (1) Atomic Aesthetic Capability Learning, which strengthens AesRM's recognition of fundamental aesthetic concepts, e.g., accurately identifying centered composition; (2) Cold-Start, aligning the model with structured reasoning protocols; and (3) GRPO, further improving evaluation accuracy. To enhance AesRM-CoT, we additionally propose self-consistency-based CoT synthesis to improve CoT quality and design CoT-based process rewards during GRPO. Extensive experiments show AesRM outperforms baselines on multiple aesthetics benchmarks and is more robust, with lower position bias. Finally, we align Wan2.2 with AesRM and observe clear aesthetic gains over existing aesthetic reward models.

preprint2026arXiv

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

Recent video generative models have greatly improved the realism of AI-generated videos, yet their outputs still exhibit artifacts such as temporal inconsistencies, structural distortions, and semantic incoherence. While Multimodal Large Language Models (MLLMs) show strong visual understanding capabilities, their ability to perceive and reason about such artifacts remains unclear. Existing benchmarks often lack systematic evaluation of artifact-aware perception and fine-grained diagnostic reasoning, especially across diverse AI-generated video domains beyond photorealistic content. To address this gap, we introduce Artifact-Bench, a comprehensive benchmark for evaluating MLLMs on AI-generated video artifact detection and analysis. We first establish a three-level hierarchical taxonomy of realism artifacts, covering photorealistic, animated, and CG-style videos. Based on this taxonomy, Artifact-Bench defines three complementary tasks: real vs. AI-generated video classification, pairwise realism comparison, and fine-grained artifact identification. Experiments on 19 leading MLLMs reveal substantial limitations in artifact perception and reasoning, with many models approaching random or even below-random performance in challenging settings. We further observe significant misalignment between MLLM judgments and human perceptual preferences, highlighting their limited reliability as general evaluators for AI-generated video realism.

preprint2026arXiv

Bridging Brain and Semantics: A Hierarchical Framework for Semantically Enhanced fMRI-to-Video Reconstruction

Reconstructing dynamic visual experiences as videos from functional magnetic resonance imaging (fMRI) is pivotal for advancing the understanding of neural processes. However, current fMRI-to-video reconstruction methods are hindered by a semantic gap between noisy fMRI signals and the rich content of videos, stemming from a reliance on incomplete semantic embeddings that neither capture video-specific cues (e.g., actions) nor integrate prior knowledge. To this end, we draw inspiration from the dual-pathway processing mechanism in human brain and introduce CineNeuron, a novel hierarchical framework for semantically enhanced video reconstruction from fMRI signals with two synergistic stages. First, a bottom-up semantic enrichment stage maps fMRI signals to a rich embedding space that comprehensively captures textual semantics, image contents, action concepts, and object categories. Second, a top-down memory integration stage utilizes the proposed Mixture-of-Memories method to dynamically select relevant "memories" from previously seen data and fuse them with the fMRI embedding to refine the video reconstruction. Extensive experimental results on two fMRI-to-video benchmarks demonstrate that CineNeuron surpasses state-of-the-art methods across various metrics.

preprint2026arXiv

DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

Reinforcement learning has emerged as a powerful tool for improving diffusion-based text-to-image models, but existing methods are largely limited to single-task optimization. Extending RL to multiple tasks is challenging: joint optimization suffers from cross-task interference and imbalance, while cascade RL is cumbersome and prone to catastrophic forgetting. We propose DiffusionOPD, a new multi-task training paradigm for diffusion models based on Online Policy Distillation (OPD). DiffusionOPD first trains task-specific teachers independently, then distills their capabilities into a unified student along the student own rollout trajectories. This decouples single-task exploration from multi-task integration and avoids the optimization burden of solving all tasks jointly from scratch. Theoretically, we lift the OPD framework from discrete tokens to continuous-state Markov processes, deriving a closed-form per-step KL objective that unifies both stochastic SDE and deterministic ODE refinement via mean-matching. We formally and empirically demonstrate that this analytic gradient provides lower variance and better generality compared to conventional PPO-style policy gradients. Extensive experiments show that DiffusionOPD consistently surpasses both multi-reward RL and cascade RL baselines in training efficiency and final performance, while achieving state-of-the-art results on all evaluated benchmarks.

preprint2026arXiv

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

Video generation is rapidly evolving from single-shot synthesis to complex multi-shot audio-video (MSAV) narratives to meet real-world demands. However, evaluating such frontier models remains a fundamental challenge. Existing benchmarks are limited in scope and data diversity, and rely on rigid evaluation pipelines, preventing systematic and reliable assessment of modern MSAV models. To bridge these gaps, we introduce MSAVBench, the first comprehensive benchmark and adaptive hybrid evaluation framework for multi-shot audio-video generation. Our benchmark spans four key dimensions, video, audio, shot, and reference, covering diverse task settings, varying shot counts of up to 15, and challenging non-realistic scenarios. Our evaluation framework improves robustness through an adaptive self-correction mechanism for shot segmentation, instance-wise rubrics for subjective metrics, and tool-grounded evidence extraction for complex judgments. Furthermore, MSAVBench achieves high alignment with human judgments, reaching a Spearman rank correlation of 91.5%. Our systematic evaluation of 19 state-of-the-art closed- and open-source models shows that current systems still struggle with director-level control and fine-grained audio-visual synchronization, while modular or agentic generation pipelines offer a promising path toward narrowing the gap between open- and closed-source models. We will release the benchmark data and evaluation code to facilitate future research.

preprint2020arXiv

Influence of Small Molecule Property on Antibody Response

Antibodies with high titer and affinity to small molecule are critical in the field for the development of vaccines against drugs of abuse, antidotes to toxins and immunoassays for compounds. However, little is known regarding how properties of small molecule influence and which chemical descriptor could indicate the degree of the antibody response. Based on our previous study, we designed and synthesized two groups of small molecules, called haptens, with varied hydrophobicities to investigate the relationship between properties of small molecules and antibody response in term of titer and affinity. We found that the magnitude of the antibody response is positively correlated with the degree of molecular hydrophobicity and related chemical descriptors. This study provides insight into the immunological characteristics of small molecules themselves and useful clues to produce high quality antibodies against small molecules.

preprint2016arXiv

New Insights on Stacking Fault Behavior in Twin Induced Plasticity from Meta-Atom Molecular Dynamics Simulations

There is growing interest in promoting deformation twinning for plasticity in advanced materials, as highly organized twin boundaries are beneficial to better strength-ductility combination in contrast to disordered grain boundaries. Twinning deformation typically involves the kinetics of stacking faults, its interaction with dislocations, and dislocation - twin boundary interactions. While the latter has been intensively investigated, the dynamics of stacking faults has been less known. In this work, we report several new insights on the stacking fault behavior in twin induced plasticity from our meta-atom molecular dynamics simulation: The stacking fault interactions are dominated by dislocation reactions taking place spontaneously, different from the proposed mechanism in literatures; The competition among generating a single stacking fault, a twinning partial and a trailing partial dislocation is dependent on a unique parameter, i.e. stacking fault energy, which in turn determines deformation twinning behaviors. The complex twin-slip and twin-dislocation interactions demonstrate the dual role of deformation twins as both dislocation barrier and storage, potentially contributing to the high strength and ductility of advanced materials like TWIP steels where deformation twinning dominated plasticity accounts for the superb strength-ductility combination.

preprint2016arXiv

Super-stretchable borophene and its stability under straining

Recent success in synthesizing two-dimensional borophene on silver substrate attracts strong interest in exploring its possible extraordinary physical properties. By using the density functional theory calculations, we show that borophene is highly stretchable along the transverse direction. The strain-to-failure in the transverse direction is nearly twice as that along the longitudinal direction. The straining induced flattening and subsequent stretch of the flat borophene are accounted for the large strain-to-failure for tension in the transverse direction. The mechanical properties in the other two directions exhibit strong anisotropy. Phonon dispersions of the strained borophene monolayers suggest that negative frequencies are presented, which indicates the instability of free-standing borophene even under high tensile stress.

preprint2013arXiv

Mechanics and Tunable Bandgap by Straining in Single-Layer Hexagonal Boron-Nitride

Current interest in two-dimensional materials extends from graphene to others systems like single-layer hexagonal boron-nitride (h-BN), for the possibility of making heterogeneous structures to achieve exceptional properties that cannot be realized in graphene.The electrically insulating h-BN and semi-metal graphene may open good opportunities to realize a semiconductor by manipulating the morphology and composition of such heterogeneous structures.Here we report the mechanical properties of h-BN and its band structures tuned by mechanical straining by using the density functional theory calculations.The elastic properties, both the Young's modulus and bending rigidity for h-BN, are isotropic.We reveal that there is a bi-linear dependence of band gap on the applied tensile strains in h-BN. Mechanical strain can tune single-layer h-BN from an insulator to a semiconductor, with a band gap in the 4.7eV to 1.5eV range.

preprint2013arXiv

Tunable Band Structures of Polycrystalline Graphene by External and Mismatch Strains

Lacking a band gap largely limits the application of graphene in electronic devices. Previous study shows that grain boundaries (GBs) in polycrystalline graphene can dramatically alter the electrical properties of graphene. Here, we investigate the band structure of polycrystalline graphene tuned by externally imposed strains and intrinsic mismatch strains at the GB by density functional theory (DFT) calculations. We found that graphene with symmetrical GBs typically has zero band gap even with large uniaxial and biaxial strain. However, some particular asymmetrical GBs can open a band gap in graphene and their band structures can be substantially tuned by external strains. A maximum band gap about 0.19 eV was observed in matched-armchair GB (5, 5) | (3, 7) with a misorientation of θ=13o when the applied uniaxial strain increases to 9%. Although mismatch strain is inevitable in asymmetrical GBs, it has a small influence on the band gap of polycrystalline graphene.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint