Source author record

Yilin Wang

Yilin Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV cond-mat.mtrl-sci cond-mat.str-el Multimedia cond-mat.mes-hall Machine Learning cond-mat.soft hep-ph Artificial Intelligence Biomolecules Computation and Language cond-mat.stat-mech cond-mat.supr-con Human-Computer Interaction math.CV math.PR physics.comp-ph

Catalog footprint

What is connected

41works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CIRAG: Construction-Integration Retrieval and Adaptive Generation for Multi-hop Question Answering

Triple-based Iterative Retrieval-Augmented Generation (iRAG) mitigates document-level noise for multi-hop question answering. However, existing methods still face limitations: (i) greedy single-path expansion, which propagates early errors and fails to capture parallel evidence from different reasoning branches, and (ii) granularity-demand mismatch, where a single evidence representation struggles to balance noise control with contextual sufficiency. In this paper, we propose the Construction-Integration Retrieval and Adaptive Generation model, CIRAG. It introduces an Iterative Construction-Integration module that constructs candidate triples and history-conditionally integrates them to distill core triples and generate the next-hop query. This module mitigates the greedy trap by preserving multiple plausible evidence chains. Besides, we propose an Adaptive Cascaded Multi-Granularity Generation module that progressively expands contextual evidence based on the problem requirements, from triples to supporting sentences and full passages. Moreover, we introduce Trajectory Distillation, which distills the teacher model's integration policy into a lightweight student, enabling efficient and reliable long-horizon reasoning. Extensive experiments demonstrate that CIRAG achieves superior performance compared to existing iRAG methods.

preprint2026arXiv

Preferences Order, Ratings Anchor: From Fused Expert Aesthetic Ground Truth to Self-Distillation

Pairwise preferences and pointwise ratings are the two dominant annotation protocols in image aesthetic assessment (IAA), yet existing benchmarks adopt only one, leaving their complementarity unmeasured under controlled conditions. We introduce PPaint, a matched dual-protocol benchmark in which 15 domain experts, 5 per category, annotate 150 Chinese paintings under both protocols across five aesthetic dimensions, collecting 45,900 pairwise expert judgments through a locally dense preference design alongside the matched ratings. The matched design reveals complementary strengths: preferences yield more consistent ordinal rankings, while ratings anchor the absolute score scale. Fusing both signals via two independent preference-to-score methods yields a fused expert ground truth on which the two constructions converge to nearly identical scores. The same preference-to-score principle extends to label-free VLM training. PSDistill converts VLM pairwise judgments into calibrated pseudo-scores via an Elo reference pool, and trains the same VLM with confidence-weighted ranking optimization to produce a single-pass aesthetic scorer. Trained on a single painting category, the distilled Qwen3-VL-8B improves mean SRCC from 0.504 to 0.709 across all three categories, outperforming all open-source baselines including the dedicated aesthetic model ArtiMuse and matching closed-source Gemini-3.1-Pro within 0.04 SRCC at single-pass inference cost, with cross-domain transfer further validated on APDDv2. We will release the full PPaint dataset and training code.

preprint2026arXiv

RecruitScope: A Visual Analytics System for Multidimensional Recruitment Data Analysis

Online recruitment platforms have become the dominant channel for modern hiring, yet most platforms offer only basic filtering capabilities, such as job title, keyword, and salary range. This hinders comprehensive analysis of multi-attribute relationships and job market patterns across different scales. We present RecruitScope, a visual analytics system designed to support multidimensional and cross-level exploration of recruitment data for job seekers and employers, particularly HR specialists. Through coordinated visualizations, RecruitScope enables users to analyze job positions and salary patterns from multiple perspectives, interpret industry dynamics at the macro level, and identify emerging positions at the micro level. We demonstrate the effectiveness of RecruitScope through case studies that reveal regional salary distribution patterns, characterize industry growth trajectories, and discover high-demand emerging roles in the job market.

preprint2022arXiv

A Video Anomaly Detection Framework based on Appearance-Motion Semantics Representation Consistency

Video anomaly detection refers to the identification of events that deviate from the expected behavior. Due to the lack of anomalous samples in training, video anomaly detection becomes a very challenging task. Existing methods almost follow a reconstruction or future frame prediction mode. However, these methods ignore the consistency between appearance and motion information of samples, which limits their anomaly detection performance. Anomalies only occur in the moving foreground of surveillance videos, so the semantics expressed by video frame sequences and optical flow without background information in anomaly detection should be highly consistent and significant for anomaly detection. Based on this idea, we propose Appearance-Motion Semantics Representation Consistency (AMSRC), a framework that uses normal data's appearance and motion semantic representation consistency to handle anomaly detection. Firstly, we design a two-stream encoder to encode the appearance and motion information representations of normal samples and introduce constraints to further enhance the consistency of the feature semantics between appearance and motion information of normal samples so that abnormal samples with low consistency appearance and motion feature representation can be identified. Moreover, the lower consistency of appearance and motion features of anomalous samples can be used to generate predicted frames with larger reconstruction error, which makes anomalies easier to spot. Experimental results demonstrate the effectiveness of the proposed method.

preprint2022arXiv

Accidental symmetries in the scalar potential of the Standard Model extended with two Higgs triplets

The extension of the Standard Model (SM) with two Higgs triplets offers an appealing way to account for both tiny Majorana neutrino masses via the type-II seesaw mechanism and the cosmological matter-antimatter asymmetry via the triplet leptogenesis. In this paper, we classify all possible accidental symmetries in the scalar potential of the two-Higgs-triplet model (2HTM). Based on the bilinear-field formalism, we show that the maximal symmetry group of the 2HTM potential is ${\rm SO(4)}$ and eight types of accidental symmetries in total can be identified. Furthermore, we examine the impact of the couplings between the SM Higgs doublet and the Higgs triplets on the accidental symmetries. The bounded-from-below conditions on the scalar potential with specific accidental symmetries are also derived. Taking the ${\rm SO(4)}$-invariant scalar potential as an example, we investigate the vacuum structures and the scalar mass spectra of the 2HTM.

preprint2022arXiv

CONVIQT: Contrastive Video Quality Estimator

Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Distortion type identification and degradation level determination is employed as an auxiliary task to train a deep learning model containing a deep Convolutional Neural Network (CNN) that extracts spatial features, as well as a recurrent unit that captures temporal information. The model is trained using a contrastive loss and we therefore refer to this training framework and resulting model as CONtrastive VIdeo Quality EstimaTor (CONVIQT). During testing, the weights of the trained model are frozen, and a linear regressor maps the learned features to quality scores in a no-reference (NR) setting. We conduct comprehensive evaluations of the proposed model on multiple VQA databases by analyzing the correlations between model predictions and ground-truth quality ratings, and achieve competitive performance when compared to state-of-the-art NR-VQA models, even though it is not trained on those databases. Our ablation experiments demonstrate that the learned representations are highly robust and generalize well across synthetic and realistic distortions. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning. The implementations used in this work have been made available at https://github.com/pavancm/CONVIQT.

preprint2022arXiv

Interactive Portrait Harmonization

Current image harmonization methods consider the entire background as the guidance for harmonization. However, this may limit the capability for user to choose any specific object/person in the background to guide the harmonization. To enable flexible interaction between user and harmonization, we introduce interactive harmonization, a new setting where the harmonization is performed with respect to a selected \emph{region} in the reference image instead of the entire background. A new flexible framework that allows users to pick certain regions of the background image and use it to guide the harmonization is proposed. Inspired by professional portrait harmonization users, we also introduce a new luminance matching loss to optimally match the color/luminance conditions between the composite foreground and select reference region. This framework provides more control to the image harmonization pipeline achieving visually pleasing portrait edits. Furthermore, we also introduce a new dataset carefully curated for validating portrait harmonization. Extensive experiments on both synthetic and real-world datasets show that the proposed approach is efficient and robust compared to previous harmonization baselines, especially for portraits. Project Webpage at \href{https://jeya-maria-jose.github.io/IPH-web/}{https://jeya-maria-jose.github.io/IPH-web/}

preprint2022arXiv

Making Video Quality Assessment Models Sensitive to Frame Rate Distortions

We consider the problem of capturing distortions arising from changes in frame rate as part of Video Quality Assessment (VQA). Variable frame rate (VFR) videos have become much more common, and streamed videos commonly range from 30 frames per second (fps) up to 120 fps. VFR-VQA offers unique challenges in terms of distortion types as well as in making non-uniform comparisons of reference and distorted videos having different frame rates. The majority of current VQA models require compared videos to be of the same frame rate, but are unable to adequately account for frame rate artifacts. The recently proposed Generalized Entropic Difference (GREED) VQA model succeeds at this task, using natural video statistics models of entropic differences of temporal band-pass coefficients, delivering superior performance on predicting video quality changes arising from frame rate distortions. Here we propose a simple fusion framework, whereby temporal features from GREED are combined with existing VQA models, towards improving model sensitivity towards frame rate distortions. We find through extensive experiments that this feature fusion significantly boosts model performance on both HFR/VFR datasets as well as fixed frame rate (FFR) VQA databases. Our results suggest that employing efficient temporal representations can result much more robust and accurate VQA models when frame rate variations can occur.

preprint2022arXiv

New insights on carbon black suspension rheology -- anisotropic thixotropy and anti-thixotropy

We report a detailed experimental study of peculiar thixotropic dynamics of carbon black (CB, Vulcan XC-72) suspensions in mineral oil, specifically the observation of sequential stress increase then decrease at a fixed shear rate in a step-down test. We verify that such dynamics, though peculiar, come from a true material response rather than experimental artifacts. We also reveal how this long-time stress decay is associated with anti-thixotropy, rather than viscoelasticity, by using orthogonal superposition (OSP) rheometry to probe viscoelastic moduli during the step-down tests. The orthogonal storage and loss modulus are present, showing this two-timescale recovery then decay response, which demonstrates that this response is anti-thixotropic, and it involves shear-induced structuring. We further show a mechanical anisotropy in the CB suspension under shear using OSP. Based on the rheological results, a microstructural schematic is proposed, considering qualitatively thixotropic structure build-up, anti-thixotropic densification, and anisotropic structure evolution. Our observation for these CB suspensions is outside the standard paradigm of thixotropic structure-parameter models, and the elastic response provides us with new insight into the transient dynamics of CB suspensions.

preprint2022arXiv

On the Role of Generalization in Transferability of Adversarial Examples

Black-box adversarial attacks designing adversarial examples for unseen neural networks (NNs) have received great attention over the past years. While several successful black-box attack schemes have been proposed in the literature, the underlying factors driving the transferability of black-box adversarial examples still lack a thorough understanding. In this paper, we aim to demonstrate the role of the generalization properties of the substitute classifier used for generating adversarial examples in the transferability of the attack scheme to unobserved NN classifiers. To do this, we apply the max-min adversarial example game framework and show the importance of the generalization properties of the substitute NN in the success of the black-box attack scheme in application to different NN classifiers. We prove theoretical generalization bounds on the difference between the attack transferability rates on training and test samples. Our bounds suggest that a substitute NN with better generalization behavior could result in more transferable adversarial examples. In addition, we show that standard operator norm-based regularization methods could improve the transferability of the designed adversarial examples. We support our theoretical results by performing several numerical experiments showing the role of the substitute network's generalization in generating transferable adversarial examples. Our empirical results indicate the power of Lipschitz regularization methods in improving the transferability of adversarial examples.

preprint2022arXiv

Perceptual Quality Assessment of UGC Gaming Videos

In recent years, with the vigorous development of the video game industry, the proportion of gaming videos on major video websites like YouTube has dramatically increased. However, relatively little research has been done on the automatic quality prediction of gaming videos, especially on those that fall in the category of "User-Generated-Content" (UGC). Since current leading general-purpose Video Quality Assessment (VQA) models do not perform well on this type of gaming videos, we have created a new VQA model specifically designed to succeed on UGC gaming videos, which we call the Gaming Video Quality Predictor (GAME-VQP). GAME-VQP successfully predicts the unique statistical characteristics of gaming videos by drawing upon features designed under modified natural scene statistics models, combined with gaming specific features learned by a Convolution Neural Network. We study the performance of GAME-VQP on a very recent large UGC gaming video database called LIVE-YT-Gaming, and find that it both outperforms other mainstream general VQA models as well as VQA models specifically designed for gaming videos. The new model will be made public after paper being accepted.

preprint2022arXiv

Probing CP Violation in Neutrino-antineutrino Oscillations with Non-unitary Flavor Mixing

If massive neutrinos are Majorana particles, then the lepton number should be violated and neutrino-antineutrino oscillations will take place. In this talk, we present the properties of CP violation in neutrino-antineutrino oscillations with a non-unitary leptonic flavor mixing matrix, which naturally arises in the type-I seesaw model due to the mixing between light and heavy Majorana neutrinos. Taking into account current experimental bounds on the leptonic unitarity violation, we show that the CP asymmetries induced by the non-unitary mixing parameters can significantly deviate from those in the unitarity limit.

preprint2022arXiv

Site-specific electronic and magnetic excitations of the skyrmion material Cu$_2$OSeO$_3$

The manifestation of skyrmions in the Mott-insulator Cu$_2$OSeO$_3$ originates from a delicate balance between magnetic and electronic energy scales. As a result of these intertwined couplings, the two symmetry-inequivalent magnetic ions, Cu-I and Cu-II, bond into a spin S=1 entangled tetrahedron. However, conceptualizing the unconventional properties of this material and the energy of the competing interactions is a challenging task due the complexity of this system. Here we combine X-ray Absorption Spectroscopy and Resonant Inelastic X-ray Scattering to uncover the electronic and magnetic excitations of Cu$_2$OSeO$_3$ with site-specificity. We quantify the energies of the 3d crystal-field splitting for both Cu-I and Cu-II, fundamental to optimize model Hamiltonians. Additionally, we unveil a site-specific magnetic mode, indicating that individual spin character is preserved within the entangled-tetrahedron picture. Our results thus provide experimental constraint for validating theories that describe the interactions of Cu$_2$OSeO$_3$, highlighting the site-selective capabilities of resonant spectroscopies

preprint2022arXiv

Subjective and Objective Analysis of Streamed Gaming Videos

The rising popularity of online User-Generated-Content (UGC) in the form of streamed and shared videos, has hastened the development of perceptual Video Quality Assessment (VQA) models, which can be used to help optimize their delivery. Gaming videos, which are a relatively new type of UGC videos, are created when skilled gamers post videos of their gameplay. These kinds of screenshots of UGC gameplay videos have become extremely popular on major streaming platforms like YouTube and Twitch. Synthetically-generated gaming content presents challenges to existing VQA algorithms, including those based on natural scene/video statistics models. Synthetically generated gaming content presents different statistical behavior than naturalistic videos. A number of studies have been directed towards understanding the perceptual characteristics of professionally generated gaming videos arising in gaming video streaming, online gaming, and cloud gaming. However, little work has been done on understanding the quality of UGC gaming videos, and how it can be characterized and predicted. Towards boosting the progress of gaming video VQA model development, we conducted a comprehensive study of subjective and objective VQA models on UGC gaming videos. To do this, we created a novel UGC gaming video resource, called the LIVE-YouTube Gaming video quality (LIVE-YT-Gaming) database, comprised of 600 real UGC gaming videos. We conducted a subjective human study on this data, yielding 18,600 human quality ratings recorded by 61 human subjects. We also evaluated a number of state-of-the-art (SOTA) VQA models on the new database, including a new one, called GAME-VQP, based on both natural video statistics and CNN-learned features. To help support work in this field, we are making the new LIVE-YT-Gaming Database, publicly available through the link: https://live.ece.utexas.edu/research/LIVE-YT-Gaming/index.html .

preprint2021arXiv

Image Quality Assessment using Contrastive Learning

We consider the problem of obtaining image quality representations in a self-supervised manner. We use prediction of distortion type and degree as an auxiliary task to learn features from an unlabeled image dataset containing a mixture of synthetic and realistic distortions. We then train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem. We refer to the proposed training framework and resulting deep IQA model as the CONTRastive Image QUality Evaluator (CONTRIQUE). During evaluation, the CNN weights are frozen and a linear regressor maps the learned representations to quality scores in a No-Reference (NR) setting. We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models, even without any additional fine-tuning of the CNN backbone. The learned representations are highly robust and generalize well across images afflicted by either synthetic or authentic distortions. Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets. The implementations used in this paper are available at \url{https://github.com/pavancm/CONTRIQUE}.

preprint2021arXiv

Regression or Classification? New Methods to Evaluate No-Reference Picture and Video Quality Models

Video and image quality assessment has long been projected as a regression problem, which requires predicting a continuous quality score given an input stimulus. However, recent efforts have shown that accurate quality score regression on real-world user-generated content (UGC) is a very challenging task. To make the problem more tractable, we propose two new methods - binary, and ordinal classification - as alternatives to evaluate and compare no-reference quality models at coarser levels. Moreover, the proposed new tasks convey more practical meaning on perceptually optimized UGC transcoding, or for preprocessing on media processing platforms. We conduct a comprehensive benchmark experiment of popular no-reference quality models on recent in-the-wild picture and video quality datasets, providing reliable baselines for both evaluation methods to support further studies. We hope this work promotes coarse-grained perceptual modeling and its applications to efficient UGC processing.

preprint2020arXiv

BBAND Index: A No-Reference Banding Artifact Predictor

Banding artifact, or false contouring, is a common video compression impairment that tends to appear on large flat regions in encoded videos. These staircase-shaped color bands can be very noticeable in high-definition videos. Here we study this artifact, and propose a new distortion-specific no-reference video quality model for predicting banding artifacts, called the Blind BANding Detector (BBAND index). BBAND is inspired by human visual models. The proposed detector can generate a pixel-wise banding visibility map and output a banding severity score at both the frame and video levels. Experimental results show that our proposed method outperforms state-of-the-art banding detection algorithms and delivers better consistency with subjective evaluations.

preprint2020arXiv

GIFnets: Differentiable GIF Encoding Framework

Graphics Interchange Format (GIF) is a widely used image file format. Due to the limited number of palette colors, GIF encoding often introduces color banding artifacts. Traditionally, dithering is applied to reduce color banding, but introducing dotted-pattern artifacts. To reduce artifacts and provide a better and more efficient GIF encoding, we introduce a differentiable GIF encoding pipeline, which includes three novel neural networks: PaletteNet, DitherNet, and BandingNet. Each of these three networks provides an important functionality within the GIF encoding pipeline. PaletteNet predicts a near-optimal color palette given an input image. DitherNet manipulates the input image to reduce color banding artifacts and provides an alternative to traditional dithering. Finally, BandingNet is designed to detect color banding, and provides a new perceptual loss specifically for GIF images. As far as we know, this is the first fully differentiable GIF encoding pipeline based on deep neural networks and compatible with existing GIF decoders. User study shows that our algorithm is better than Floyd-Steinberg based GIF encoding.

preprint2020arXiv

Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation

Autoregressive models recently achieved comparable results versus state-of-the-art Generative Adversarial Networks (GANs) with the help of Vector Quantized Variational AutoEncoders (VQ-VAE). However, autoregressive models have several limitations such as exposure bias and their training objective does not guarantee visual fidelity. To address these limitations, we propose to use Reinforced Adversarial Learning (RAL) based on policy gradient optimization for autoregressive models. By applying RAL, we enable a similar process for training and testing to address the exposure bias issue. In addition, visual fidelity has been further optimized with adversarial loss inspired by their strong counterparts: GANs. Due to the slow sampling speed of autoregressive models, we propose to use partial generation for faster training. RAL also empowers the collaboration between different modules of the VQ-VAE framework. To our best knowledge, the proposed method is first to enable adversarial learning in autoregressive models for image generation. Experiments on synthetic and real-world datasets show improvements over the MLE trained models. The proposed method improves both negative log-likelihood (NLL) and Fréchet Inception Distance (FID), which indicates improvements in terms of visual quality and diversity. The proposed method achieves state-of-the-art results on Celeba for 64 $\times$ 64 image resolution, showing promise for large scale image generation.

preprint2020arXiv

Large deviations of radial SLE$_{\infty}$

We derive the large deviation principle for radial Schramm-Loewner evolution ($\operatorname{SLE}$) on the unit disk with parameter $κ\rightarrow \infty$. Restricting to the time interval $[0,1]$, the good rate function is finite only on a certain family of Loewner chains driven by absolutely continuous probability measures $\{ϕ_t^2 (ζ)\, dζ\}_{t \in [0,1]}$ on the unit circle and equals $\int_0^1 \int_{S^1} |ϕ_t'|^2/2\,dζ\,dt$. Our proof relies on the large deviation principle for the long-time average of the Brownian occupation measure by Donsker and Varadhan.

preprint2020arXiv

Multimodal Style Transfer via Graph Cuts

An assumption widely used in recent neural style transfer methods is that image styles can be described by global statics of deep features like Gram or covariance matrices. Alternative approaches have represented styles by decomposing them into local pixel or neural patches. Despite the recent progress, most existing methods treat the semantic patterns of style image uniformly, resulting unpleasing results on complex styles. In this paper, we introduce a more flexible and general universal style transfer technique: multimodal style transfer (MST). MST explicitly considers the matching of semantic patterns in content and style images. Specifically, the style image features are clustered into sub-style components, which are matched with local content features under a graph cut formulation. A reconstruction network is trained to transfer each sub-style and render the final stylized result. We also generalize MST to improve some existing methods. Extensive experiments demonstrate the superior effectiveness, robustness, and flexibility of MST.

preprint2020arXiv

Shape Adaptor: A Learnable Resizing Module

We present a novel resizing module for neural networks: shape adaptor, a drop-in enhancement built on top of traditional resizing layers, such as pooling, bilinear sampling, and strided convolution. Whilst traditional resizing layers have fixed and deterministic reshaping factors, our module allows for a learnable reshaping factor. Our implementation enables shape adaptors to be trained end-to-end without any additional supervision, through which network architectures can be optimised for each individual task, in a fully automated way. We performed experiments across seven image classification datasets, and results show that by simply using a set of our shape adaptors instead of the original resizing layers, performance increases consistently over human-designed networks, across all datasets. Additionally, we show the effectiveness of shape adaptors on two other applications: network compression and transfer learning. The source code is available at: https://github.com/lorenmt/shape-adaptor.

preprint2020arXiv

Subjective Quality Assessment for YouTube UGC Dataset

Due to the scale of social video sharing, User Generated Content (UGC) is getting more attention from academia and industry. To facilitate compression-related research on UGC, YouTube has released a large-scale dataset. The initial dataset only provided videos, limiting its use in quality assessment. We used a crowd-sourcing platform to collect subjective quality scores for this dataset. We analyzed the distribution of Mean Opinion Score (MOS) in various dimensions, and investigated some fundamental questions in video quality assessment, like the correlation between full video MOS and corresponding chunk MOS, and the influence of chunk variation in quality score aggregation.

preprint2019arXiv

YouTube UGC Dataset for Video Compression Research

Non-professional video, commonly known as User Generated Content (UGC) has become very popular in today's video sharing applications. However, traditional metrics used in compression and quality assessment, like BD-Rate and PSNR, are designed for pristine originals. Thus, their accuracy drops significantly when being applied on non-pristine originals (the majority of UGC). Understanding difficulties for compression and quality assessment in the scenario of UGC is important, but there are few public UGC datasets available for research. This paper introduces a large scale UGC dataset (1500 20 sec video clips) sampled from millions of YouTube videos. The dataset covers popular categories like Gaming, Sports, and new features like High Dynamic Range (HDR). Besides a novel sampling method based on features extracted from encoding, challenges for UGC compression and quality evaluation are also discussed. Shortcomings of traditional reference-based metrics on UGC are addressed. We demonstrate a promising way to evaluate UGC quality by no-reference objective quality metrics, and evaluate the current dataset with three no-reference metrics (Noise, Banding, and SLEEQ).

preprint2016arXiv

Doping-driven orbital-selective Mott transition in multi-band Hubbard models with crystal field splitting

We have studied the doping-driven orbital-selective Mott transition in multi-band Hubbard models with equal band width in the presence of crystal field splitting. Crystal field splitting lifts one of the bands while leaving the others degenerate. We use single-site dynamical mean-field theory combined with continuous time quantum Monte Carlo impurity solver to calculate a phase diagram as a function of total electron filling $N$ and crystal field splitting $Δ$. We find a large region of orbital-selective Mott phase in the phase diagram when the doping is large enough. Further analysis indicates that the large region of orbital-selective Mott phase is driven and stabilized by doping. Such models may account for the orbital-selective Mott transition in some doped realistic strongly correlated materials.

preprint2016arXiv

Hierarchical Attention Network for Action Recognition in Videos

Understanding human actions in wild videos is an important task with a broad range of applications. In this paper we propose a novel approach named Hierarchical Attention Network (HAN), which enables to incorporate static spatial information, short-term motion information and long-term video temporal structures for complex human action understanding. Compared to recent convolutional neural network based approaches, HAN has following advantages (1) HAN can efficiently capture video temporal structures in a longer range; (2) HAN is able to reveal temporal transitions between frame chunks with different time steps, i.e. it explicitly models the temporal transitions between frames as well as video segments and (3) with a multiple step spatial temporal attention mechanism, HAN automatically learns important regions in video frames and temporal segments in the video. The proposed model is trained and evaluated on the standard video action benchmarks, i.e., UCF-101 and HMDB-51, and it significantly outperforms the state-of-the arts

preprint2015arXiv

$i$QIST: An open source continuous-time quantum Monte Carlo impurity solver toolkit

Quantum impurity solvers have a broad range of applications in theoretical studies of strongly correlated electron systems. Especially, they play a key role in dynamical mean-field theory calculations of correlated lattice models and realistic materials. Therefore, the development and implementation of efficient quantum impurity solvers is an important task. In this paper, we present an open source interacting quantum impurity solver toolkit (dubbed $i$QIST). This package contains several highly optimized quantum impurity solvers which are based on the hybridization expansion continuous-time quantum Monte Carlo algorithm, as well as some essential pre- and post-processing tools. We first introduce the basic principle of continuous-time quantum Monte Carlo algorithm and then discuss the implementation details and optimization strategies. The software framework, major features, and installation procedure for $i$QIST are also explained. Finally, several simple tutorials are presented in order to demonstrate the usage and power of $i$QIST.

preprint2015arXiv

Breakdown of compensation and persistence of non-saturating magnetoresistance in WTe2 thin flakes

We present a detailed study of magnetoresistance \r{ho}xx(H), Hall effect \r{ho}xy(H), and electrolyte gating effect in thin (<100 nm) exfoliated crystals of WTe2. We observe quantum oscillations in H of both \r{ho}xx(H) and \r{ho}xy(H), and identify four oscillation frequencies consistent with previous reports in thick crystals. \r{ho}xy(H) is linear in H at low H consistent with near-perfect electron-hole compensation, however becomes nonlinear and changes sign with increasing H, implying a breakdown of compensation. A field-dependent ratio of carrier concentrations p/n can consistently explain \r{ho}xx(H) and \r{ho}xy(H) within a two-fluid model. We also employ an electrolytic gate to highly electron-dope WTe2 with Li. The non-saturating \r{ho}xx(H) persists to H = 14 T with magnetoresistance ratio exceeding 2 x 104 %, even with significant deviation from perfect electron-hole compensation (p/n = 0.84), where the two-fluid model predicts a saturating \r{ho}xx(H). Our results suggest electron-hole compensation is not the mechanism for extremely large magnetoresistance in WTe2, other alternative explanations need to be considered.

preprint2015arXiv

Electronic transport properties of Ir-decorated graphene

Graphene decorated with 5d transitional metal atoms is predicted to exhibit many intriguing properties; for example iridium adatoms are proposed to induce a substantial topological gap in graphene. We extensively investigated the conductivity of single-layer graphene decorated with iridium deposited in ultra-high vacuum at low temperature (7 K) as a function of Ir concentration, carrier density, temperature, and annealing conditions. Our results are consistent with the formation of Ir clusters of ~100 atoms at low temperature, with each cluster donating a single electronic charge to graphene. Annealing graphene increases the cluster size, reducing the doping and increasing the mobility. We do not observe any sign of an energy gap induced by spin-orbit coupling, possibly due to the clustering of Ir.

preprint2015arXiv

Interaction-induced quantum anomalous Hall phase in (111) bilayer of LaCoO$_3$

In the present paper, the Gutzwiller density functional theory (LDA+G) has been applied to study a bilayer system of LaCoO$_3$ grown along the $(111)$ direction on SrTiO$_3$. The LDA calculations show that there are two nearly flat bands located at the top and bottom of $e_{g}$ bands of Co atoms with the Fermi level crossing the lower one, which is almost half-filled. After including both the spin-orbit coupling and the Coulomb interaction in the LDA+G method, we find that the interplay between spin-orbit coupling and Coulomb interaction stabilizes a very robust ferromagnetic insulator phase with non-zero Chern number, which indicates the possibility to realize quantum anomalous Hall effect in this system.

preprint2015arXiv

Neutral-current Hall effects in disordered graphene

A non-local Hall bar geometry is used to detect neutral-current Hall effects in graphene on silicon dioxide. Disorder is tuned by the addition of Au or Ir adatoms in ultra-high vacuum. A reproducible neutral-current Hall effect is found in both as-fabricated and adatom-decorated graphene. The Hall angle exhibits a complex but reproducible dependence on gate voltage and disorder, and notably breaks electron-hole symmetry. An exponential dependence on length between Hall and inverse-Hall probes indicates a neutral current relaxation length of approximately 300 nm. The short relaxation length and lack of precession in parallel magnetic field suggest that the neutral currents are valley currents. The near lack of temperature dependence from 7-300 K is unprecedented and promising for using controlled disorder for room temperature neutral-current electronics.

preprint2015arXiv

Strong charge and spin fluctuations in La$_2$O$_3$Fe$_2$Se$_2$

The electronic structure and magnetic properties of the strongly correlated material La$_2$O$_3$Fe$_2$Se$_2$ are studied by using both the density function theory plus $U$ (DFT+$U$) method and the DFT plus Gutzwiller (DFT+G) variational method. The ground-state magnetic structure of this material obtained with DFT+$U$ is consistent with recent experiments, but its band gap is significantly overestimated by DFT+$U$, even with a small Hubbard $U$ value. In contrast, the DFT+G method yields a band gap of 0.1 - 0.2 eV, in excellent agreement with experiment. Detailed analysis shows that the electronic and magnetic properties of of La$_2$O$_3$Fe$_2$Se$_2$ are strongly affected by charge and spin fluctuations which are missing in the DFT+$U$ method.

preprint2015arXiv

Unsupervised Video Analysis Based on a Spatiotemporal Saliency Detector

Visual saliency, which predicts regions in the field of view that draw the most visual attention, has attracted a lot of interest from researchers. It has already been used in several vision tasks, e.g., image classification, object detection, foreground segmentation. Recently, the spectrum analysis based visual saliency approach has attracted a lot of interest due to its simplicity and good performance, where the phase information of the image is used to construct the saliency map. In this paper, we propose a new approach for detecting spatiotemporal visual saliency based on the phase spectrum of the videos, which is easy to implement and computationally efficient. With the proposed algorithm, we also study how the spatiotemporal saliency can be used in two important vision task, abnormality detection and spatiotemporal interest point detection. The proposed algorithm is evaluated on several commonly used datasets with comparison to the state-of-art methods from the literature. The experiments demonstrate the effectiveness of the proposed approach to spatiotemporal visual saliency detection and its application to the above vision tasks

preprint2013arXiv

Dynamical Screening Effect on Local Two-Particle Vertex Functions

In principle, the electronic Coulomb interaction among the correlated orbitals is frequency-dependent. Though it is generally believed that the dynamically screened interaction may play a crucial role in understanding the subtle electronic structures of strongly correlated materials, hitherto we know very little about it. In the Letter, we demonstrate that within the framework of single-site dynamical mean-field theory the local two-particle Green's functions $χ$ and vertex functions $Γ$ are strongly modified by the dynamically screened interaction. Since both $χ$ and $Γ$ represent the main ingredients to compute momentum-resolved response functions and to treat non-local spatial correlations by means of diagrammatic extensions of dynamical mean-field theory, it is urgent to reexamine previous results by taking the dynamical screening effect into account. The modifications should be very considerable.

preprint2012arXiv

Driven polymer translocation through a cylindrical nanochannel: Interplay between the channel length and the chain length

Using analytical techniques and Langevin dynamics simulations, we investigate the dynamics of polymer translocation through a nanochannel embedded in two dimensions under an applied external field. We examine the translocation time for various ratio of the channel length $L$ to the polymer length $N$. For short channels $L\ll N$, the translocation time $τ\sim N^{1+ν}$ under weak driving force $F$, while $τ\sim F^{-1}L$ for long channels $L\gg N$, independent of the chain length $N$. Moreover, we observe a minimum of translocation time as a function of $L/N$ for different driving forces and channel widths. These results are interpreted by the waiting time of a single segment.

preprint2012arXiv

Dynamical screening in strongly correlated metal SrVO3

The consequences of dynamical screening of Coulomb interaction among correlated electrons in realistic materials have not been widely considered before. In this letter we try to incorporate a frequency dependent Coulomb interaction into the state-of-the-art ab initio electronic structure computing framework of local density approximation plus dynamical mean-field theory, and then choose SrVO3 as a prototype material to demonstrate the importance of dynamical screening effect. It is shown to renormalise the spectral weight near the Fermi level, to increase the effective mass, and to suppress the t2g quasiparticle band width apparently. The calculated results are in accordance with very recent angle-resolved photoemission spectroscopy experiments and Bose factor ansatz calculations.

preprint2011arXiv

Fermi Level Tuning of Epitaxial Sb2Te3 Thin Films on Graphene by Regulating Intrinsic Defects and Substrate Transfer Doping

High-quality Sb2Te3 films are obtained by molecular beam epitaxy on graphene substrate and investigated by in situ scanning tunneling microscopy/spectroscopy. Intrinsic defects responsible for the natural p-type conductivity of Sb2Te3 are identified to be the Sb vacancies and SbTe antisites in agreement with first-principles calculations. By minimizing defect densities, coupled with a transfer doping by the graphene substrate, the Fermi level of Sb2Te3 thin films can be tuned over the entire range of the bulk band gap. This establishes the necessary condition to explore topological insulator behaviors near the Dirac point.

preprint2011arXiv

Landau quantization and the thickness limit of topological insulator thin films of Sb2Te3

We report the experimental observation of Landau quantization of molecular beam epitaxy grown Sb2Te3 thin films by a low-temperature scanning tunneling microscope. Different from all the reported systems, the Landau quantization in Sb2Te3 topological insulator is not sensitive to the intrinsic substitutional defects in the films. As a result, a nearly perfect linear energy dispersion of surface states as 2D massless Dirac fermion system is achieved. We demonstrate that 4 quintuple layers are the thickness limit for Sb2Te3 thin film being a 3D topological insulator. The mechanism of the Landau level broadening is discussed in terms of enhanced quasiparticle lifetime.

preprint2011arXiv

Phases and phase transitions in two dimensional superconducting films

This paper has been withdrawn by the author

preprint2011arXiv

Pressure-Driven Orbital Selective Insulator to Metal Transition and Spin State Crossover in Cubic CoO

The metal-insulator and spin state transitions of CoO under high pressure are studied by using density functional theory combined with dynamical mean-field theory. Our calculations predict that the metal-insulator transition in CoO is a typical orbital selective Mott transition, where the $t_{2g}$ orbitals of Co 3d shell become metallic firstly around 60 GPa while the $e_g$ orbitals still remain insulating until 170 GPa. Further studies of the spin states of Co 3d shell reveal that the orbital selective Mott phase in the intermediate pressure regime is mainly stabilized by the high-spin state of the Co 3d shell and the transition from this phase to the full metallic state is driven by the high-spin to low-spin transition of the Co$^{2+}$ ions. Our results are in good agreement with the most recent transport and x-ray emission experiments under high pressure.

preprint2010arXiv

Landau Quantization of Massless Dirac Fermions in Topological Insulator

The recent theoretical prediction and experimental realization of topological insulators (TI) has generated intense interest in this new state of quantum matter. The surface states of a three-dimensional (3D) TI such as Bi_2Te_3, Bi_2Se_3 and Sb_2Te_3 consist of a single massless Dirac cones. Crossing of the two surface state branches with opposite spins in the materials is fully protected by the time reversal (TR) symmetry at the Dirac points, which cannot be destroyed by any TR invariant perturbation. Recent advances in thin-film growth have permitted this unique two-dimensional electron system (2DES) to be probed by scanning tunneling microscopy (STM) and spectroscopy (STS). The intriguing TR symmetry protected topological states were revealed in STM experiments where the backscattering induced by non-magnetic impurities was forbidden. Here we report the Landau quantization of the topological surface states in Bi_2Se_3 in magnetic field by using STM/STS. The direct observation of the discrete Landau levels (LLs) strongly supports the 2D nature of the topological states and gives direct proof of the nondegenerate structure of LLs in TI. We demonstrate the linear dispersion of the massless Dirac fermions by the square-root dependence of LLs on magnetic field. The formation of LLs implies the high mobility of the 2DES, which has been predicted to lead to topological magneto-electric effect of the TI.

Yilin Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

41 published item(s)

CIRAG: Construction-Integration Retrieval and Adaptive Generation for Multi-hop Question Answering

Preferences Order, Ratings Anchor: From Fused Expert Aesthetic Ground Truth to Self-Distillation

RecruitScope: A Visual Analytics System for Multidimensional Recruitment Data Analysis

A Video Anomaly Detection Framework based on Appearance-Motion Semantics Representation Consistency

Accidental symmetries in the scalar potential of the Standard Model extended with two Higgs triplets

CONVIQT: Contrastive Video Quality Estimator

Interactive Portrait Harmonization

Making Video Quality Assessment Models Sensitive to Frame Rate Distortions

New insights on carbon black suspension rheology -- anisotropic thixotropy and anti-thixotropy

On the Role of Generalization in Transferability of Adversarial Examples

Perceptual Quality Assessment of UGC Gaming Videos

Probing CP Violation in Neutrino-antineutrino Oscillations with Non-unitary Flavor Mixing

Site-specific electronic and magnetic excitations of the skyrmion material Cu$_2$OSeO$_3$

Subjective and Objective Analysis of Streamed Gaming Videos

Image Quality Assessment using Contrastive Learning

Regression or Classification? New Methods to Evaluate No-Reference Picture and Video Quality Models

BBAND Index: A No-Reference Banding Artifact Predictor

GIFnets: Differentiable GIF Encoding Framework

Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation

Large deviations of radial SLE$_{\infty}$

Multimodal Style Transfer via Graph Cuts

Shape Adaptor: A Learnable Resizing Module

Subjective Quality Assessment for YouTube UGC Dataset

YouTube UGC Dataset for Video Compression Research

Doping-driven orbital-selective Mott transition in multi-band Hubbard models with crystal field splitting

Hierarchical Attention Network for Action Recognition in Videos

$i$QIST: An open source continuous-time quantum Monte Carlo impurity solver toolkit

Breakdown of compensation and persistence of non-saturating magnetoresistance in WTe2 thin flakes

Electronic transport properties of Ir-decorated graphene

Interaction-induced quantum anomalous Hall phase in (111) bilayer of LaCoO$_3$

Neutral-current Hall effects in disordered graphene

Strong charge and spin fluctuations in La$_2$O$_3$Fe$_2$Se$_2$

Unsupervised Video Analysis Based on a Spatiotemporal Saliency Detector

Dynamical Screening Effect on Local Two-Particle Vertex Functions

Driven polymer translocation through a cylindrical nanochannel: Interplay between the channel length and the chain length

Dynamical screening in strongly correlated metal SrVO3

Fermi Level Tuning of Epitaxial Sb2Te3 Thin Films on Graphene by Regulating Intrinsic Defects and Substrate Transfer Doping

Landau quantization and the thickness limit of topological insulator thin films of Sb2Te3

Phases and phase transitions in two dimensional superconducting films

Pressure-Driven Orbital Selective Insulator to Metal Transition and Spin State Crossover in Cubic CoO

Landau Quantization of Massless Dirac Fermions in Topological Insulator