Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
33works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

33 published item(s)

preprint2026arXiv

AesRM: Improving Video Aesthetics with Expert-Level Feedback

Despite rapid advances in photorealistic video generation, real-world applications such as filmmaking require video aesthetics, e.g., harmonious colors and cinematic lighting, beyond visual fidelity. Prior work on visual aesthetics largely focuses on images, often reducing aesthetics to coarse definitions, e.g., visual pleasure, without a rigorous and systematic evaluation. To improve video aesthetics, we propose a hierarchical rubric that decomposes video aesthetics into three core dimensions, Visual Aesthetics (VA), Visual Fidelity (VF), and Visual Plausibility (VP), with 15 fine-grained criteria, e.g., shot composition. This framework enables a large-scale expert-annotated preference dataset and an evaluation benchmark, AesVideo-Bench, containing about 2500 video pairs with expert annotations on VA, VF, and VP. We then build a family of Video Aesthetic Reward Models (AesRM): AesRM-Base, which directly predicts pairwise preferences on these dimensions to provide efficient post-training rewards, and AesRM-CoT, which additionally generates CoT aligned with all 15 criteria to improve assessment interpretability. Specifically, we train AesRM with a three-stage progressive scheme: (1) Atomic Aesthetic Capability Learning, which strengthens AesRM's recognition of fundamental aesthetic concepts, e.g., accurately identifying centered composition; (2) Cold-Start, aligning the model with structured reasoning protocols; and (3) GRPO, further improving evaluation accuracy. To enhance AesRM-CoT, we additionally propose self-consistency-based CoT synthesis to improve CoT quality and design CoT-based process rewards during GRPO. Extensive experiments show AesRM outperforms baselines on multiple aesthetics benchmarks and is more robust, with lower position bias. Finally, we align Wan2.2 with AesRM and observe clear aesthetic gains over existing aesthetic reward models.

preprint2026arXiv

Bridging Brain and Semantics: A Hierarchical Framework for Semantically Enhanced fMRI-to-Video Reconstruction

Reconstructing dynamic visual experiences as videos from functional magnetic resonance imaging (fMRI) is pivotal for advancing the understanding of neural processes. However, current fMRI-to-video reconstruction methods are hindered by a semantic gap between noisy fMRI signals and the rich content of videos, stemming from a reliance on incomplete semantic embeddings that neither capture video-specific cues (e.g., actions) nor integrate prior knowledge. To this end, we draw inspiration from the dual-pathway processing mechanism in human brain and introduce CineNeuron, a novel hierarchical framework for semantically enhanced video reconstruction from fMRI signals with two synergistic stages. First, a bottom-up semantic enrichment stage maps fMRI signals to a rich embedding space that comprehensively captures textual semantics, image contents, action concepts, and object categories. Second, a top-down memory integration stage utilizes the proposed Mixture-of-Memories method to dynamically select relevant "memories" from previously seen data and fuse them with the fMRI embedding to refine the video reconstruction. Extensive experimental results on two fMRI-to-video benchmarks demonstrate that CineNeuron surpasses state-of-the-art methods across various metrics.

preprint2026arXiv

DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

Reinforcement learning has emerged as a powerful tool for improving diffusion-based text-to-image models, but existing methods are largely limited to single-task optimization. Extending RL to multiple tasks is challenging: joint optimization suffers from cross-task interference and imbalance, while cascade RL is cumbersome and prone to catastrophic forgetting. We propose DiffusionOPD, a new multi-task training paradigm for diffusion models based on Online Policy Distillation (OPD). DiffusionOPD first trains task-specific teachers independently, then distills their capabilities into a unified student along the student own rollout trajectories. This decouples single-task exploration from multi-task integration and avoids the optimization burden of solving all tasks jointly from scratch. Theoretically, we lift the OPD framework from discrete tokens to continuous-state Markov processes, deriving a closed-form per-step KL objective that unifies both stochastic SDE and deterministic ODE refinement via mean-matching. We formally and empirically demonstrate that this analytic gradient provides lower variance and better generality compared to conventional PPO-style policy gradients. Extensive experiments show that DiffusionOPD consistently surpasses both multi-reward RL and cascade RL baselines in training efficiency and final performance, while achieving state-of-the-art results on all evaluated benchmarks.

preprint2026arXiv

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

Video generation is rapidly evolving from single-shot synthesis to complex multi-shot audio-video (MSAV) narratives to meet real-world demands. However, evaluating such frontier models remains a fundamental challenge. Existing benchmarks are limited in scope and data diversity, and rely on rigid evaluation pipelines, preventing systematic and reliable assessment of modern MSAV models. To bridge these gaps, we introduce MSAVBench, the first comprehensive benchmark and adaptive hybrid evaluation framework for multi-shot audio-video generation. Our benchmark spans four key dimensions, video, audio, shot, and reference, covering diverse task settings, varying shot counts of up to 15, and challenging non-realistic scenarios. Our evaluation framework improves robustness through an adaptive self-correction mechanism for shot segmentation, instance-wise rubrics for subjective metrics, and tool-grounded evidence extraction for complex judgments. Furthermore, MSAVBench achieves high alignment with human judgments, reaching a Spearman rank correlation of 91.5%. Our systematic evaluation of 19 state-of-the-art closed- and open-source models shows that current systems still struggle with director-level control and fine-grained audio-visual synchronization, while modular or agentic generation pipelines offer a promising path toward narrowing the gap between open- and closed-source models. We will release the benchmark data and evaluation code to facilitate future research.

preprint2025arXiv

Critical gate distance for Wigner crystallization in the two-dimensional electron gas

We report on the properties of the two-dimensional electron gas in a dual-gate geometry, using quantum Monte Carlo methods to obtain aspects of the phase diagram as a function of electron density and gate distance. We identify the critical gate distance below which the Wigner crystal phase disappears. For larger gate distances, the system undergoes a re-entrant transition from crystal to liquid at sufficiently low density. We also present preliminary evidence for a fully polarized ferromagnetic liquid state at low electron density and intermediate gate distances. The quantum Monte Carlo results are compared with simpler approximate methods, which are shown to be semi-quantitatively reliable for determining key features of the phase diagram. These methods are then used to obtain the phase boundary between the Wigner crystal and liquid in the single-gate geometry.

preprint2023arXiv

HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot Action Recognition

Recent attempts mainly focus on learning deep representations for each video individually under the episodic meta-learning regime and then performing temporal alignment to match query and support videos. However, they still suffer from two drawbacks: (i) learning individual features without considering the entire task may result in limited representation capability, and (ii) existing alignment strategies are sensitive to noises and misaligned instances. To handle the two limitations, we propose a novel Hybrid Relation guided temporal Set Matching (HyRSM++) approach for few-shot action recognition. The core idea of HyRSM++ is to integrate all videos within the task to learn discriminative representations and involve a robust matching technique. To be specific, HyRSM++ consists of two key components, a hybrid relation module and a temporal set matching metric. Given the basic representations from the feature extractor, the hybrid relation module is introduced to fully exploit associated relations within and cross videos in an episodic task and thus can learn task-specific embeddings. Subsequently, in the temporal set matching metric, we carry out the distance measure between query and support videos from a set matching perspective and design a Bi-MHM to improve the resilience to misaligned instances. In addition, we explicitly exploit the temporal coherence in videos to regularize the matching process. Furthermore, we extend the proposed HyRSM++ to deal with the more challenging semi-supervised few-shot action recognition and unsupervised few-shot action recognition tasks. Experimental results on multiple benchmarks demonstrate that our method achieves state-of-the-art performance under various few-shot settings. The source code is available at https://github.com/alibaba-mmai-research/HyRSMPlusPlus.

preprint2022arXiv

Context-aware Proposal Network for Temporal Action Detection

This technical report presents our first place winning solution for temporal action detection task in CVPR-2022 AcitivityNet Challenge. The task aims to localize temporal boundaries of action instances with specific classes in long untrimmed videos. Recent mainstream attempts are based on dense boundary matchings and enumerate all possible combinations to produce proposals. We argue that the generated proposals contain rich contextual information, which may benefits detection confidence prediction. To this end, our method mainly consists of the following three steps: 1) action classification and feature extraction by Slowfast, CSN, TimeSformer, TSP, I3D-flow, VGGish-audio, TPN and ViViT; 2) proposal generation. Our proposed Context-aware Proposal Network (CPN) builds on top of BMN, GTAD and PRN to aggregate contextual information by randomly masking some proposal features. 3) action detection. The final detection prediction is calculated by assigning the proposals with corresponding video-level classifcation results. Finally, we ensemble the results under different feature combination settings and achieve 45.8% performance on the test set, which improves the champion result in CVPR-2021 ActivityNet Challenge by 1.1% in terms of average mAP.

preprint2022arXiv

Discovery-and-Selection: Towards Optimal Multiple Instance Learning for Weakly Supervised Object Detection

Weakly supervised object detection (WSOD) is a challenging task that requires simultaneously learn object classifiers and estimate object locations under the supervision of image category labels. A major line of WSOD methods roots in multiple instance learning which regards images as bags of instances and selects positive instances from each bag to learn the detector. However, a grand challenge emerges when the detector inclines to converge to discriminative parts of objects rather than the whole objects. In this paper, under the hypothesis that optimal solutions are included in local minima, we propose a discovery-and-selection approach fused with multiple instance learning (DS-MIL), which finds rich local minima and select optimal solution from multiple local minima. To implement DS-MIL, an attention module is proposed so that more context information can be captured by feature maps and more valuable proposals can be collected during training. With proposal candidates, a selection module is proposed to select informative instances for object detector. Experimental results on commonly used benchmarks show that our proposed DS-MIL approach can consistently improve the baselines, reporting state-of-the-art performance.

preprint2022arXiv

End-to-end Temporal Action Detection with Transformer

Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video. It is a fundamental and challenging task in video understanding. Previous methods tackle this task with complicated pipelines. They often need to train multiple networks and involve hand-designed operations, such as non-maximal suppression and anchor generation, which limit the flexibility and prevent end-to-end learning. In this paper, we propose an end-to-end Transformer-based method for TAD, termed TadTR. Given a small set of learnable embeddings called action queries, TadTR adaptively extracts temporal context information from the video for each query and directly predicts action instances with the context. To adapt Transformer to TAD, we propose three improvements to enhance its locality awareness. The core is a temporal deformable attention module that selectively attends to a sparse set of key snippets in a video. A segment refinement mechanism and an actionness regression head are designed to refine the boundaries and confidence of the predicted instances, respectively. With such a simple pipeline, TadTR requires lower computation cost than previous detectors, while preserving remarkable performance. As a self-contained detector, it achieves state-of-the-art performance on THUMOS14 (56.7% mAP) and HACS Segments (32.09% mAP). Combined with an extra action classifier, it obtains 36.75% mAP on ActivityNet-1.3. Code is available at https://github.com/xlliu7/TadTR.

preprint2022arXiv

Hybrid Relation Guided Set Matching for Few-shot Action Recognition

Current few-shot action recognition methods reach impressive performance by learning discriminative features for each video via episodic training and designing various temporal alignment strategies. Nevertheless, they are limited in that (a) learning individual features without considering the entire task may lose the most relevant information in the current episode, and (b) these alignment strategies may fail in misaligned instances. To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric. The purpose of the hybrid relation module is to learn task-specific embeddings by fully exploiting associated relations within and cross videos in an episode. Built upon the task-specific features, we reformulate distance measure between query and support videos as a set matching problem and further design a bidirectional Mean Hausdorff Metric to improve the resilience to misaligned instances. By this means, the proposed HyRSM can be highly informative and flexible to predict query categories under the few-shot settings. We evaluate HyRSM on six challenging benchmarks, and the experimental results show its superiority over the state-of-the-art methods by a convincing margin. Project page: https://hyrsm-cvpr2022.github.io/.

preprint2022arXiv

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Natural videos provide rich visual contents for self-supervised learning. Yet most existing approaches for learning spatio-temporal representations rely on manually trimmed videos, leading to limited diversity in visual patterns and limited performance gain. In this work, we aim to learn representations by leveraging more abundant information in untrimmed videos. To this end, we propose to learn a hierarchy of consistencies in videos, i.e., visual consistency and topical consistency, corresponding respectively to clip pairs that tend to be visually similar when separated by a short time span and share similar topics when separated by a long time span. Specifically, a hierarchical consistency learning framework HiCo is presented, where the visually consistent pairs are encouraged to have the same representation through contrastive learning, while the topically consistent pairs are coupled through a topical classifier that distinguishes whether they are topic related. Further, we impose a gradual sampling algorithm for proposed hierarchical consistency learning, and demonstrate its theoretical superiority. Empirically, we show that not only HiCo can generate stronger representations on untrimmed videos, it also improves the representation quality when applied to trimmed videos. This is in contrast to standard contrastive learning that fails to learn appropriate representations from untrimmed videos.

preprint2022arXiv

Open-world Semantic Segmentation for LIDAR Point Clouds

Current methods for LIDAR semantic segmentation are not robust enough for real-world applications, e.g., autonomous driving, since it is closed-set and static. The closed-set assumption makes the network only able to output labels of trained classes, even for objects never seen before, while a static network cannot update its knowledge base according to what it has seen. Therefore, in this work, we propose the open-world semantic segmentation task for LIDAR point clouds, which aims to 1) identify both old and novel classes using open-set semantic segmentation, and 2) gradually incorporate novel objects into the existing knowledge base using incremental learning without forgetting old classes. For this purpose, we propose a REdundAncy cLassifier (REAL) framework to provide a general architecture for both the open-set semantic segmentation and incremental learning problems. The experimental results show that REAL can simultaneously achieves state-of-the-art performance in the open-set semantic segmentation task on the SemanticKITTI and nuScenes datasets, and alleviate the catastrophic forgetting problem with a large margin during incremental learning.

preprint2022arXiv

Precision Many-Body Study of the Berezinskii-Kosterlitz-Thouless Transition and Temperature-Dependent Properties in the Two-Dimensional Fermi Gas

We perform large-scale, numerically exact calculations on the two-dimensional interacting Fermi gas with a contact attraction. Reaching much larger lattice sizes and lower temperatures than previously possible, we determine systematically the finite-temperature phase diagram of the Berezinskii-Kosterlitz-Thouless (BKT) transitions for interaction strengths ranging from BCS to crossover to BEC regimes. The evolutions of the pairing wavefunctions and the fermion and Cooper pair momentum distributions with temperature are accurately characterized. In the crossover regime, we find that the contact has a non-monotonic temperature dependence, first increasing as temperature is lowered, and then showing a slight decline below the BKT transition temperature to approach the ground-state value from above.

preprint2022arXiv

Solving 2D and 3D lattice models of correlated fermions -- combining matrix product states with mean field theory

Correlated electron states are at the root of many important phenomena including unconventional superconductivity (USC), where electron-pairing arises from repulsive interactions. Computing the properties of correlated electrons, such as the critical temperature $T_c$ for the onset of USC, efficiently and reliably from the microscopic physics with quantitative methods remains a major challenge for almost all models and materials. In this theoretical work we combine matrix product states (MPS) with static mean field (MF) to provide a solution to this challenge for quasi-one-dimensional (Q1D) systems: Two- and three-dimensional (2D/3D) materials comprised of weakly coupled correlated 1D fermions. This MPS+MF framework for the ground state and thermal equilibrium properties of Q1D fermions is developed and validated for attractive Hubbard systems first, and further enhanced via analytical field theory. We then deploy it to compute $T_c$ for superconductivity in 3D arrays of weakly coupled, doped and repulsive Hubbard ladders. The MPS+MF framework thus enables the reliable, quantitative and unbiased study of USC and high-$T_c$ superconductivity - and potentially many more correlated phases - in fermionic Q1D systems from microscopic parameters, in ways inaccessible to previous methods. It opens the possibility of designing deliberately optimized Q1D superconductors, from experiments in ultracold gases to synthesizing new materials.

preprint2022arXiv

Stripes and spin-density waves in the doped two-dimensional Hubbard model: ground state phase diagram

We determine the spin and charge orders in the ground state of the doped two-dimensional (2D) Hubbard model in its simplest form, namely with only nearest-neighbor hopping and on-site repulsion. At half-filling, the ground state is known to be an anti-ferromagnetic Mott insulator. Doping Mott insulators is believed to be relevant to the superconductivity observed in cuprates. A variety of candidates have been proposed for the ground state of the doped 2D Hubbard model. A recent work employing a combination of several state-of-the-art numerical many-body methods, established the stripe order as the ground state near $1/8$ doping at strong interactions. In this work, we apply one of these methods, the cutting-edge constrained-path auxiliary field quantum Monte Carlo method with self-consistently optimized gauge constraints, to systematically study the model as a function of doping and interaction strength. With careful finite size scaling based on large-scale computations, we map out the ground state phase diagram in terms of its spin and charge order. We find that modulated antiferromagnetic order persists from near half-filling to about $1/5$ doping. At lower interaction strengths or larger doping, these ordered states are best described as spin-density waves, with essentially delocalized holes and modest oscillations in charge correlations. When the charge correlations are stronger (large interaction or small doping), they are best described as stripe states, with the holes more localized near the node in the antiferromagnetic spin order. In both cases, we find that the wavelength in the charge correlations is consistent with so-called filled stripes in the pure Hubbard model.

preprint2022arXiv

TAda! Temporally-Adaptive Convolutions for Video Understanding

Spatial convolutions are widely used in numerous deep video models. It fundamentally assumes spatio-temporal invariance, i.e., using shared weights for every location in different frames. This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos. Specifically, TAdaConv empowers the spatial convolutions with temporal modelling abilities by calibrating the convolution weights for each frame according to its local and global temporal context. Compared to previous temporal modelling operations, TAdaConv is more efficient as it operates over the convolution kernels instead of the features, whose dimension is an order of magnitude smaller than the spatial resolutions. Further, the kernel calibration brings an increased model capacity. We construct TAda2D and TAdaConvNeXt networks by replacing the 2D convolutions in ResNet and ConvNeXt with TAdaConv, which leads to at least on par or better performance compared to state-of-the-art approaches on multiple video action recognition and localization benchmarks. We also demonstrate that as a readily plug-in operation with negligible computation overhead, TAdaConv can effectively improve many existing video models with a convincing margin.

preprint2022arXiv

TCTrack: Temporal Contexts for Aerial Tracking

Temporal contexts among consecutive frames are far from being fully utilized in existing visual trackers. In this work, we present TCTrack, a comprehensive framework to fully exploit temporal contexts for aerial tracking. The temporal contexts are incorporated at \textbf{two levels}: the extraction of \textbf{features} and the refinement of \textbf{similarity maps}. Specifically, for feature extraction, an online temporally adaptive convolution is proposed to enhance the spatial features using temporal information, which is achieved by dynamically calibrating the convolution weights according to the previous frames. For similarity map refinement, we propose an adaptive temporal transformer, which first effectively encodes temporal knowledge in a memory-efficient way, before the temporal knowledge is decoded for accurate adjustment of the similarity map. TCTrack is effective and efficient: evaluation on four aerial tracking benchmarks shows its impressive performance; real-world UAV tests show its high speed of over 27 FPS on NVIDIA Jetson AGX Xavier.

preprint2021arXiv

Ab initio electronic density in solids by many-body plane-wave auxiliary-field quantum Monte Carlo calculations

We present accurate many-body results of the electronic densities in several solid materials, including Si, NaCl, and Cu. These results are obtained using the ab initio auxiliary-field quantum Monte Carlo (AFQMC) method working in a plane-wave basis with norm-conserving, multiple-projector pseudopotentials. AFQMC has been shown to be an excellent many-body total energy method. Computation of observables and correlation functions other than the ground-state energy requires back-propagation, whose adaption and implementation in the plane-wave basis AFQMC framework are discussed in the present paper. This development allows us to compute correlation functions, electronic densities and interatomic forces, paving the way for geometry optimizations and calculations of thermodynamic properties in solids. Finite supercell size effects are considerably more subtle in the many-body framework than in independent-electron calculations. We analyze the convergence of the electronic density, and obtain best estimates for the thermodynamic limit. The densities from several typical density functionals are benchmarked against our near-exact results. The electronic densities we have obtained can also be used to help construct improved density functionals.

preprint2021arXiv

Does QA-based intermediate training help fine-tuning language models for text classification?

Fine-tuning pre-trained language models for downstream tasks has become a norm for NLP. Recently it is found that intermediate training based on high-level inference tasks such as Question Answering (QA) can improve the performance of some language models for target tasks. However it is not clear if intermediate training generally benefits various language models. In this paper, using the SQuAD-2.0 QA task for intermediate training for target text classification tasks, we experimented on eight tasks for single-sequence classification and eight tasks for sequence-pair classification using two base and two compact language models. Our experiments show that QA-based intermediate training generates varying transfer performance across different language models, except for similar QA tasks.

preprint2021arXiv

Exotic Superfluid Phases in Spin Polarized Systems on Optical Lattices

Leveraging cutting-edge numerical methodologies, we study the ground state of the two-dimensional spin-polarized Fermi gas in an optical lattice. We focus on systems at high density and small spin polarization, corresponding to the parameter regime believed to be most favorable to the formation of the elusive Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) superfluid phase. Our systematic study of large lattice sizes, hosting nearly $500$ atoms, provides strong evidence of the stability of the FFLO state in this regime, as well as a high-accuracy characterization of its properties. Our results for the density correlation function reveal the existence of density order in the system, suggesting the possibility of an intricate coexistence of long-range orders in the ground state. The ground-state properties are seen to differ significantly from the standard mean-field description, providing a compelling avenue for future theoretical and experimental explorations of the interplay between interaction and superfluidity in an exotic phase of matter.

preprint2020arXiv

A Pseudo-BCS Wavefunction from Density Matrix Decomposition:Application in Auxiliary-Field Quantum Monte Carlo

We present a method to construct pseudo-BCS wave functions from the one-body density matrix. The resulting many-body wave function, which can be produced for any fermion systems, including those with purely repulsive interactions, has the form of a number-projected BCS form, or antisymmetrized germinal power (AGP). Such wave functions provide a better ansatz for correlated fermion systems than a single Slater determinant, and often better than a linear combination of Slater determinants (for example from a truncated active space calculation). We describe a procedure to build such a wave function conveniently from a given reduced density matrix of the system, rather than from a mean-field solution (which gives a Slater determinant for repulsive interactions). The pseudo-BCS wave function thus obtained reproduces the density matrix or minimizes the difference between the input and resulting density matrices. One application of the pseudo-BCS wave function is in auxiliary-field quantum Monte Carlo (AFQMC) calculations as the trial wave function to control the sign/phase problem. AFQMC is often among the most accurate general methods for correlated fermion systems. We show that the pseudo-BCS form further reduces the constraint bias and leads to improved accuracy compared to the usual Slater determinant trial wave functions, using the two-dimensional Hubbard model as an example. Furthermore, the pseudo-BCS trial wave function allows a new systematically improvable self-consistent approach, with pseudo-BCS trial wave function iteratively generated by AFQMC via the one-body density matrix.

preprint2020arXiv

A3: An Automatic Topology-Aware Malfunction Detection and Fixation System in Data Center Networks

Link failures and cable miswirings are not uncommon in building data center networks, which prevents the existing automatic address configuration methods from functioning correctly. However, accurately detecting such malfunctions is not an easy task because there could be no observable node degree changes. Fixing or correcting such malfunctions is even harder as almost no work can provide accurate fixation suggestions now. To solve the problems, we design and implement A3, an automatic topology-aware malfunction detection and fixation system. A3 innovatively formulates the problem of finding minimal fixation to the problem of computing minimum graph difference (NP-hard) and solves it in O(k^6) and O(k^3) for any less than k/2 and k/4 undirected link malfunctions for FatTree, respectively. Our evaluation demonstrates that for less than k/2 undirected link malfunctions, A3 is 100% accurate for malfunction detection and provides the minimum fixation result. For greater or equal to k/2 undirected link malfunctions, A3 still has accuracy of about 100% and provides the near optimal fixation result.

preprint2020arXiv

Absence of superconductivity in the pure two-dimensional Hubbard model

We study the superconducting pairing correlations in the ground state of the doped Hubbard model -- in its original form without hopping beyond nearest neighbor or other perturbing parameters -- in two dimensions at intermediate to strong coupling and near optimal doping. The nature of such correlations has been a central question ever since the discovery of cuprate high-temperature superconductors. Despite unprecedented effort and tremendous progress in understanding the properties of this fundamental model, a definitive answer to whether the ground state is superconducting in the parameter regime most relevant to cuprates has proved exceedingly difficult to establish. In this work, we employ two complementary, state-of-the-art many-body computational methods, constrained path (CP) auxiliary-field quantum Monte Carlo (AFQMC) and density matrix renormalization group (DMRG) methods, deploying the most recent algorithmic advances in each. Systematic and detailed comparisons between the two methods are performed. The DMRG is extremely reliable on small width cylinders, where we use it to validate the AFQMC. The AFQMC is then used to study wide systems as well as fully periodic systems, to establish that we have reached the thermodynamic limit. The ground state is found to be non-superconducting in the moderate to strong coupling regime in the vicinity of optimal hole doping.

preprint2020arXiv

CBR-Net: Cascade Boundary Refinement Network for Action Detection: Submission to ActivityNet Challenge 2020 (Task 1)

In this report, we present our solution for the task of temporal action localization (detection) (task 1) in ActivityNet Challenge 2020. The purpose of this task is to temporally localize intervals where actions of interest occur and predict the action categories in a long untrimmed video. Our solution mainly includes three components: 1) feature encoding: we apply three kinds of backbones, including TSN [7], Slowfast[3] and I3d[1], which are both pretrained on Kinetics dataset[2]. Applying these models, we can extract snippet-level video representations; 2) proposal generation: we choose BMN [5] as our baseline, base on which we design a Cascade Boundary Refinement Network (CBR-Net) to conduct proposal detection. The CBR-Net mainly contains two modules: temporal feature encoding, which applies BiLSTM to encode long-term temporal information; CBR module, which targets to refine the proposal precision under different parameter settings; 3) action localization: In this stage, we combine the video-level classification results obtained by the fine tuning networks to predict the category of each proposal. Moreover, we also apply to different ensemble strategies to improve the performance of the designed solution, by which we achieve 42.788% on the testing set of ActivityNet v1.3 dataset in terms of mean Average Precision metrics.

preprint2020arXiv

Ground-state properties of the hydrogen chain: insulator-to-metal transition, dimerization, and magnetic phases

Accurate and predictive computations of the quantum-mechanical behavior of many interacting electrons in realistic atomic environments are critical for the theoretical design of materials with desired properties, and require solving the grand-challenge problem of the many-electron Schrodinger equation. An infinite chain of equispaced hydrogen atoms is perhaps the simplest realistic model for a bulk material, embodying several central themes of modern condensed matter physics and chemistry, while retaining a connection to the paradigmatic Hubbard model. Here we report a combined application of cutting-edge computational methods to determine the properties of the hydrogen chain in its quantum-mechanical ground state. Varying the separation between the nuclei leads to a rich phase diagram, including a Mott phase with quasi long-range antiferromagnetic order, electron density dimerization with power-law correlations, an insulator-to-metal transition and an intricate set of intertwined magnetic orders.

preprint2020arXiv

Less is More: Rejecting Unreliable Reviews for Product Question Answering

Promptly and accurately answering questions on products is important for e-commerce applications. Manually answering product questions (e.g. on community question answering platforms) results in slow response and does not scale. Recent studies show that product reviews are a good source for real-time, automatic product question answering (PQA). In the literature, PQA is formulated as a retrieval problem with the goal to search for the most relevant reviews to answer a given product question. In this paper, we focus on the issue of answerability and answer reliability for PQA using reviews. Our investigation is based on the intuition that many questions may not be answerable with a finite set of reviews. When a question is not answerable, a system should return nil answers rather than providing a list of irrelevant reviews, which can have significant negative impact on user experience. Moreover, for answerable questions, only the most relevant reviews that answer the question should be included in the result. We propose a conformal prediction based framework to improve the reliability of PQA systems, where we reject unreliable answers so that the returned results are more concise and accurate at answering the product question, including returning nil answers for unanswerable questions. Experiments on a widely used Amazon dataset show encouraging results of our proposed framework. More broadly, our results demonstrate a novel and effective application of conformal methods to a retrieval task.

preprint2020arXiv

Magnetic and charge orders in the ground state of the Emery model -- accurate numerical results

We perform extensive auxiliary-field quantum Monte Carlo (AFQMC) calculations for the three-band Hubbard (Emery) model in order to study the ground-state properties of Copper-Oxygen planes in the cuprates. Employing cutting-edge AFQMC techniques with a self-consistent gauge constraint in auxiliary-field space to control the sign problem, we reach supercells containing around 500 atoms to capture collective modes in the charge and spin orders and characterize the behavior in the thermodynamic limit. The self-consistency scheme interfacing with generalized Hartree-Fock calculations allows high accuracy in AFQMC to resolve small energy scales, which is crucial for determining the complex candidate orders in such a system. We present detailed information on the charge order, spin order, momentum distribution, and localization properties as a function of charge-transfer energy for the the under-doped regime. In contrast with the stripe and spiral orders under hole-doping, we find that the corresponding 1/8 electron-doped system exhibits purely antiferromagnetic order in the three-band model, consistent with the asymmetry between electron and hole-doping in the phase diagram of cuprates.

preprint2020arXiv

Multi-Level Temporal Pyramid Network for Action Detection

Currently, one-stage frameworks have been widely applied for temporal action detection, but they still suffer from the challenge that the action instances span a wide range of time. The reason is that these one-stage detectors, e.g., Single Shot Multi-Box Detector (SSD), extract temporal features only applying a single-level layer for each head, which is not discriminative enough to perform classification and regression. In this paper, we propose a Multi-Level Temporal Pyramid Network (MLTPN) to improve the discrimination of the features. Specially, we first fuse the features from multiple layers with different temporal resolutions, to encode multi-layer temporal information. We then apply a multi-level feature pyramid architecture on the features to enhance their discriminative abilities. Finally, we design a simple yet effective feature fusion module to fuse the multi-level multi-scale features. By this means, the proposed MLTPN can learn rich and discriminative features for different action instances with different durations. We evaluate MLTPN on two challenging datasets: THUMOS'14 and Activitynet v1.3, and the experimental results show that MLTPN obtains competitive performance on Activitynet v1.3 and outperforms the state-of-the-art approaches on THUMOS'14 significantly.

preprint2020arXiv

Plaquette versus ordinary $d$-wave pairing in the $t'$-Hubbard model on a width 4 cylinder

The Hubbard model and its extensions are important microscopic models for understanding high- $T_c$ superconductivity in cuprates. In the model with next-nearest-neighbor hopping $t&#39;$ (the $t&#39;$- Hubbard model), pairing is strongly influenced by $t&#39;$ . In particular, a recent study on a width-4 cylinder observed quasi-long-rage superconducting order, associated with a negative $t&#39;$ , which was taken to imply superconductivity in the two-dimensional (2D) limit. In this work we study more carefully pairing in the width-4 $t&#39;$-Hubbard model. We show that in this specific system, the pairing symmetry with $t&#39;<0$ is not the ordinary $d$-wave one would expect in the 2D limit. Instead we observe a so-called plaquette d-wave pairing. The plaquette d-wave exists only on a width-4 cylinder, and so is not representative of the 2D limit. We find that a negative $t&#39;$ suppresses the conventional d-wave, leading to plaquette pairing. In contrast, a different $t&#39;&#39;$ coupling acting diagonally on the plaquettes suppresses plaquette pairing, leading to conventional $d$-wave pairing.

preprint2020arXiv

Predicting Ligand-Dissociation Energies of 3d Coordination Complexes with Auxiliary-Field Quantum Monte Carlo

Transition metal complexes are ubiquitous in biology and chemical catalysis, yet they remain difficult to accurately describe with ab initio methods due to the presence of a large degree of dynamic electron correlation, and, in some cases, strong static correlation which results from a manifold of low-lying states. Progress has been hindered by a scarcity of high quality gas-phase experimental data, while exact ab initio predictions are usually computationally unaffordable due to the large size of the systems. In this work, we present a data set of 34 3d metal-containing complexes with gas-phase ligand-dissociation energies that have reported uncertainties of $\leq$ 2 kcal/mol. We perform all-electron phaseless auxiliary-field quantum Monte Carlo (ph-AFQMC) utilizing multi-determinant trial wavefunctions selected by a blackbox procedure. We compare the results with those from DFT with various functionals, and DLPNO-CCSD(T). We find MAE of 1.09 $\pm$ 0.28 kcal/mol for our best ph-AFQMC method, vs 2.89 kcal/mol for DLPNO-CCSD(T) and 1.57 - 3.87 kcal/mol for DFT. We find maximum errors of 2.96 $\pm$ 1.71 kcal/mol for our best ph-AFQMC method, vs 9.15 kcal/mol for DLPNO-CCSD(T) and 5.98 - 13.69 kcal/mol for DFT. The reasonable performance of several functionals is in stark contrast to the much poorer accuracy previously demonstrated for diatomics, suggesting a moderation in electron correlation due to ligand coordination. However, the unpredictably large errors for a small subset of cases with both DFT and DLPNO-CCSD(T) leave cause for concern, especially due to the unreliability of common multi-reference indicators. In contrast, the robust and, in principle, systematically improvable results of ph-AFQMC for these realistic complexes establish it as a useful tool for elucidating the electronic structure of transition metal-containing complexes and predicting their gas-phase properties.

preprint2020arXiv

Some Recent Developments in Auxiliary-Field Quantum Monte Carlo for Real Materials

The auxiliary-field quantum Monte Carlo (AFQMC) method is a general numerical method for correlated many-electron systems, which is being increasingly applied in lattice models, atoms, molecules, and solids. Here we introduce the theory and algorithm of the method specialized for real materials, and present several recent developments. We give a systematic exposition of the key steps of AFQMC, closely tracking the framework of a modern software library we are developing. The building of a Monte Carlo Hamiltonian, projecting to the ground state, sampling two-body operators, phaseless approximation, and measuring ground state properties are discussed in details. An advanced implementation for multi-determinant trial wave functions is described which dramatically speeds up the algorithm and reduces the memory cost. We propose a self-consistent constraint for real materials, and discuss two flavors for its realization, either by coupling the AFQMC calculation to an effective independent-electron calculation, or via the natural orbitals of the computed one-body density matrix.

preprint2020arXiv

Temporal Fusion Network for Temporal Action Localization:Submission to ActivityNet Challenge 2020 (Task E)

This technical report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020.The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category.Firstly, we utilize the video-level feature information to train multiple video-level action classification models. In this way, we can get the category of action in the video.Secondly, we focus on generating high quality temporal proposals.For this purpose, we apply BMN to generate a large number of proposals to obtain high recall rates. We then refine these proposals by employing a cascade structure network called Refine Network, which can predict position offset and new IOU under the supervision of ground truth.To make the proposals more accurate, we use bidirectional LSTM, Nonlocal and Transformer to capture temporal relationships between local features of each proposal and global features of the video data.Finally, by fusing the results of multiple models, our method obtains 40.55% on the validation set and 40.53% on the test set in terms of mAP, and achieves Rank 1 in this challenge.

preprint2019arXiv

Direct comparison of many-body methods for realistic electronic Hamiltonians

A large collaboration carefully benchmarks 20 first principles many-body electronic structure methods on a test set of 7 transition metal atoms, and their ions and monoxides. Good agreement is attained between the 3 systematically converged methods, resulting in experiment-free reference values. These reference values are used to assess the accuracy of modern emerging and scalable approaches to the many-electron problem. The most accurate methods obtain energies indistinguishable from experimental results, with the agreement mainly limited by the experimental uncertainties. Comparison between methods enables a unique perspective on calculations of many-body systems of electrons.