Source author record

Chao Yang

Chao Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

87works

49topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Fusing in 3D: Free-Viewpoint Fusion Rendering with a 3D Infrared-Visible Scene Representation

Infrared-visible image fusion aims to integrate infrared and visible information into a single fused image. Existing 2D fusion methods focus on fusing images from fixed camera viewpoints, neglecting a comprehensive understanding of complex scenarios, which results in the loss of critical information about the scene. To address this limitation, we propose a novel Infrared-Visible Gaussian Fusion (IVGF) framework, which reconstructs scene geometry from multimodal 2D inputs and enables direct rendering of fused images. Specifically, we propose a cross-modal adjustment (CMA) module that modulates the opacity of Gaussians to solve the problem of cross-modal conflicts. Moreover, to preserve the distinctive features from both modalities, we introduce a fusion loss that guides the optimization of CMA, thus ensuring that the fused image retains the critical characteristics of each modality. Comprehensive qualitative and quantitative experiments demonstrate the effectiveness of the proposed method.

preprint2026arXiv

LongDPM: Overlap-Aware 4D Reconstruction from Long Monocular Videos

Recovering a dynamic 3D scene from a long monocular video is crucial for dense geometry, camera motion, and temporal correspondence to remain consistent in a shared coordinate system. Existing methods face two key challenges: (1) feed-forward reconstruction models provide accurate local predictions but are limited to short clips, and (2) long-range trackers preserve correspondences without producing dense sequence-level reconstruction. This paper presents LongDPM, a novel overlap-aware framework for scalable long-range monocular dynamic reconstruction. First, LongDPM processes long videos in overlapping chunks, keeping inference memory bounded by the chunk length. Second, it connects chunk-local coordinate systems through confidence-weighted registration with static-aware overlap abstraction. Third, it associates dynamic identities across chunk boundaries and fuses matched trajectories to recover coherent long-range 3D motion. Experimental results demonstrate that LongDPM achieves superior long-range reconstruction and tracking performance, reducing dense tracking EPE over V-DPM on PointOdyssey, Kubric-F, and Kubric-G, while obtaining the best TUM-dynamics ATE for camera pose estimation.

preprint2026arXiv

Modality-Decoupled RGB-Thermal Object Detector via Query Fusion

The advantage of RGB-Thermal (RGB-T) detection lies in its ability to perform modality fusion and integrate cross-modality complementary information, enabling robust detection under diverse illumination and weather conditions. However, under extreme conditions where one modality exhibits poor quality and disturbs detection, modality separation is necessary to mitigate the impact of noise. To address this problem, we propose a Modality-Decoupled RGB-T detection framework with Query Fusion (MDQF) to balance modality complementation and separation. In this framework, DETR-like detectors are employed as separate branches for the RGB and TIR images, with query fusion interspersed between the two branches in each refinement stage. Herein, query fusion is performed by feeding the high-quality queries from one branch to the other one after query selection and adaptation. This design effectively excludes the degraded modality and corrects the predictions using high-quality queries. Moreover, the decoupled framework allows us to optimize each individual branch with unpaired RGB or TIR images, eliminating the need for paired RGB-T data. Extensive experiments demonstrate that our approach delivers superior performance to existing RGB-T detectors and achieves better modality independence.

preprint2026arXiv

Multi-proposal Collaboration and Multi-task Training for Weakly-supervised Video Moment Retrieval

This study focuses on weakly-supervised Video Moment Retrieval (VMR), aiming to identify a moment semantically similar to the given query within an untrimmed video using only video-level correspondences, without relying on temporal annotations during training. Previous methods either aggregate predictions for all instances in the video, or indirectly address the task by proposing reconstructions for the query. However, these methods often produce low-quality temporal proposals, struggle with distinguishing misaligned moments in the same video, or lack stability due to a reliance on a single auxiliary task. To address these limitations, we present a novel weakly-supervised method called Multi-proposal Collaboration and Multi-task Training (MCMT). Initially, we generate multiple proposals and derive corresponding learnable Gaussian masks from them. These masks are then combined to create a high-quality positive sample mask, highlighting video clips most relevant to the query. Concurrently, we classify other clips in the same video as the easy negative sample and the entire video as the hard negative sample. During training, we introduce forward and inverse masked query reconstruction tasks to impose more substantial constraints on the network, promoting more robust and stable retrieval performance. Extensive experiments on two standard benchmarks affirm the effectiveness of the proposed method in VMR.

preprint2026arXiv

Multimodal Signal Processing For Thermo-Visible-Lidar Fusion In Real-time 3D Semantic Mapping

In complex environments, autonomous robot navigation and environmental perception pose higher requirements for SLAM technology. This paper presents a novel method for semantically enhancing 3D point cloud maps with thermal information. By first performing pixel-level fusion of visible and infrared images, the system projects real-time LiDAR point clouds onto this fused image stream. It then segments heat source features in the thermal channel to instantly identify high temperature targets and applies this temperature information as a semantic layer on the final 3D map. This approach generates maps that not only have accurate geometry but also possess a critical semantic understanding of the environment, making it highly valuable for specific applications like rapid disaster assessment and industrial preventive maintenance.

preprint2026arXiv

Report of the 5th PVUW Challenge: Towards More Diverse Modalities in Pixel-Level Understanding

This report summarizes the objectives, datasets, and top-performing methodologies of the 2026 Pixel-level Video Understanding in the Wild (PVUW) Challenge, hosted at CVPR 2026, which evaluates state-of-the-art models under highly unconstrained conditions. To provide a comprehensive assessment, the 2026 edition features three specialized tracks: the MOSE track for tracking objects within densely cluttered and severely occluded scenarios; the MeViS-Text track for localizing targets via motion-focused linguistic expressions; and the newly inaugurated MeViS-Audio track, which pioneers acoustic-driven object segmentation. By introducing previously unreleased challenging data and analyzing the cutting-edge, multimodal solutions submitted by participants, this report highlights the community's latest technical advancements and charts promising future directions for robust video scene comprehension.

preprint2026arXiv

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

Reusable skills are becoming a common interface for extending large language model agents, packaging procedural guidance with access to files, tools, memory, and execution environments. However, this modularity introduces attack surfaces that are largely missed by existing safety evaluations: even when the user request is benign, task-relevant skill materials or local artifacts can steer an agent toward unsafe actions. We present SkillSafetyBench, a runnable benchmark for evaluating such skill-mediated safety failures. SkillSafetyBench includes 155 adversarial cases across 47 tasks, 6 risk domains, and 30 safety categories, each evaluated with a case-specific rule-based verifier. Experiments with multiple CLI agents and model backends show that localized non-user attacks can consistently induce unsafe behavior, with distinct failure patterns across domains, attack methods, and scaffold-model pairings. Our findings suggest that agent safety depends not only on model-level alignment, but also on how agents interpret skills, trust workflow context, and act through executable environments.

preprint2026arXiv

Thermo-LIO: A Novel Multi-Sensor Integrated System for Structural Health Monitoring

Traditional two-dimensional thermography, despite being non-invasive and useful for defect detection in the construction field, is limited in effectively assessing complex geometries, inaccessible areas, and subsurface defects. This paper introduces Thermo-LIO, a novel multi-sensor system that can enhance Structural Health Monitoring (SHM) by fusing thermal imaging with high-resolution LiDAR. To achieve this, the study first develops a multimodal fusion method combining thermal imaging and LiDAR, enabling precise calibration and synchronization of multimodal data streams to create accurate representations of temperature distributions in buildings. Second, it integrates this fusion approach with LiDAR-Inertial Odometry (LIO), enabling full coverage of large-scale structures and allowing for detailed monitoring of temperature variations and defect detection across inspection cycles. Experimental validations, including case studies on a bridge and a hall building, demonstrate that Thermo-LIO can detect detailed thermal anomalies and structural defects more accurately than traditional methods. The system enhances diagnostic precision, enables real-time processing, and expands inspection coverage, highlighting the crucial role of multimodal sensor integration in advancing SHM methodologies for large-scale civil infrastructure.

preprint2024arXiv

The Dust Attenuation Scaling Relation of Star-Forming Galaxies in the EAGLE Simulations

Dust attenuation in star-forming galaxies (SFGs), as parameterized by the infrared excess (IRX $\equiv L_{\rm IR}/L_{\rm UV}$), is found to be tightly correlated with star formation rate (SFR), metallicity and galaxy size, following a universal IRX relation up to $z=3$. This scaling relation can provide a fundamental constraint for theoretical models to reconcile galaxy star formation, chemical enrichment, and structural evolution across cosmic time. We attempt to reproduce the universal IRX relation over $0.1\leq z\leq 2.5$ using the EAGLE hydrodynamical simulations and examine sensitive parameters in determining galaxy dust attenuation. Our findings show that while the predicted universal IRX relation from EAGLE approximately aligns with observations at $z\leq 0.5$, noticeable disparities arise at different stellar masses and higher redshifts. Specifically, we investigate how modifying various galaxy parameters can affect the predicted universal IRX relation in comparison to the observed data. We demonstrate that the simulated gas-phase metallicity is the critical quantity for the shape of the predicted universal IRX relation. We find that the influence of the infrared luminosity and infrared excess is less important while galaxy size has virtually no significant effect. Overall, the EAGLE simulations are not able to replicate some of the observed characteristics between IRX and galaxy parameters of SFGs, emphasizing the need for further investigation and testing for our current state-of-the-art theoretical models.

preprint2024arXiv

Undecidability of tiling the plane with a fixed number of Wang bars

To study the fixed parameter undecidability of tiling problem for a set of Wang tiles, Jeandel and Rolin show that the tiling problem for a set of 44 Wang bars is undecidable. In this paper, we improve their result by proving that whether a set of 29 Wang bars can tile the plane is undecidable. As a consequence, the tiling problem for a set of Wang tiles with color deficiency of 25 is also undecidable.

preprint2023arXiv

SAS: Self-Augmentation Strategy for Language Model Pre-training

The core of self-supervised learning for pre-training language models includes pre-training task design as well as appropriate data augmentation. Most data augmentations in language model pre-training are context-independent. A seminal contextualized augmentation was recently proposed in ELECTRA and achieved state-of-the-art performance by introducing an auxiliary generation network (generator) to produce contextualized data augmentation for the training of a main discrimination network (discriminator). This design, however, introduces extra computation cost of the generator and a need to adjust the relative capability between the generator and the discriminator. In this paper, we propose a self-augmentation strategy (SAS) where a single network is utilized for both regular pre-training and contextualized data augmentation for the training in later epochs. Essentially, this strategy eliminates a separate generator and uses the single network to jointly conduct two pre-training tasks with MLM (Masked Language Modeling) and RTD (Replaced Token Detection) heads. It avoids the challenge to search for an appropriate size of the generator, which is critical to the performance as evidenced in ELECTRA and its subsequent variant models. In addition, SAS is a general strategy that can be seamlessly combined with many new techniques emerging recently or in the future, such as the disentangled attention mechanism from DeBERTa. Our experiments show that SAS is able to outperform ELECTRA and other state-of-the-art models in the GLUE tasks with similar or less computation cost.

preprint2022arXiv

A Robust Multimodal Remote Sensing Image Registration Method and System Using Steerable Filters with First- and Second-order Gradients

Co-registration of multimodal remote sensing images is still an ongoing challenge because of nonlinear radiometric differences (NRD) and significant geometric distortions (e.g., scale and rotation changes) between these images. In this paper, a robust matching method based on the Steerable filters is proposed consisting of two critical steps. First, to address severe NRD, a novel structural descriptor named the Steerable Filters of first- and second-Order Channels (SFOC) is constructed, which combines the first- and second-order gradient information by using the steerable filters with a multi-scale strategy to depict more discriminative structure features of images. Then, a fast similarity measure is established called Fast Normalized Cross-Correlation (Fast-NCCSFOC), which employs the Fast Fourier Transform technique and the integral image to improve the matching efficiency. Furthermore, to achieve reliable registration performance, a coarse-to-fine multimodal registration system is designed consisting of two pivotal modules. The local coarse registration is first conducted by involving both detection of interest points (IPs) and local geometric correction, which effectively utilizes the prior georeferencing information of RS images to address global geometric distortions. In the fine registration stage, the proposed SFOC is used to resist significant NRD, and to detect control points between multimodal images by a template matching scheme. The performance of the proposed matching method has been evaluated with many different kinds of multimodal RS images. The results show its superior matching performance compared with the state-of-the-art methods. Moreover, the designed registration system also outperforms the popular commercial software in both registration accuracy and computational efficiency. Our system is available at https://github.com/yeyuanxin110.

preprint2022arXiv

An extension on neighbor sum distinguishing total coloring of graphs

Let $f: V(G)\cup E(G)\rightarrow \{1,2,\dots,k\}$ be a non-proper total $k$-coloring of $G$. Define a weight function on total coloring as $$ϕ(x)=f(x)+\sum\limits_{e\ni x}f(e)+\sum\limits_{y\in N(x)}f(y),$$ where $N(x)=\{y\in V(G)|xy\in E(G)\}$. If $ϕ(x)\neq ϕ(y)$ for any edge $xy\in E(G)$, then $f$ is called a neighbor full sum distinguishing total $k$-coloring of $G$. The smallest value $k$ for which $G$ has such a coloring is called the neighbor full sum distinguishing total chromatic number of $G$ and denoted by fgndi$_{\sum}(G)$. The coloring is an extension of neighbor sum distinguishing non-proper total coloring. In this paper we conjecture that fgndi$_{\sum}(G)\leq 3$ for any connected graph $G$ of order at least three. We prove that the conjecture is true for (i) paths and cycles; (ii) 3-regular graphs and (iii) stars, complete graphs, trees, hypercubes, bipartite graphs and complete $r$-partite graphs. In particular, complete graphs can achieve the upper bound for the above conjecture.

preprint2022arXiv

DAS-PINNs: A deep adaptive sampling method for solving high-dimensional partial differential equations

In this work we propose a deep adaptive sampling (DAS) method for solving partial differential equations (PDEs), where deep neural networks are utilized to approximate the solutions of PDEs and deep generative models are employed to generate new collocation points that refine the training set. The overall procedure of DAS consists of two components: solving the PDEs by minimizing the residual loss on the collocation points in the training set and generating a new training set to further improve the accuracy of current approximate solution. In particular, we treat the residual as a probability density function and approximate it with a deep generative model, called KRnet. The new samples from KRnet are consistent with the distribution induced by the residual, i.e., more samples are located in the region of large residual and less samples are located in the region of small residual. Analogous to classical adaptive methods such as the adaptive finite element, KRnet acts as an error indicator that guides the refinement of the training set. Compared to the neural network approximation obtained with uniformly distributed collocation points, the developed algorithms can significantly improve the accuracy, especially for low regularity and high-dimensional problems. We demonstrate the effectiveness of the proposed DAS method with numerical experiments.

preprint2022arXiv

GUIM -- General User and Item Embedding with Mixture of Representation in E-commerce

Our goal is to build general representation (embedding) for each user and each product item across Alibaba's businesses, including Taobao and Tmall which are among the world's biggest e-commerce websites. The representation of users and items has been playing a critical role in various downstream applications, including recommendation system, search, marketing, demand forecasting and so on. Inspired from the BERT model in natural language processing (NLP) domain, we propose a GUIM (General User Item embedding with Mixture of representation) model to achieve the goal with massive, structured, multi-modal data including the interactions among hundreds of millions of users and items. We utilize mixture of representation (MoR) as a novel representation form to model the diverse interests of each user. In addition, we use the InfoNCE from contrastive learning to avoid intractable computational costs due to the numerous size of item (token) vocabulary. Finally, we propose a set of representative downstream tasks to serve as a standard benchmark to evaluate the quality of the learned user and/or item embeddings, analogous to the GLUE benchmark in NLP domain. Our experimental results in these downstream tasks clearly show the comparative value of embeddings learned from our GUIM model.

preprint2022arXiv

KSSOLV 2.0: An efficient MATLAB toolbox for solving the Kohn-Sham equations with plane-wave basis set

KSSOLV (Kohn-Sham Solver) is a MATLAB toolbox for performing Kohn-Sham density functional theory (DFT) calculations with a plane-wave basis set. KSSOLV 2.0 preserves the design features of the original KSSOLV software to allow users and developers to easily set up a problem and perform ground-state calculations as well as to prototype and test new algorithms. Furthermore, it includes new functionalities such as new iterative diagonalization algorithms, k-point sampling for electron band structures, geometry optimization and advanced algorithms for performing DFT calculations with local, semi-local, and hybrid exchange-correlation functionals. It can be used to study the electronic structures of both molecules and solids. We describe these new capabilities in this work through a few use cases. We also demonstrate the numerical accuracy and computational efficiency of KSSOLV on a variety of examples.

preprint2022arXiv

Randomized Algorithms for Scientific Computing (RASC)

Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.

preprint2022arXiv

Research on Multi-Objective Planning of Electric Vehicle Charging Stations Considering the Condition of Urban Traffic Network

As an important supporting facility for electric vehicles, the reasonable planning and layout of charging stations are of great significance to the development of electric vehicles. However, the planning and layout of charging stations is affected by various complex factors such as policy economy, charging demand, user charging comfort, and road traffic conditions. How to weigh various factors to construct a reasonable model of charging station location and capacity has become a major difficulty in the field of electric vehicle charging facility planning. Firstly, this paper constructs the location and capacity optimization model of the charging station with the goal of maximizing the revenue of operators and minimizing the user's charging additional cost. At the same time, the road time-consuming index is introduced to quantify the impact of road congestion on the user's charging additional cost, so as to effectively improve the user's satisfaction during charging. Then, aiming at the charging station planning model, a non-dominated sorting genetic algorithm with an elite strategy (NSGA-II) based on chaos initialization and arithmetic crossover operator is proposed. Finally, taking the Haidian District of Beijing as the simulation object, the results show that compared with the situation of urban traffic networks not considered, the model proposed in this paper significantly reduces the cost of lost time of users by 11.4% and the total additional cost of users' charging by 7.6%. It not only ensures the economy of the system, but also effectively improves the charging satisfaction of users, which further verifies the feasibility and effectiveness of the model, and can provide a reference for the planning and layout of charging stations in the future.

preprint2022arXiv

Sim2Real Object-Centric Keypoint Detection and Description

Keypoint detection and description play a central role in computer vision. Most existing methods are in the form of scene-level prediction, without returning the object classes of different keypoints. In this paper, we propose the object-centric formulation, which, beyond the conventional setting, requires further identifying which object each interest point belongs to. With such fine-grained information, our framework enables more downstream potentials, such as object-level matching and pose estimation in a clustered environment. To get around the difficulty of label collection in the real world, we develop a sim2real contrastive learning mechanism that can generalize the model trained in simulation to real-world applications. The novelties of our training method are three-fold: (i) we integrate the uncertainty into the learning framework to improve feature description of hard cases, e.g., less-textured or symmetric patches; (ii) we decouple the object descriptor into two output branches -- intra-object salience and inter-object distinctness, resulting in a better pixel-wise description; (iii) we enforce cross-view semantic consistency for enhanced robustness in representation learning. Comprehensive experiments on image matching and 6D pose estimation verify the encouraging generalization ability of our method from simulation to reality. Particularly for 6D pose estimation, our method significantly outperforms typical unsupervised/sim2real methods, achieving a closer gap with the fully supervised counterpart. Additional results and videos can be found at https://zhongcl-thu.github.io/rock/

preprint2022arXiv

Topological Authentication Technique In Topologically Asymmetric Cryptosystem

Making topological authentication from theory to practical application is an important and challenging task. More and more researchers pay attention on coming quantum computation, privacy data protection, lattices and cryptography. Research show the advantages of topological authentications through graph operations, various matrices, graph colorings and graph labelings are: related with two or more different mathematical areas, be not pictures, there are huge number of colorings and labelings, rooted on modern mathematics, diversity of asymmetric ciphers, simplicity and convenience, easily created, irreversibility, computational security, provable security, and so on. Topological authentications based on various graph homomorphisms, degree-sequence homomorphisms, graph-set homomorphisms. Randomly topological coding and topological authentications are based on Hanzi authentication, randomly adding-edge-removing operation, randomly leaf-adding algorithms, graph random increasing techniques, operation graphic lattice and dynamic networked models and their spanning trees and maximum leaf spanning trees. Realization of topological authentication is an important topic, we study: number-based strings generated from colored graphs, particular graphs (complete graphs, trees, planar graphs), some methods of generating public-keys. some techniques of topologically asymmetric cryptosystem are: W-type matching labelings, dual-type labelings, reciprocal-type labelings, topological homomorphisms, indexed colorings, graphic lattices, degree-sequence lattices, every-zero Cds-matrix groups of degree-sequences, every-zero graphic groups, graphic lattices having coloring closure property, self-similar networked lattices.

preprint2022arXiv

Using dynamic mode decomposition to predict the dynamics of a two-time non-equilibrium Green's function

Computing the numerical solution of the Kadanoff-Baym equations, a set of nonlinear integral differential equations satisfied by two-time Green's functions derived from many-body perturbation theory for a quantum many-body system away from equilibrium, is a challenging task. Recently, we have successfully applied dynamic mode decomposition (DMD) to construct a data driven reduced order model that can be used to extrapolate the time-diagonal of a two-time Green's function from numerical solution of the KBE within a small time window. In this paper, we extend the previous work and use DMD to predict off-diagonal elements of the two-time Green's function. We partition the two-time Green's function into a number of one-time functions along the diagonal and subdiagonls of the two-time window as well as in horizontal and vertical directions. We use DMD to construct separate reduced order models to predict the dynamics of these one-time functions in a two-step procedure. We extrapolate along diagonal and several subdiagonals within a subdiagonal band of a two-time window in the first step. In the second step, we use DMD to extrapolate the Green's function outside of the sub-diagonal band. We demonstrate the efficiency and accuracy of this approach by applying it to a two-band Hubbard model problem.

preprint2022arXiv

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) We propose U2++, a unified two-pass framework with bidirectional attention decoders, which includes the future contextual information by a right-to-left attention decoder to improve the representative ability of the shared encoder and the performance during the rescoring stage. (2) We introduce an n-gram based language model and a WFST-based decoder into WeNet 2.0, promoting the use of rich text data in production scenarios. (3) We design a unified contextual biasing framework, which leverages user-specific context (e.g., contact lists) to provide rapid adaptation ability for production and improves ASR accuracy in both with-LM and without-LM scenarios. (4) We design a unified IO to support large-scale data for effective model training. In summary, the brand-new WeNet 2.0 achieves up to 10\% relative recognition performance improvement over the original WeNet on various corpora and makes available several important production-oriented features.

preprint2022arXiv

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total. We collect the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions. An optical character recognition (OCR) based method is introduced to generate the audio/text segmentation candidates for the YouTube data on its corresponding video captions, while a high-quality ASR transcription system is used to generate audio/text pair candidates for the Podcast data. Then we propose a novel end-to-end label error detection approach to further validate and filter the candidates. We also provide three manually labelled high-quality test sets along with WenetSpeech for evaluation -- Dev for cross-validation purpose in training, Test_Net, collected from Internet for matched test, and Test\_Meeting, recorded from real meetings for more challenging mismatched test. Baseline systems trained with WenetSpeech are provided for three popular speech recognition toolkits, namely Kaldi, ESPnet, and WeNet, and recognition results on the three test sets are also provided as benchmarks. To the best of our knowledge, WenetSpeech is the current largest open-sourced Mandarin speech corpus with transcriptions, which benefits research on production-level speech recognition.

preprint2021arXiv

A greedy algorithm for computing eigenvalues of a symmetric matrix

We present a greedy algorithm for computing selected eigenpairs of a large sparse matrix $H$ that can exploit localization features of the eigenvector. When the eigenvector to be computed is localized, meaning only a small number of its components have large magnitudes, the proposed algorithm identifies the location of these components in a greedy manner, and obtains approximations to the desired eigenpairs of $H$ by computing eigenpairs of a submatrix extracted from the corresponding rows and columns of $H$. Even when the eigenvector is not completely localized, the approximate eigenvectors obtained by the greedy algorithm can be used as good starting guesses to accelerate the convergence of an iterative eigensolver applied to $H$. We discuss a few possibilities for selecting important rows and columns of $H$ and techniques for constructing good initial guesses for an iterative eigensolver using the approximate eigenvectors returned from the greedy algorithm. We demonstrate the effectiveness of this approach with examples from nuclear quantum many-body calculations, many-body localization studies of quantum spin chains and road network analysis.

preprint2021arXiv

A strange metal in a bosonic system

Fermi liquid theory forms the basis for our understanding of the majority of metals, which is manifested in the description of transport properties that the electrical resistivity goes as temperature squared in the limit of zero temperature. However, the observations of strange metal states in various quantum materials, notably high-temperature superconductors, bring this spectacularly successful theoretical framework into crisis. When electron scattering rate 1/τ hits its limit, kBT/{\hbar} where {\hbar} is the reduced Planck's constant, T represents absolute temperature and kB denotes Boltzmann's constant, Planckian dissipation occurs and lends strange metals a surprising link to black holes, gravity, and quantum information theory. Here, we show the characteristic signature of strange metallicity arising unprecedentedly in a bosonic system. Our nanopatterned YBa2Cu3O7-δ(YBCO) film arrays reveal T-linear resistance as well as B-linear magnetoresistance over an extended temperature and magnetic field range in a quantum critical region in the phase diagram. Moreover, the slope of the T-linear resistance α_cp appears bounded by α_cp {\approx} h/2e^2 [1/T]_c^onset where T_c^onset is the temperature at which Cooper pairs form, intimating a common scale-invariant transport mechanism corresponding to Planckian dissipation.In contrast to fermionic systems where the temperature and magnetic field dependent scattering rates combine in quadrature of {\hbar}/τ {\approx} {\sqrt} (((k_B T)^2+(μ_B B)^2)), both terms linearly combine in the present bosonic system, i.e. {\hbar}/τ {\approx} (k_B T+[γμ]_B B), where γ is a constant. By extending the reach of strange metal phenomenology to a bosonic system, our results suggest that there is a fundamental principle governing their transport which transcends particle statistics.

preprint2021arXiv

Accelerating quantum many-body configuration interaction with directives

Many-Fermion Dynamics-nuclear, or MFDn, is a configuration interaction (CI) code for nuclear structure calculations. It is a platform-independent Fortran 90 code using a hybrid MPI+X programming model. For CPU platforms the application has a robust and optimized OpenMP implementation for shared memory parallelism. As part of the NESAP application readiness program for NERSC's latest Perlmutter system, MFDn has been updated to take advantage of accelerators. The current mainline GPU port is based on OpenACC. In this work we describe some of the key challenges of creating an efficient GPU implementation. Additionally, we compare the support of OpenMP and OpenACC on AMD and NVIDIA GPUs.

preprint2021arXiv

CoRe: An Efficient Coarse-refined Training Framework for BERT

In recent years, BERT has made significant breakthroughs on many natural language processing tasks and attracted great attentions. Despite its accuracy gains, the BERT model generally involves a huge number of parameters and needs to be trained on massive datasets, so training such a model is computationally very challenging and time-consuming. Hence, training efficiency should be a critical issue. In this paper, we propose a novel coarse-refined training framework named CoRe to speed up the training of BERT. Specifically, we decompose the training process of BERT into two phases. In the first phase, by introducing fast attention mechanism and decomposing the large parameters in the feed-forward network sub-layer, we construct a relaxed BERT model which has much less parameters and much lower model complexity than the original BERT, so the relaxed model can be quickly trained. In the second phase, we transform the trained relaxed BERT model into the original BERT and further retrain the model. Thanks to the desired initialization provided by the relaxed model, the retraining phase requires much less training steps, compared with training an original BERT model from scratch with a random initialization. Experimental results show that the proposed CoRe framework can greatly reduce the training time without reducing the performance.

preprint2021arXiv

Symmetric-Constrained Irregular Structure Inpainting for Brain MRI Registration with Tumor Pathology

Deformable registration of magnetic resonance images between patients with brain tumors and healthy subjects has been an important tool to specify tumor geometry through location alignment and facilitate pathological analysis. Since tumor region does not match with any ordinary brain tissue, it has been difficult to deformably register a patients brain to a normal one. Many patient images are associated with irregularly distributed lesions, resulting in further distortion of normal tissue structures and complicating registration's similarity measure. In this work, we follow a multi-step context-aware image inpainting framework to generate synthetic tissue intensities in the tumor region. The coarse image-to-image translation is applied to make a rough inference of the missing parts. Then, a feature-level patch-match refinement module is applied to refine the details by modeling the semantic relevance between patch-wise features. A symmetry constraint reflecting a large degree of anatomical symmetry in the brain is further proposed to achieve better structure understanding. Deformable registration is applied between inpainted patient images and normal brains, and the resulting deformation field is eventually used to deform original patient data for the final alignment. The method was applied to the Multimodal Brain Tumor Segmentation (BraTS) 2018 challenge database and compared against three existing inpainting methods. The proposed method yielded results with increased peak signal-to-noise ratio, structural similarity index, inception score, and reduced L1 error, leading to successful patient-to-normal brain image registration.

preprint2021arXiv

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

The unified streaming and non-streaming two-pass (U2) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy, real-time factor (RTF), and latency. In this paper, we present U2++, an enhanced version of U2 to further improve the accuracy. The core idea of U2++ is to use the forward and the backward information of the labeling sequences at the same time at training to learn richer information, and combine the forward and backward prediction at decoding to give more accurate recognition results. We also proposed a new data augmentation method called SpecSub to help the U2++ model to be more accurate and robust. Our experiments show that, compared with U2, U2++ shows faster convergence at training, better robustness to the decoding method, as well as consistent 5\% - 8\% word error rate reduction gain over U2. On the experiment of AISHELL-1, we achieve a 4.63\% character error rate (CER) with a non-streaming setup and 5.05\% with a streaming setup with 320ms latency by U2++. To the best of our knowledge, 5.05\% is the best-published streaming result on the AISHELL-1 test set.

preprint2021arXiv

VoxelHop: Successive Subspace Learning for ALS Disease Classification Using Structural MRI

Deep learning has great potential for accurate detection and classification of diseases with medical imaging data, but the performance is often limited by the number of training datasets and memory requirements. In addition, many deep learning models are considered a "black-box," thereby often limiting their adoption in clinical applications. To address this, we present a successive subspace learning model, termed VoxelHop, for accurate classification of Amyotrophic Lateral Sclerosis (ALS) using T2-weighted structural MRI data. Compared with popular convolutional neural network (CNN) architectures, VoxelHop has modular and transparent structures with fewer parameters without any backpropagation, so it is well-suited to small dataset size and 3D imaging data. Our VoxelHop has four key components, including (1) sequential expansion of near-to-far neighborhood for multi-channel 3D data; (2) subspace approximation for unsupervised dimension reduction; (3) label-assisted regression for supervised dimension reduction; and (4) concatenation of features and classification between controls and patients. Our experimental results demonstrate that our framework using a total of 20 controls and 26 patients achieves an accuracy of 93.48$\%$ and an AUC score of 0.9394 in differentiating patients from controls, even with a relatively small number of datasets, showing its robustness and effectiveness. Our thorough evaluations also show its validity and superiority to the state-of-the-art 3D CNN classification methods. Our framework can easily be generalized to other classification tasks using different imaging modalities.

preprint2020arXiv

A Multilevel Spectral Indicator Method for Eigenvalues of Large Non-Hermitian Matrices

Recently a novel family of eigensolvers, called spectral indicator methods (SIMs), was proposed. Given a region on the complex plane, SIMs first compute an indicator by the spectral projection. The indicator is used to test if the region contains eigenvalue(s). Then the region containing eigenvalues(s) is subdivided and tested. The procedure is repeated until the eigenvalues are identified within a specified precision. In this paper, using Cayley transformation and Krylov subspaces, a memory efficient multilevel eigensolver is proposed. The method uses less memory compared with the early versions of SIMs and is particularly suitable to compute many eigenvalues of large sparse (non-Hermitian) matrices. Several examples are presented for demonstration.

preprint2020arXiv

A Shift Selection Strategy for Parallel Shift-Invert Spectrum Slicing in Symmetric Self-Consistent Eigenvalue Computation

The central importance of large scale eigenvalue problems in scientific computation necessitates the development of massively parallel algorithms for their solution. Recent advances in dense numerical linear algebra have enabled the routine treatment of eigenvalue problems with dimensions on the order of hundreds of thousands on the world's largest supercomputers. In cases where dense treatments are not feasible, Krylov subspace methods offer an attractive alternative due to the fact that they do not require storage of the problem matrices. However, demonstration of scalability of either of these classes of eigenvalue algorithms on computing architectures capable of expressing massive parallelism is non-trivial due to communication requirements and serial bottlenecks, respectively. In this work, we introduce the SISLICE method: a parallel shift-invert algorithm for the solution of the symmetric self-consistent field (SCF) eigenvalue problem. The SISLICE method drastically reduces the communication requirement of current parallel shift-invert eigenvalue algorithms through various shift selection and migration techniques based on density of states estimation and k-means clustering, respectively. This work demonstrates the robustness and parallel performance of the SISLICE method on a representative set of SCF eigenvalue problems and outlines research directions which will be explored in future work.

preprint2020arXiv

Constrained R-CNN: A general image manipulation detection model

Recently, deep learning-based models have exhibited remarkable performance for image manipulation detection. However, most of them suffer from poor universality of handcrafted or predetermined features. Meanwhile, they only focus on manipulation localization and overlook manipulation classification. To address these issues, we propose a coarse-to-fine architecture named Constrained R-CNN for complete and accurate image forensics. First, the learnable manipulation feature extractor learns a unified feature representation directly from data. Second, the attention region proposal network effectively discriminates manipulated regions for the next manipulation classification and coarse localization. Then, the skip structure fuses low-level and high-level information to refine the global manipulation features. Finally, the coarse localization information guides the model to further learn the finer local features and segment out the tampered region. Experimental results show that our model achieves state-of-the-art performance. Especially, the F1 score is increased by 28.4%, 73.2%, 13.3% on the NIST16, COVERAGE, and Columbia dataset.

preprint2020arXiv

Context-Integrated and Feature-Refined Network for Lightweight Object Parsing

Semantic segmentation for lightweight object parsing is a very challenging task, because both accuracy and efficiency (e.g., execution speed, memory footprint or computational complexity) should all be taken into account. However, most previous works pay too much attention to one-sided perspective, either accuracy or speed, and ignore others, which poses a great limitation to actual demands of intelligent devices. To tackle this dilemma, we propose a novel lightweight architecture named Context-Integrated and Feature-Refined Network (CIFReNet). The core components of CIFReNet are the Long-skip Refinement Module (LRM) and the Multi-scale Context Integration Module (MCIM). The LRM is designed to ease the propagation of spatial information between low-level and high-level stages. Furthermore, channel attention mechanism is introduced into the process of long-skip learning to boost the quality of low-level feature refinement. Meanwhile, the MCIM consists of three cascaded Dense Semantic Pyramid (DSP) blocks with image-level features, which is presented to encode multiple context information and enlarge the field of view. Specifically, the proposed DSP block exploits a dense feature sampling strategy to enhance the information representations without significantly increasing the computation cost. Comprehensive experiments are conducted on three benchmark datasets for object parsing including Cityscapes, CamVid, and Helen. As indicated, the proposed method reaches a better trade-off between accuracy and efficiency compared with the other state-of-the-art methods.

preprint2020arXiv

ELSI -- An Open Infrastructure for Electronic Structure Solvers

Routine applications of electronic structure theory to molecules and periodic systems need to compute the electron density from given Hamiltonian and, in case of non-orthogonal basis sets, overlap matrices. System sizes can range from few to thousands or, in some examples, millions of atoms. Different discretization schemes (basis sets) and different system geometries (finite non-periodic vs. infinite periodic boundary conditions) yield matrices with different structures. The ELectronic Structure Infrastructure (ELSI) project provides an open-source software interface to facilitate the implementation and optimal use of high-performance solver libraries covering cubic scaling eigensolvers, linear scaling density-matrix-based algorithms, and other reduced scaling methods in between. In this paper, we present recent improvements and developments inside ELSI, mainly covering (1) new solvers connected to the interface, (2) matrix layout and communication adapted for parallel calculations of periodic and/or spin-polarized systems, (3) routines for density matrix extrapolation in geometry optimization and molecular dynamics calculations, and (4) general utilities such as parallel matrix I/O and JSON output. The ELSI interface has been integrated into four electronic structure code projects (DFTB+, DGDFT, FHI-aims, SIESTA), allowing us to rigorously benchmark the performance of the solvers on an equal footing. Based on results of a systematic set of large-scale benchmarks performed with Kohn-Sham density-functional theory and density-functional tight-binding theory, we identify factors that strongly affect the efficiency of the solvers, and propose a decision layer that assists with the solver selection process. Finally, we describe a reverse communication interface encoding matrix-free iterative solver strategies that are amenable, e.g., for use with planewave basis sets.

preprint2020arXiv

Fast and Robust Registration of Aerial Images and LiDAR data Based on Structrual Features and 3D Phase Correlation

Co-Registration of aerial imagery and Light Detection and Ranging (LiDAR) data is quilt challenging because the different imaging mechanism causes significant geometric and radiometric distortions between such data. To tackle the problem, this paper proposes an automatic registration method based on structural features and three-dimension (3D) phase correlation. In the proposed method, the LiDAR point cloud data is first transformed into the intensity map, which is used as the reference image. Then, we employ the Fast operator to extract uniformly distributed interest points in the aerial image by a partition strategy and perform a local geometric correction by using the collinearity equation to eliminate scale and rotation difference between images. Subsequently, a robust structural feature descriptor is build based on dense gradient features, and the 3D phase correlation is used to detect control points (CPs) between aerial images and LiDAR data in the frequency domain, where the image matching is accelerated by the 3D Fast Fourier Transform (FFT). Finally, the obtained CPs are employed to correct the exterior orientation elements, which is used to achieve co-registration of aerial images and LiDAR data. Experiments with two datasets of aerial images and LiDAR data show that the proposed method is much faster and more robust than state of the art methods

preprint2020arXiv

Multi-time scale identification for multi-energy system

Multi-energy systems have been leaping forward for its various benefits, e.g., energy conservation and emission reduction. Coupling components are capable of transmitting energy from one time scale system to another time scale system, so the multi-energy system exhibits multi-time scale characteristic and broad bandwidth, thereby causing difficulties in dynamic modeling. In this work, two-time scale system identification is studied. A method is developed to solve the problem, which is uses signal pre-filtering and subtraction. The high and low frequency parts of the two-time scale system are identified separately and then combined to form the incorporated in parallel structure. The consistency of the method is proved and case studies are used to verify the effectiveness of the method.

preprint2020arXiv

On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters

The predominance of Kohn-Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high performance computing (HPC). With recent trends in HPC leading towards in increasing reliance on heterogeneous accelerator based architectures such as graphics processing units (GPU), existing code bases must embrace these architectural advances to maintain the high-levels of performance which have come to be expected for these methods. In this work, we purpose a three-level parallelism scheme for the distributed numerical integration of the exchange-correlation (XC) potential in the Gaussian basis set discretization of the Kohn-Sham equations on large computing clusters consisting of multiple GPUs per compute node. In addition, we purpose and demonstrate the efficacy of the use of batched kernels, including batched level-3 BLAS operations, in achieving high-levels of performance on the GPU. We demonstrate the performance and scalability of the implementation of the purposed method in the NWChemEx software package by comparing to the existing scalable CPU XC integration in NWChem.

preprint2020arXiv

One-Shot Domain Adaptation For Face Generation

In this paper, we propose a framework capable of generating face images that fall into the same distribution as that of a given one-shot example. We leverage a pre-trained StyleGAN model that already learned the generic face distribution. Given the one-shot target, we develop an iterative optimization scheme that rapidly adapts the weights of the model to shift the output's high-level distribution to the target's. To generate images of the same distribution, we introduce a style-mixing technique that transfers the low-level statistics from the target to faces randomly generated with the model. With that, we are able to generate an unlimited number of faces that inherit from the distribution of both generic human faces and the one-shot example. The newly generated faces can serve as augmented training data for other downstream tasks. Such setting is appealing as it requires labeling very few, or even one example, in the target domain, which is often the case of real-world face manipulations that result from a variety of unknown and unique distributions, each with extremely low prevalence. We show the effectiveness of our one-shot approach for detecting face manipulations and compare it with other few-shot domain adaptation methods qualitatively and quantitatively.

preprint2020arXiv

Optical System Design of Bionic Compound Eye with Broad Field of View

In nature, many common insects have compound eyes composed of many small eyes arranged on a curved retina. This kind of vision systems have many advantages, such as small size, large FOV (field of view) and high sensitivity, which have attracted extensive attention and research from world-wide researchers. It has good application prospects in military strikes and mechanical vision. In this paper, a new type of miniature compound eye system with large FOV is designed, which contains a micro-lens array and a relay system. Hexagonal micro-lens array are spliced seamlessly as a curved shell in the designed compound eye system. The intermediate curved image formed by the curved array is converted to a planar image by introducing a relay system. After combination and optimization of the micro-lens array and the relay system, the MTF values at 89.3lp/mm for each FOV within 120.5° are greater than 0.3, and the corresponding RMS spot radii less than the radius of the Airy disk, which proves the good imaging quality for the compound eye. The clear aperture of a single micro lens is 250μm with FOV 6°. After tolerance analysis, the results show the image quality still holds good enough performance and meets the requirements of the additive manufacturing process.

preprint2020arXiv

Parallel energy-stable solver for a coupled Allen-Cahn and Cahn-Hilliard system

In this paper, we study numerical methods for solving the coupled Allen-Cahn/Cahn-Hilliard system associated with a free energy functional of logarithmic type. To tackle the challenge posed by the special free energy functional, we propose a method to approximate the discrete variational derivatives in polynomial forms, such that the corresponding finite difference scheme is unconditionally energy stable and the energy dissipation law is maintained. To further improve the performance of the algorithm, a modified adaptive time stepping strategy is adopted such that the time step size can be flexibly controlled based on the dynamical evolution of the problem. To achieve high performance on parallel computers, we introduce a domain decomposition based, parallel Newton-Krylov-Schwarz method to solve the nonlinear algebraic system constructed from the discretization at each time step. Numerical experiments show that the proposed algorithm is second-order accurate in both space and time, energy stable with large time steps, and highly scalable to over ten thousands processor cores on the Sunway TaihuLight supercomputer.

preprint2020arXiv

Parallel multilevel restricted Schwarz preconditioners for implicit simulation of subsurface flows with Peng-Robinson equation of state

Parallel algorithms and simulators with good scalabilities are particularly important for large-scale reservoir simulations on modern supercomputers with a large number of processors. In this paper, we introduce and study a family of highly scalable multilevel restricted additive Schwarz (RAS) methods for the fully implicit solution of subsurface flows with Peng-Robinson equation of state in two and three dimensions. With the use of a second-order fully implicit scheme, the proposed simulator is unconditionally stable with the relaxation of the time step size by the stability condition. The investigation then focuses on the development of several types of multilevel overlapping additive Schwarz methods for the preconditioning of the resultant linear system arising from the inexact Newton iteration, and some fast solver technologies are presented for the assurance of the multilevel approach efficiency and scalability. We numerically show that the proposed fully implicit framework is highly efficient for solving both standard benchmarks as well as realistic problems with several hundreds of millions of unknowns and scalable to 8192 processors on the Tianhe-2 supercomputer.

preprint2020arXiv

PFNN: A Penalty-Free Neural Network Method for Solving a Class of Second-Order Boundary-Value Problems on Complex Geometries

We present PFNN, a penalty-free neural network method, to efficiently solve a class of second-order boundary-value problems on complex geometries. To reduce the smoothness requirement, the original problem is reformulated to a weak form so that the evaluations of high-order derivatives are avoided. Two neural networks, rather than just one, are employed to construct the approximate solution, with one network satisfying the essential boundary conditions and the other handling the rest part of the domain. In this way, an unconstrained optimization problem, instead of a constrained one, is solved without adding any penalty terms. The entanglement of the two networks is eliminated with the help of a length factor function that is scale invariant and can adapt with complex geometries. We prove the convergence of the PFNN method and conduct numerical experiments on a series of linear and nonlinear second-order boundary-value problems to demonstrate that PFNN is superior to several existing approaches in terms of accuracy, flexibility and robustness.

preprint2020arXiv

Quantum Fourier Transform Revisited

The fast Fourier transform (FFT) is one of the most successful numerical algorithms of the 20th century and has found numerous applications in many branches of computational science and engineering. The FFT algorithm can be derived from a particular matrix decomposition of the discrete Fourier transform (DFT) matrix. In this paper, we show that the quantum Fourier transform (QFT) can be derived by further decomposing the diagonal factors of the FFT matrix decomposition into products of matrices with Kronecker product structure. We analyze the implication of this Kronecker product structure on the discrete Fourier transform of rank-1 tensors on a classical computer. We also explain why such a structure can take advantage of an important quantum computer feature that enables the QFT algorithm to attain an exponential speedup on a quantum computer over the FFT algorithm on a classical computer. Further, the connection between the matrix decomposition of the DFT matrix and a quantum circuit is made. We also discuss a natural extension of a radix-2 QFT decomposition to a radix-d QFT decomposition. No prior knowledge of quantum computing is required to understand what is presented in this paper. Yet, we believe this paper may help readers to gain some rudimentary understanding of the nature of quantum computing from a matrix computation point of view.

preprint2020arXiv

Quench dynamics of Hopf insulators

Hopf insulators are exotic topological states of matter outside the standard ten-fold way classification based on discrete symmetries. Its topology is captured by an integer invariant that describes the linking structures of the Hamiltonian in the three-dimensional momentum space. In this paper, we investigate the quantum dynamics of Hopf insulators across a sudden quench and show that the quench dynamics is characterized by a $\mathbb{Z}_2$ invariant $ν$ which reveals a rich interplay between quantum quench and static band topology. We construct the $\mathbb{Z}_2$ topological invariant using the loop unitary operator, and prove that $ν$ relates the pre- and post-quench Hopf invariants through $ν=(\mathcal{L}-\mathcal{L}_0)\bmod 2$. The $\mathbb{Z}_2$ nature of the dynamical invariant is in sharp contrast to the $\mathbb{Z}$ invariant for the quench dynamics of Chern insulators in two dimensions. The non-trivial dynamical topology is further attributed to the emergence of $π$-defects in the phase band of the loop unitary. These $π$-defects are generally closed curves in the momentum-time space, for example, as nodal rings carrying Hopf charge.

preprint2020arXiv

Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations

Deep encoder-decoder based CNNs have advanced image inpainting methods for hole filling. While existing methods recover structures and textures step-by-step in the hole regions, they typically use two encoder-decoders for separate recovery. The CNN features of each encoder are learned to capture either missing structures or textures without considering them as a whole. The insufficient utilization of these encoder features limit the performance of recovering both structures and textures. In this paper, we propose a mutual encoder-decoder CNN for joint recovery of both. We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively. The deep layer features are sent to a structure branch and the shallow layer features are sent to a texture branch. In each branch, we fill holes in multiple scales of the CNN features. The filled CNN features from both branches are concatenated and then equalized. During feature equalization, we reweigh channel attentions first and propose a bilateral propagation activation function to enable spatial equalization. To this end, the filled CNN features of structure and texture mutually benefit each other to represent image content at all feature levels. We use the equalized feature to supplement decoder features for output image generation through skip connections. Experiments on the benchmark datasets show the proposed method is effective to recover structures and textures and performs favorably against state-of-the-art approaches.

preprint2020arXiv

Solving the k-sparse Eigenvalue Problem with Reinforcement Learning

We examine the possibility of using a reinforcement learning (RL) algorithm to solve large-scale eigenvalue problems in which the desired the eigenvector can be approximated by a sparse vector with at most $k$ nonzero elements, where $k$ is relatively small compare to the dimension of the matrix to be partially diagonalized. This type of problem arises in applications in which the desired eigenvector exhibits localization properties and in large-scale eigenvalue computations in which the amount of computational resource is limited. When the positions of these nonzero elements can be determined, we can obtain the $k$-sparse approximation to the original problem by computing eigenvalues of a $k\times k$ submatrix extracted from $k$ rows and columns of the original matrix. We review a previously developed greedy algorithm for incrementally probing the positions of the nonzero elements in a $k$-sparse approximate eigenvector and show that the greedy algorithm can be improved by using an RL method to refine the selection of $k$ rows and columns of the original matrix. We describe how to represent states, actions, rewards and policies in an RL algorithm designed to solve the $k$-sparse eigenvalue problem and demonstrate the effectiveness of the RL algorithm on two examples originating from quantum many-body physics.

preprint2020arXiv

Testing Ghasemi-Nodehi-Bambi metric parameters with quasi-periodic oscillations

QPOs are seen as peak features in the X-ray power spectral density of stellar mass black holes and neutron stars, of which frequencies can be measured with high precision. These QPO frequencies are believed to be related to fundamental frequencies of test particles, which are mainly determined by the background metric. We consider the metric introduced in Ghasemi-Nodehi and Bambi (Eur Phys J C 76:290, 2016). The fundamental frequencies in this metric are sensitive to some of the metric parameters but insensitive to other parameters, which means, the differences in fundamental frequencies in this metric and those in the Kerr ones can be significant for small changes of some but not all parameters around the Kerr value. By comparing with the QPO observations of GRO J1655-40, we find that only one parameter of the Ghasemi-Nodehi-Bambi metric can be strongly constrained, but other parameters cannot. We also use nested algorithm to investigate whether better constraints on the Ghasemi-Nodehi-Bambi metric parameters can be obtained from QPO observations of multiple objects by simulations. We find that four parameters can be strongly constrained while other parameters cannot . Our results suggest that QPOs may be important tools for testing the Kerr metric.

preprint2019arXiv

Influence of point defects on the electronic and topological properties of monolayer WTe$_2$

In some topological insulators, such as graphene and WTe$_2$, band inversion originates from chemical bonding and space group symmetry, in contrast to materials such as Bi$_2$Se$_3$, where the band inversion derives from relativistic effects in the atoms. In the former, band inversion is susceptible to changes of the chemical environment, e.g. by defects, while the latter are less affected by defects due to the larger energy scale associated with atomic relativistic effects. Motivated by recent experiments, we study the effect of Te-vacancies and Te-adatoms on the electronic properties of WTe$_2$. We find that the Te-vacancies have a formation energy of $2.21$ eV, while the formation energy of the Te-adatoms is much lower with $0.72$ eV. The vacancies strongly influence the band structure and we present evidence that band inversion is already reversed at the nominal composition of WTe$_{1.97}$. In contrast, we show that the adatoms do not change the electronic structure in the vicinity of the Fermi level and thus the topological properties. Our findings indicate that Te-adatoms should be present in thin films that are grown in a Te-rich environment, and we suggest that they have been observed in scanning tunneling microscopy experiments.

preprint2018arXiv

Optimized pair natural orbitals for the coupled cluster methods

We present the coupled-cluster singles and doubles method formulated in terms of truncated pair-natural orbitals (PNO) that are optimized to minimize the effect of truncation. Compared to the standard ground-state PNO coupled-cluster approaches, in which truncated PNOs derived from first-order Møller-Plesset (MP1) amplitudes are used to compress the CC wave operator, the iteratively-optimized PNOs ("iPNOs") offer moderate improvement for small PNO ranks but rapidly increase their effectiveness for large PNO ranks. The error introduced by PNO truncation in the CCSD energy is reduced by orders of magnitude in the asymptotic regime, with an insignificant increase in PNO ranks. The effect of PNO optimization is particularly effective when combined with Neese's perturbative correction for the PNO incompleteness of the CCSD energy. The use of the perturbative correction in combination with the PNO optimization procedure seems to produce the most precise approximation to the canonical CCSD energies for small and large PNO ranks. For the standard benchmark set of noncovalent binding energies remarkable improvements with respect to standard PNO approach range from a factor of 3 with PNO truncation threshold $τ_\text{PNO}=10^{-6}$ (with the maximum PNO truncation error in the binding energy of only 0.1 kcal/mol) to more than 2 orders of magnitude with $τ_\text{PNO}=10^{-9}$.

preprint2016arXiv

A hybrid approach for cooperative output regulation with sampled compensator

This work investigates the cooperative output regulation problem of linear multi-agent systems with hybrid sampled data control. Due to the limited data sensing and communication, in many practical situations, only sampled data are available for the cooperation of multi-agent systems. To overcome this problem, a distributed hybrid controller is presented for the cooperative output regulation, and cooperative output regulation is achieved by well designed state feedback law. Then it proposed a method for the designing of sampled data controller to solve the cooperative output regulation problem with continuous linear systems and discrete-time communication data. Finally, numerical simulation example for cooperative tracking and a simulation example for optimal control of micro-grids are proposed to illustrate the result of the sampled data control law.

preprint2016arXiv

A Left-Looking Selected Inversion Algorithm and Task Parallelism on Shared Memory Systems

Given a sparse matrix $A$, the selected inversion algorithm is an efficient method for computing certain selected elements of $A^{-1}$. These selected elements correspond to all or some nonzero elements of the LU factors of $A$. In many ways, the type of matrix updates performed in the selected inversion algorithm is similar to that performed in the LU factorization, although the sequence of operation is different. In the context of LU factorization, it is known that the left-looking and right-looking algorithms exhibit different memory access and data communication patterns, and hence different behavior on shared memory and distributed memory parallel machines. Corresponding to right-looking and left-looking LU factorization, selected inversion algorithm can be organized as a left-looking and a right-looking algorithm. The parallel right-looking version of the algorithm has been developed in [1]. The sequence of operations performed in this version of the selected inversion algorithm is similar to those performed in a left-looking LU factorization algorithm. In this paper, we describe the left-looking variant of the selected inversion algorithm, and based on task parallel method, present an efficient implementation of the algorithm for shared memory machines. We demonstrate that with the task scheduling features provided by OpenMP 4.0, the left-looking selected inversion algorithm can scale well both on the Intel Haswell multicore architecture and on the Intel Knights Corner (KNC) manycore architecture. Compared to the right-looking selected inversion algorithm, the left-looking formulation facilitates pipelining of work along different branches of the elimination tree, and can be a promising candidate for future development of massively parallel selected inversion algorithms on heterogeneous architecture.

preprint2016arXiv

An efficient basis set representation for calculating electrons in molecules

The method of McCurdy, Baertschy, and Rescigno, J. Phys. B, 37, R137 (2004) is generalized to obtain a straightforward, surprisingly accurate, and scalable numerical representation for calculating the electronic wave functions of molecules. It uses a basis set of product sinc functions arrayed on a Cartesian grid, and yields 1 kcal/mol precision for valence transition energies with a grid resolution of approximately 0.1 bohr. The Coulomb matrix elements are replaced with matrix elements obtained from the kinetic energy operator. A resolution-of-the-identity approximation renders the primitive one- and two-electron matrix elements diagonal; in other words, the Coulomb operator is local with respect to the grid indices. The calculation of contracted two-electron matrix elements among orbitals requires only O(N log(N)) multiplication operations, not O(N^4), where N is the number of basis functions; N = n^3 on cubic grids. The representation not only is numerically expedient, but also produces energies and properties superior to those calculated variationally. Absolute energies, absorption cross sections, transition energies, and ionization potentials are reported for one- (He^+, H_2^+ ), two- (H_2, He), ten- (CH_4) and 56-electron (C_8H_8) systems.

preprint2016arXiv

BSEPACK User's Guide

This is the user manual for the software package BSEPACK (Bethe--Salpeter Eigenvalue Solver Package).

preprint2016arXiv

Chebyshev polynomial filtered subspace iteration in the Discontinuous Galerkin method for large-scale electronic structure calculations

The Discontinuous Galerkin (DG) electronic structure method employs an adaptive local basis (ALB) set to solve the Kohn-Sham equations of density functional theory (DFT) in a discontinuous Galerkin framework. The adaptive local basis is generated on-the-fly to capture the local material physics, and can systematically attain chemical accuracy with only a few tens of degrees of freedom per atom. A central issue for large-scale calculations, however, is the computation of the electron density (and subsequently, ground state properties) from the discretized Hamiltonian in an efficient and scalable manner. We show in this work how Chebyshev polynomial filtered subspace iteration (CheFSI) can be used to address this issue and push the envelope in large-scale materials simulations in a discontinuous Galerkin framework. We describe how the subspace filtering steps can be performed in an efficient and scalable manner using a two-dimensional parallelization scheme, thanks to the orthogonality of the DG basis set and block-sparse structure of the DG Hamiltonian matrix. The on-the-fly nature of the ALBs requires additional care in carrying out the subspace iterations. We demonstrate the parallel scalability of the DG-CheFSI approach in calculations of large-scale two-dimensional graphene sheets and bulk three-dimensional lithium-ion electrolyte systems. Employing 55,296 computational cores, the time per self-consistent field iteration for a sample of the bulk 3D electrolyte containing 8,586 atoms is 90 seconds, and the time for a graphene sheet containing 11,520 atoms is 75 seconds.

preprint2016arXiv

Preconditioned iterative methods for eigenvalue counts

We describe preconditioned iterative methods for estimating the number of eigenvalues of a Hermitian matrix within a given interval. Such estimation is useful in a number of applications.In particular, it can be used to develop an efficient spectrum-slicing strategy to compute many eigenpairs of a Hermitian matrix. Our method is based on the Lanczos- and Arnoldi-type of iterations. We show that with a properly defined preconditioner, only a few iterations may be needed to obtain a good estimate of the number of eigenvalues within a prescribed interval. We also demonstrate that the number of iterations required by the proposed preconditioned schemes is independent of the size and condition number of the matrix. The efficiency of the methods is illustrated on several problems arising from density functional theory based electronic structure calculations.

preprint2016arXiv

Task-space coordinated tracking of multiple heterogeneous manipulators via controller-estimator approaches

This paper studies the task-space coordinated tracking of a time-varying leader for multiple heterogeneous manipulators (MHMs), containing redundant manipulators and nonredundant ones. Different from the traditional coordinated control, distributed controller-estimator algorithms (DCEA), which consist of local algorithms and networked algorithms, are developed for MHMs with parametric uncertainties and input disturbances. By invoking differential inclusions, nonsmooth analysis, and input-to-state stability, some conditions (including sufficient conditions, necessary and sufficient conditions) on the asymptotic stability of the task-space tracking errors and the subtask errors are developed. Simulation results are given to show the effectiveness of the presented DCEA.

preprint2016arXiv

Time-varying formation tracking of multiple manipulators via distributed finite-time control

Comparing with traditional fixed formation for a group of dynamical systems, time-varying formation can produce the following benefits: i) covering the greater part of complex environments; ii) collision avoidance. This paper studies the time-varying formation tracking for multiple manipulator systems (MMSs) under fixed and switching directed graphs with a dynamic leader, whose acceleration cannot change too fast. An explicit mathematical formulation of time-varying formation is developed based on the related practical applications. A class of extended inverse dynamics control algorithms combining with distributed sliding-mode estimators are developed to address the aforementioned problem. By invoking finite-time stability arguments, several novel criteria (including sufficient criteria, necessary and sufficient criteria) for global finite-time stability of MMSs are established. Finally, numerical experiments are presented to verify the effectiveness of the theoretical results.

preprint2016arXiv

Versatile electronic properties and exotic edge states in single-layer tetragonal silicon carbides

Three single-layer tetragonal silicon carbides (SiC), termed as T1,T2 and T3, are proposed by density functional theory (DFT) computations. Although the three structures have the same topological geometry, they show versatile electronic properties from semiconductor (T1), semimetal (T2) to metal (T3).The versatile properties are originated from the rich bonds between Si and C atoms. The nanoribbons of the three SiC also show interesting electronic properties. Especially, T1 nanoribbons possess exotic edge states, where electrons only distribute on one edge's silicon or carbon atoms. The band gaps of the T1 nanoribbons are constant because of no interaction between the edge states.

preprint2015arXiv

A Projected Preconditioned Conjugate Gradient Algorithm for Computing Many Extreme Eigenpairs of a Hermitian Matrix

We present an iterative algorithm for computing an invariant subspace associated with the algebraically smallest eigenvalues of a large sparse or structured Hermitian matrix A. We are interested in the case in which the dimension of the invariant subspace is large (e.g., over several hundreds or thousands) even though it may still be small relative to the dimension of A. These problems arise from, for example, density functional theory based electronic structure calculations for complex materials. The key feature of our algorithm is that it performs fewer Rayleigh--Ritz calculations compared to existing algorithms such as the locally optimal precondition conjugate gradient or the Davidson algorithm. It is a block algorithm, hence can take advantage of efficient BLAS3 operations and be implemented with multiple levels of concurrency. We discuss a number of practical issues that must be addressed in order to implement the algorithm efficiently on a high performance computer.

preprint2015arXiv

A Thick-Restart Lanczos algorithm with polynomial filtering for Hermitian eigenvalue problems

Polynomial filtering can provide a highly effective means of computing all eigenvalues of a real symmetric (or complex Hermitian) matrix that are located in a given interval, anywhere in the spectrum. This paper describes a technique for tackling this problem by combining a Thick-Restart version of the Lanczos algorithm with deflation (`locking') and a new type of polynomial filters obtained from a least-squares technique. The resulting algorithm can be utilized in a `spectrum-slicing' approach whereby a very large number of eigenvalues and associated eigenvectors of the matrix are computed by extracting eigenpairs located in different sub-intervals independently from one another.

preprint2015arXiv

Ab Initio No Core Shell Model - Recent Results and Further Prospects

There has been significant recent progress in solving the long-standing problems of how nuclear shell structure and collective motion emerge from underlying microscopic inter-nucleon interactions. We review a selection of recent significant results within the ab initio No Core Shell Model (NCSM) closely tied to three major factors enabling this progress: (1) improved nuclear interactions that accurately describe the experimental two-nucleon and three-nucleon interaction data; (2) advances in algorithms to simulate the quantum many-body problem with strong interactions; and (3) continued rapid development of high-performance computers now capable of performing $20 \times 10^{15}$ floating point operations per second. We also comment on prospects for further developments.

preprint2015arXiv

Adaptive local basis set for Kohn-Sham density functional theory in a discontinuous Galerkin framework II: Force, vibration, and molecular dynamics calculations

Recently, we have proposed the adaptive local basis set for electronic structure calculations based on Kohn-Sham density functional theory in a pseudopotential framework. The adaptive local basis set is efficient and systematically improvable for total energy calculations. In this paper, we present the calculation of atomic forces, which can be used for a range of applications such as geometry optimization and molecular dynamics simulation. We demonstrate that, under mild assumptions, the computation of atomic forces can scale nearly linearly with the number of atoms in the system using the adaptive local basis set. We quantify the accuracy of the Hellmann-Feynman forces for a range of physical systems, benchmarked against converged planewave calculations, and find that the adaptive local basis set is efficient for both force and energy calculations, requiring at most a few tens of basis functions per atom to attain accuracy required in practice. Since the adaptive local basis set has implicit dependence on atomic positions, Pulay forces are in general nonzero. However, we find that the Pulay force is numerically small and systematically decreasing with increasing basis completeness, so that the Hellmann-Feynman force is sufficient for basis sizes of a few tens of basis functions per atom. We verify the accuracy of the computed forces in static calculations of quasi-1D and 3D disordered Si systems, vibration calculation of a quasi-1D Si system, and molecular dynamics calculations of H$_2$ and liquid Al-Si alloy systems, where we find excellent agreement with independent benchmark results in literature.

preprint2015arXiv

Characterization of symmetry-protected topological phases in polymerized models by trajectories of Majorana stars

By using Majorana's stellar representation, we give a clear geometrical interpretation of the topological phases of inversion-symmetric polymerized models by mapping the Bloch states of multi-band systems to Majorana stars on the Bloch sphere. While trajectories of Majorana stars of a filled Bloch band exhibit quite different geometrical structures for topologically trivial and nontrivial phases, we further demonstrate that these structures are uniquely determined by distributions of Majorana stars of two high-symmetrical momentum states, which have different parities for topologically different states.

preprint2015arXiv

DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) [J. Comput. Phys. 2012, 231, 2140] method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field (SCF) iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. It minimizes the number of degrees of freedom required to represent the solution to the Kohn-Sham problem for a desired level of accuracy. In particular, DGDFT can reach the planewave accuracy with far fewer numbers of degrees of freedom. By using the pole expansion and selected inversion (PEXSI) technique to compute electron density, energy and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of two-dimensional (2D) phosphorene systems with 3,500-14,000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

preprint2015arXiv

Edge reconstruction in armchair phosphorene nanoribbons revealed by discontinuous Galerkin density functional theory

With the help of our recently developed massively parallel DGDFT (Discontinuous Galerkin Density Functional Theory) methodology, we perform large-scale Kohn-Sham density functional theory calculations on phosphorene nanoribbons with armchair edges (ACPNRs) containing a few thousands to ten thousand atoms. The use of DGDFT allows us to systematically achieve conventional plane wave basis set type of accuracy, but with a much smaller number (about 15) of adaptive local basis (ALB) functions per atom for this system. The relatively small number degrees of freedom required to represent the Kohn-Sham Hamiltonian, together with the use of the pole expansion the selected inversion (PEXSI) technique that circumvents the need to diagonalize the Hamiltonian, result in a highly efficient and scalable computational scheme for analyzing the electronic structures of ACPNRs as well as its dynamics. The total wall clock time for calculating the electronic structures of large-scale ACPNRs containing 1080-10800 atoms is only 10-25 s per self-consistent field (SCF) iteration, with accuracy fully comparable to that obtained from conventional planewave DFT calculations. For the ACPNR system, we observe that the DGDFT methodology can scale to 5,000-50,000 processors. We use DGDFT based ab-initio molecular dynamics (AIMD) calculations to study the thermodynamic stability of ACPNRs. Our calculations reveal that a 2 * 1 edge reconstruction appears in ACPNRs at room temperature.

preprint2015arXiv

Enhancing the scalability and load balancing of the parallel selected inversion algorithm via tree-based asynchronous communication

We develop a method for improving the parallel scalability of the recently developed parallel selected inversion algorithm [Jacquelin, Lin and Yang 2014], named PSelInv, on massively parallel distributed memory machines. In the PSelInv method, we compute selected elements of the inverse of a sparse matrix A that can be decomposed as A = LU, where L is lower triangular and U is upper triangular. Updating these selected elements of A-1 requires restricted collective communications among a subset of processors within each column or row communication group created by a block cyclic distribution of L and U. We describe how this type of restricted collective communication can be implemented by using asynchronous point-to-point MPI communication functions combined with a binary tree based data propagation scheme. Because multiple restricted collective communications may take place at the same time in the parallel selected inversion algorithm, we need to use a heuristic to prevent processors participating in multiple collective communications from receiving too many messages. This heuristic allows us to reduce communication load imbalance and improve the overall scalability of the selected inversion algorithm. For instance, when 6,400 processors are used, we observe over 5x speedup for test matrices. It also mitigates the performance variability introduced by an inhomogeneous network topology.

preprint2015arXiv

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

This paper considers the problem of estimating multiple related Gaussian graphical models from a $p$-dimensional dataset consisting of different classes. Our work is based upon the formulation of this problem as group graphical lasso. This paper proposes a novel hybrid covariance thresholding algorithm that can effectively identify zero entries in the precision matrices and split a large joint graphical lasso problem into small subproblems. Our hybrid covariance thresholding method is superior to existing uniform thresholding methods in that our method can split the precision matrix of each individual class using different partition schemes and thus split group graphical lasso into much smaller subproblems, each of which can be solved very fast. In addition, this paper establishes necessary and sufficient conditions for our hybrid covariance thresholding algorithm. The superior performance of our thresholding method is thoroughly analyzed and illustrated by a few experiments on simulated data and real gene expression data.

preprint2015arXiv

Feature selection for classification with class-separability strategy and data envelopment analysis

In this paper, a novel feature selection method is presented, which is based on Class-Separability (CS) strategy and Data Envelopment Analysis (DEA). To better capture the relationship between features and the class, class labels are separated into individual variables and relevance and redundancy are explicitly handled on each class label. Super-efficiency DEA is employed to evaluate and rank features via their conditional dependence scores on all class labels, and the feature with maximum super-efficiency score is then added in the conditioning set for conditional dependence estimation in the next iteration, in such a way as to iteratively select features and get the final selected features. Eventually, experiments are conducted to evaluate the effectiveness of proposed method comparing with four state-of-the-art methods from the viewpoint of classification accuracy. Empirical results verify the feasibility and the superiority of proposed feature selection method.

preprint2015arXiv

Generalized preconditioned locally harmonic residual method for non-Hermitian eigenproblems

We introduce the Generalized Preconditioned Locally Harmonic Residual (GPLHR) method for solving standard and generalized non-Hermitian eigenproblems. The method is particularly useful for computing a subset of eigenvalues, and their eigen- or Schur vectors, closest to a given shift. The proposed method is based on block iterations and can take advantage of a preconditioner if it is available. It does not need to perform exact shift-and-invert transformation. Standard and generalized eigenproblems are handled in a unified framework. Our numerical experiments demonstrate that GPLHR is generally more robust and efficient than existing methods, especially if the available memory is limited.

preprint2015arXiv

Large-scale ab initio simulations based on systematically improvable atomic basis

We present a first-principles computer code package (ABACUS) that is based on density functional theory and numerical atomic basis sets. Theoretical foundations and numerical techniques used in the code are described, with focus on the accuracy and transferability of the hierarchical atomic basis sets as generated using a scheme proposed by Chen, Guo and He [J. Phys.:Condens. Matter \textbf{22}, 445501 (2010)]. Benchmark results are presented for a variety of systems include molecules, solids, surfaces, and defects. All results show that the ABACUS package with its associated atomic basis sets is an efficient and reliable tool for simulating both small and large-scale materials.

preprint2015arXiv

PSelInv -- A Distributed Memory Parallel Algorithm for Selected Inversion : the Symmetric Case

We describe an efficient parallel implementation of the selected inversion algorithm for distributed memory computer systems, which we call \texttt{PSelInv}. The \texttt{PSelInv} method computes selected elements of a general sparse matrix $A$ that can be decomposed as $A = LU$, where $L$ is lower triangular and $U$ is upper triangular. The implementation described in this paper focuses on the case of sparse symmetric matrices. It contains an interface that is compatible with the distributed memory parallel sparse direct factorization \texttt{SuperLU\_DIST}. However, the underlying data structure and design of \texttt{PSelInv} allows it to be easily combined with other factorization routines such as \texttt{PARDISO}. We discuss general parallelization strategies such as data and task distribution schemes. In particular, we describe how to exploit the concurrency exposed by the elimination tree associated with the $LU$ factorization of $A$. We demonstrate the efficiency and accuracy of \texttt{PSelInv} by presenting a number of numerical experiments. In particular, we show that \texttt{PSelInv} can run efficiently on more than $4,000$ cores for a modestly sized matrix. We also demonstrate how \texttt{PSelInv} can be used to accelerate large-scale electronic structure calculations.

preprint2015arXiv

Structure Preserving Parallel Algorithms for Solving the Bethe-Salpeter Eigenvalue Problem

The Bethe-Salpeter eigenvalue problem is a dense structured eigenvalue problem arising from discretized Bethe-Salpeter equation in the context of computing exciton energies and states. A computational challenge is that at least half of the eigenvalues and the associated eigenvectors are desired in practice. We establish the equivalence between Bethe-Salpeter eigenvalue problems and real Hamiltonian eigenvalue problems. Based on theoretical analysis, structure preserving algorithms for a class of Bethe-Salpeter eigenvalue problems are proposed. We also show that for this class of problems all eigenvalues obtained from the Tamm-Dancoff approximation are overestimated. In order to solve large scale problems of practical interest, we discuss parallel implementations of our algorithms targeting distributed memory systems. Several numerical examples are presented to demonstrate the efficiency and accuracy of our algorithms.

preprint2015arXiv

Topological invariants for phase transition points of one-dimensional $\mathbb{Z}_2$ topological systems

We study topological properties of phase transition points of two topologically non-trivial $\mathbb{Z}_2$ classes (D and DIII) in one dimension by assigning a Berry phase defined on closed circles around the gap closing points in the parameter space of momentum and a transition driving parameter. While the topological property of the $\mathbb{Z}_2$ system is generally characterized by a $\mathbb{Z}_2$ topological invariant, we identify that it has a correspondence to the quantized Berry phase protected by the particle-hole symmetry, and then give a proper definition of Berry phase to the phase transition point. By applying our scheme to some specific models of class D and DIII, we demonstrate that the topological phase transition can be well characterized by the Berry phase of the transition point, which reflects the change of Berry phases of topologically different phases across the phase transition point.

preprint2015arXiv

Winding numbers of phase transition points for one-dimensional topological systems

We study topological properties of phase transition points of one-dimensional topological quantum phase transitions by assigning winding numbers defined on closed circles around the gap closing points in the parameter space of momentum and a transition driving parameter, which overcomes the problem of ill definition of winding numbers on the transition points. By applying our scheme to the extended Kitaev model and extended Su-Schrieffer-Heeger model, we demonstrate that the topological phase transition can be well characterized by winding numbers of transition points, which reflect the change of the winding number of topologically different phases across the phase transition points.

preprint2014arXiv

Approximating spectral densities of large matrices

In physics, it is sometimes desirable to compute the so-called \emph{Density Of States} (DOS), also known as the \emph{spectral density}, of a real symmetric matrix $A$. The spectral density can be viewed as a probability density distribution that measures the likelihood of finding eigenvalues near some point on the real line. The most straightforward way to obtain this density is to compute all eigenvalues of $A$. But this approach is generally costly and wasteful, especially for matrices of large dimension. There exists alternative methods that allow us to estimate the spectral density function at much lower cost. The major computational cost of these methods is in multiplying $A$ with a number of vectors, which makes them appealing for large-scale problems where products of the matrix $A$ with arbitrary vectors are relatively inexpensive. This paper defines the problem of estimating the spectral density carefully, and discusses how to measure the accuracy of an approximate spectral density. It then surveys a few known methods for estimating the spectral density, and proposes some new variations of existing methods. All methods are discussed from a numerical linear algebra point of view.

preprint2014arXiv

Electronic Structure of Large-Scale Graphene Nanoflakes

With the help of the recently developed SIESTA-PEXSI method [J. Phys.: Condens. Matter \textbf{26}, 305503 (2014)], we perform Kohn-Sham density functional theory (DFT) calculations to study the stability and electronic structure of hexagonal graphene nanoflakes (GNFs) with up to 11,700 atoms. We find the electronic properties of GNFs, including their cohesive energy, HOMO-LUMO energy gap, edge states and aromaticity, depend sensitively on the type of edges (ACGNFs and ZZGNFs), size and the number of electrons. We observe that, due to the edge-induced strain effect in ACGNFs, large-scale ACGNFs' cohesive energy decreases as their size increases. This trend does not hold for ZZGNFs due to the presence of many edge states in ZZGNFs. We find that the energy gaps $E_g$ of GNFs all decay with respect to $1/L$, where $L$ is the size of the GNF, in a linear fashion. But as their size increases, ZZGNFs exhibit more localized edge states. We believe the presence of these states makes their gap decrease more rapidly. In particular, when $L$ is larger than 6.40 $nm$, we find that ZZGNFs exhibit metallic characteristics. Furthermore, we find that the aromatic structures of GNFs appear to depend only on whether the system has $4N$ or $4N+2$ electrons, where $N$ is an integer.

preprint2014arXiv

Low-rank SIFT: An Affine Invariant Feature for Place Recognition

In this paper, we present a novel affine-invariant feature based on SIFT, leveraging the regular appearance of man-made objects. The feature achieves full affine invariance without needing to simulate over affine parameter space. Low-rank SIFT, as we name the feature, is based on our observation that local tilt, which are caused by changes of camera axis orientation, could be normalized by converting local patches to standard low-rank forms. Rotation, translation and scaling invariance could be achieved in ways similar to SIFT. As an extension of SIFT, our method seeks to add prior to solve the ill-posed affine parameter estimation problem and normalizes them directly, and is applicable to objects with regular structures. Furthermore, owing to recent breakthrough in convex optimization, such parameter could be computed efficiently. We will demonstrate its effectiveness in place recognition as our major application. As extra contributions, we also describe our pipeline of constructing geotagged building database from the ground up, as well as an efficient scheme for automatic feature selection.

preprint2014arXiv

Numerical integration for ab initio many-electron self energy calculations within the GW approximation

We present a numerical integration scheme for evaluating the convolution of a Green's function with a screened Coulomb potential on the real axis in the GW approximation of the self energy. Our scheme takes the zero broadening limit in Green's function first, replaces the numerator of the integrand with a piecewise polynomial approximation, and performs principal value integration on subintervals analytically. We give the error bound of our numerical integration scheme and show by numerical examples that it is more reliable and accurate than the standard quadrature rules such as the composite trapezoidal rule. We also discuss the benefit of using different self energy expressions to perform the numerical convolution at different frequencies.

preprint2014arXiv

SIESTA-PEXSI: Massively parallel method for efficient and accurate \textit{ab initio} materials simulation without matrix diagonalization

We describe a scheme for efficient large-scale electronic-structure calculations based on the combination of the pole expansion and selected inversion (PEXSI) technique with the SIESTA method, which uses numerical atomic orbitals within the Kohn-Sham density functional theory (KSDFT) framework. The PEXSI technique can efficiently utilize the sparsity pattern of the Hamiltonian and overlap matrices generated in SIESTA, and for large systems has a much lower computational complexity than that associated with the matrix diagonalization procedure. The PEXSI technique can be used to evaluate the electron density, free energy, atomic forces, density of states and local density of states without computing any eigenvalue or eigenvector of the Kohn-Sham Hamiltonian. It can achieve accuracy fully comparable to that obtained from a matrix diagonalization procedure for general systems, including metallic systems at low temperature. The PEXSI method is also highly scalable. With the recently developed massively parallel PEXSI technique, we can make efficient use of more than $10,000$ processors on high performance machines. We demonstrate the performance and accuracy of the SIESTA-PEXSI method using several examples of large scale electronic structure calculations, including 1D, 2D and bulk problems with insulating, semi-metallic, and metallic character.

preprint2013arXiv

Accelerating Atomic Orbital-based Electronic Structure Calculation via Pole Expansion and Selected Inversion

We describe how to apply the recently developed pole expansion and selected inversion (PEXSI) technique to Kohn-Sham density function theory (DFT) electronic structure calculations that are based on atomic orbital discretization. We give analytic expressions for evaluating the charge density, the total energy, the Helmholtz free energy and the atomic forces (including both the Hellman-Feynman force and the Pulay force) without using the eigenvalues and eigenvectors of the Kohn-Sham Hamiltonian. We also show how to update the chemical potential without using Kohn-Sham eigenvalues. The advantage of using PEXSI is that it has a much lower computational complexity than that associated with the matrix diagonalization procedure. We demonstrate the performance gain by comparing the timing of PEXSI with that of diagonalization on insulating and metallic nanotubes. For these quasi-1D systems, the complexity of PEXSI is linear with respect to the number of atoms. This linear scaling can be observed in our computational experiments when the number of atoms in a nanotube is larger than a few hundreds. Both the wall clock time and the memory requirement of PEXSI is modest. This makes it even possible to perform Kohn-Sham DFT calculations for 10,000-atom nanotubes with a sequential implementation of the selected inversion algorithm. We also perform an accurate geometry optimization calculation on a truncated (8,0) boron-nitride nanotube system containing 1024 atoms. Numerical results indicate that the use of PEXSI does not lead to loss of accuracy required in a practical DFT calculation.

preprint2013arXiv

Augmented projections for ptychographic imaging

Ptychography is a popular technique to achieve diffraction limited resolution images of a two or three dimensional sample using high frame rate detectors. We introduce a relaxation of common projection algorithms to account for instabilities given by intensity and background fluctuations, position errors, or poor calibration using multiplexing illumination. This relaxation introduces an additional phasing optimization at every step that enhances the convergence rate of common projection algorithms. Numerical tests exhibit the exact recovery of the object and the noise when there is high redundancy in the data.

preprint2013arXiv

Elliptic preconditioner for accelerating the self consistent field iteration in Kohn-Sham density functional theory

We discuss techniques for accelerating the self consistent field (SCF) iteration for solving the Kohn-Sham equations. These techniques are all based on constructing approximations to the inverse of the Jacobian associated with a fixed point map satisfied by the total potential. They can be viewed as preconditioners for a fixed point iteration. We point out different requirements for constructing preconditioners for insulating and metallic systems respectively, and discuss how to construct preconditioners to keep the convergence rate of the fixed point iteration independent of the size of the atomistic system. We propose a new preconditioner that can treat insulating and metallic system in a unified way. The new preconditioner, which we call an elliptic preconditioner, is constructed by solving an elliptic partial differential equation. The elliptic preconditioner is shown to be more effective in accelerating the convergence of a fixed point iteration than the existing approaches for large inhomogeneous systems at low temperature.

preprint2013arXiv

Vector cross product in n-dimensional vector space

The definition of vector cross product (VCP) introduced by Eckmann only exists in thethree- and the seven- dimensional vector space. In this paper, according to the orthogonal completeness, magnitude of basis vector cross product and all kinds of combinations of basis vector $\hat{e}_i$, the generalized definition of VCP in the odd n-dimensional vector space is given by introducing a cross term $X_{AB}$. In addition, the definition is validated by reducing the generalization definition to the fundamental three- and seven-dimensional vector space.

preprint2012arXiv

A Combinatorial Perspective of the Protein Inference Problem

In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from the results of peptide identification. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we are devoted to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (Protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain a closed-form formulation for protein inference. Based on our model, we study the impact of unique peptides and degenerate peptides on protein probabilities. Here, degenerate peptides are peptides shared by at least two proteins. Meanwhile, we also study the relationship of our model with other methods such as ProteinProphet. A probability confidence interval can be calculated and used together with probability to filter the protein identification result. Our method achieves competitive results with ProteinProphet in a more efficient manner in the experiment based on two datasets of standard protein mixtures and two datasets of real samples. We name our program ProteinInfer. Its Java source code is available at http://bioinformatics.ust.hk/proteininfer

preprint2012arXiv

Running PeptideProphet Separately on Replicates Improves Peptide Identification Results

Limited spectrum coverage is a problem in shotgun proteomics. Replicates are generated to improve the spectrum coverage. When integrating peptide identification results obtained from replicates, the state-of-the-art algorithm PeptideProphet combines Peptide-Spectrum Matches (PSMs) before building the statistical model to calculate peptide probabilities. In this paper, we find the connection between merging results of replicates and Bagging, which is a standard routine to improve the power of statistical methods. Following Bagging's philosophy, we propose to run PeptideProphet separately on each replicate and combine the outputs to obtain the final peptide probabilities. In our experiments, we show that the proposed routine can improve PeptideProphet consistently on a standard protein dataset, a Human dataset and a Yeast dataset.

preprint2009arXiv

{\it Ab initio} nuclear structure - the large sparse matrix eigenvalue problem

The structure and reactions of light nuclei represent fundamental and formidable challenges for microscopic theory based on realistic strong interaction potentials. Several {\it ab initio} methods have now emerged that provide nearly exact solutions for some nuclear properties. The {\it ab initio} no core shell model (NCSM) and the no core full configuration (NCFC) method, frame this quantum many-particle problem as a large sparse matrix eigenvalue problem where one evaluates the Hamiltonian matrix in a basis space consisting of many-fermion Slater determinants and then solves for a set of the lowest eigenvalues and their associated eigenvectors. The resulting eigenvectors are employed to evaluate a set of experimental quantities to test the underlying potential. For fundamental problems of interest, the matrix dimension often exceeds $10^{10}$ and the number of nonzero matrix elements may saturate available storage on present-day leadership class facilities. We survey recent results and advances in solving this large sparse matrix eigenvalue problem. W also outline the challenges that lie ahead for achieving further breakthroughs in fundamental nuclear theory using these {\it ab initio} approaches.

Chao Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

87 published item(s)

Fusing in 3D: Free-Viewpoint Fusion Rendering with a 3D Infrared-Visible Scene Representation

LongDPM: Overlap-Aware 4D Reconstruction from Long Monocular Videos

Modality-Decoupled RGB-Thermal Object Detector via Query Fusion

Multi-proposal Collaboration and Multi-task Training for Weakly-supervised Video Moment Retrieval

Multimodal Signal Processing For Thermo-Visible-Lidar Fusion In Real-time 3D Semantic Mapping

Report of the 5th PVUW Challenge: Towards More Diverse Modalities in Pixel-Level Understanding

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

Thermo-LIO: A Novel Multi-Sensor Integrated System for Structural Health Monitoring

The Dust Attenuation Scaling Relation of Star-Forming Galaxies in the EAGLE Simulations

Undecidability of tiling the plane with a fixed number of Wang bars

SAS: Self-Augmentation Strategy for Language Model Pre-training

A Robust Multimodal Remote Sensing Image Registration Method and System Using Steerable Filters with First- and Second-order Gradients

An extension on neighbor sum distinguishing total coloring of graphs

DAS-PINNs: A deep adaptive sampling method for solving high-dimensional partial differential equations

GUIM -- General User and Item Embedding with Mixture of Representation in E-commerce

KSSOLV 2.0: An efficient MATLAB toolbox for solving the Kohn-Sham equations with plane-wave basis set

Randomized Algorithms for Scientific Computing (RASC)

Research on Multi-Objective Planning of Electric Vehicle Charging Stations Considering the Condition of Urban Traffic Network

Sim2Real Object-Centric Keypoint Detection and Description

Topological Authentication Technique In Topologically Asymmetric Cryptosystem

Using dynamic mode decomposition to predict the dynamics of a two-time non-equilibrium Green's function

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

A greedy algorithm for computing eigenvalues of a symmetric matrix

A strange metal in a bosonic system

Accelerating quantum many-body configuration interaction with directives

CoRe: An Efficient Coarse-refined Training Framework for BERT

Symmetric-Constrained Irregular Structure Inpainting for Brain MRI Registration with Tumor Pathology

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

VoxelHop: Successive Subspace Learning for ALS Disease Classification Using Structural MRI

A Multilevel Spectral Indicator Method for Eigenvalues of Large Non-Hermitian Matrices

A Shift Selection Strategy for Parallel Shift-Invert Spectrum Slicing in Symmetric Self-Consistent Eigenvalue Computation

Constrained R-CNN: A general image manipulation detection model

Context-Integrated and Feature-Refined Network for Lightweight Object Parsing

ELSI -- An Open Infrastructure for Electronic Structure Solvers

Fast and Robust Registration of Aerial Images and LiDAR data Based on Structrual Features and 3D Phase Correlation

Multi-time scale identification for multi-energy system

On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters

One-Shot Domain Adaptation For Face Generation

Optical System Design of Bionic Compound Eye with Broad Field of View

Parallel energy-stable solver for a coupled Allen-Cahn and Cahn-Hilliard system

Parallel multilevel restricted Schwarz preconditioners for implicit simulation of subsurface flows with Peng-Robinson equation of state

PFNN: A Penalty-Free Neural Network Method for Solving a Class of Second-Order Boundary-Value Problems on Complex Geometries

Quantum Fourier Transform Revisited

Quench dynamics of Hopf insulators

Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations

Solving the k-sparse Eigenvalue Problem with Reinforcement Learning

Testing Ghasemi-Nodehi-Bambi metric parameters with quasi-periodic oscillations

Influence of point defects on the electronic and topological properties of monolayer WTe$_2$

Optimized pair natural orbitals for the coupled cluster methods

A hybrid approach for cooperative output regulation with sampled compensator

A Left-Looking Selected Inversion Algorithm and Task Parallelism on Shared Memory Systems

An efficient basis set representation for calculating electrons in molecules

BSEPACK User's Guide

Chebyshev polynomial filtered subspace iteration in the Discontinuous Galerkin method for large-scale electronic structure calculations

Preconditioned iterative methods for eigenvalue counts

Task-space coordinated tracking of multiple heterogeneous manipulators via controller-estimator approaches

Time-varying formation tracking of multiple manipulators via distributed finite-time control

Versatile electronic properties and exotic edge states in single-layer tetragonal silicon carbides

A Projected Preconditioned Conjugate Gradient Algorithm for Computing Many Extreme Eigenpairs of a Hermitian Matrix

A Thick-Restart Lanczos algorithm with polynomial filtering for Hermitian eigenvalue problems

Ab Initio No Core Shell Model - Recent Results and Further Prospects

Adaptive local basis set for Kohn-Sham density functional theory in a discontinuous Galerkin framework II: Force, vibration, and molecular dynamics calculations

Characterization of symmetry-protected topological phases in polymerized models by trajectories of Majorana stars

DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations

Edge reconstruction in armchair phosphorene nanoribbons revealed by discontinuous Galerkin density functional theory

Enhancing the scalability and load balancing of the parallel selected inversion algorithm via tree-based asynchronous communication

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

Feature selection for classification with class-separability strategy and data envelopment analysis

Generalized preconditioned locally harmonic residual method for non-Hermitian eigenproblems

Large-scale ab initio simulations based on systematically improvable atomic basis

PSelInv -- A Distributed Memory Parallel Algorithm for Selected Inversion : the Symmetric Case

Structure Preserving Parallel Algorithms for Solving the Bethe-Salpeter Eigenvalue Problem

Topological invariants for phase transition points of one-dimensional $\mathbb{Z}_2$ topological systems