Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
106works
0followers
47topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

106 published item(s)

preprint2026arXiv

Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents

Standard embodied evaluations do not independently score whether an agent correctly commits to task completion at episode closure, a capacity we call terminal commitment. Behaviorally distinct failures--never completing the task, completing it but failing to stop, and reporting success without sufficient evidence--collapse into the same benchmark failure. We introduce VIGIL, an evaluation framework that makes terminal commitment independently measurable. Under VIGIL's default protocol, agents observe only egocentric RGB, receive no action-success signals, and must end each episode with a semantic report checked deterministically against hidden world state. This yields two separate scores: world-state completion (W) and benchmark success (B), where B additionally requires a correct terminal report. This decoupling makes four outcome categories distinguishable: missed execution, post-attainment drift, unsupported commitment, and verified success. Across 20 models on 1,000 frozen episodes, systems with comparable W differ by up to 19.7 pp in B: one model converts achieved states into correct reports, while another with near-identical execution drifts past the goal without closing. An action-feedback intervention further tests the separation: execution-oriented signals improve W broadly, yet commitment failures persist in models that do not already ground terminal reports in the achieved state. VIGIL provides a protocol that makes terminal commitment independently visible and scorable.

preprint2026arXiv

Video Generation with Predictive Latents

Video Variational Autoencoder (VAE) enables latent video generative modeling by mapping the visual world into compact spatiotemporal latent spaces, improving training efficiency and stability. While existing video VAEs achieve commendable reconstruction quality, continued optimization of reconstruction does not necessarily translate into improved generative performance. How to enhance the diffusability of video latents remains a critical and unresolved challenge. In this work, inspired by principles of predictive world modeling, we investigate the potential of predictive learning to improve the video generative modeling. To this end, we introduce a simple and effective predictive reconstruction objective that unifies predictive learning with video reconstruction. Specifically, we randomly discard future frames and encode only partial past observations, while training the decoder to reconstruct the observed frames and predict future ones simultaneously. This design encourages the latent space to encode temporally predictive structures and build a more coherent understanding of video dynamics, thereby improving generation quality. Our model, termed Predictive Video VAE (PV-VAE), achieves superior performance on video generation, with 52% faster convergence and a 34.42 FVD improvement over the Wan2.2 VAE on UCF101. Furthermore, comprehensive analyses demonstrate that PV-VAE not only exhibits favorable scalability, with generative performance improving alongside VAE training, but also yields consistent gains in downstream video understanding, underscoring a latent space that effectively captures temporal coherence and motion priors.

preprint2024arXiv

The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models

In the era of large language models (LLMs), hallucination (i.e., the tendency to generate factually incorrect content) poses great challenge to trustworthy and reliable deployment of LLMs in real-world applications. To tackle the LLM hallucination, three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them (mitigation). To address these challenges, this work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation. Specially, we construct a new hallucination benchmark HaluEval 2.0, and designs a simple yet effective detection method for LLM hallucination. Furthermore, we zoom into the different training or utilization stages of LLMs and extensively analyze the potential factors that lead to the LLM hallucination. Finally, we implement and examine a series of widely used techniques to mitigate the hallucinations in LLMs. Our work has led to several important findings to understand the hallucination origin and mitigate the hallucinations in LLMs. Our code and data can be accessed at https://github.com/RUCAIBox/HaluEval-2.0.

preprint2023arXiv

Flexible Alignment Super-Resolution Network for Multi-Contrast MRI

Magnetic resonance imaging plays an essential role in clinical diagnosis by acquiring the structural information of biological tissue. Recently, many multi-contrast MRI super-resolution networks achieve good effects. However, most studies ignore the impact of the inappropriate foreground scale and patch size of multi-contrast MRI, which probably leads to inappropriate feature alignment. To tackle this problem, we propose the Flexible Alignment Super-Resolution Network (FASR-Net) for multi-contrast MRI Super-Resolution. The Flexible Alignment module of FASR-Net consists of two modules for feature alignment. (1) The Single-Multi Pyramid Alignment(S-A) module solves the situation where low-resolution (LR) images and reference (Ref) images have different scales. (2) The Multi-Multi Pyramid Alignment(M-A) module solves the situation where LR and Ref images have the same scale. Besides, we propose the Cross-Hierarchical Progressive Fusion (CHPF) module aiming at fusing the features effectively, further improving the image quality. Compared with other state-of-the-art methods, FASR-net achieves the most competitive results on FastMRI and IXI datasets. Our code will be available at \href{https://github.com/yimingliu123/FASR-Net}{https://github.com/yimingliu123/FASR-Net}.

preprint2023arXiv

GUAP: Graph Universal Attack Through Adversarial Patching

Graph neural networks (GNNs) are a class of effective deep learning models for node classification tasks; yet their predictive capability may be severely compromised under adversarially designed unnoticeable perturbations to the graph structure and/or node data. Most of the current work on graph adversarial attacks aims at lowering the overall prediction accuracy, but we argue that the resulting abnormal model performance may catch attention easily and invite quick counterattack. Moreover, attacks through modification of existing graph data may be hard to conduct if good security protocols are implemented. In this work, we consider an easier attack harder to be noticed, through adversarially patching the graph with new nodes and edges. The attack is universal: it targets a single node each time and flips its connection to the same set of patch nodes. The attack is unnoticeable: it does not modify the predictions of nodes other than the target. We develop an algorithm, named GUAP, that achieves high attack success rate but meanwhile preserves the prediction accuracy. GUAP is fast to train by employing a sampling strategy. We demonstrate that a 5% sampling in each epoch yields 20x speedup in training, with only a slight degradation in attack performance. Additionally, we show that the adversarial patch trained with the graph convolutional network transfers well to other GNNs, such as the graph attention network.

preprint2022arXiv

A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token Completion

Multi-person motion capture can be challenging due to ambiguities caused by severe occlusion, fast body movement, and complex interactions. Existing frameworks build on 2D pose estimations and triangulate to 3D coordinates via reasoning the appearance, trajectory, and geometric consistencies among multi-camera observations. However, 2D joint detection is usually incomplete and with wrong identity assignments due to limited observation angle, which leads to noisy 3D triangulation results. To overcome this issue, we propose to explore the short-range autoregressive characteristics of skeletal motion using transformer. First, we propose an adaptive, identity-aware triangulation module to reconstruct 3D joints and identify the missing joints for each identity. To generate complete 3D skeletal motion, we then propose a Dual-Masked Auto-Encoder (D-MAE) which encodes the joint status with both skeletal-structural and temporal position encoding for trajectory completion. D-MAE's flexible masking and encoding mechanism enable arbitrary skeleton definitions to be conveniently deployed under the same framework. In order to demonstrate the proposed model's capability in dealing with severe data loss scenarios, we contribute a high-accuracy and challenging motion capture dataset of multi-person interactions with severe occlusion. Evaluations on both benchmark and our new dataset demonstrate the efficiency of our proposed model, as well as its advantage against the other state-of-the-art methods.

preprint2022arXiv

A Survey of Decision Making in Adversarial Games

Game theory has by now found numerous applications in various fields, including economics, industry, jurisprudence, and artificial intelligence, where each player only cares about its own interest in a noncooperative or cooperative manner, but without obvious malice to other players. However, in many practical applications, such as poker, chess, evader pursuing, drug interdiction, coast guard, cyber-security, and national defense, players often have apparently adversarial stances, that is, selfish actions of each player inevitably or intentionally inflict loss or wreak havoc on other players. Along this line, this paper provides a systematic survey on three main game models widely employed in adversarial games, i.e., zero-sum normal-form and extensive-form games, Stackelberg (security) games, zero-sum differential games, from an array of perspectives, including basic knowledge of game models, (approximate) equilibrium concepts, problem classifications, research frontiers, (approximate) optimal strategy seeking techniques, prevailing algorithms, and practical applications. Finally, promising future research directions are also discussed for relevant adversarial games.

preprint2022arXiv

Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining

Improving the training and inference performance of graph neural networks (GNNs) is faced with a challenge uncommon in general neural networks: creating mini-batches requires a lot of computation and data movement due to the exponential growth of multi-hop graph neighborhoods along network layers. Such a unique challenge gives rise to a diverse set of system design choices. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment, under which we identify major performance bottlenecks hitherto under-explored by developers: mini-batch preparation and transfer. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler, a shared-memory parallelization strategy, and the pipelining of batch transfer with GPU computation. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised. Such an observation unifies training and inference, simplifying model implementation. We report comprehensive experimental results with several benchmark data sets and GNN architectures, including a demonstration that, for the ogbn-papers100M data set, our system SALIENT achieves a speedup of 3x over a standard PyTorch-Geometric implementation with a single GPU and a further 8x parallel speedup with 16 GPUs. Therein, training a 3-layer GraphSAGE model with sampling fanout (15, 10, 5) takes 2.0 seconds per epoch and inference with fanout (20, 20, 20) takes 2.4 seconds, attaining test accuracy 64.58%.

preprint2022arXiv

Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning

The expensive annotation cost is notoriously known as the main constraint for the development of the point cloud semantic segmentation technique. Active learning methods endeavor to reduce such cost by selecting and labeling only a subset of the point clouds, yet previous attempts ignore the spatial-structural diversity of the selected samples, inducing the model to select clustered candidates with similar shapes in a local area while missing other representative ones in the global environment. In this paper, we propose a new 3D region-based active learning method to tackle this problem. Dubbed SSDR-AL, our method groups the original point clouds into superpoints and incrementally selects the most informative and representative ones for label acquisition. We achieve the selection mechanism via a graph reasoning network that considers both the spatial and structural diversities of superpoints. To deploy SSDR-AL in a more practical scenario, we design a noise-aware iterative labeling strategy to confront the "noisy annotation" problem introduced by the previous "dominant labeling" strategy in superpoints. Extensive experiments on two point cloud benchmarks demonstrate the effectiveness of SSDR-AL in the semantic segmentation task. Particularly, SSDR-AL significantly outperforms the baseline method and reduces the annotation cost by up to 63.0% and 24.0% when achieving 90% performance of fully supervised learning, respectively.

preprint2022arXiv

Adaptive Random Fourier Features Kernel LMS

We propose the adaptive random Fourier features Gaussian kernel LMS (ARFF-GKLMS). Like most kernel adaptive filters based on stochastic gradient descent, this algorithm uses a preset number of random Fourier features to save computation cost. However, as an extra flexibility, it can adapt the inherent kernel bandwidth in the random Fourier features in an online manner. This adaptation mechanism allows to alleviate the problem of selecting the kernel bandwidth beforehand for the benefit of an improved tracking in non-stationary circumstances. Simulation results confirm that the proposed algorithm achieves a performance improvement in terms of convergence rate, error at steady-state and tracking ability over other kernel adaptive filters with preset kernel bandwidth.

preprint2022arXiv

An Optimal Distributed Algorithm with Operator Extrapolation for Stochastic Aggregative Games

This work studies Nash equilibrium seeking for a class of stochastic aggregative games, where each player has an expectation-valued objective function depending on its local strategy and the aggregate of all players' strategies. We propose a distributed algorithm with operator extrapolation, in which each player maintains an estimate of this aggregate by exchanging this information with its neighbors over a time-varying network, and updates its decision through the mirror descent method. An operator extrapolation at the search direction is applied such that the two step historical gradient samples are utilized to accelerate the convergence. Under the strongly monotone assumption on the pseudo-gradient mapping, we prove that the proposed algorithm can achieve the optimal convergence rate of $\mathcal{O}(1/k)$ for Nash equilibrium seeking of stochastic games. Finally, the algorithm performance is demonstrated via numerical simulations.

preprint2022arXiv

Attention-Guided NIR Image Colorization via Adaptive Fusion of Semantic and Texture Clues

Near infrared (NIR) imaging has been widely applied in low-light imaging scenarios; however, it is difficult for human and algorithms to perceive the real scene in the colorless NIR domain. While Generative Adversarial Network (GAN) has been widely employed in various image colorization tasks, it is challenging for a direct mapping mechanism, such as a conventional GAN, to transform an image from the NIR to the RGB domain with correct semantic reasoning, well-preserved textures, and vivid color combinations concurrently. In this work, we propose a novel Attention-based NIR image colorization framework via Adaptive Fusion of Semantic and Texture clues, aiming at achieving these goals within the same framework. The tasks of texture transfer and semantic reasoning are carried out in two separate network blocks. Specifically, the Texture Transfer Block (TTB) aims at extracting texture features from the NIR image's Laplacian component and transferring them for subsequent color fusion. The Semantic Reasoning Block (SRB) extracts semantic clues and maps the NIR pixel values to the RGB domain. Finally, a Fusion Attention Block (FAB) is proposed to adaptively fuse the features from the two branches and generate an optimized colorization result. In order to enhance the network's learning capacity in semantic reasoning as well as mapping precision in texture transfer, we have proposed the Residual Coordinate Attention Block (RCAB), which incorporates coordinate attention into a residual learning framework, enabling the network to capture long-range dependencies along the channel direction and meanwhile precise positional information can be preserved along spatial directions. RCAB is also incorporated into FAB to facilitate accurate texture alignment during fusion. Both quantitative and qualitative evaluations show that the proposed method outperforms state-of-the-art NIR image colorization methods.

preprint2022arXiv

Carrying out CNN Channel Pruning in a White Box

Channel Pruning has been long studied to compress CNNs, which significantly reduces the overall computation. Prior works implement channel pruning in an unexplainable manner, which tends to reduce the final classification errors while failing to consider the internal influence of each channel. In this paper, we conduct channel pruning in a white box. Through deep visualization of feature maps activated by different channels, we observe that different channels have a varying contribution to different categories in image classification. Inspired by this, we choose to preserve channels contributing to most categories. Specifically, to model the contribution of each channel to differentiating categories, we develop a class-wise mask for each channel, implemented in a dynamic training manner w.r.t. the input image's category. On the basis of the learned class-wise mask, we perform a global voting mechanism to remove channels with less category discrimination. Lastly, a fine-tuning process is conducted to recover the performance of the pruned model. To our best knowledge, it is the first time that CNN interpretability theory is considered to guide channel pruning. Extensive experiments on representative image classification tasks demonstrate the superiority of our White-Box over many state-of-the-arts. For instance, on CIFAR-10, it reduces 65.23% FLOPs with even 0.62% accuracy improvement for ResNet-110. On ILSVRC-2012, White-Box achieves a 45.6% FLOPs reduction with only a small loss of 0.83% in the top-1 accuracy for ResNet-50.

preprint2022arXiv

CentripetalText: An Efficient Text Instance Representation for Scene Text Detection

Scene text detection remains a grand challenge due to the variation in text curvatures, orientations, and aspect ratios. One of the hardest problems in this task is how to represent text instances of arbitrary shapes. Although many methods have been proposed to model irregular texts in a flexible manner, most of them lose simplicity and robustness. Their complicated post-processings and the regression under Dirac delta distribution undermine the detection performance and the generalization ability. In this paper, we propose an efficient text instance representation named CentripetalText (CT), which decomposes text instances into the combination of text kernels and centripetal shifts. Specifically, we utilize the centripetal shifts to implement pixel aggregation, guiding the external text pixels to the internal text kernels. The relaxation operation is integrated into the dense regression for centripetal shifts, allowing the correct prediction in a range instead of a specific value. The convenient reconstruction of text contours and the tolerance of prediction errors in our method guarantee the high detection accuracy and the fast inference speed, respectively. Besides, we shrink our text detector into a proposal generation module, namely CentripetalText Proposal Network, replacing Segmentation Proposal Network in Mask TextSpotter v3 and producing more accurate proposals. To validate the effectiveness of our method, we conduct experiments on several commonly used scene text benchmarks, including both curved and multi-oriented text datasets. For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods, e.g., F-measure of 86.3% at 40.0 FPS on Total-Text, F-measure of 86.1% at 34.8 FPS on MSRA-TD500, etc. For the task of end-to-end scene text recognition, our method outperforms Mask TextSpotter v3 by 1.1% on Total-Text.

preprint2022arXiv

ChoreoGraph: Music-conditioned Automatic Dance Choreography over a Style and Tempo Consistent Dynamic Graph

To generate dance that temporally and aesthetically matches the music is a challenging problem, as the following factors need to be considered. First, the aesthetic styles and messages conveyed by the motion and music should be consistent. Second, the beats of the generated motion should be locally aligned to the musical features. And finally, basic choreomusical rules should be observed, and the motion generated should be diverse. To address these challenges, we propose ChoreoGraph, which choreographs high-quality dance motion for a given piece of music over a Dynamic Graph. A data-driven learning strategy is proposed to evaluate the aesthetic style and rhythmic connections between music and motion in a progressively learned cross-modality embedding space. The motion sequences will be beats-aligned based on the music segments and then incorporated as nodes of a Dynamic Motion Graph. Compatibility factors such as the style and tempo consistency, motion context connection, action completeness, and transition smoothness are comprehensively evaluated to determine the node transition in the graph. We demonstrate that our repertoire-based framework can generate motions with aesthetic consistency and robustly extensible in diversity. Both quantitative and qualitative experiment results show that our proposed model outperforms other baseline models.

preprint2022arXiv

Complex valued semi-linear heat equations in super-critical spaces $E^s_σ$

We consider the Cauchy problem for the complex valued semi-linear heat equation $$ \partial_t u - Δu - u^m =0, \ \ u (0,x) = u_0(x), $$ where $m\geq 2$ is an integer and the initial data belong to super-critical spaces $E^s_σ$ for which the norms are defined by $$ \|f\|_{E^s_σ} = \|\langle ξ\rangle^σ2^{s|ξ|}\hat{f}(ξ)\|_{L^2}, \ \ σ\in \mathbb{R}, \ s<0. $$ If $s<0$, then any Sobolev space $H^{r}$ is a subspace of $E^s_σ$, i.e., $\cup_{r \in \mathbb{R}} H^r \subset E^s_σ$. We obtain the global existence and uniqueness of the solutions if the initial data belong to $E^s_σ$ ($s<0, \ σ\geq d/2-2/(m-1)$) and their Fourier transforms are supported in the first octant, the smallness conditions on the initial data in $E^s_σ$ are not required for the global solutions. Moreover, we show that the error between the solution $u$ and the iteration solution $u^{(j)}$ is $C^j/(j\,!)^2$. Similar results also hold if the nonlinearity $u^m$ is replaced by an exponential function $e^u-1$.

preprint2022arXiv

Data-driven Self-triggered Control via Trajectory Prediction

Self-triggered control, a well-documented technique for reducing the communication overhead while ensuring desired system performance, is gaining increasing popularity. However, existing methods for self-triggered control require explicit system models that are assumed perfectly known a priori. An end-to-end control paradigm known as data-driven control learns control laws directly from data, and offers a competing alternative to the routine system identification-then-control method. In this context, the present paper puts forth data-driven self-triggered control schemes for unknown linear systems using data collected offline. Specifically, for output feedback control systems, a data-driven model predictive control (MPC) scheme is proposed, which computes a sequence of control inputs while generating a predicted system trajectory. A data-driven self-triggering law is designed using the predicted trajectory, to determine the next triggering time once a new measurement becomes available. For state feedback control systems, instead of capitalizing on MPC to predict the trajectory, a data-fitting problem using the pre-collected input-state data is solved, whose solution is employed to construct the self-triggering mechanism. Both feasibility and stability are established for the proposed self-triggered controllers, which are validated using numerical examples.

preprint2022arXiv

Data-Efficient Graph Grammar Learning for Molecular Generation

The problem of molecular generation has received significant attention recently. Existing methods are typically based on deep neural networks and require training on large datasets with tens of thousands of samples. In practice, however, the size of class-specific chemical datasets is usually limited (e.g., dozens of samples) due to labor-intensive experimentation and data collection. This presents a considerable challenge for the deep learning generative models to comprehensively describe the molecular design space. Another major challenge is to generate only physically synthesizable molecules. This is a non-trivial task for neural network-based generative models since the relevant chemical knowledge can only be extracted and generalized from the limited training data. In this work, we propose a data-efficient generative model that can be learned from datasets with orders of magnitude smaller sizes than common benchmarks. At the heart of this method is a learnable graph grammar that generates molecules from a sequence of production rules. Without any human assistance, these production rules are automatically constructed from training data. Furthermore, additional chemical knowledge can be incorporated in the model by further grammar optimization. Our learned graph grammar yields state-of-the-art results on generating high-quality molecules for three monomer datasets that contain only ${\sim}20$ samples each. Our approach also achieves remarkable performance in a challenging polymer generation task with only $117$ training samples and is competitive against existing methods using $81$k data points. Code is available at https://github.com/gmh14/data_efficient_grammar.

preprint2022arXiv

Decentralized Nash Equilibria Learning for Online Game with Bandit Feedback

This paper studies distributed online bandit learning of generalized Nash equilibria for online game, where cost functions of all players and coupled constraints are time-varying. The values rather than full information of cost and local constraint functions are revealed to local players gradually. The goal of each player is to selfishly minimize its own cost function with no future information subject to a strategy set constraint and time-varying coupled inequality constraints. To this end, a distributed online algorithm based on mirror descent and one-point bandit feedback is designed for seeking generalized Nash equilibria of the online game. It is shown that the devised online algorithm achieves sublinear expected regrets and accumulated constraint violation if the path variation of the generalized Nash equilibrium sequence is sublinear. Furthermore, the proposed algorithm is extended to the scenario of delayed bandit feedback, that is, the values of cost and constraint functions are disclosed to local players with time delays. It is also demonstrated that the online algorithm with delayed bandit feedback still has sublinear expected regrets and accumulated constraint violation under some conditions on the path variation and delay. Simulations are presented to illustrate the efficiency of theoretical results.

preprint2022arXiv

Discrete Boltzmann modeling of Rayleigh-Taylor instability: effects of interfacial tension, viscosity and heat conductivity

The Rayleigh-Taylor Instability (RTI) in compressible flow with inter-molecular interactions is probed via the Discrete Boltzmann Method (DBM). The effects of interfacial tension, viscosity and heat conduction are investigated. It is found that the influences of interfacial tension on the perturbation amplitude, bubble velocity, and two kinds of entropy production rates all show differences at different stages of RTI evolution. It inhibits the RTI evolution at the bubble acceleration stage, while at the asymptotic velocity stage, it first promotes and then inhibits the RTI evolution. Viscosity and heat conduction inhibit the RTI evolution. Viscosity shows a suppressive effect on entropy generation rate related to heat flow at the early stage but a first promotive and then suppressive effect on entropy generation rate related to heat flow at a later stage. Heat conduction shows a promotive effect on entropy generation rate related to heat flow at an early stage. Still, it offers a first promotive and then suppressive effect on entropy generation rate related to heat flow at a later stage. By introducing the morphological boundary length, we found that the stage of exponential growth of interface length with time corresponds to the bubble acceleration stage. The first maximum point of interface length change rate and the first maximum point of the change rate of entropy generation rate related to viscous stress can be used as a new criterion for RTI to enter the asymptotic velocity stage.

preprint2022arXiv

Distributed coordination for seeking the optimal Nash equilibrium of aggregative games

This paper aims to design a distributed coordination algorithm for solving a multi-agent decision problem with a hierarchical structure. The primary goal is to search the Nash equilibrium of a noncooperative game such that each player has no incentive to deviate from the equilibrium under its private objective. Meanwhile, the agents can coordinate to optimize the social cost within the set of Nash equilibria of the underlying game. Such an optimal Nash equilibrium problem can be modeled as a distributed optimization problem with variational inequality constraints. We consider the scenario where the objective functions of both the underlying game and social cost optimization problem have a special aggregation structure. Since each player only has access to its local objectives while cannot know all players&#39; decisions, a distributed algorithm is highly desirable. By utilizing the Tikhonov regularization and dynamical averaging tracking technique, we propose a distributed coordination algorithm by introducing an incentive term in addition to the gradient-based Nash equilibrium seeking, so as to intervene players&#39; decisions to improve the system efficiency. We prove its convergence to the optimal Nash equilibrium of a monotone aggregative game with simulation studies.

preprint2022arXiv

Distributed Momentum-based Frank-Wolfe Algorithm for Stochastic Optimization

This paper considers distributed stochastic optimization, in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network. Stochastic optimization problems are usually tackled by variants of projected stochastic gradient descent. However, projecting a point onto a feasible set is often expensive. The Frank-Wolfe (FW) method has well-documented merits in handling convex constraints, but existing stochastic FW algorithms are basically developed for centralized settings. In this context, the present work puts forth a distributed stochastic Frank-Wolfe solver, by judiciously combining Nesterov&#39;s momentum and gradient tracking techniques for stochastic convex and nonconvex optimization over networks. It is shown that the convergence rate of the proposed algorithm is $\mathcal{O}(k^{-\frac{1}{2}})$ for convex optimization, and $\mathcal{O}(1/\mathrm{log}_2(k))$ for nonconvex optimization. The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives.

preprint2022arXiv

Distributed Optimization with Projection-free Dynamics

We consider continuous-time dynamics for distributed optimization with set constraints in the paper. To handle the computational complexity of projection-based dynamics due to solving a general quadratic optimization subproblem with projection, we propose a distributed projection-free dynamics by employing the Frank-Wolfe method, also known as the conditional gradient algorithm. The process searches a feasible descent direction by solving an alternative linear optimization instead of a quadratic one. To make the approach applicable over weight-balanced digraphs, we design a dynamics for the consensus of local decision variables and another dynamics of auxiliary variables to track the global gradient. Then we prove the convergence of the dynamical systems to the optimal solution, and provide detailed numerical comparisons with both projection-based dynamics and other distributed projection-free algorithms. Also, we derive the distributed discrete-time scheme following the instructive ideas of the proposed dynamics and provide its accordingly convergence rate.

preprint2022arXiv

Distributed stochastic projection-free solver for constrained optimization

This paper proposes a distributed stochastic projection-free algorithm for large-scale constrained finite-sum optimization whose constraint set is complicated such that the projection onto the constraint set can be expensive. The global cost function is allocated to multiple agents, each of which computes its local stochastic gradients and communicates with its neighbors to solve the global problem. Stochastic gradient methods enable low computational cost, while they are hard and slow to converge due to the variance caused by random sampling. To construct a convergent distributed stochastic projection-free algorithm, this paper incorporates a variance reduction technique and gradient tracking technique in the Frank-Wolfe update. We develop a sampling rule for the variance reduction technique to reduce the variance introduced by stochastic gradients. Complete and rigorous proofs show that the proposed distributed projection-free algorithm converges with a sublinear convergence rate and enjoys superior complexity guarantees for both convex and non-convex objective functions. By comparative simulations, we demonstrate the convergence and computational efficiency of the proposed algorithm.

preprint2022arXiv

Distributed Variable Sample-size Stochastic Optimization with Fixed Step-sizes

The paper considers distributed stochastic optimization over randomly switching networks, where agents collaboratively minimize the average of all agents&#39; local expectation-valued convex cost functions. Due to the stochasticity in gradient observations, distributedness of local functions, and randomness of communication topologies, distributed algorithms with a convergence guarantee under fixed step-sizes have not been achieved yet. This work incorporates variance reduction scheme into the distributed stochastic gradient tracking algorithm, where local gradients are estimated by averaging across a variable number of sampled gradients. With an identically and independently distributed (i.i.d.) random network, we show that all agents&#39; iterates converge almost surely to the same optimal solution under fixed step-sizes. When the global cost function is strongly convex and the sample size increases at a geometric rate, we prove that the iterates geometrically converge to the unique optimal solution, and establish the iteration, oracle, and communication complexity. The algorithm performance including rate and complexity analysis are further investigated with constant step-sizes and a polynomially increasing sample size. Finally, the empirical algorithm performance are illustrated with numerical examples.

preprint2022arXiv

Dynamical Primal-Dual Accelerated Method with Applications to Network Optimization

This paper develops a continuous-time primal-dual accelerated method with an increasing damping coefficient for a class of convex optimization problems with affine equality constraints. This paper analyzes critical values for parameters in the proposed method and prove that the rate of convergence in terms of the duality gap function is $O(\tfrac{1}{t^2})$ by choosing suitable parameters. As far as we know, this is the first continuous-time primal-dual accelerated method that can obtain the optimal rate. Then this work applies the proposed method to two network optimization problems, a distributed optimization problem with consensus constraints and a distributed extended monotropic optimization problem, and obtains two variant distributed algorithms. Finally, numerical simulations are given to demonstrate the efficacy of the proposed method.

preprint2022arXiv

Enhancing Innate and Adaptive Immune Systems by Cold Atmospheric Plasma (CAP) and Its Antitumor Immunity

Cold atmospheric plasma (CAP) is a near room temperature ionized gas, generated under non-equilibrium discharge conditions. Here we show that a short exposure of rat peritoneal exudate macrophages and T-cells to CAP in vitro, triggered an inflammatory phenotype leading to better antigen-presenting and effector cell function respectively. Different from previous studies mainly using immortalized cell lines, both macrophage and T-cells in this study were primary cells isolated from mice. Furthermore, ex-vivo exposure of T-cells to CAP, followed by their adoptive transfer into tumor-bearing mice resulted in a strong antitumor effect in vivo. Mechanistically, CAP seems to disrupt tolerogenic pathways leading to enhanced production of pro-inflammatory cytokines while limiting the production of anti-inflammatory cytokines and the expression of inhibitory molecules such as programmed death-ligand 1 (PD-L1). CAP represents therefore a novel, non-toxic and easy to deliver technology to augment the function of immune cells and enhance antitumor responses when used as a component of T-cell adoptive immunotherapies strategies or, potentially in combination with other cancer immunotherapeutic approaches.

preprint2022arXiv

Explaining Adverse Actions in Credit Decisions Using Shapley Decomposition

When a financial institution declines an application for credit, an adverse action (AA) is said to occur. The applicant is then entitled to an explanation for the negative decision. This paper focuses on credit decisions based on a predictive model for probability of default and proposes a methodology for AA explanation. The problem involves identifying the important predictors responsible for the negative decision and is straightforward when the underlying model is additive. However, it becomes non-trivial even for linear models with interactions. We consider models with low-order interactions and develop a simple and intuitive approach based on first principles. We then show how the methodology generalizes to the well-known Shapely decomposition and the recently proposed concept of Baseline Shapley (B-Shap). Unlike other Shapley techniques in the literature for local interpretability of machine learning results, B-Shap is computationally tractable since it involves just function evaluations. An illustrative case study is used to demonstrate the usefulness of the method. The paper also discusses situations with highly correlated predictors and desirable properties of fitted models in the credit-lending context, such as monotonicity and continuity.

preprint2022arXiv

Fluid laminarization in protein-based high internal phase emulsions process

Protein-based high internal phase emulsions (HIPEs) have gained tremendous attention in diverse fields, but their mechanism in the emulsification process remains elusive. In this article, HIPEs were stabilized directly by food-grade proteins, depending on a self-organized process featuring a fluid laminarization. We elucidated that the emulsification with the rotor-stator mixer is a typical non-equilibrium process. The crucial factor for the process is related to the irreversible energy dissipation, while the internal phase volume fraction is the threshold determining the laminarization. The feasible explanation speculated that the transition corresponds to the dissipative structure, i.e., compressive droplets, arising from the spatiotemporal self-organization, to dissipate the turbulent kinetic energy. We found a new paradigm of dissipative structure, comprehending such structure in the HIPEs emulsification process, which is expected to pave the way for its industrial-scale production with the virtue of low-cost proteins.

preprint2022arXiv

Global Cauchy problems for the nonlocal (derivative) NLS in $E^s_σ$

We consider the Cauchy problem for the (derivative) nonlocal NLS in super-critical function spaces $E^s_σ$ for which the norms are defined by $$ \|f\|_{E^s_σ} = \|\langleξ\rangle^σ2^{s|ξ|}\hat{f}(ξ)\|_{L^2}, \ s<0, \ σ\in \mathbb{R}. $$ Any Sobolev space $H^{r}$ is a subspace of $E^s_σ$, i.e., $H^r \subset E^s_σ$ for any $ r,σ\in \mathbb{R}$ and $s<0$. Let $s<0$ and $σ>-1/2$ ($σ>0$) for the nonlocal NLS (for the nonlocal derivative NLS). We show the global existence and uniqueness of the solutions if the initial data belong to $E^s_σ$ and their Fourier transforms are supported in $(0, \infty)$, the smallness conditions on the initial data in $E^s_σ$ are not required for the global solutions.

preprint2022arXiv

Graph Decoupling Attention Markov Networks for Semi-supervised Graph Node Classification

Graph neural networks (GNN) have been ubiquitous in graph node classification tasks. Most of GNN methods update the node embedding iteratively by aggregating its neighbors&#39; information. However, they often suffer from negative disturbance, due to edges connecting nodes with different labels. One approach to alleviate this negative disturbance is to use attention to learn the weights of aggregation, but current attention-based GNNs only consider feature similarity and also suffer from the lack of supervision. In this paper, we consider the label dependency of graph nodes and propose a decoupling attention mechanism to learn both hard and soft attention. The hard attention is learned on labels for a refined graph structure with fewer inter-class edges, so that the aggregation&#39;s negative disturbance can be reduced. The soft attention aims to learn the aggregation weights based on features over the refined graph structure to enhance information gains during message passing. Particularly, we formulate our model under the EM framework, and the learned attention is used to guide the label propagation in the M-step and the feature propagation in the E-step, respectively. Extensive experiments are performed on six well-known benchmark graph datasets to verify the effectiveness of the proposed method.

preprint2022arXiv

Graph-Augmented Normalizing Flows for Anomaly Detection of Multiple Time Series

Anomaly detection is a widely studied task for a broad variety of data types; among them, multiple time series appear frequently in applications, including for example, power grids and traffic networks. Detecting anomalies for multiple time series, however, is a challenging subject, owing to the intricate interdependencies among the constituent series. We hypothesize that anomalies occur in low density regions of a distribution and explore the use of normalizing flows for unsupervised anomaly detection, because of their superior quality in density estimation. Moreover, we propose a novel flow model by imposing a Bayesian network among constituent series. A Bayesian network is a directed acyclic graph (DAG) that models causal relationships; it factorizes the joint probability of the series into the product of easy-to-evaluate conditional probabilities. We call such a graph-augmented normalizing flow approach GANF and propose joint estimation of the DAG with flow parameters. We conduct extensive experiments on real-world datasets and demonstrate the effectiveness of GANF for density estimation, anomaly detection, and identification of time series distribution drift.

preprint2022arXiv

Hyperspectral Image Super-resolution with Deep Priors and Degradation Model Inversion

To overcome inherent hardware limitations of hyperspectral imaging systems with respect to their spatial resolution, fusion-based hyperspectral image (HSI) super-resolution is attracting increasing attention. This technique aims to fuse a low-resolution (LR) HSI and a conventional high-resolution (HR) RGB image in order to obtain an HR HSI. Recently, deep learning architectures have been used to address the HSI super-resolution problem and have achieved remarkable performance. However, they ignore the degradation model even though this model has a clear physical interpretation and may contribute to improve the performance. We address this problem by proposing a method that, on the one hand, makes use of the linear degradation model in the data-fidelity term of the objective function and, on the other hand, utilizes the output of a convolutional neural network for designing a deep prior regularizer in spectral and spatial gradient domains. Experiments show the performance improvement achieved with this strategy.

preprint2022arXiv

Instantaneous indirect measurement principle in quantum mechanics

In quantum systems, the measurement of operators and the measurement of the quantum states of the system are very challenging tasks. In this Letter, we propose a method to obtain the average value of one operator in a certain state by measuring the instantaneous change of the average value of another operator with the assistance of a known reference state. We refer to this measurement method as the instantaneous indirect measurement method. By studying the application of this method to some typical models, we find that this measurement can be applied to the measurement of an arbitrary state of a quantum system. Furthermore, for the system to be measured, we find that such measurement neither significantly affects the wave function of the system nor causes wave function collapse of the system. Also, our study shows that when two independent systems are coupled, the information mapping between them is done instantaneously. Finally, we discuss applying this measurement method to the measurement of quantum Fisher information, which quantizes the limited accuracy of estimating a parameter from a quantum state.

preprint2022arXiv

Interpretable Feature Engineering for Time Series Predictors using Attention Networks

Regression problems with time-series predictors are common in banking and many other areas of application. In this paper, we use multi-head attention networks to develop interpretable features and use them to achieve good predictive performance. The customized attention layer explicitly uses multiplicative interactions and builds feature-engineering heads that capture temporal dynamics in a parsimonious manner. Convolutional layers are used to combine multivariate time series. We also discuss methods for handling static covariates in the modeling process. Visualization and explanation tools are used to interpret the results and explain the relationship between the inputs and the extracted features. Both simulation and real dataset are used to illustrate the usefulness of the methodology. Keyword: Attention heads, Deep neural networks, Interpretable feature engineering

preprint2022arXiv

Joint learning of object graph and relation graph for visual question answering

Modeling visual question answering(VQA) through scene graphs can significantly improve the reasoning accuracy and interpretability. However, existing models answer poorly for complex reasoning questions with attributes or relations, which causes false attribute selection or missing relation in Figure 1(a). It is because these models cannot balance all kinds of information in scene graphs, neglecting relation and attribute information. In this paper, we introduce a novel Dual Message-passing enhanced Graph Neural Network (DM-GNN), which can obtain a balanced representation by properly encoding multi-scale scene graph information. Specifically, we (i)transform the scene graph into two graphs with diversified focuses on objects and relations; Then we design a dual structure to encode them, which increases the weights from relations (ii)fuse the encoder output with attribute features, which increases the weights from attributes; (iii)propose a message-passing mechanism to enhance the information transfer between objects, relations and attributes. We conduct extensive experiments on datasets including GQA, VG, motif-VG and achieve new state of the art.

preprint2022arXiv

Length L-function for Network-Constrained Point Data

Network constrained points are referred to as points restricted to road networks, such as taxi pick up and drop off locations. A significant pattern of network constrained points is referred to as an aggregation; e.g., the aggregation of pick up points may indicate a high taxi demand in a particular area. Although the network K function using the shortest path network distance has been proposed to detect point aggregation, its statistical unit is still radius based. R neighborhood, in particular, has inconsistent network length owing to the complex configuration of road networks which cause unfair counts and identification errors in networks (e.g., the length of the r neighborhood located at an intersection is longer than that on straight roads, which may include more points). In this study, we derived the length L function for network constrained points to identify the aggregation by designing a novel neighborhood as the statistical unit; the total length of this is consistent throughout the network. Compared to the network K function, our method can detect a true to life aggregation scale, identify the aggregation with higher network density, as well as identify the aggregations that the network K function cannot. We validated our method using taxi trips pick up location data within Zhongguancun Area in Beijing, analyzing differences in maximal aggregation between workdays and weekends to understand taxi demand in the morning and evening peak.

preprint2022arXiv

Locality Guidance for Improving Vision Transformers on Tiny Datasets

While the Vision Transformer (VT) architecture is becoming trendy in computer vision, pure VT models perform poorly on tiny datasets. To address this issue, this paper proposes the locality guidance for improving the performance of VTs on tiny datasets. We first analyze that the local information, which is of great importance for understanding images, is hard to be learned with limited data due to the high flexibility and intrinsic globality of the self-attention mechanism in VTs. To facilitate local information, we realize the locality guidance for VTs by imitating the features of an already trained convolutional neural network (CNN), inspired by the built-in local-to-global hierarchy of CNN. Under our dual-task learning paradigm, the locality guidance provided by a lightweight CNN trained on low-resolution images is adequate to accelerate the convergence and improve the performance of VTs to a large extent. Therefore, our locality guidance approach is very simple and efficient, and can serve as a basic performance enhancement method for VTs on tiny datasets. Extensive experiments demonstrate that our method can significantly improve VTs when training from scratch on tiny datasets and is compatible with different kinds of VTs and datasets. For example, our proposed method can boost the performance of various VTs on tiny datasets (e.g., 13.07% for DeiT, 8.98% for T2T and 7.85% for PVT), and enhance even stronger baseline PVTv2 by 1.86% to 79.30%, showing the potential of VTs on tiny datasets. The code is available at https://github.com/lkhl/tiny-transformers.

preprint2022arXiv

Memory-based Message Passing: Decoupling the Message for Propogation from Discrimination

Message passing is a fundamental procedure for graph neural networks in the field of graph representation learning. Based on the homophily assumption, the current message passing always aggregates features of connected nodes, such as the graph Laplacian smoothing process. However, real-world graphs tend to be noisy and/or non-smooth. The homophily assumption does not always hold, leading to sub-optimal results. A revised message passing method needs to maintain each node&#39;s discriminative ability when aggregating the message from neighbors. To this end, we propose a Memory-based Message Passing (MMP) method to decouple the message of each node into a self-embedding part for discrimination and a memory part for propagation. Furthermore, we develop a control mechanism and a decoupling regularization to control the ratio of absorbing and excluding the message in the memory for each node. More importantly, our MMP is a general skill that can work as an additional layer to help improve traditional GNNs performance. Extensive experiments on various datasets with different homophily ratios demonstrate the effectiveness and robustness of the proposed method.

preprint2022arXiv

NDF: Neural Deformable Fields for Dynamic Human Modelling

We propose Neural Deformable Fields (NDF), a new representation for dynamic human digitization from a multi-view video. Recent works proposed to represent a dynamic human body with shared canonical neural radiance fields which links to the observation space with deformation fields estimations. However, the learned canonical representation is static and the current design of the deformation fields is not able to represent large movements or detailed geometry changes. In this paper, we propose to learn a neural deformable field wrapped around a fitted parametric body model to represent the dynamic human. The NDF is spatially aligned by the underlying reference surface. A neural network is then learned to map pose to the dynamics of NDF. The proposed NDF representation can synthesize the digitized performer with novel views and novel poses with a detailed and reasonable dynamic appearance. Experiments show that our method significantly outperforms recent human synthesis methods.

preprint2022arXiv

Neural Optimization Machine: A Neural Network Approach for Optimization

A novel neural network (NN) approach is proposed for constrained optimization. The proposed method uses a specially designed NN architecture and training/optimization procedure called Neural Optimization Machine (NOM). The objective functions for the NOM are approximated with NN models. The optimization process is conducted by the neural network&#39;s built-in backpropagation algorithm. The NOM solves optimization problems by extending the architecture of the NN objective function model. This is achieved by appropriately designing the NOM&#39;s structure, activation function, and loss function. The NN objective function can have arbitrary architectures and activation functions. The application of the NOM is not limited to specific optimization problems, e.g., linear and quadratic programming. It is shown that the increase of dimension of design variables does not increase the computational cost significantly. Then, the NOM is extended for multiobjective optimization. Finally, the NOM is tested using numerical optimization problems and applied for the optimal design of processing parameters in additive manufacturing.

preprint2022arXiv

Observation of short-period helical spin order and magnetic transition in a non-chiral centrosymmetric helimagnet

The search for materials exhibiting nanoscale spiral order continues to be fuelled by the promise of emergent inductors. Although such spin textures have been reported in many materials, most of them exhibit long periods or are limited to operate far below room temperature. Here, we present the real-space observation of an ordered helical spin order with a period of 3.2 nm in a non-chiral centrosymmetric helimagnet MnCoSi at room temperature via multi-angle and multi-azimuth approach of Lorentz transmission electron microscopy (TEM). A magnetic transition from the ordered helical spin order to a cycloidal spin order below 228 K is clearly revealed by in situ neutron powder diffraction and Lorentz TEM, which is closely correlated with temperature-induced variation in magneto-crystalline anisotropy. These results reveal the origin of spiral ordered spin textures in non-chiral centrosymmetric helimagnet, which can serve as a new strategy for searching materials with nanoscale spin order with potential applications in emergent electromagnetism.

preprint2022arXiv

OpenMedIA: Open-Source Medical Image Analysis Toolbox and Benchmark under Heterogeneous AI Computing Platforms

In this paper, we present OpenMedIA, an open-source toolbox library containing a rich set of deep learning methods for medical image analysis under heterogeneous Artificial Intelligence (AI) computing platforms. Various medical image analysis methods, including 2D/3D medical image classification, segmentation, localisation, and detection, have been included in the toolbox with PyTorch and/or MindSpore implementations under heterogeneous NVIDIA and Huawei Ascend computing systems. To our best knowledge, OpenMedIA is the first open-source algorithm library providing compared PyTorch and MindSpore implementations and results on several benchmark datasets. The source codes and models are available at https://git.openi.org.cn/OpenMedIA.

preprint2022arXiv

Parallel and distributed asynchronous adaptive stochastic gradient methods

Stochastic gradient methods (SGMs) are the predominant approaches to train deep learning models. The adaptive versions (e.g., Adam and AMSGrad) have been extensively used in practice, partly because they achieve faster convergence than the non-adaptive versions while incurring little overhead. On the other hand, asynchronous (async) parallel computing has exhibited significantly higher speed-up over its synchronous (sync) counterpart. Async-parallel non-adaptive SGMs have been well studied in the literature from the perspectives of both theory and practical performance. Adaptive SGMs can also be implemented without much difficulty in an async-parallel way. However, to the best of our knowledge, no theoretical result of async-parallel adaptive SGMs has been established. The difficulty for analyzing adaptive SGMs with async updates originates from the second moment term. In this paper, we propose an async-parallel adaptive SGM based on AMSGrad. We show that the proposed method inherits the convergence guarantee of AMSGrad for both convex and non-convex problems, if the staleness (also called delay) caused by asynchrony is bounded. Our convergence rate results indicate a nearly linear parallelization speed-up if $τ=o(K^{\frac{1}{4}})$, where $τ$ is the staleness and $K$ is the number of iterations. The proposed method is tested on both convex and non-convex machine learning problems, and the numerical results demonstrate its clear advantages over the sync counterpart and the async-parallel nonadaptive SGM.

preprint2022arXiv

Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study

This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of extreme gradient boosting machines (XGB) and random forests (RFs), and feedforward neural networks (FFNNs) from TensorFlow. The paper is organized in a findings-based manner, with each section providing general conclusions supported by empirical results from simulation studies that cover a wide range of model complexity and correlation structures among predictors. We considered both continuous and binary responses of different sample sizes. Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models and tree-based boosting algorithms performing better in non-smooth models. This conclusion held generally for predictive performance, identification of important variables, and determining correct input-output relationships as measured by partial dependence plots (PDPs). FFNNs generally had less over-fitting, as measured by the difference in performance between training and testing datasets. However, the difference with XGB was often small. RFs did not perform well in general, confirming the findings in the literature. All models exhibited different degrees of bias seen in PDPs, but the bias was especially problematic for RFs. The extent of the biases varied with correlation among predictors, response type, and data set sample size. In general, tree-based models tended to over-regularize the fitted model in the tails of predictor distributions. Finally, as to be expected, performances were better for continuous responses compared to binary data and with larger samples.

preprint2022arXiv

Prediction of Depression Severity Based on the Prosodic and Semantic Features with Bidirectional LSTM and Time Distributed CNN

Depression is increasingly impacting individuals both physically and psychologically worldwide. It has become a global major public health problem and attracts attention from various research fields. Traditionally, the diagnosis of depression is formulated through semi-structured interviews and supplementary questionnaires, which makes the diagnosis heavily relying on physicians experience and is subject to bias. Mental health monitoring and cloud-based remote diagnosis can be implemented through an automated depression diagnosis system. In this article, we propose an attention-based multimodality speech and text representation for depression prediction. Our model is trained to estimate the depression severity of participants using the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset. For the audio modality, we use the collaborative voice analysis repository (COVAREP) features provided by the dataset and employ a Bidirectional Long Short-Term Memory Network (Bi-LSTM) followed by a Time-distributed Convolutional Neural Network (T-CNN). For the text modality, we use global vectors for word representation (GloVe) to perform word embeddings and the embeddings are fed into the Bi-LSTM network. Results show that both audio and text models perform well on the depression severity estimation task, with best sequence level F1 score of 0.9870 and patient-level F1 score of 0.9074 for the audio model over five classes (healthy, mild, moderate, moderately severe, and severe), as well as sequence level F1 score of 0.9709 and patient-level F1 score of 0.9245 for the text model over five classes. Results are similar for the multimodality fused model, with the highest F1 score of 0.9580 on the patient-level depression detection task over five classes. Experiments show statistically significant improvements over previous works.

preprint2022arXiv

PSP: Million-level Protein Sequence Dataset for Protein Structure Prediction

Proteins are essential component of human life and their structures are important for function and mechanism analysis. Recent work has shown the potential of AI-driven methods for protein structure prediction. However, the development of new models is restricted by the lack of dataset and benchmark training procedure. To the best of our knowledge, the existing open source datasets are far less to satisfy the needs of modern protein sequence-structure related research. To solve this problem, we present the first million-level protein structure prediction dataset with high coverage and diversity, named as PSP. This dataset consists of 570k true structure sequences (10TB) and 745k complementary distillation sequences (15TB). We provide in addition the benchmark training procedure for SOTA protein structure prediction model on this dataset. We validate the utility of this dataset for training by participating CAMEO contest in which our model won the first place. We hope our PSP dataset together with the training benchmark can enable a broader community of AI/biology researchers for AI-driven protein related research.

preprint2022arXiv

Shapley Computations Using Surrogate Model-Based Trees

Shapley-related techniques have gained attention as both global and local interpretation tools because of their desirable properties. However, their computation using conditional expectations is computationally expensive. Approximation methods suggested in the literature have limitations. This paper proposes the use of a surrogate model-based tree to compute Shapley and SHAP values based on conditional expectation. Simulation studies show that the proposed algorithm provides improvements in accuracy, unifies global Shapley and SHAP interpretation, and the thresholding method provides a way to trade-off running time and accuracy.

preprint2022arXiv

Temporal-MPI: Enabling Multi-Plane Images for Dynamic Scene Modelling via Temporal Basis Learning

Novel view synthesis of static scenes has achieved remarkable advancements in producing photo-realistic results. However, key challenges remain for immersive rendering of dynamic scenes. One of the seminal image-based rendering method, the multi-plane image (MPI), produces high novel-view synthesis quality for static scenes. But modelling dynamic contents by MPI is not studied. In this paper, we propose a novel Temporal-MPI representation which is able to encode the rich 3D and dynamic variation information throughout the entire video as compact temporal basis and coefficients jointly learned. Time-instance MPI for rendering can be generated efficiently using mini-seconds by linear combinations of temporal basis and coefficients from Temporal-MPI. Thus novel-views at arbitrary time-instance will be able to be rendered via Temporal-MPI in real-time with high visual quality. Our method is trained and evaluated on Nvidia Dynamic Scene Dataset. We show that our proposed Temporal- MPI is much faster and more compact compared with other state-of-the-art dynamic scene modelling methods.

preprint2022arXiv

Time-Frequency Mask Aware Bi-directional LSTM: A Deep Learning Approach for Underwater Acoustic Signal Separation

The underwater acoustic signals separation is a key technique for the underwater communications. The existing methods are mostly model-based, and could not accurately characterise the practical underwater acoustic communication environment. They are only suitable for binary signal separation, but cannot handle multivariate signal separation. On the other hand, the recurrent neural network (RNN) shows powerful capability in extracting the features of the temporal sequences. Inspired by this, in this paper, we present a data-driven approach for underwater acoustic signals separation using deep learning technology. We use the Bi-directional Long Short-Term Memory (Bi-LSTM) to explore the features of Time-Frequency (T-F) mask, and propose a T-F mask aware Bi-LSTM for signal separation. Taking advantage of the sparseness of the T-F image, the designed Bi-LSTM network is able to extract the discriminative features for separation, which further improves the separation performance. In particular, this method breaks through the limitations of the existing methods, not only achieves good results in multivariate separation, but also effectively separates signals when mixed with 40dB Gaussian noise signals. The experimental results show that this method can achieve a $97\%$ guarantee ratio (PSR), and the average similarity coefficient of the multivariate signal separation is stable above 0.8 under high noise conditions.

preprint2022arXiv

Training-free Transformer Architecture Search

Recently, Vision Transformer (ViT) has achieved remarkable success in several computer vision tasks. The progresses are highly relevant to the architecture design, then it is worthwhile to propose Transformer Architecture Search (TAS) to search for better ViTs automatically. However, current TAS methods are time-consuming and existing zero-cost proxies in CNN do not generalize well to the ViT search space according to our experimental observations. In this paper, for the first time, we investigate how to conduct TAS in a training-free manner and devise an effective training-free TAS (TF-TAS) scheme. Firstly, we observe that the properties of multi-head self-attention (MSA) and multi-layer perceptron (MLP) in ViTs are quite different and that the synaptic diversity of MSA affects the performance notably. Secondly, based on the observation, we devise a modular strategy in TF-TAS that evaluates and ranks ViT architectures from two theoretical perspectives: synaptic diversity and synaptic saliency, termed as DSS-indicator. With DSS-indicator, evaluation results are strongly correlated with the test accuracies of ViT models. Experimental results demonstrate that our TF-TAS achieves a competitive performance against the state-of-the-art manually or automatically design ViT architectures, and it promotes the searching efficiency in ViT search space greatly: from about $24$ GPU days to less than $0.5$ GPU days. Moreover, the proposed DSS-indicator outperforms the existing cutting-edge zero-cost approaches (e.g., TE-score and NASWOT).

preprint2022arXiv

Traversing the Local Polytopes of ReLU Neural Networks: A Unified Approach for Network Verification

Although neural networks (NNs) with ReLU activation functions have found success in a wide range of applications, their adoption in risk-sensitive settings has been limited by the concerns on robustness and interpretability. Previous works to examine robustness and to improve interpretability partially exploited the piecewise linear function form of ReLU NNs. In this paper, we explore the unique topological structure that ReLU NNs create in the input space, identifying the adjacency among the partitioned local polytopes and developing a traversing algorithm based on this adjacency. Our polytope traversing algorithm can be adapted to verify a wide range of network properties related to robustness and interpretability, providing an unified approach to examine the network behavior. As the traversing algorithm explicitly visits all local polytopes, it returns a clear and full picture of the network behavior within the traversed region. The time and space complexity of the traversing algorithm is determined by the number of a ReLU NN&#39;s partitioning hyperplanes passing through the traversing region.

preprint2022arXiv

Universal Deep GNNs: Rethinking Residual Connection in GNNs from a Path Decomposition Perspective for Preventing the Over-smoothing

The performance of GNNs degrades as they become deeper due to the over-smoothing. Among all the attempts to prevent over-smoothing, residual connection is one of the promising methods due to its simplicity. However, recent studies have shown that GNNs with residual connections only slightly slow down the degeneration. The reason why residual connections fail in GNNs is still unknown. In this paper, we investigate the forward and backward behavior of GNNs with residual connections from a novel path decomposition perspective. We find that the recursive aggregation of the median length paths from the binomial distribution of residual connection paths dominates output representation, resulting in over-smoothing as GNNs go deeper. Entangled propagation and weight matrices cause gradient smoothing and prevent GNNs with residual connections from optimizing to the identity mapping. Based on these findings, we present a Universal Deep GNNs (UDGNN) framework with cold-start adaptive residual connections (DRIVE) and feedforward modules. Extensive experiments demonstrate the effectiveness of our method, which achieves state-of-the-art results over non-smooth heterophily datasets by simply stacking standard GNNs.

preprint2022arXiv

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Visual appearance is considered to be the most important cue to understand images for cross-modal retrieval, while sometimes the scene text appearing in images can provide valuable information to understand the visual semantics. Most of existing cross-modal retrieval approaches ignore the usage of scene text information and directly adding this information may lead to performance degradation in scene text free scenarios. To address this issue, we propose a full transformer architecture to unify these cross-modal retrieval scenarios in a single $\textbf{Vi}$sion and $\textbf{S}$cene $\textbf{T}$ext $\textbf{A}$ggregation framework (ViSTA). Specifically, ViSTA utilizes transformer blocks to directly encode image patches and fuse scene text embedding to learn an aggregated visual representation for cross-modal retrieval. To tackle the modality missing problem of scene text, we propose a novel fusion token based transformer aggregation approach to exchange the necessary scene text information only through the fusion token and concentrate on the most important features in each modality. To further strengthen the visual modality, we develop dual contrastive learning losses to embed both image-text pairs and fusion-text pairs into a common cross-modal space. Compared to existing methods, ViSTA enables to aggregate relevant scene text semantics with visual appearance, and hence improve results under both scene text free and scene text aware scenarios. Experimental results show that ViSTA outperforms other methods by at least $\bf{8.4}\%$ at Recall@1 for scene text aware retrieval task. Compared with state-of-the-art scene text free retrieval methods, ViSTA can achieve better accuracy on Flicker30K and MSCOCO while running at least three times faster during the inference stage, which validates the effectiveness of the proposed framework.

preprint2021arXiv

An energetic perspective on rapid quenches in quantum annealing

There are well developed theoretical tools to analyse how quantum dynamics can solve computational problems by varying Hamiltonian parameters slowly, near the adiabatic limit. On the other hand, there are relatively few tools to understand the opposite limit of rapid quenches, as used in quantum annealing and (in the limit of infinitely rapid quenches) in quantum walks. In this paper, we develop several tools which are applicable in the rapid quench regime. Firstly, we analyse the energy expectation value of different elements of the Hamiltonian. From this, we show that monotonic quenches, where the strength of the problem Hamiltonian is consistently increased relative to fluctuation (driver) terms, will yield a better result on average than random guessing. Secondly, we develop methods to determine whether dynamics will occur locally under rapid quench Hamiltonians, and identify cases where a rapid quench will lead to a substantially improved solution. In particular, we find that a technique we refer to as &#34;pre-annealing&#34; can significantly improve the performance of quantum walks. We also show how these tools can provide efficient heuristic estimates for Hamiltonian parameters, a key requirement for practical application of quantum annealing.

preprint2021arXiv

Attention-Guided Progressive Neural Texture Fusion for High Dynamic Range Image Restoration

High Dynamic Range (HDR) imaging via multi-exposure fusion is an important task for most modern imaging platforms. In spite of recent developments in both hardware and algorithm innovations, challenges remain over content association ambiguities caused by saturation, motion, and various artifacts introduced during multi-exposure fusion such as ghosting, noise, and blur. In this work, we propose an Attention-guided Progressive Neural Texture Fusion (APNT-Fusion) HDR restoration model which aims to address these issues within one framework. An efficient two-stream structure is proposed which separately focuses on texture feature transfer over saturated regions and multi-exposure tonal and texture feature fusion. A neural feature transfer mechanism is proposed which establishes spatial correspondence between different exposures based on multi-scale VGG features in the masked saturated HDR domain for discriminative contextual clues over the ambiguous image areas. A progressive texture blending module is designed to blend the encoded two-stream features in a multi-scale and progressive manner. In addition, we introduce several novel attention mechanisms, i.e., the motion attention module detects and suppresses the content discrepancies among the reference images; the saturation attention module facilitates differentiating the misalignment caused by saturation from those caused by motion; and the scale attention module ensures texture blending consistency between different coder/decoder scales. We carry out comprehensive qualitative and quantitative evaluations and ablation studies, which validate that these novel modules work coherently under the same framework and outperform state-of-the-art methods.

preprint2021arXiv

Deep reinforcement learning for RAN optimization and control

Due to the high variability of the traffic in the radio access network (RAN), fixed network configurations are not flexible enough to achieve optimal performance. Our vendors provide several settings of the eNodeB to optimize the RAN performance, such as media access control scheduler, loading balance, etc. But the detailed mechanisms of the eNodeB configurations are usually very complicated and not disclosed, not to mention the large key performance indicators (KPIs) space needed to be considered. These make constructing a simulator, offline tuning, or rule-based solutions difficult. We aim to build an intelligent controller without strong assumption or domain knowledge about the RAN and can run 24/7 without supervision. To achieve this goal, we first build a closed-loop control testbed RAN in a lab environment with one eNodeB provided by one of the largest wireless vendors and four smartphones. Next, we build a double Q network agent trained with the live feedback of the key performance indicators from the RAN. Our work proved the effectiveness of applying deep reinforcement learning to improve network performance in a real RAN network environment.

preprint2021arXiv

Directed Acyclic Graph Neural Networks

Graph-structured data ubiquitously appears in science and engineering. Graph neural networks (GNNs) are designed to exploit the relational inductive bias exhibited in graphs; they have been shown to outperform other forms of neural networks in scenarios where structure information supplements node features. The most common GNN architecture aggregates information from neighborhoods based on message passing. Its generality has made it broadly applicable. In this paper, we focus on a special, yet widely used, type of graphs -- DAGs -- and inject a stronger inductive bias -- partial ordering -- into the neural network design. We propose the \emph{directed acyclic graph neural network}, DAGNN, an architecture that processes information according to the flow defined by the partial order. DAGNN can be considered a framework that entails earlier works as special cases (e.g., models for trees and models updating node representations recurrently), but we identify several crucial components that prior architectures lack. We perform comprehensive experiments, including ablation studies, on representative DAG datasets (i.e., source code, neural architectures, and probabilistic graphical models) and demonstrate the superiority of DAGNN over simpler DAG architectures as well as general graph architectures.

preprint2021arXiv

Distributed proximal gradient algorithm for non-smooth non-convex optimization over time-varying networks

This note studies the distributed non-convex optimization problem with non-smooth regularization, which has wide applications in decentralized learning, estimation and control. The objective function is the sum of different local objective functions, which consist of differentiable (possibly non-convex) cost functions and non-smooth convex functions. This paper presents a distributed proximal gradient algorithm for the non-smooth non-convex optimization problem over time-varying multi-agent networks. Each agent updates local variable estimate by the multi-step consensus operator and the proximal operator. We prove that the generated local variables achieve consensus and converge to the set of critical points with convergence rate $O(1/T)$. Finally, we verify the efficacy of proposed algorithm by numerical simulations.

preprint2021arXiv

Extreme expected values and their applications in quantum information processing

We consider the probability distribution when the monotonic function $F(X)$ of the independent variable $X$ takes the maximum or minimum expected value under the two constraints of a certain probability and a certain expected value of the independent variable $X$. We proposed an equal probability and equal expected value splitting method. With this method, we proved four inequalities, and two of them can be reduced to Jensen&#39;s inequalities. Subsequently, we find that after dividing the non-monotone function $H(X)$ into multiple monotone intervals, the problem of solving the maximum and minimum expected values of $H(X)$ can be transformed into the problem of solving the extreme value of a multiple-variable function. Finally, we apply the proved theory to solve three problems in quantum information processing. When studying the quantum parameter estimation in Mach-Zehnder interferometer, for an equal total input photon number, we find an optimal path-symmetric input state that makes the quantum Fisher information take the maximum value, and we prove that the NOON state is the path-symmetric state that makes the quantum Fisher information takes the minimum value. When studying the quantum parameter estimation in Landau-Zener-Jaynes-Cummings model, we find the optimal initial state of the cavity field that makes the system obtain the maximum quantum Fisher information. Finally, for an equal initial average photon number, we find the optimal initial state of the cavity field that makes the Tavis-Cummings quantum battery have the maximum stored energy and the maximum average charging power.

preprint2021arXiv

Generating a Doppelganger Graph: Resembling but Distinct

Deep generative models, since their inception, have become increasingly more capable of generating novel and perceptually realistic signals (e.g., images and sound waves). With the emergence of deep models for graph structured data, natural interests seek extensions of these generative models for graphs. Successful extensions were seen recently in the case of learning from a collection of graphs (e.g., protein data banks), but the learning from a single graph has been largely under explored. The latter case, however, is important in practice. For example, graphs in financial and healthcare systems contain so much confidential information that their public accessibility is nearly impossible, but open science in these fields can only advance when similar data are available for benchmarking. In this work, we propose an approach to generating a doppelganger graph that resembles a given one in many graph properties but nonetheless can hardly be used to reverse engineer the original one, in the sense of a near zero edge overlap. The approach is an orchestration of graph representation learning, generative adversarial networks, and graph realization algorithms. Through comparison with several graph generative models (either parameterized by neural networks or not), we demonstrate that our result barely reproduces the given graph but closely matches its properties. We further show that downstream tasks, such as node classification, on the generated graphs reach similar performance to the use of the original ones.

preprint2021arXiv

Heat conduction theory including phonon coherence

Understanding and quantifying the fundamental physical property of coherence of thermal excitations is a long-standing and general problem in physics. The conventional theory, i.e. the phonon gas model, fails to describe coherence and its impact on thermal transport. In this letter, we propose a general heat conduction formalism supported by theoretical arguments and direct atomic simulations, which takes into account both the conventional phonon gas model and the wave nature of thermal phonons. By naturally introducing wavepackets in the heat flux from fundamental concepts, we derive an original thermal conductivity expression including coherence times and lifetimes. Our theory and simulations reveal two distinct types of coherence, i.e., intrinsic and mutual, appearing in two different temperature ranges. This contribution establishes a fundamental frame for understanding and quantifying the coherence of thermal phonons, which should have a general impact on the estimation of the thermal properties of solids.

preprint2021arXiv

Intra- and interband excitations induced residue decay of the Bose polaron in a one-dimensional double-well

We investigate the polaronic properties of a single impurity immersed in a weakly interacting bosonic environment confined within a one-dimensional double-well potential using an exact diagonalization approach. We find that an increase of the impurity-bath coupling results in a vanishing residue, signifying the occurrence of the polaron orthogonality catastrophe. Asymptotic configurations of the systems&#39; ground state wave function in the strongly interacting regime are obtained by means of a Schmidt decomposition, which in turn accounts for the observed orthogonality catastrophe of the polaron. We exemplify that depending on the repulsion of the Bose gas, three distinct residue behaviors appear with respect to the impurity-bath coupling. These residue regimes are characterized by two critical values of the bosonic repulsion and originate from the interplay between the intra- and the interband excitations of the impurity. Moreover, they can be clearly distinguished in the corresponding species reduced density matrices with the latter revealing a phase separation on either the one- or the two-body level. The impact of the interspecies mass-imbalance on the impurity&#39;s excitation processes is appreciated yielding an interaction shift of the residue regions. Our results explicate the interplay of intra- and interband excitation processes for the polaron generation in multiwell traps and for designing specific polaron entangled states motivating their exposure in current experiments.

preprint2021arXiv

Learning Structral coherence Via Generative Adversarial Network for Single Image Super-Resolution

Among the major remaining challenges for single image super resolution (SISR) is the capacity to recover coherent images with global shapes and local details conforming to human vision system. Recent generative adversarial network (GAN) based SISR methods have yielded overall realistic SR images, however, there are always unpleasant textures accompanied with structural distortions in local regions. To target these issues, we introduce the gradient branch into the generator to preserve structural information by restoring high-resolution gradient maps in SR process. In addition, we utilize a U-net based discriminator to consider both the whole image and the detailed per-pixel authenticity, which could encourage the generator to maintain overall coherence of the reconstructed images. Moreover, we have studied objective functions and LPIPS perceptual loss is added to generate more realistic and natural details. Experimental results show that our proposed method outperforms state-of-the-art perceptual-driven SR methods in perception index (PI), and obtains more geometrically consistent and visually pleasing textures in natural image restoration.

preprint2021arXiv

Resilient Control under Quantization and Denial-of-Service: Co-designing a Deadbeat Controller and Transmission Protocol

This paper is concerned with the problem of stabilizing continuous-time linear time-invariant systems subject to quantization and Denial-of-Service (DoS) attacks. In this context, two DoS-induced challenges emerge with the design of resilient encoding schemes, namely, the coupling between encoding strategies of different signals, and the synchronization between the encoder and decoder. To address these challenges, a novel structure that is equipped with a deadbeat controller as well as a delicate transmission protocol for the input and output channels, co-designed leveraging the controllability index, is put forward. When both input and output channels are subject to DoS attacks and quantization, the proposed structure is shown able to decouple the encoding schemes for input, output, and estimated output signals. This property is further corroborated by designing encoding schemes as well as conditions that ensure exponential stability of the closed-loop system. On the other hand, when only the output channel is subject to network phenomenon, the proposed structure can achieve exponential stabilization without acknowledgment (ACK) signals, in contrast to existing ACK-based results. Finally, a numerical example is given to demonstrate the practical merits of the proposed approach as well as the theory.

preprint2021arXiv

Sparse Linear Spectral Unmixing of Hyperspectral images using Expectation-Propagation

This paper presents a novel Bayesian approach for hyperspectral image unmixing. The observed pixels are modeled by a linear combination of material signatures weighted by their corresponding abundances. A spike-and-slab abundance prior is adopted to promote sparse mixtures and an Ising prior model is used to capture spatial correlation of the mixture support across pixels. We approximate the posterior distribution of the abundances using the expectation-propagation (EP) method. We show that it can significantly reduce the computational complexity of the unmixing stage and meanwhile provide uncertainty measures, compared to expensive Monte Carlo strategies traditionally considered for uncertainty quantification. Moreover, many variational parameters within each EP factor can be updated in a parallel manner, which enables mapping of efficient algorithmic architectures based on graphics processing units (GPU). Under the same approximate Bayesian framework, we then extend the proposed algorithm to semi-supervised unmixing, whereby the abundances are viewed as latent variables and the expectation-maximization (EM) algorithm is used to refine the endmember matrix. Experimental results on synthetic data and real hyperspectral data illustrate the benefits of the proposed framework over state-of-art linear unmixing methods.

preprint2021arXiv

Towards Unbiased COVID-19 Lesion Localisation and Segmentation via Weakly Supervised Learning

Despite tremendous efforts, it is very challenging to generate a robust model to assist in the accurate quantification assessment of COVID-19 on chest CT images. Due to the nature of blurred boundaries, the supervised segmentation methods usually suffer from annotation biases. To support unbiased lesion localisation and to minimise the labeling costs, we propose a data-driven framework supervised by only image-level labels. The framework can explicitly separate potential lesions from original images, with the help of a generative adversarial network and a lesion-specific decoder. Experiments on two COVID-19 datasets demonstrate the effectiveness of the proposed framework and its superior performance to several existing methods.

preprint2021arXiv

Transient Performance Analysis of the $\ell_1$-RLS

The recursive least-squares algorithm with $\ell_1$-norm regularization ($\ell_1$-RLS) exhibits excellent performance in terms of convergence rate and steady-state error in identification of sparse systems. Nevertheless few works have studied its stochastic behavior, in particular its transient performance. In this letter, we derive analytical models of the transient behavior of the $\ell_1$-RLS in the mean and mean-square sense. Simulation results illustrate the accuracy of these models.

preprint2020arXiv

AD-Cluster: Augmented Discriminative Clustering for Domain Adaptive Person Re-identification

Domain adaptive person re-identification (re-ID) is a challenging task, especially when person identities in target domains are unknown. Existing methods attempt to address this challenge by transferring image styles or aligning feature distributions across domains, whereas the rich unlabeled samples in target domains are not sufficiently exploited. This paper presents a novel augmented discriminative clustering (AD-Cluster) technique that estimates and augments person clusters in target domains and enforces the discrimination ability of re-ID models with the augmented clusters. AD-Cluster is trained by iterative density-based clustering, adaptive sample augmentation, and discriminative feature learning. It learns an image generator and a feature encoder which aim to maximize the intra-cluster diversity in the sample space and minimize the intra-cluster distance in the feature space in an adversarial min-max manner. Finally, AD-Cluster increases the diversity of sample clusters and improves the discrimination capability of re-ID models greatly. Extensive experiments over Market-1501 and DukeMTMC-reID show that AD-Cluster outperforms the state-of-the-art with large margins.

preprint2020arXiv

Adaptive Explainable Neural Networks (AxNNs)

While machine learning techniques have been successfully applied in several fields, the black-box nature of the models presents challenges for interpreting and explaining the results. We develop a new framework called Adaptive Explainable Neural Networks (AxNN) for achieving the dual goals of good predictive performance and model interpretability. For predictive performance, we build a structured neural network made up of ensembles of generalized additive model networks and additive index models (through explainable neural networks) using a two-stage process. This can be done using either a boosting or a stacking ensemble. For interpretability, we show how to decompose the results of AxNN into main effects and higher-order interaction effects. The computations are inherited from Google&#39;s open source tool AdaNet and can be efficiently accelerated by training with distributed computing. The results are illustrated on simulated and real datasets.

preprint2020arXiv

Adaptive generalized multiscale approximation of a mixed finite element method with velocity elimination

In this paper, we propose offline and online adaptive enrichment algorithms for the generalized multiscale approximation of a mixed finite element method with velocity elimination to solve the subsurface flow problem in high-contrast and heterogeneous porous media. We give the theoretical analysis for the convergence of these two adaptive methods, which shows that sufficient initial basis functions (belong to the offline space) leads to a faster convergence rate. A series of numerical examples are provided to highlight the performance of both these two adaptive methods and also validate the theoretical analysis. Both offline and online adaptive methods are effective that can reduce the relative error substantially. In addition, the online adaptive method generally performs better than the offline adaptive method as online basis functions contain important global information such as distant effects that cannot be captured by offline basis functions. The numerical results also show that with a suitable initial multiscale space that includes all offline basis functions corresponding to relative smaller eigenvalues of local spectral decompositions in the offline stage, the convergence rate of the online enrichment is independent of the permeability contrast.

preprint2020arXiv

Adaptively Aligned Image Captioning via Adaptive Attention Time

Recent neural models for image captioning usually employ an encoder-decoder framework with an attention mechanism. However, the attention mechanism in such a framework aligns one single (attended) image feature vector to one caption word, assuming one-to-one mapping from source image regions and target caption words, which is never possible. In this paper, we propose a novel attention model, namely Adaptive Attention Time (AAT), to align the source and the target adaptively for image captioning. AAT allows the framework to learn how many attention steps to take to output a caption word at each decoding step. With AAT, an image region can be mapped to an arbitrary number of caption words while a caption word can also attend to an arbitrary number of image regions. AAT is deterministic and differentiable, and doesn&#39;t introduce any noise to the parameter gradients. In this paper, we empirically show that AAT improves over state-of-the-art methods on the task of image captioning. Code is available at https://github.com/husthuaan/AAT.

preprint2020arXiv

Affine Combination of Diffusion Strategies over Networks

Diffusion adaptation is a powerful strategy for distributed estimation and learning over networks. Motivated by the concept of combining adaptive filters, this work proposes a combination framework that aggregates the operation of multiple diffusion strategies for enhanced performance. By assigning a combination coefficient to each node, and using an adaptation mechanism to minimize the network error, we obtain a combined diffusion strategy that benefits from the best characteristics of all component strategies simultaneously in terms of excess-mean-square error (EMSE). Analyses of the universality are provided to show the superior performance of affine combination scheme and to characterize its behavior in the mean and mean-square sense. Simulation results are presented to demonstrate the effectiveness of the proposed strategies, as well as the accuracy of theoretical findings.

preprint2020arXiv

Asymptotic population imbalance of an ultracold bosonic ensemble in a driven double-well

We demonstrate that an ultracold many-body bosonic ensemble confined in an one-dimensional (1D) double well potential exhibits a population imbalance between the two wells at large timescales, when the depth of the wells are modulated by a time-dependent driving force. The specific form of the driving force is shown to break spatial parity and time-reversal symmetries, which leads to such an asymptotic population imbalance (API). The value of the API can be flexibly controlled by changing the phase of the driving force and the total number of particles. While the API is highly sensitive to the initial state in the few-particle regime, this dependence on the initial state is lost as we approach the classical limit of large particle numbers. We perform a Floquet analysis in the few-particle regime and an analysis based on a driven classical non-rigid pendulum in the many-particle regime. Although the obtained API values in the many-particle regime agree very well with that obtained in the classical limit, we show that there exists a significant disagreement in the corresponding real-time population imbalance due to quantum correlations.

preprint2020arXiv

Chart Auto-Encoders for Manifold Structured Data

Deep generative models have made tremendous advances in image and signal representation learning and generation. These models employ the full Euclidean space or a bounded subset as the latent space, whose flat geometry, however, is often too simplistic to meaningfully reflect the manifold structure of the data. In this work, we advocate the use of a multi-chart latent space for better data representation. Inspired by differential geometry, we propose a \textbf{Chart Auto-Encoder (CAE)} and prove a universal approximation theorem on its representation capability. We show that the training data size and the network size scale exponentially in approximation error with an exponent depending on the intrinsic dimension of the data manifold. CAE admits desirable manifold properties that auto-encoders with a flat latent space fail to obey, predominantly proximity of data. We conduct extensive experimentation with synthetic and real-life examples to demonstrate that CAE provides reconstruction with high fidelity, preserves proximity in the latent space, and generates new data remaining near the manifold. These experiments show that CAE is advantageous over existing auto-encoders and variants by preserving the topology of the data manifold as well as its geometry.

preprint2020arXiv

Convolution Neural Network Architecture Learning for Remote Sensing Scene Classification

Remote sensing image scene classification is a fundamental but challenging task in understanding remote sensing images. Recently, deep learning-based methods, especially convolutional neural network-based (CNN-based) methods have shown enormous potential to understand remote sensing images. CNN-based methods meet with success by utilizing features learned from data rather than features designed manually. The feature-learning procedure of CNN largely depends on the architecture of CNN. However, most of the architectures of CNN used for remote sensing scene classification are still designed by hand which demands a considerable amount of architecture engineering skills and domain knowledge, and it may not play CNN&#39;s maximum potential on a special dataset. In this paper, we proposed an automatically architecture learning procedure for remote sensing scene classification. We designed a parameters space in which every set of parameters represents a certain architecture of CNN (i.e., some parameters represent the type of operators used in the architecture such as convolution, pooling, no connection or identity, and the others represent the way how these operators connect). To discover the optimal set of parameters for a given dataset, we introduced a learning strategy which can allow efficient search in the architecture space by means of gradient descent. An architecture generator finally maps the set of parameters into the CNN used in our experiments.

preprint2020arXiv

Cooperative Pursuit with Multi-Pursuer and One Faster Free-moving Evader

This paper addresses a multi-pursuer single-evader pursuit-evasion game where the free-moving evader moves faster than the pursuers. Most of the existing works impose constraints on the faster evader such as limited moving area and moving direction. When the faster evader is allowed to move freely without any constraint, the main issues are how to form an encirclement to trap the evader into the capture domain, how to balance between forming an encirclement and approaching the faster evader, and what conditions make the capture possible. In this paper, a distributed pursuit algorithm is proposed to enable pursuers to form an encirclement and approach the faster evader. An algorithm that balances between forming an encirclement and approaching the faster evader is proposed. Moreover, sufficient capture conditions are derived based on the initial spatial distribution and the speed ratios of the pursuers and the evader. Simulation and experimental results on ground robots validate the effectiveness and practicability of the proposed method.

preprint2020arXiv

Deep Spatial-angular Regularization for Compressive Light Field Reconstruction over Coded Apertures

Coded aperture is a promising approach for capturing the 4-D light field (LF), in which the 4-D data are compressively modulated into 2-D coded measurements that are further decoded by reconstruction algorithms. The bottleneck lies in the reconstruction algorithms, resulting in rather limited reconstruction quality. To tackle this challenge, we propose a novel learning-based framework for the reconstruction of high-quality LFs from acquisitions via learned coded apertures. The proposed method incorporates the measurement observation into the deep learning framework elegantly to avoid relying entirely on data-driven priors for LF reconstruction. Specifically, we first formulate the compressive LF reconstruction as an inverse problem with an implicit regularization term. Then, we construct the regularization term with an efficient deep spatial-angular convolutional sub-network to comprehensively explore the signal distribution free from the limited representation ability and inefficiency of deterministic mathematical modeling. Experimental results show that the reconstructed LFs not only achieve much higher PSNR/SSIM but also preserve the LF parallax structure better, compared with state-of-the-art methods on both real and synthetic LF benchmarks. In addition, experiments show that our method is efficient and robust to noise, which is an essential advantage for a real camera system. The code is publicly available at \url{https://github.com/angmt2008/LFCA}

preprint2020arXiv

Embedding Compression with Isotropic Iterative Quantization

Continuous representation of words is a standard component in deep learning-based NLP models. However, representing a large vocabulary requires significant memory, which can cause problems, particularly on resource-constrained platforms. Therefore, in this paper we propose an isotropic iterative quantization (IIQ) approach for compressing embedding vectors into binary ones, leveraging the iterative quantization technique well established for image retrieval, while satisfying the desired isotropic property of PMI based models. Experiments with pre-trained embeddings (i.e., GloVe and HDC) demonstrate a more than thirty-fold compression ratio with comparable and sometimes even improved performance over the original real-valued embedding vectors.

preprint2020arXiv

Generalized multiscale approximation of a multipoint flux mixed finite element method for Darcy-Forchheimer model

In this paper, we propose a multiscale method for the Darcy-Forchheimer model in highly heterogeneous porous media. The problem is solved in the framework of generalized multiscale finite element methods (GMsFEM) combined with a multipoint flux mixed finite element (MFMFE) method. %In the MFMFE methods, appropriate mixed finite element spaces and suitable quadrature rules are employed, which allow for local velocity elimination and lead to a cell-centered system for the pressure. We consider the MFMFE method that utilizes the lowest order Brezzi-Douglas-Marini ($\textrm{BDM}_1$) mixed finite element spaces for the velocity and pressure approximation. The symmetric trapezoidal quadrature rule is employed for the integration of bilinear forms relating to the velocity variables so that the local velocity elimination is allowed and leads to a cell-centered system for the pressure. %on meshes composed of simplices and $h^2$-perturbed parallelograms. We construct multiscale space for the pressure and solve the problem on the coarse grid following the GMsFEM framework. In the offline stage, we construct local snapshot spaces and perform spectral decompositions to get the offline space with a smaller dimension. In the online stage, we use the Newton iterative algorithm to solve the nonlinear problem and obtain the offline solution, which reduces the iteration times greatly comparing to the standard Picard iteration. Based on the offline space and offline solution, we calculate online basis functions which contain important global information to enrich the multiscale space iteratively. The online basis functions are efficient and accurate to reduce relative errors substantially. Numerical examples are provided to highlight the performance of the proposed multiscale method.

preprint2020arXiv

Hyperspectral Image Super-resolution via Deep Progressive Zero-centric Residual Learning

This paper explores the problem of hyperspectral image (HSI) super-resolution that merges a low resolution HSI (LR-HSI) and a high resolution multispectral image (HR-MSI). The cross-modality distribution of the spatial and spectral information makes the problem challenging. Inspired by the classic wavelet decomposition-based image fusion, we propose a novel \textit{lightweight} deep neural network-based framework, namely progressive zero-centric residual network (PZRes-Net), to address this problem efficiently and effectively. Specifically, PZRes-Net learns a high resolution and \textit{zero-centric} residual image, which contains high-frequency spatial details of the scene across all spectral bands, from both inputs in a progressive fashion along the spectral dimension. And the resulting residual image is then superimposed onto the up-sampled LR-HSI in a \textit{mean-value invariant} manner, leading to a coarse HR-HSI, which is further refined by exploring the coherence across all spectral bands simultaneously. To learn the residual image efficiently and effectively, we employ spectral-spatial separable convolution with dense connections. In addition, we propose zero-mean normalization implemented on the feature maps of each layer to realize the zero-mean characteristic of the residual image. Extensive experiments over both real and synthetic benchmark datasets demonstrate that our PZRes-Net outperforms state-of-the-art methods to a \textit{significant} extent in terms of both 4 quantitative metrics and visual quality, e.g., our PZRes-Net improves the PSNR more than 3dB, while saving 2.3$\times$ parameters and consuming 15$\times$ less FLOPs. The code is publicly available at https://github.com/zbzhzhy/PZRes-Net .

preprint2020arXiv

Large anisotropic topological Hall effect in a hexagonal non-collinear magnet Fe5Sn3

We report the observation of a large anisotropic topological Hall effect (THE) in the hexagonal non-collinear magnet Fe5Sn3 single crystals. It is found that the sign of the topological Hall resistivity is negative when a magnetic field H perpendicular to the bc-plane (H\perp bc-plane), however, it changes form negative to positive when H parallel to the c-axis (H\parallel c-axis). The value of topological Hall resistivity increased with the increasing temperature and reached approximately -2.12 μΩcm (H\perp bc-plane) and 0.5 μΩcm (H\parallel c-axis) at 350 K, respectively. Quantitative analyses of the measured data suggest that the observed anisotropic THE may originate from the opposite scalar spin chirality induced by the magnetic fields perpendicular and parallel to the c-axis, respectively.

preprint2020arXiv

Large anomalous Hall angle in a topological semimetal candidate TbPtBi

The magnetotransport properties in antiferromagnetic half-Heusler single crystals of TbPtBi, a magnetic-field-induced topological semimetal with simple band structure, are investigated. We found that a nonmonotonic magnetic field dependence of the anomalous Hall resistivity in a high magnetic field (B>7T), which come from the change of band structure induced by the Zeeman-like splitting when applying the external magnetic field. The experiment results show that credible anomalous Hall resistivity and conductivity reach up to 0.6798mΩcm and 125Ω-1cm-1, respectively. A large AHA up to 33% is obtained in TbPtBi, which is comparable to typical ferromagnetic Weyl semimetal. The analysis of results show it should be attributed to topological band around EF and low carrier density.

preprint2020arXiv

Large anomalous Hall effect in a hexagonal ferromagnetic Fe5Sn3 single crystal

In this paper, we report an experimental observation of the large anomalous Hall effect (AHE) in a hexagonal ferromagnetic Fe5Sn3 single crystal with current along the b axis and a magnetic field normal to the bc plane. The intrinsic contribution of the anomalous Hall conductance sigma_AH^int was approximately 613 Ω-1 cm-1, which was more than 3 times the maximum value in the frustrated kagome magnet Fe3Sn2 and nearly independent of the temperature over a wide range between 5 and 350 K. The analysis results revealed that the large AHE was dominated by a common, intrinsic term, while the extrinsic contribution, i.e., the skew scattering and side jump, turned out to be small. In addition to the large AHE, it was found the types of majority carriers changed at approximately 275 and 30 K, consistent with the critical temperatures of the spin reorientation. These findings suggest that the hexagonal ferromagnetic Fe5Sn3 single crystal is an excellent candidate to use for the study of the topological features in ferromagnets.

preprint2020arXiv

Learning a Weakly-Supervised Video Actor-Action Segmentation Model with a Wise Selection

We address weakly-supervised video actor-action segmentation (VAAS), which extends general video object segmentation (VOS) to additionally consider action labels of the actors. The most successful methods on VOS synthesize a pool of pseudo-annotations (PAs) and then refine them iteratively. However, they face challenges as to how to select from a massive amount of PAs high-quality ones, how to set an appropriate stop condition for weakly-supervised training, and how to initialize PAs pertaining to VAAS. To overcome these challenges, we propose a general Weakly-Supervised framework with a Wise Selection of training samples and model evaluation criterion (WS^2). Instead of blindly trusting quality-inconsistent PAs, WS^2 employs a learning-based selection to select effective PAs and a novel region integrity criterion as a stopping condition for weakly-supervised training. In addition, a 3D-Conv GCAM is devised to adapt to the VAAS task. Extensive experiments show that WS^2 achieves state-of-the-art performance on both weakly-supervised VOS and VAAS tasks and is on par with the best fully-supervised method on VAAS.

preprint2020arXiv

Light Field Spatial Super-resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization

Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited sampling resources have to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an indispensable part of the LF camera processing pipeline. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challenging than traditional single-image SR. The performance of existing methods is still limited as they fail to thoroughly explore the coherence among LF views and are insufficient in accurately preserving the parallax structure of the scene. In this paper, we propose a novel learning-based LF spatial SR framework, in which each view of an LF image is first individually super-resolved by exploring the complementary information among views with combinatorial geometry embedding. For accurate preservation of the parallax structure among the reconstructed views, a regularization network trained over a structure-aware loss function is subsequently appended to enforce correct parallax relationships over the intermediate estimation. Our proposed approach is evaluated over datasets with a large number of testing images including both synthetic and real-world scenes. Experimental results demonstrate the advantage of our approach over state-of-the-art methods, i.e., our method not only improves the average PSNR by more than 1.0 dB but also preserves more accurate parallax details, at a lower computational cost.

preprint2020arXiv

Light Field Super-resolution via Attention-Guided Fusion of Hybrid Lenses

This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation; the other one constructs another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations via the learned attention maps, leading to the final high-resolution LF image. Extensive experiments demonstrate the significant superiority of our approach over state-of-the-art ones. That is, our method not only improves the PSNR by more than 2 dB, but also preserves the LF structure much better. To the best of our knowledge, this is the first end-to-end deep learning method for reconstructing a high-resolution LF image with a hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and also be beneficial to LF data storage and transmission. The code is available at https://github.com/jingjin25/LFhybridSR-Fusion.

preprint2020arXiv

Linearly Convergent Algorithm with Variance Reduction for Distributed Stochastic Optimization

This paper considers a distributed stochastic strongly convex optimization, where agents connected over a network aim to cooperatively minimize the average of all agents&#39; local cost functions. Due to the stochasticity of gradient estimation and distributedness of local objective, fast linearly convergent distributed algorithms have not been achieved yet. This work proposes a novel distributed stochastic gradient tracking algorithm with variance reduction, where the local gradients are estimated by an increasing batch-size of sampled gradients. With an undirected connected communication graph and a geometrically increasing batch-size, the iterates are shown to converge in mean to the optimal solution at a geometric rate (achieving linear convergence). The iteration, communication, and oracle complexity for obtaining an $ε$-optimal solution are established as well. Particulary, the communication complexity is $\mathcal{O}(\ln (1/ε))$ while the oracle complexity (number of sampled gradients) is $\mathcal{O}(1/ε^2)$, which is of the same order as that of centralized approaches. Hence, the proposed scheme is communication-efficient without requiring extra sampled gradients. Numerical simulations are given to demonstrate the theoretic results.

preprint2020arXiv

Magnetically induced metal-insulator transition in Pb2CaOsO6

We report on the structural, magnetic, and electronic properties of two new double-perovskites synthesized under high pressure; Pb2CaOsO6 and Pb2ZnOsO6. Upon cooling below 80 K, Pb2CaOsO6 simultaneously undergoes a metal--insulator transition and develops antiferromagnetic order. Pb2ZnOsO6, on the other hand, remains a paramagnetic metal down to 2 K. The key difference between the two compounds lies in their crystal structure. The Os atoms in Pb2ZnOsO6 are arranged on an approximately face-centred cubic lattice with strong antiferromagnetic nearest-neighbor exchange couplings. The geometrical frustration inherent to this lattice prevents magnetic order from forming down to the lowest temperatures. In contrast, the unit cell of Pb2CaOsO6 is heavily distorted up to at least 500 K, including antiferroelectric-like displacements of the Pb and O atoms despite metallic conductivity above 80 K. This distortion relieves the magnetic frustration, facilitating magnetic order which in turn drives the metal--insulator transition. Our results suggest that the phase transition in Pb2CaOsO6 is spin-driven, and could be a rare example of a Slater transition.

preprint2020arXiv

Mobile Energy Transfer in Internet of Things

Internet of things (IoT) is powering up smart cities by connecting all kinds of electronic devices. The power supply problem of IoT devices constitutes a major challenge in current IoT development, due to the poor battery endurance as well as the troublesome cable deployment. The wireless power transfer (WPT) technology has recently emerged as a promising solution. Yet, existing WPT advances cannot support free and mobile charging like Wi-Fi communications. To this end, the concept of mobile energy transfer (MET) is proposed, which relies critically on an resonant beam charging (RBC) technology. The adaptive (A) RBC technology builds on RBC, but aims at improving the charging efficiency by charging devices at device preferred current and voltage levels adaptively. A mobile ARBC scheme is developed relying on an adaptive source power control. Extensive numerical simulations using a 1,000mAh Li-ion battery show that the mobile ARBC outperforms simple charging schemes such as the constant power charging, the profile-adaptive charging, and the distance-adaptive charging in saving energy.

preprint2020arXiv

Policy Gradient from Demonstration and Curiosity

With reinforcement learning, an agent could learn complex behaviors from high-level abstractions of the task. However, exploration and reward shaping remained challenging for existing methods, especially in scenarios where the extrinsic feedback was sparse. Expert demonstrations have been investigated to solve these difficulties, but a tremendous number of high-quality demonstrations were usually required. In this work, an integrated policy gradient algorithm was proposed to boost exploration and facilitate intrinsic reward learning from only limited number of demonstrations. We achieved this by reformulating the original reward function with two additional terms, where the first term measured the Jensen-Shannon divergence between current policy and the expert, and the second term estimated the agent&#39;s uncertainty about the environment. The presented algorithm was evaluated on a range of simulated tasks with sparse extrinsic reward signals where only one single demonstrated trajectory was provided to each task, superior exploration efficiency and high average return were demonstrated in all tasks. Furthermore, it was found that the agent could imitate the expert&#39;s behavior and meanwhile sustain high return.

preprint2020arXiv

Ro-SOS: Metric Expression Network (MEnet) for Robust Salient Object Segmentation

Although deep CNNs have brought significant improvement to image saliency detection, most CNN based models are sensitive to distortion such as compression and noise. In this paper, we propose an end-to-end generic salient object segmentation model called Metric Expression Network (MEnet) to deal with saliency detection with the tolerance of distortion. Within MEnet, a new topological metric space is constructed, whose implicit metric is determined by the deep network. As a result, we manage to group all the pixels in the observed image semantically within this latent space into two regions: a salient region and a non-salient region. With this architecture, all feature extractions are carried out at the pixel level, enabling fine granularity of output boundaries of the salient objects. What&#39;s more, we try to give a general analysis for the noise robustness of the network in the sense of Lipschitz and Jacobian literature. Experiments demonstrate that robust salient maps facilitating object segmentation can be generated by the proposed metric. Tests on several public benchmarks show that MEnet has achieved desirable performance. Furthermore, by direct computation and measuring the robustness, the proposed method outperforms previous CNN-based methods on distorted inputs.

preprint2020arXiv

Room-temperature ferrimagnetism of anti-site-disordered Ca2MnOsO6

Room-temperature ferrimagnetism was discovered for the anti-site-disordered perovskite Ca2MnOsO6 with Tc = 305 K. Ca2MnOsO6 crystallizes into an orthorhombic structure with a space group of Pnma, in which Mn and Os share the oxygen-coordinated-octahedral site at an equal ratio without a noticeable ordered arrangement. The material is electrically semiconducting with variable-range-hopping behavior. X-ray absorption spectroscopy confirmed the trivalent state of the Mn and the pentavalent state of the Os. X-ray magnetic circular dichroism spectroscopy reveals that the Mn and Os magnetic moments are aligned antiferromagnetically, thereby classifying the material as a ferrimagnet which is in accordance with band structure calculations. It is intriguing that the magnetic signal of the Os is very weak, and that the observed total magnetic moment is primarily due to the Mn. The Tc = 305 K is the second highest in the material category of so-called disordered ferromagnets such as CaRu1-xMnxO3, SrRu1-xCrxO3, and CaIr1-xMnxO3, and hence, may support the development of spintronic oxides with relaxed requirements concerning the anti-site disorder of the magnetic ions.

preprint2020arXiv

RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data

In recent years, deep convolutional neural network (DCNN) has seen a breakthrough progress in natural image recognition because of three points: universal approximation ability via DCNN, large-scale database (such as ImageNet), and supercomputing ability powered by GPU. The remote sensing field is still lacking a large-scale benchmark compared to ImageNet and Place2. In this paper, we propose a remote sensing image classification benchmark (RSI-CB) based on massive, scalable, and diverse crowdsource data. Using crowdsource data, such as Open Street Map (OSM) data, ground objects in remote sensing images can be annotated effectively by points of interest, vector data from OSM, or other crowdsource data. The annotated images can be used in remote sensing image classification tasks. Based on this method, we construct a worldwide large-scale benchmark for remote sensing image classification. This benchmark has two sub-datasets with 256 by 256 and 128 by 128 sizes because different DCNNs require different image sizes. The former contains 6 categories with 35 subclasses of more than 24,000 images. The latter contains 6 categories with 45 subclasses of more than 36,000 images. This classification system of ground objects is defined according to the national standard of land-use classification in China and is inspired by the hierarchy mechanism of ImageNet. Finally, we conduct many experiments to compare RSI-CB with the SAT-4, SAT-6, and UC-Merced datasets on handcrafted features, such as scale-invariant feature transform, color histogram, local binary patterns, and GIST, and classical DCNN models, such as AlexNet, VGGNet, GoogLeNet, and ResNet.

preprint2020arXiv

Scaling features in the spreading of COVID-19

Since the outbreak of COVID-19, many data analyses have been done. Some of them are based on the classical epidemiological approach that assumes an exponential growth, but a few studies report that a power-law scaling may provide a better fit to the currently available data. Hereby, we examine the data in China (01/20/2020--02/24/2020), and indeed find that the growth closely follows a power-law kinetics over a significantly wide time period. The exponents are $2.48(20)$, $2.21(6)$ and $4.26(12)$ for the number of confirmed infections, deaths and cured cases, respectively, indicating an underlying small-world network structure in the pandemic. While no obvious deviations from the power-law growth can be seen yet for the number of deaths and cured cases, negative deviations have clearly appeared in the number of infections, particularly that for the region outside Hubei. This suggests the beginning of the slowing-down of the virus spreading due to the huge containment effort. Meanwhile, we find that despite the dramatic difference in magnitudes, the growth kinetics of the infection number exhibits much similarity for Hubei province and the region outside Hubei. On this basis, in log-log plot, we rescale the infection number for the region outside Hubei such that it overlaps as much as possible with the total infection number in China, from which an approximate extrapolation yields the maximum of the pandemic around March 3, 2020, with the number of infections about $83,000$. Further, by analyzing the kinetics of the mortality in log-log scale, we obtains a rough estimate that near March 3, the death rate of COVID-19 would be about $4.7\%\thicksim 5.0\%$ for Hubei province and $0.7\%\thicksim1.0\%$ for the region outside Hubei. We emphasize that our predictions may be quantitatively unreliable, since the data analysis is purely empirical and various assumptions are used.

preprint2020arXiv

Secure State Estimation with Byzantine Sensors: A Probabilistic Approach

This paper studies static state estimation in multi-sensor settings, with a caveat that an unknown subset of the sensors are compromised by an adversary, whose measurements can be manipulated arbitrarily. The attacker is able to compromise $q$ out of $m$ sensors. A new performance metric, which quantifies the asymptotic decay rate for the probability of having an estimation error larger than $δ$, is proposed. We develop an optimal estimator for the new performance metric with a fixed $δ$, which is the Chebyshev center of a union of ellipsoids. We further provide an estimator that is optimal for every $δ$, for the special case where the sensors are homogeneous. Numerical examples are given to elaborate the results.

preprint2020arXiv

Self-synchronization of thermal phonons at equilibrium

Self-synchronization is a ubiquitous phenomenon in nature, in which oscillators are collectively locked in frequency and phase through mutual interactions. While self-synchronization requires the forced excitation of at least one of the oscillators, we demonstrate that this mechanism spontaneously appears due to the activation from thermal fluctuations. By performing molecular dynamic simulations, we demonstrate the self-synchronization of thermal phonons in a platform supporting doped silicon resonators. We find that thermal phonons are spontaneously converging to the same frequency and phase. In addition, the dependencies to intrinsic frequency difference and coupling strength agree well with the Kuramoto model predictions. More interestingly, we find that a balance between energy dissipation resulting from phonon-phonon scattering and potential energy between oscillators is required to maintain synchronization. Finally, a wavelet transform approach corroborates the generation of coherent thermal phonons in the collective state of oscillators. Our study provides a new perspective on self-synchronization and on the relationship between fluctuations and coherence.

preprint2020arXiv

Study of polycrystalline bulk Sr$_3$OsO$_6$ double-perovskite insulator: comparison with 1000 K ferromagnetic epitaxial films

Polycrystalline Sr$_3$OsO$_6$, which is an ordered double-perovskite insulator, is synthesized via solid-state reaction under high-temperature and high-pressure conditions of 1200 $^\circ$C and 6 GPa. The synthesis enables us to conduct a comparative study of the bulk form of Sr$_3$OsO$_6$ toward revealing the driving mechanism of 1000 K ferromagnetism, which has recently been discovered for epitaxially grown Sr$_3$OsO$_6$ films. Unlike the film, the bulk is dominated by antiferromagnetism rather than ferromagnetism. Therefore, robust ferromagnetic order appears only when Sr$_3$OsO$_6$ is under the influence of interfaces. A specific heat capacity of 39.6(9) 10$^{-3}$ J mol$^{-1}$ K$^{-2}$ is found at low temperatures ($<$17 K). This value is remarkably high, suggesting the presence of possible fermionic-like excitations at the magnetic ground state. Although the bulk and film forms of Sr$_3$OsO$_6$ share the same lattice basis and electrically insulating state, the magnetism is entirely different between them.

preprint2020arXiv

Supervised Machine Learning Techniques: An Overview with Applications to Banking

This article provides an overview of Supervised Machine Learning (SML) with a focus on applications to banking. The SML techniques covered include Bagging (Random Forest or RF), Boosting (Gradient Boosting Machine or GBM) and Neural Networks (NNs). We begin with an introduction to ML tasks and techniques. This is followed by a description of: i) tree-based ensemble algorithms including Bagging with RF and Boosting with GBMs, ii) Feedforward NNs, iii) a discussion of hyper-parameter optimization techniques, and iv) machine learning interpretability. The paper concludes with a comparison of the features of different ML algorithms. Examples taken from credit risk modeling in banking are used throughout the paper to illustrate the techniques and interpret the results of the algorithms.

preprint2020arXiv

Surrogate Locally-Interpretable Models with Supervised Machine Learning Algorithms

Supervised Machine Learning (SML) algorithms, such as Gradient Boosting, Random Forest, and Neural Networks, have become popular in recent years due to their superior predictive performance over traditional statistical methods. However, their complexity makes the results hard to interpret without additional tools. There has been a lot of recent work in developing global and local diagnostics for interpreting SML models. In this paper, we propose a locally-interpretable model that takes the fitted ML response surface, partitions the predictor space using model-based regression trees, and fits interpretable main-effects models at each of the nodes. We adapt the algorithm to be efficient in dealing with high-dimensional predictors. While the main focus is on interpretability, the resulting surrogate model also has reasonably good predictive performance.

preprint2020arXiv

The Transient Responses of An Axisymmetric Tropical Cyclone to Instantaneous Surface Roughening and Drying. Part I: Numerical Experiments

Inland tropical cyclone (TC) impacts due to high winds and rainfall-induced flooding depend strongly on the evolution of the wind field and precipitation distribution after landfall. However, research has yet to test the detailed response of a mature TC and its hazards to changes in surface forcing in idealized settings. This work tests the transient response of an idealized hurricane to instantaneous transitions in two key surface properties associated with landfall: surface roughening and drying. Simplified axisymmetric experiments are performed in CM1 where surface drag coefficient and evaporative fraction are each systematically modified beneath a mature hurricane. Surface drying stabilizes the eyewall and consequently weakens the overturning circulation, thereby reducing inward angular momentum transport that slowly decays the wind field only within the inner-core. In contrast, surface roughening initially ($\sim$12 hours) rapidly weakens the entire low-level wind field and enhances the overturning circulation dynamically despite the concurrent thermodynamic stabilization of the eyewall; thereafter the storm gradually decays similar to drying. As a result, total precipitation temporarily increases with roughening but uniformly decreases with drying. Storm size decreases monotonically and rapidly with surface roughening, while the radius of maximum wind can increase with moderate surface drying. Overall, this work provides a mechanistic foundation for understanding the inland evolution of real storms in nature.

preprint2020arXiv

Ultralow thermal conductivity from transverse acoustic phonon suppression in distorted crystalline α-MgAgSb

Low thermal conductivity is favorable for preserving the temperature gradient between the two ends of a thermoelectric material in order to ensure continuous electron current generation. In high-performance thermoelectric materials, there are two main low thermal conductivity mechanisms: the phonon anharmonic in PbTe and SnSe and phonon scattering resulting from the dynamic disorder in AgCrSe2 and CuCrSe2, which have been successfully revealed by inelastic neutron scattering. Using neutron scattering and ab initio calculations, we report here a mechanism of static local structure distortion combined with phonon-anharmonic-induced ultralow lattice thermal conductivity in α-MgAgSb. Since the transverse acoustic phonons are almost fully scattered by the compound&#39;s intrinsic distorted rocksalt sublattice, the heat is mainly transported by the longitudinal acoustic phonons. The ultralow thermal conductivity in α-MgAgSb is attributed to its atomic dynamics being altered by the structure distortion, which presents a possible microscopic route to enhance the performance of similar thermoelectric materials.

preprint2019arXiv

Charging of quantum batteries with general harmonic power

We analyse the charging process of quantum batteries with general harmonic power. To describe the charge efficiency, we introduce the charge saturation and the charging power, and divide the charging mode into the saturated charging mode and the unsaturated charging mode. The relationships between the time-dependent charge saturation and the parameters of general driving field are discussed both analytically and numerically. And according to the Floquet theorem, we give the expressions of time-dependent charge saturation with the quasiengery and the Floquet states of the system. With both the analytical and numerical results, we find the optimal parameters to reach the best charging efficiency.

preprint2019arXiv

Comb-mode-resolved adaptive sampling terahertz dual-comb spectroscopy with a free-running single-cavity fiber laser

Mode-resolved dual-comb spectroscopy (DCS) is an emerging spectroscopic tool with the potential to simultaneously achieve a broad spectral coverage and ultrahigh spectral resolution in terahertz (THz) spectroscopy. However, the need for two independently stabilized ultrafast lasers significantly hampers the potential application of DCS techniques. In this article, we demonstrate mode-resolved DCS in the THz region based on a free-running single-cavity dual-comb fiber laser with adaptive sampling. Low-pressure spectroscopy of acetonitrile gas with absorption features approaching the Doppler limit is demonstrated by comb-mode-resolved measurements with a spectral sampling spacing of 48.8 MHz, a spectral resolution of less than 5 MHz and a signal-to-noise ratio of ~50 dB. The successful demonstration of the proposed method clearly indicates the great potential for the realization of low-complexity, MHz-resolution THz spectroscopy instrumentation.

preprint2019arXiv

Relaxation-Based Coarsening for Multilevel Hypergraph Partitioning

Multilevel partitioning methods that are inspired by principles of multiscaling are the most powerful practical hypergraph partitioning solvers. Hypergraph partitioning has many applications in disciplines ranging from scientific computing to data science. In this paper we introduce the concept of algebraic distance on hypergraphs and demonstrate its use as an algorithmic component in the coarsening stage of multilevel hypergraph partitioning solvers. The algebraic distance is a vertex distance measure that extends hyperedge weights for capturing the local connectivity of vertices which is critical for hypergraph coarsening schemes. The practical effectiveness of the proposed measure and corresponding coarsening scheme is demonstrated through extensive computational experiments on a diverse set of problems. Finally, we propose a benchmark of hypergraph partitioning problems to compare the quality of other solvers.