Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
50works
0followers
39topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

50 published item(s)

preprint2026arXiv

Benchmarking and Improving GUI Agents in High-Dynamic Environments

Recent advancements in Graphical User Interface (GUI) agents have predominantly focused on training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the challenge of high-dynamic GUI environments remains largely underexplored. Existing agents typically rely on a single screenshot after each action for decision-making, leading to a partially observable (or even unobservable) Markov decision process, where the key GUI state including important information for actions is often inadequately captured. To systematically explore this challenge, we introduce DynamicGUIBench, a comprehensive online GUI benchmark spanning ten applications and diverse interaction scenarios characterized by important interface changes between actions. Furthermore, we present DynamicUI, an agent designed for dynamic interfaces, which takes screen-recording videos of the interaction process as input and consists of three components: a dynamic perceiver, a refinement strategy, and a reflection. Specifically, the dynamic perceiver clusters frames of the GUI video, generates captions for the centroids, and iteratively selects the most informative frames as the salient dynamic context. Considering that there may be inconsistencies and noise between the selected frames and the textual context of the agent, the refinement strategy employs an action-conditioned filtering to refine thoughts to mitigate thought-action inconsistency and redundancy. Based on the refined agent trajectories, the reflection module provides effective and accurate guidance for further actions. Experiments on DynamicGUIBench demonstrate that DynamicUI significantly improves the performance in dynamic GUI environments, while maintaining competitive performance on other public benchmarks.

preprint2022arXiv

A Wavelet-CNN-LSTM Model for Tailings Pond Risk Prediction

Tailings ponds are places for storing industrial waste. Once the tailings pond collapses, the villages nearby will be destroyed and the harmful chemicals will cause serious environmental pollution. There is an urgent need for a reliable forecast model, which could investigate the variation trend of stability coefficient of tailing dam and issue early warnings. In order to fill the gap, this work presents an hybrid network - Wavelet-based Long-Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), namely Wavelet-CNN-LSTM netwrok for predicting the tailings pond risk. Firstly, we construct the especial nonlinear data processing method to impute the missing value with the numerical inversion (NI) method, which combines correlation analysis, sensitivity analysis, and Random Forest (RF) algorithms. Secondly, a new forecasting model was proposed to monitor the saturation line, which is the lifeline of the tailings pond and can directly reflect the stability of the tailings pond. After using the discrete wavelet transform (DWT) to decompose the original saturation line data into 4-layer wavelets and de-noise the data, the CNN was used to identify and learn the spatial structures in the time series, followed by LSTM cells for detecting the long-short-term dependence. Finally, different experiments were conducted to evaluate the effectiveness of our model by comparing it with other state-of-the-art algorithms. The results show that Wavelet-CNN-LSTM achieves the best score both in mean absolute percentage error (MAPE), root-mean-square error (RMSE) and R 2 .

preprint2022arXiv

An End-to-End Cascaded Image Deraining and Object Detection Neural Network

While the deep learning-based image deraining methods have made great progress in recent years, there are two major shortcomings in their application in real-world situations. Firstly, the gap between the low-level vision task represented by rain removal and the high-level vision task represented by object detection is significant, and the low-level vision task can hardly contribute to the high-level vision task. Secondly, the quality of the deraining dataset needs to be improved. In fact, the rain lines in many baselines have a large gap with the real rain lines, and the resolution of the deraining dataset images is generally not ideally. Meanwhile, there are few common datasets for both the low-level vision task and the high-level vision task. In this paper, we explore the combination of the low-level vision task with the high-level vision task. Specifically, we propose an end-to-end object detection network for reducing the impact of rainfall, which consists of two cascaded networks, an improved image deraining network and an object detection network, respectively. We also design the components of the loss function to accommodate the characteristics of the different sub-networks. We then propose a dataset based on the KITTI dataset for rainfall removal and object detection, on which our network surpasses the state-of-the-art with a significant improvement in metrics. Besides, our proposed network is measured on driving videos collected by self-driving vehicles and shows positive results for rain removal and object detection.

preprint2022arXiv

Attacking Black-box Recommendations via Copying Cross-domain User Profiles

Recently, recommender systems that aim to suggest personalized lists of items for users to interact with online have drawn a lot of attention. In fact, many of these state-of-the-art techniques have been deep learning based. Recent studies have shown that these deep learning models (in particular for recommendation systems) are vulnerable to attacks, such as data poisoning, which generates users to promote a selected set of items. However, more recently, defense strategies have been developed to detect these generated users with fake profiles. Thus, advanced injection attacks of creating more `realistic' user profiles to promote a set of items is still a key challenge in the domain of deep learning based recommender systems. In this work, we present our framework CopyAttack, which is a reinforcement learning based black-box attack method that harnesses real users from a source domain by copying their profiles into the target domain with the goal of promoting a subset of items. CopyAttack is constructed to both efficiently and effectively learn policy gradient networks that first select, and then further refine/craft, user profiles from the source domain to ultimately copy into the target domain. CopyAttack's goal is to maximize the hit ratio of the targeted items in the Top-$k$ recommendation list of the users in the target domain. We have conducted experiments on two real-world datasets and have empirically verified the effectiveness of our proposed framework and furthermore performed a thorough model analysis.

preprint2022arXiv

Combining Intra-Risk and Contagion Risk for Enterprise Bankruptcy Prediction Using Graph Neural Networks

Predicting the bankruptcy risk of small and medium-sized enterprises (SMEs) is an important step for financial institutions when making decisions about loans. Existing studies in both finance and AI research fields, however, tend to only consider either the intra-risk or contagion risk of enterprises, ignoring their interactions and combinatorial effects. This study for the first time considers both types of risk and their joint effects in bankruptcy prediction. Specifically, we first propose an enterprise intra-risk encoder based on statistically significant enterprise risk indicators for its intra-risk learning. Then, we propose an enterprise contagion risk encoder based on enterprise relation information from an enterprise knowledge graph for its contagion risk embedding. In particular, the contagion risk encoder includes both the newly proposed Hyper-Graph Neural Networks and Heterogeneous Graph Neural Networks, which can model contagion risk in two different aspects, i.e. common risk factors based on hyperedges and direct diffusion risk from neighbors, respectively. To evaluate the model, we collect real-world multi-sources data on SMEs and build a novel benchmark dataset called SMEsD. We provide open access to the dataset, which is expected to further promote research on financial risk analysis. Experiments on SMEsD against twelve state-of-the-art baselines demonstrate the effectiveness of the proposed model for bankruptcy prediction.

preprint2022arXiv

Conceptor Learning for Class Activation Mapping

Class Activation Mapping (CAM) has been widely adopted to generate saliency maps which provides visual explanations for deep neural networks (DNNs). The saliency maps are conventionally generated by fusing the channels of the target feature map using a weighted average scheme. It is a weak model for the inter-channel relation, in the sense that it only models the relation among channels in a contrastive way (i.e., channels that play key roles in the prediction are given higher weights for them to stand out in the fusion). The collaborative relation, which makes the channels work together to provide cross reference, has been ignored. Furthermore, the model has neglected the intra-channel relation thoroughly.In this paper, we address this problem by introducing Conceptor learning into CAM generation. Conceptor leaning has been originally proposed to model the patterns of state changes in recurrent neural networks (RNNs). By relaxing the dependency of Conceptor learning to RNNs, we make Conceptor-CAM not only generalizable to more DNN architectures but also able to learn both the inter- and intra-channel relations for better saliency map generation. Moreover, we have enabled the use of Boolean operations to combine the positive and pseudo-negative evidences, which has made the CAM inference more robust and comprehensive. The effectiveness of Conceptor-CAM has been validated with both formal verifications and experiments on the dataset of the largest scale in literature. The experimental results show that Conceptor-CAM is compatible with and can bring significant improvement to all well recognized CAM-based methods, and has outperformed the state-of-the-art methods by 43.14%~72.79% (88.39%~168.15%) on ILSVRC2012 in Average Increase (Drop), 15.42%~42.55% (47.09%~372.09%) on VOC, and 17.43%~31.32% (47.54%~206.45%) on COCO, respectively.

preprint2022arXiv

Cycle Encoding of a StyleGAN Encoder for Improved Reconstruction and Editability

GAN inversion aims to invert an input image into the latent space of a pre-trained GAN. Despite the recent advances in GAN inversion, there remain challenges to mitigate the tradeoff between distortion and editability, i.e. reconstructing the input image accurately and editing the inverted image with a small visual quality drop. The recently proposed pivotal tuning model makes significant progress towards reconstruction and editability, by using a two-step approach that first inverts the input image into a latent code, called pivot code, and then alters the generator so that the input image can be accurately mapped into the pivot code. Here, we show that both reconstruction and editability can be improved by a proper design of the pivot code. We present a simple yet effective method, named cycle encoding, for a high-quality pivot code. The key idea of our method is to progressively train an encoder in varying spaces according to a cycle scheme: W->W+->W. This training methodology preserves the properties of both W and W+ spaces, i.e. high editability of W and low distortion of W+. To further decrease the distortion, we also propose to refine the pivot code with an optimization-based method, where a regularization term is introduced to reduce the degradation in editability. Qualitative and quantitative comparisons to several state-of-the-art methods demonstrate the superiority of our approach.

preprint2022arXiv

Graph Trend Filtering Networks for Recommendations

Recommender systems aim to provide personalized services to users and are playing an increasingly important role in our daily lives. The key of recommender systems is to predict how likely users will interact with items based on their historical online behaviors, e.g., clicks, add-to-cart, purchases, etc. To exploit these user-item interactions, there are increasing efforts on considering the user-item interactions as a user-item bipartite graph and then performing information propagation in the graph via Graph Neural Networks (GNNs). Given the power of GNNs in graph representation learning, these GNNs-based recommendation methods have remarkably boosted the recommendation performance. Despite their success, most existing GNNs-based recommender systems overlook the existence of interactions caused by unreliable behaviors (e.g., random/bait clicks) and uniformly treat all the interactions, which can lead to sub-optimal and unstable performance. In this paper, we investigate the drawbacks (e.g., non-adaptive propagation and non-robustness) of existing GNN-based recommendation methods. To address these drawbacks, we introduce a principled graph trend collaborative filtering method and propose the Graph Trend Filtering Networks for recommendations (GTN) that can capture the adaptive reliability of the interactions. Comprehensive experiments and ablation studies are presented to verify and understand the effectiveness of the proposed framework. Our implementation based on PyTorch is available at https://github.com/wenqifan03/GTN-SIGIR2022.

preprint2022arXiv

Hexagonal Boron Nitride (hBN) as a Low-loss Dielectric for Superconducting Quantum Circuits and Qubits

Dielectrics with low loss at microwave frequencies are imperative for high-coherence solid-state quantum computing platforms. We study the dielectric loss of hexagonal boron nitride (hBN) thin films in the microwave regime by measuring the quality factor of parallel-plate capacitors (PPCs) made of NbSe$_{2}$-hBN-NbSe$_{2}$ heterostructures integrated into superconducting circuits. The extracted microwave loss tangent of hBN is bounded to be at most in the mid-10$^{-6}$ range in the low temperature, single-photon regime. We integrate hBN PPCs with aluminum Josephson junctions to realize transmon qubits with coherence times reaching 25 $μ$s, consistent with the hBN loss tangent inferred from resonator measurements. The hBN PPC reduces the qubit feature size by approximately two-orders of magnitude compared to conventional all-aluminum coplanar transmons. Our results establish hBN as a promising dielectric for building high-coherence quantum circuits with substantially reduced footprint and, with a high energy participation that helps to reduce unwanted qubit cross-talk.

preprint2022arXiv

Hierarchical Deep Reinforcement Learning for VWAP Strategy Optimization

Designing an intelligent volume-weighted average price (VWAP) strategy is a critical concern for brokers, since traditional rule-based strategies are relatively static that cannot achieve a lower transaction cost in a dynamic market. Many studies have tried to minimize the cost via reinforcement learning, but there are bottlenecks in improvement, especially for long-duration strategies such as the VWAP strategy. To address this issue, we propose a deep learning and hierarchical reinforcement learning jointed architecture termed Macro-Meta-Micro Trader (M3T) to capture market patterns and execute orders from different temporal scales. The Macro Trader first allocates a parent order into tranches based on volume profiles as the traditional VWAP strategy does, but a long short-term memory neural network is used to improve the forecasting accuracy. Then the Meta Trader selects a short-term subgoal appropriate to instant liquidity within each tranche to form a mini-tranche. The Micro Trader consequently extracts the instant market state and fulfils the subgoal with the lowest transaction cost. Our experiments over stocks listed on the Shanghai stock exchange demonstrate that our approach outperforms baselines in terms of VWAP slippage, with an average cost saving of 1.16 base points compared to the optimal baseline.

preprint2022arXiv

Improved three-dimensional thermal multiphase lattice Boltzmann model for liquid-vapor phase change

Modeling liquid-vapor phase change using the lattice Boltzmann (LB) method has attracted significant attention in recent years. In this paper, we propose an improved three-dimensional (3D) thermal multiphase LB model for simulating liquid-vapor phase change. The proposed model has the following features. First, it is still within the framework of the thermal LB method using a temperature distribution function and therefore retains the fundamental advantages of the thermal LB method. Second, in the existing thermal LB models for liquid-vapor phase change, the finite-difference computations of the gradient terms $\nabla \cdot u$ and $\nabla T$ usually require special treatment at boundary nodes, while in the proposed thermal LB model these two terms are calculated locally. Moreover, in some of the existing thermal LB models, the error term ${\partial _{t_0}}\left( {Tu} \right)$ is eliminated by adding local correction terms to the collision process in the moment space, which causes these thermal LB models to be limited to the D2Q9 lattice in two dimensions and the D3Q15 or D3Q19 lattice in three dimensions. Conversely, the proposed model does not suffer from such an error term and therefore the thermal LB equation can be constructed on the D3Q7 lattice, which simplifies the model and improves the computational efficiency. Numerical simulations are carried out to validate the accuracy and efficiency of the proposed thermal multiphase LB model for simulating liquid-vapor phase change.

preprint2022arXiv

Indicative Image Retrieval: Turning Blackbox Learning into Grey

Deep learning became the game changer for image retrieval soon after it was introduced. It promotes the feature extraction (by representation learning) as the core of image retrieval, with the relevance/matching evaluation being degenerated into simple similarity metrics. In many applications, we need the matching evidence to be indicated rather than just have the ranked list (e.g., the locations of the target proteins/cells/lesions in medical images). It is like the matched words need to be highlighted in search engines. However, this is not easy to implement without explicit relevance/matching modeling. The deep representation learning models are not feasible because of their blackbox nature. In this paper, we revisit the importance of relevance/matching modeling in deep learning era with an indicative retrieval setting. The study shows that it is possible to skip the representation learning and model the matching evidence directly. By removing the dependency on the pre-trained models, it has avoided a lot of related issues (e.g., the domain gap between classification and retrieval, the detail-diffusion caused by convolution, and so on). More importantly, the study demonstrates that the matching can be explicitly modeled and backtracked later for generating the matching evidence indications. It can improve the explainability of deep inference. Our method obtains a best performance in literature on both Oxford-5k and Paris-6k, and sets a new record of 97.77% on Oxford-5k (97.81% on Paris-6k) without extracting any deep features.

preprint2022arXiv

IPDAE: Improved Patch-Based Deep Autoencoder for Lossy Point Cloud Geometry Compression

Point cloud is a crucial representation of 3D contents, which has been widely used in many areas such as virtual reality, mixed reality, autonomous driving, etc. With the boost of the number of points in the data, how to efficiently compress point cloud becomes a challenging problem. In this paper, we propose a set of significant improvements to patch-based point cloud compression, i.e., a learnable context model for entropy coding, octree coding for sampling centroid points, and an integrated compression and training process. In addition, we propose an adversarial network to improve the uniformity of points during reconstruction. Our experiments show that the improved patch-based autoencoder outperforms the state-of-the-art in terms of rate-distortion performance, on both sparse and large-scale point clouds. More importantly, our method can maintain a short compression time while ensuring the reconstruction quality.

preprint2022arXiv

Knowledge-enhanced Black-box Attacks for Recommendations

Recent studies have shown that deep neural networks-based recommender systems are vulnerable to adversarial attacks, where attackers can inject carefully crafted fake user profiles (i.e., a set of items that fake users have interacted with) into a target recommender system to achieve malicious purposes, such as promote or demote a set of target items. Due to the security and privacy concerns, it is more practical to perform adversarial attacks under the black-box setting, where the architecture/parameters and training data of target systems cannot be easily accessed by attackers. However, generating high-quality fake user profiles under black-box setting is rather challenging with limited resources to target systems. To address this challenge, in this work, we introduce a novel strategy by leveraging items' attribute information (i.e., items' knowledge graph), which can be publicly accessible and provide rich auxiliary knowledge to enhance the generation of fake user profiles. More specifically, we propose a knowledge graph-enhanced black-box attacking framework (KGAttack) to effectively learn attacking policies through deep reinforcement learning techniques, in which knowledge graph is seamlessly integrated into hierarchical policy networks to generate fake user profiles for performing adversarial black-box attacks. Comprehensive experiments on various real-world datasets demonstrate the effectiveness of the proposed attacking framework under the black-box setting.

preprint2022arXiv

Learning Bi-typed Multi-relational Heterogeneous Graph via Dual Hierarchical Attention Networks

Bi-type multi-relational heterogeneous graph (BMHG) is one of the most common graphs in practice, for example, academic networks, e-commerce user behavior graph and enterprise knowledge graph. It is a critical and challenge problem on how to learn the numerical representation for each node to characterize subtle structures. However, most previous studies treat all node relations in BMHG as the same class of relation without distinguishing the different characteristics between the intra-class relations and inter-class relations of the bi-typed nodes, causing the loss of significant structure information. To address this issue, we propose a novel Dual Hierarchical Attention Networks (DHAN) based on the bi-typed multi-relational heterogeneous graphs to learn comprehensive node representations with the intra-class and inter-class attention-based encoder under a hierarchical mechanism. Specifically, the former encoder aggregates information from the same type of nodes, while the latter aggregates node representations from its different types of neighbors. Moreover, to sufficiently model node multi-relational information in BMHG, we adopt a newly proposed hierarchical mechanism. By doing so, the proposed dual hierarchical attention operations enable our model to fully capture the complex structures of the bi-typed multi-relational heterogeneous graphs. Experimental results on various tasks against the state-of-the-arts sufficiently confirm the capability of DHAN in learning node representations on the BMHGs.

preprint2022arXiv

Mass and Age determination of the LAMOST data with different Machine Learning methods

We present a catalog of 948,216 stars with mass label and a catalog of 163,105 red clump (RC) stars with mass and age labels simultaneously. The training dataset is cross matched from the LAMOST (The Large Sky Area Multi-Object Fiber Spectroscopic Telescope) DR5 and high resolution asteroseismology data, mass and age are predicted by random forest method or convex hull algorithm. The stellar parameters with high correlation with mass and age are extracted and the test dataset shows that the median relative error of the prediction model for the mass of large sample is 3\% and meanwhile, the mass and age of red clump stars are 4\% and 7\%. We also compare the predicted age of red clump stars with the recent works and find that the final uncertainty of the RC sample could reach 18\% for age and 9\% for mass, in the meantime, final precision of the mass for large sample with different type of stars could reach 13\% without considering systematics, all these are implying that this method could be widely used in the future. Moreover, we explore the performance of different machine learning methods for our sample, including bayesian linear regression (BYS), gradient boosting decision Tree (GBDT), multilayer perceptron (MLP), multiple linear regression (MLR), random forest (RF) and support vector regression (SVR). Finally we find that the performance of nonlinear model is generally better than that of linear model, and the GBDT and RF methods are relatively better.

preprint2022arXiv

Multi-View Self-Attention Based Transformer for Speaker Recognition

Initially developed for natural language processing (NLP), Transformer model is now widely used for speech processing tasks such as speaker recognition, due to its powerful sequence modeling capabilities. However, conventional self-attention mechanisms are originally designed for modeling textual sequence without considering the characteristics of speech and speaker modeling. Besides, different Transformer variants for speaker recognition have not been well studied. In this work, we propose a novel multi-view self-attention mechanism and present an empirical study of different Transformer variants with or without the proposed attention mechanism for speaker recognition. Specifically, to balance the capabilities of capturing global dependencies and modeling the locality, we propose a multi-view self-attention mechanism for speaker Transformer, in which different attention heads can attend to different ranges of the receptive field. Furthermore, we introduce and compare five Transformer variants with different network architectures, embedding locations, and pooling methods to learn speaker embeddings. Experimental results on the VoxCeleb1 and VoxCeleb2 datasets show that the proposed multi-view self-attention mechanism achieves improvement in the performance of speaker recognition, and the proposed speaker Transformer network attains excellent results compared with state-of-the-art models.

preprint2022arXiv

Observation of anomalous amplitude modes in the kagome metal CsV$_3$Sb$_5$

The charge-density wave (CDW) phase is often accompanied by the condensation of a soft acoustic phonon mode, giving rise to lattice distortion and charge density modulation. This picture was challenged for the recently discovered kagome metal CsV$_3$Sb$_5$, based on the evidence of absence of soft phonons. Here we report the observation of Raman-active CDW amplitude modes in this material, which are collective excitations typically thought to emerge out of frozen soft phonons. The amplitude modes strongly hybridize with other superlattice modes, imparting them with clear temperature-dependent frequency shift and broadening, rarely seen in other known CDW materials. Both the mode mixing and the large amplitude mode frequencies suggest that the CDW exhibits the character of strong electron-phonon coupling, a regime in which acoustic phonon softening can cease to exist. The observation of amplitude modes in the absence of soft phonons highlights the unconventional nature of the CDW in CsV$_3$Sb$_5$.

preprint2022arXiv

Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

Pool-based Active Learning (AL) has achieved great success in minimizing labeling cost by sequentially selecting informative unlabeled samples from a large unlabeled data pool and querying their labels from oracle/annotators. However, existing AL sampling strategies might not work well in out-of-distribution (OOD) data scenarios, where the unlabeled data pool contains some data samples that do not belong to the classes of the target task. Achieving good AL performance under OOD data scenarios is a challenging task due to the natural conflict between AL sampling strategies and OOD sample detection. AL selects data that are hard to be classified by the current basic classifier (e.g., samples whose predicted class probabilities have high entropy), while OOD samples tend to have more uniform predicted class probabilities (i.e., high entropy) than in-distribution (ID) data. In this paper, we propose a sampling scheme, Monte-Carlo Pareto Optimization for Active Learning (POAL), which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool. We cast the AL sampling task as a multi-objective optimization problem, and thus we utilize Pareto optimization based on two conflicting objectives: (1) the normal AL data sampling scheme (e.g., maximum entropy), and (2) the confidence of not being an OOD sample. Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.

preprint2022arXiv

Saliency Attack: Towards Imperceptible Black-box Adversarial Attack

Deep neural networks are vulnerable to adversarial examples, even in the black-box setting where the attacker is only accessible to the model output. Recent studies have devised effective black-box attacks with high query efficiency. However, such performance is often accompanied by compromises in attack imperceptibility, hindering the practical use of these approaches. In this paper, we propose to restrict the perturbations to a small salient region to generate adversarial examples that can hardly be perceived. This approach is readily compatible with many existing black-box attacks and can significantly improve their imperceptibility with little degradation in attack success rate. Further, we propose the Saliency Attack, a new black-box attack aiming to refine the perturbations in the salient region to achieve even better imperceptibility. Extensive experiments show that compared to the state-of-the-art black-box attacks, our approach achieves much better imperceptibility scores, including most apparent distortion (MAD), $L_0$ and $L_2$ distances, and also obtains significantly higher success rates judged by a human-like threshold on MAD. Importantly, the perturbations generated by our approach are interpretable to some extent. Finally, it is also demonstrated to be robust to different detection-based defenses.

preprint2022arXiv

Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods

Actor-critic Reinforcement Learning (RL) algorithms have achieved impressive performance in continuous control tasks. However, they still suffer two nontrivial obstacles, i.e., low sample efficiency and overestimation bias. To this end, we propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL). Our SDQ-CAL boosts the Double Q-learning for off-policy actor-critic RL based on a modification of the Bellman optimality operator with Advantage Learning. Specifically, SDQ-CAL improves sample efficiency by modifying the reward to facilitate the distinction from experience between the optimal actions and the others. Besides, it mitigates the overestimation issue by updating a pair of critics simultaneously upon double estimators. Extensive experiments reveal that our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks. We release the source code of our method at: \url{https://github.com/LQNew/SDQ-CAL}.

preprint2022arXiv

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning. The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder. Leveraging large-scale unlabeled speech and text data, we pre-train SpeechT5 to learn a unified-modal representation, hoping to improve the modeling capability for both speech and text. To align the textual and speech information into this unified semantic space, we propose a cross-modal vector quantization approach that randomly mixes up speech/text states with latent units as the interface between encoder and decoder. Extensive evaluations show the superiority of the proposed SpeechT5 framework on a wide variety of spoken language processing tasks, including automatic speech recognition, speech synthesis, speech translation, voice conversion, speech enhancement, and speaker identification. We release our code and model at https://github.com/microsoft/SpeechT5.

preprint2022arXiv

Stock Movement Prediction Based on Bi-typed Hybrid-relational Market Knowledge Graph via Dual Attention Networks

Stock Movement Prediction (SMP) aims at predicting listed companies' stock future price trend, which is a challenging task due to the volatile nature of financial markets. Recent financial studies show that the momentum spillover effect plays a significant role in stock fluctuation. However, previous studies typically only learn the simple connection information among related companies, which inevitably fail to model complex relations of listed companies in the real financial market. To address this issue, we first construct a more comprehensive Market Knowledge Graph (MKG) which contains bi-typed entities including listed companies and their associated executives, and hybrid-relations including the explicit relations and implicit relations. Afterward, we propose DanSmp, a novel Dual Attention Networks to learn the momentum spillover signals based upon the constructed MKG for stock prediction. The empirical experiments on our constructed datasets against nine SOTA baselines demonstrate that the proposed DanSmp is capable of improving stock prediction with the constructed MKG.

preprint2022arXiv

Towards General Deep Leakage in Federated Learning

Unlike traditional central training, federated learning (FL) improves the performance of the global model by sharing and aggregating local models rather than local data to protect the users' privacy. Although this training approach appears secure, some research has demonstrated that an attacker can still recover private data based on the shared gradient information. This on-the-fly reconstruction attack deserves to be studied in depth because it can occur at any stage of training, whether at the beginning or at the end of model training; no relevant dataset is required and no additional models need to be trained. We break through some unrealistic assumptions and limitations to apply this reconstruction attack in a broader range of scenarios. We propose methods that can reconstruct the training data from shared gradients or weights, corresponding to the FedSGD and FedAvg usage scenarios, respectively. We propose a zero-shot approach to restore labels even if there are duplicate labels in the batch. We study the relationship between the label and image restoration. We find that image restoration fails even if there is only one incorrectly inferred label in the batch; we also find that when batch images have the same label, the corresponding image is restored as a fusion of that class of images. Our approaches are evaluated on classic image benchmarks, including CIFAR-10 and ImageNet. The batch size, image quality, and the adaptability of the label distribution of our approach exceed those of GradInversion, the state-of-the-art.

preprint2022arXiv

Towards Sustainable Satellite Edge Computing

Recently, Low Earth Orbit (LEO) satellites experience rapid development and satellite edge computing emerges to address the limitation of bent-pipe architecture in existing satellite systems. Introducing energy-consuming computing components in satellite edge computing increases the depth of battery discharge. This will shorten batteries' life and influences the satellites' operation in orbit. In this paper, we aim to extend batteries' life by minimizing the depth of discharge for Earth observation missions. Facing the challenges of wireless uncertainty and energy harvesting dynamics, our work develops an online energy scheduling algorithm within an online convex optimization framework. Our algorithm achieves sub-linear regret and the constraint violation asymptotically approaches zero. Simulation results show that our algorithm can reduce the depth of discharge significantly.

preprint2022arXiv

Tuning the competition between superconductivity and charge order in kagome superconductor Cs(V1-xNbx)3Sb5

The recently discovered coexistence of superconductivity and charge density wave order in the kagome systems AV3Sb5 (A = K, Rb, Cs) has stimulated enormous interest. According to theory, a vanadium-based kagome system may host a flat band, nontrivial linear dispersive Dirac surface states and electronic correlation. Despite intensive investigations, it remains controversial about the origin of the charge density wave (CDW) order, how does the superconductivity relate to the CDW, and whether the anomalous Hall effect (AHE) arises primarily from the kagome lattice or the CDW order. We report an extensive investigation on Cs(V1-xNbx)3Sb5 samples with systematic Nb doping. Our results show that the Nb doping induces apparent suppression of CDW order and promotes superconductivity; meanwhile, the AHE and magnetoresistance (MR) will be significantly weakened together with the CDW order. Combining with our density functional calculations, we interpret these effects by an antiphase shift of the Fermi energy with respect to the saddle points near M and the Fermi surface centered around Γ. It is found that the former depletes the filled states for the CDW instability and worsens the nesting condition for CDW order; while the latter lifts the Fermi level upward and enlarges the Fermi surface surrounding the Γ point, and thus promotes superconductivity. Our results uncover a delicate but unusual competition between the CDW order and superconductivity.

preprint2021arXiv

No observation of chiral flux current in the topological kagome metal CsV$_{3}$Sb$_{5}$

Compounds with kagome lattice usually host many exotic quantum states, including the quantum spin liquid, non-trivial topological Dirac bands and a strongly renormalized flat band, etc. Recently an interesting vanadium based kagome family $A$V$_{3}$Sb$_{5}$ ($A$ = K, Rb, or Cs) was discovered, and these materials exhibit multiple interesting properties, including unconventional saddle-point driving charge density wave (CDW) state, superconductivity, etc. Furthermore, some experiments show anomalous Hall effect which inspires that there might be some chiral flux current states. Here we report scanning tunneling measurements by using spin-polarized tips. Although we have observed clearly the $2a_0\times2a_0$ CDW and $4a_0$ stripe orders, the well-designed experiments with refined spin-polarized tips do not reveal any trace of the chiral flux current phase in CsV$_3$Sb$_5$ within the limits of experimental accuracy. No observation of the local magnetic moment in our experiments may put an upper bound constraint on the magnitude of magnetic moments induced by the possible chiral loop current which has a time-reversal symmetry breaking along $c$-axis in CsV$_{3}$Sb$_{5}$.

preprint2021arXiv

Solving Cold Start Problem in Recommendation with Attribute Graph Neural Networks

Matrix completion is a classic problem underlying recommender systems. It is traditionally tackled with matrix factorization. Recently, deep learning based methods, especially graph neural networks, have made impressive progress on this problem. Despite their effectiveness, existing methods focus on modeling the user-item interaction graph. The inherent drawback of such methods is that their performance is bound to the density of the interactions, which is however usually of high sparsity. More importantly, for a cold start user/item that does not have any interactions, such methods are unable to learn the preference embedding of the user/item since there is no link to this user/item in the graph. In this work, we develop a novel framework Attribute Graph Neural Networks (AGNN) by exploiting the attribute graph rather than the commonly used interaction graph. This leads to the capability of learning embeddings for cold start users/items. Our AGNN can produce the preference embedding for a cold user/item by learning on the distribution of attributes with an extended variational auto-encoder structure. Moreover, we propose a new graph neural network variant, i.e., gated-GNN, to effectively aggregate various attributes of different modalities in a neighborhood. Empirical results on two real-world datasets demonstrate that our model yields significant improvements for cold start recommendations and outperforms or matches state-of-the-arts performance in the warm start scenario.

preprint2021arXiv

The ergodic and non-ergodic phases in one dimensional clean Jaynes-Cummings-Hubbard system

We study the ergodic and non-ergodic behaviors of a clean Jaynes-Cummings-Hubbard chain for different parameters based on the average level spacings and the generalized fractal dimensions of eigenstates by using exact diagonalization. It can be found that a transition from ergodicity to non-ergodicity phases happens when the atom-photon detuning is large, and the non-ergodic phases maybe exist in the thermodynamic limit. We also find that the non-ergodic phase violates the eigenstate thermalization hypothesis. Finally, we study the many-body multifractality of the ground state and find that the derivative of the generalized fractal dimensions can determine the critical point of the Superfluid-Mott-insulation phase transition in a small range of parameters under different boundary conditions and there is no ergodicity for the ground state.

preprint2021arXiv

Tiansuan Constellation: An Open Research Platform

Satellite network is the first step of interstellar voyages. It can provide global Internet connectivity everywhere on earth, where most areas cannot access the Internet by the terrestrial infrastructure due to the geographic accessibility and high cost. The space industry experiences a rise in large low-earth-orbit satellite constellations to achieve universal connectivity. The research community is also urgent to do some leading research to bridge the connectivity divide. Researchers now conduct their work by simulation, which is far from enough. However, experiments on real satellites are blocked by the high threshold of space technology, such as deployment cost and unknown risks. To solve the above dilemma, we are eager to contribute to the universal connectivity and build an open research platform, Tiansuan constellation to support experiments on real satellite networks. We discuss the potential research topics that would benefit from Tiansuan constellation. We provide two case studies that have already deployed in two experimental satellites of Tiansuan constellation.

preprint2021arXiv

Unsupervised Person Re-Identification with Multi-Label Learning Guided Self-Paced Clustering

Although unsupervised person re-identification (Re-ID) has drawn increasing research attention recently, it remains challenging to learn discriminative features without annotations across disjoint camera views. In this paper, we address the unsupervised person Re-ID with a conceptually novel yet simple framework, termed as Multi-label Learning guided self-paced Clustering (MLC). MLC mainly learns discriminative features with three crucial modules, namely a multi-scale network, a multi-label learning module, and a self-paced clustering module. Specifically, the multi-scale network generates multi-granularity person features in both global and local views. The multi-label learning module leverages a memory feature bank and assigns each image with a multi-label vector based on the similarities between the image and feature bank. After multi-label training for several epochs, the self-paced clustering joins in training and assigns a pseudo label for each image. The benefits of our MLC come from three aspects: i) the multi-scale person features for better similarity measurement, ii) the multi-label assignment based on the whole dataset ensures that every image can be trained, and iii) the self-paced clustering removes some noisy samples for better feature learning. Extensive experiments on three popular large-scale Re-ID benchmarks demonstrate that our MLC outperforms previous state-of-the-art methods and significantly improves the performance of unsupervised person Re-ID.

preprint2020arXiv

A Competence-aware Curriculum for Visual Concepts Learning via Question Answering

Humans can progressively learn visual concepts from easy to hard questions. To mimic this efficient learning ability, we propose a competence-aware curriculum for visual concept learning in a question-answering manner. Specifically, we design a neural-symbolic concept learner for learning the visual concepts and a multi-dimensional Item Response Theory (mIRT) model for guiding the learning process with an adaptive curriculum. The mIRT effectively estimates the concept difficulty and the model competence at each learning step from accumulated model responses. The estimated concept difficulty and model competence are further utilized to select the most profitable training samples. Experimental results on CLEVR show that with a competence-aware curriculum, the proposed method achieves state-of-the-art performances with superior data efficiency and convergence speed. Specifically, the proposed model only uses 40% of training data and converges three times faster compared with other state-of-the-art methods.

preprint2020arXiv

A new theory of fluid-solid coupling in a porous medium for application to the ultrasonic evaluation of tissue remodeling using bioelastomers

Bioelastomers have demonstrated tremendous value and potential in the field of tissue repair due to increasing health demands. Improved non-invasive methods are required for monitoring tissue development assisted by bioelastomers. In this paper, we present a novel theory of fluid-solid coupling in a porous medium for application to the ultrasonic evaluation of tissue remodeling using bioelastomers. The common assumption of equal solid and liquid displacements used in the conventional description of a fluid-saturated porous solid cannot be applied to soft media, such as bioelastomers. We revise the geoacoustic theory of Biot to allow for relative motion between a fluid and a solid in an aggregate and derive an expression for a characteristic fluid-solid coupling parameter. Unlike the conventional method, the propagation speed of shear waves observed by ultrasound shear wave elastography is considered a known quantity in the novel theory, and the calculated value of the coupling parameter is used to evaluate the status of tissue repair. The model is validated by analyzing selected cases. The conditions under which the model can be applied are identified. However, further development of the theory is required to extract dynamic parameters that can be used to monitor the entire tissue remodeling process. In this paper, a theoretical approach is developed that can be used to analyze the mechanics of tissue repair. The theory has potential applications in the field of acellular in situ tissue engineering for non-invasive monitoring of the complex mechanical remodeling process of tissue regeneration and bioelastomer degradation.

preprint2020arXiv

A3: An Automatic Topology-Aware Malfunction Detection and Fixation System in Data Center Networks

Link failures and cable miswirings are not uncommon in building data center networks, which prevents the existing automatic address configuration methods from functioning correctly. However, accurately detecting such malfunctions is not an easy task because there could be no observable node degree changes. Fixing or correcting such malfunctions is even harder as almost no work can provide accurate fixation suggestions now. To solve the problems, we design and implement A3, an automatic topology-aware malfunction detection and fixation system. A3 innovatively formulates the problem of finding minimal fixation to the problem of computing minimum graph difference (NP-hard) and solves it in O(k^6) and O(k^3) for any less than k/2 and k/4 undirected link malfunctions for FatTree, respectively. Our evaluation demonstrates that for less than k/2 undirected link malfunctions, A3 is 100% accurate for malfunction detection and provides the minimum fixation result. For greater or equal to k/2 undirected link malfunctions, A3 still has accuracy of about 100% and provides the near optimal fixation result.

preprint2020arXiv

An Investigation of Few-Shot Learning in Spoken Term Classification

In this paper, we investigate the feasibility of applying few-shot learning algorithms to a speech task. We formulate a user-defined scenario of spoken term classification as a few-shot learning problem. In most few-shot learning studies, it is assumed that all the N classes are new in a N-way problem. We suggest that this assumption can be relaxed and define a N+M-way problem where N and M are the number of new classes and fixed classes respectively. We propose a modification to the Model-Agnostic Meta-Learning (MAML) algorithm to solve the problem. Experiments on the Google Speech Commands dataset show that our approach outperforms the conventional supervised learning approach and the original MAML.

preprint2020arXiv

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning

The goal of neural-symbolic computation is to integrate the connectionist and symbolist paradigms. Prior methods learn the neural-symbolic models using reinforcement learning (RL) approaches, which ignore the error propagation in the symbolic reasoning module and thus converge slowly with sparse rewards. In this paper, we address these issues and close the loop of neural-symbolic learning by (1) introducing the \textbf{grammar} model as a \textit{symbolic prior} to bridge neural perception and symbolic reasoning, and (2) proposing a novel \textbf{back-search} algorithm which mimics the top-down human-like learning procedure to propagate the error through the symbolic reasoning module efficiently. We further interpret the proposed learning framework as maximum likelihood estimation using Markov chain Monte Carlo sampling and the back-search algorithm as a Metropolis-Hastings sampler. The experiments are conducted on two weakly-supervised neural-symbolic tasks: (1) handwritten formula recognition on the newly introduced HWF dataset; (2) visual question answering on the CLEVR dataset. The results show that our approach significantly outperforms the RL methods in terms of performance, converging speed, and data efficiency. Our code and data are released at \url{https://liqing-ustc.github.io/NGS}.

preprint2020arXiv

How does boiling occur in lattice Boltzmann simulations?

In recent years, the lattice Boltzmann (LB) method has been widely employed to simulate boiling phenomena [A. Márkus and G. Házi, Phys. Rev. E 83, 046705 (2011); Biferale et al., Phys. Rev. Lett. 108, 104502 (2012); Li et al., Phys. Rev. E 96, 063303 (2017); Wu et al., Int. J. Heat Mass Transfer 126, 773 (2018)]. However, a very important issue still remains open, i.e., how does boiling occur in the LB simulations? For instance, the existing LB studies showed that the boiling on a hydrophobic surface begins at a lower wall superheat than that on a hydrophilic surface, which qualitatively agrees well with experimental studies, but no one has yet explained how this phenomenon appears in the LB simulations and what happened in the simulations after changing the wettability of the heating surface. In this paper, the LB boiling mechanism is revealed by analyzing boiling on a flat surface with mixed wettability and boiling on a structured surface with homogeneous wettability. Through a theoretical analysis, we demonstrate that, when the same wall superheat is applied, in the LB boiling simulations the fluid density near the heating surface decreases faster on a hydrophobic surface than that on a hydrophilic surface. Accordingly, a lower wall superheat can induce the phase transition from liquid to vapor on a hydrophobic surface than that on a hydrophilic surface. Furthermore, a similar theoretical analysis shows that the fluid density decreases fastest at concave corners in the case of a structured surface with homogeneous wettability, which explains why vapor bubbles are nucleated at concave corners in the LB simulations of boiling on structured surfaces.

preprint2020arXiv

Imaging viscous flow of the Dirac fluid in graphene

The electron-hole plasma in charge-neutral graphene is predicted to realize a quantum critical system whose transport features a universal hydrodynamic description, even at room temperature. This quantum critical "Dirac fluid" is expected to have a shear viscosity close to a minimum bound, with an inter-particle scattering rate saturating at the Planckian time $\hbar/(k_B T)$. While electrical transport measurements at finite carrier density are consistent with hydrodynamic electron flow in graphene, a "smoking gun" of viscous behavior remains elusive. In this work, we directly image viscous Dirac fluid flow in graphene at room temperature via measurement of the associated stray magnetic field. Nanoscale magnetic imaging is performed using quantum spin magnetometers realized with nitrogen vacancy (NV) centers in diamond. Scanning single-spin and wide-field magnetometry reveals a parabolic Poiseuille profile for electron flow in a graphene channel near the charge neutrality point, establishing the viscous transport of the Dirac fluid. This measurement is in contrast to the conventional uniform flow profile imaged in an Ohmic conductor. Via combined imaging-transport measurements, we obtain viscosity and scattering rates, and observe that these quantities are comparable to the universal values expected at quantum criticality. This finding establishes a nearly-ideal electron fluid in neutral graphene at room temperature. Our results pave the way to study hydrodynamic transport in quantum critical fluids relevant to strongly-correlated electrons in high-$T_c$ superconductors. This work also highlights the capability of quantum spin magnetometers to probe correlated-electronic phenomena at the nanoscale.

preprint2020arXiv

Incorporating Effective Global Information via Adaptive Gate Attention for Text Classification

The dominant text classification studies focus on training classifiers using textual instances only or introducing external knowledge (e.g., hand-craft features and domain expert knowledge). In contrast, some corpus-level statistical features, like word frequency and distribution, are not well exploited. Our work shows that such simple statistical information can enhance classification performance both efficiently and significantly compared with several baseline models. In this paper, we propose a classifier with gate mechanism named Adaptive Gate Attention model with Global Information (AGA+GI), in which the adaptive gate mechanism incorporates global statistical features into latent semantic features and the attention layer captures dependency relationship within the sentence. To alleviate the overfitting issue, we propose a novel Leaky Dropout mechanism to improve generalization ability and performance stability. Our experiments show that the proposed method can achieve better accuracy than CNN-based and RNN-based approaches without global information on several benchmarks.

preprint2020arXiv

Interference in Atomic Magnetometry

Atomic magnetometers are highly sensitive detectors of magnetic fields that monitor the evolution of the macroscopic magnetic moment of atomic vapors, and opening new applications in biological, physical, and chemical science. However, the performance of atomic magnetometers is often limited by hidden systematic effects that may cause misdiagnosis for a variety of applications, e.g., in NMR and in biomagnetism. In this work, we uncover a hitherto unexplained interference effect in atomic magnetometers, which causes an important systematic effect to greatly deteriorate the accuracy of measuring magnetic fields. We present a standard approach to detecting and characterizing the interference effect in, but not limited to, atomic magnetometers. As applications of our work, we consider the effect of the interference in NMR structural determination and locating the brain electrophysiological symptom, and show that it will help to improve the measurement accuracy by taking interference effects into account. Through our experiments, we indeed find good agreement between our prediction and the asymmetric amplitudes of resonant lines in ultralow-field NMR spectra -- an effect that has not been understood so far. We anticipate that our work will stimulate interesting new researches for magnetic interference phenomena in a wide range of magnetometers and their applications.

preprint2020arXiv

LO-Net: Deep Real-time Lidar Odometry

We present a novel deep convolutional network pipeline, LO-Net, for real-time lidar odometry estimation. Unlike most existing lidar odometry (LO) estimations that go through individually designed feature selection, feature matching, and pose estimation pipeline, LO-Net can be trained in an end-to-end manner. With a new mask-weighted geometric constraint loss, LO-Net can effectively learn feature representation for LO estimation, and can implicitly exploit the sequential dependencies and dynamics in the data. We also design a scan-to-map module, which uses the geometric and semantic information learned in LO-Net, to improve the estimation accuracy. Experiments on benchmark datasets demonstrate that LO-Net outperforms existing learning based approaches and has similar accuracy with the state-of-the-art geometry-based approach, LOAM.

preprint2020arXiv

MiniNet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation

Predicting depth from a single image is an attractive research topic since it provides one more dimension of information to enable machines to better perceive the world. Recently, deep learning has emerged as an effective approach to monocular depth estimation. As obtaining labeled data is costly, there is a recent trend to move from supervised learning to unsupervised learning to obtain monocular depth. However, most unsupervised learning methods capable of achieving high depth prediction accuracy will require a deep network architecture which will be too heavy and complex to run on embedded devices with limited storage and memory spaces. To address this issue, we propose a new powerful network with a recurrent module to achieve the capability of a deep network while at the same time maintaining an extremely lightweight size for real-time high performance unsupervised monocular depth prediction from video sequences. Besides, a novel efficient upsample block is proposed to fuse the features from the associated encoder layer and recover the spatial size of features with the small number of model parameters. We validate the effectiveness of our approach via extensive experiments on the KITTI dataset. Our new model can run at a speed of about 110 frames per second (fps) on a single GPU, 37 fps on a single CPU, and 2 fps on a Raspberry Pi 3. Moreover, it achieves higher depth accuracy with nearly 33 times fewer model parameters than state-of-the-art models. To the best of our knowledge, this work is the first extremely lightweight neural network trained on monocular video sequences for real-time unsupervised monocular depth estimation, which opens up the possibility of implementing deep learning-based real-time unsupervised monocular depth prediction on low-cost embedded devices.

preprint2020arXiv

PoseGAN: A Pose-to-Image Translation Framework for Camera Localization

Camera localization is a fundamental requirement in robotics and computer vision. This paper introduces a pose-to-image translation framework to tackle the camera localization problem. We present PoseGANs, a conditional generative adversarial networks (cGANs) based framework for the implementation of pose-to-image translation. PoseGANs feature a number of innovations including a distance metric based conditional discriminator to conduct camera localization and a pose estimation technique for generated camera images as a stronger constraint to improve camera localization performance. Compared with learning-based regression methods such as PoseNet, PoseGANs can achieve better performance with model sizes that are 70% smaller. In addition, PoseGANs introduce the view synthesis technique to establish the correspondence between the 2D images and the scene, \textit{i.e.}, given a pose, PoseGANs are able to synthesize its corresponding camera images. Furthermore, we demonstrate that PoseGANs differ in principle from structure-based localization and learning-based regressions for camera localization, and show that PoseGANs exploit the geometric structures to accomplish the camera localization task, and is therefore more stable than and superior to learning-based regressions which rely on local texture features instead. In addition to camera localization and view synthesis, we also demonstrate that PoseGANs can be successfully used for other interesting applications such as moving object elimination and frame interpolation in video sequences.

preprint2020arXiv

RMB-DPOP: Refining MB-DPOP by Reducing Redundant Inferences

MB-DPOP is an important complete algorithm for solving Distributed Constraint Optimization Problems (DCOPs) by exploiting a cycle-cut idea to implement memory-bounded inference. However, each cluster root in the algorithm is responsible for enumerating all the instantiations of its cycle-cut nodes, which would cause redundant inferences when its branches do not have the same cycle-cut nodes. Additionally, a large number of cycle-cut nodes and the iterative nature of MB-DPOP further exacerbate the pathology. As a result, MB-DPOP could suffer from huge coordination overheads and cannot scale up well. Therefore, we present RMB-DPOP which incorporates several novel mechanisms to reduce redundant inferences and improve the scalability of MB-DPOP. First, using the independence among the cycle-cut nodes in different branches, we distribute the enumeration of instantiations into different branches whereby the number of nonconcurrent instantiations reduces significantly and each branch can perform memory bounded inference asynchronously. Then, taking the topology into the consideration, we propose an iterative allocation mechanism to choose the cycle-cut nodes that cover a maximum of active nodes in a cluster and break ties according to their relative positions in a pseudo-tree. Finally, a caching mechanism is proposed to further reduce unnecessary inferences when the historical results are compatible with the current instantiations. We theoretically show that with the same number of cycle-cut nodes RMB-DPOP requires as many messages as MB-DPOP in the worst case and the experimental results show our superiorities over the state-of-the-art.

preprint2020arXiv

Self-awareness based resource allocation strategy for containment of epidemic spreading

Resource support between individuals is of particular importance in controlling or mitigating epidemic spreading, especially during pandemics. Whereas there remains the question of how we can protect ourselves from being infected while helping others by donating resources in fighting against the epidemic. To answer the question, we propose a novel resource allocation model by considering the awareness of self-protection of individuals. In the model, a tuning parameter is introduced to quantify the reaction strength of individuals when they are aware of the disease. And then, a coupled model of resource allocation and disease spreading is proposed to study the impact of self-awareness on resource allocation and, its impact on the dynamics of epidemic spreading. Through theoretical analysis and extensive Monte Carlo simulations, we find that in the stationary state, the system converges to two states: the whole healthy or the completely infected, which indicates an abrupt increase in the prevalence when there is a shortage of resources. More importantly, we find that too cautious and too selfless for the people during the outbreak of an epidemic are both not suitable for disease control. Through extensive simulations, we find the optimal point, at which there is a maximum value of the epidemic threshold, and an outbreak can be delayed to the greatest extent. At last, we study further the effects of network structure on the coupled dynamics. We find that the degree heterogeneity promotes the outbreak of disease, and the network structure does not alter the optimal phenomenon in behavior response.

preprint2020arXiv

The prescribed time sliding mode control for attitude tracking of spacecraft

With the development of the space missions, there are extensive missions in the demand of the prescribed time convergence. However, it is still a difficult work to combine the prescribed time method with the sliding mode control due to the infinite gain of the prescribed time method while approaching the prescribed time and two periods of sliding mode control. In this paper, a new prescribed time sliding mode control method is proposed for general systems with matched disturbances, from the second-order system to the high-order system. A novel sliding mode variable with explicit time term is designed for achieving the prescribed time convergence. More importantly, as time approaches the prescribed time, the singularity of control input can be avoided. Finally, this paper presents a disturbance observer based prescribed time sliding mode control method for attitude tracking of spacecraft and the efficiency of this method has been verified through the numerical simulations.

preprint2020arXiv

Tips and Tricks for Webly-Supervised Fine-Grained Recognition: Learning from the WebFG 2020 Challenge

WebFG 2020 is an international challenge hosted by Nanjing University of Science and Technology, University of Edinburgh, Nanjing University, The University of Adelaide, Waseda University, etc. This challenge mainly pays attention to the webly-supervised fine-grained recognition problem. In the literature, existing deep learning methods highly rely on large-scale and high-quality labeled training data, which poses a limitation to their practicability and scalability in real world applications. In particular, for fine-grained recognition, a visual task that requires professional knowledge for labeling, the cost of acquiring labeled training data is quite high. It causes extreme difficulties to obtain a large amount of high-quality training data. Therefore, utilizing free web data to train fine-grained recognition models has attracted increasing attentions from researchers in the fine-grained community. This challenge expects participants to develop webly-supervised fine-grained recognition methods, which leverages web images in training fine-grained recognition models to ease the extreme dependence of deep learning methods on large-scale manually labeled datasets and to enhance their practicability and scalability. In this technical report, we have pulled together the top WebFG 2020 solutions of total 54 competing teams, and discuss what methods worked best across the set of winning teams, and what surprisingly did not help.

preprint2019arXiv

Absence of superconductivity in bulk Nd$_{1-x}$Sr$_x$NiO$_2$

Recently superconductivity at 9 - 15 K was discovered in an infinite-layer nickelate (Nd$_{0.8}$Sr$_{0.2}$NiO$_2$ films), which has received enormous attention. Since the $Ni^{1+}$ ionic state in NdNiO$_2$ may have the $3d^9$ outer-shell electronic orbit which resembles that of the cuprates, it is very curious to know whether superconductivity discovered here has similar mechanism as that in the cuprates. By using a three-step method, we successfully synthesize the bulk samples of Nd$_{1-x}$Sr$_x$NiO$_2$ (x=0, 0.2, 0.4). The X-ray diffractions reveal that all the samples contain mainly the infinite layer phase of 112 with some amount of segregated Ni. This has also been well proved by the SEM image and the EDS composition analysis. The resistive measurements on the Sr doped samples show insulating behavior without the presence of superconductivity. Temperature dependence of the magnetic moment under high magnetic fields exhibits a Curie-Weiss law feature with the paramagnetic moment of about 2$μ_B$/f.u.. By applying pressure on Nd$_{0.8}$Sr$_{0.2}$NiO$_2$ up to about 50.2 GPa, we find that the strong insulating behavior at ambient pressure is significantly suppressed, but superconductivity has not been observed either. Since the lattice constants derived from our XRD data are very close to those of the reported superconducting films, we argue that the superconductivity in the reported film may not originate from the expected Nd$_{0.8}$Sr$_{0.2}$NiO$_2$, but arise from the interface or the stress effect.

preprint2019arXiv

Decoupling of itinerant and localized $d$-orbital electrons in the compound Sc$_{0.5}$Zr$_{0.5}$Co

By using the arc-melting method, we successfully synthesized the compound Sc$_{0.5}$Zr$_{0.5}$Co with the space group of $Pm$-$3m$. Both the resistivity and magnetic susceptibility measurements reveal a phase transition at about 86 K. This transition might be attributed to the establishment of an antiferromagnetic order. The magnetization hysteresis loop measurements in wide temperature region show a weak ferromagnetic feature, which suggests a possible canted arrangement of the magnetic moments. Bounded by the phase transition temperature, the resistivity at ambient pressure shows a change from Fermi liquid behavior to a super-linear behavior as temperature increases. By applying pressures up to 32.1 GPa, the transition temperature does not show a clear change and no superconductivity is observed above 2 K. The density functional theory (DFT) calculations confirm the existence of the antiferromagnetic order and reveal a gap between the spin-up and spin-down $d$-orbital electrons. This kind of behavior may suggest that the antiferromagnetic order in this compound originates from the localized $d$-electrons which do not contribute to the conductance. Thus the itinerant and localized $d$-orbital electrons in the compound are decoupled.

preprint2019arXiv

Milliwatt-threshold visible-telecom optical parametric oscillation using silicon nanophotonics

The on-chip creation of coherent light at visible wavelengths is crucial to field-level deployment of spectroscopy and metrology systems. Although on-chip lasers have been implemented in specific cases, a general solution that is not restricted by limitations of specific gain media has not been reported. Here, we propose creating visible light from an infrared pump by widely-separated optical parametric oscillation (OPO) using silicon nanophotonics. The OPO creates signal and idler light in the 700 nm and 1300 nm bands, respectively, with a 900 nm pump. It operates at a threshold power of (0.9 +/- 0.1) mW, over 50x smaller than other widely-separated microcavity OPO works, which have only been reported in the infrared. This low threshold enables direct pumping without need of an intermediate optical amplifier. We further show how the device design can be modified to generate 780 nm and 1500 nm light with a similar power efficiency. Our nanophotonic OPO shows distinct advantages in power efficiency, operation stability, and device scalability, and is a major advance towards flexible on-chip generation of coherent visible light.