Source author record

Ang Li

Ang Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

119works

48topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer

Decision Transformer (DT) formulates offline reinforcement learning as autoregressive sequence modeling, achieving promising results by predicting actions from a sequence of Return-to-Go (RTG), state, and action tokens. However, RTG is a scalar that summarizes future rewards, containing far less information than typical state or action vectors, yet it consumes the same computational budget per token. Worse, the self-attention cost of Transformers grows quadratically with sequence length, so including RTG as a separate token adds unnecessary overhead. We propose SlimDT, which removes RTG from the autoregressive sequence. Instead, we inject RTG information into the state representations before the sequential modeling step, allowing the Transformer to process only a compact (state, action) sequence. This reduces the sequence length by one-third, directly improving inference efficiency. On the D4RL benchmark, SlimDT surpasses standard DT across various tasks and achieves performance comparable to existing state-of-the-art methods. Decoupling a sparse conditioning signal from an information-rich sequence thus yields both computational gains and higher task performance.

preprint2023arXiv

Block-Level MU-MISO Interference Exploitation Precoding: Optimal Structure and Explicit Duality

This paper investigates block-level interference exploitation (IE) precoding for multi-user multiple-input single-output (MU-MISO) downlink systems. To overcome the need for symbol-level IE precoding to frequently update the precoding matrix, we propose to jointly optimize all the precoders or transmit signals within a transmission block. The resultant precoders only need to be updated once per block, and while not necessarily constant over all the symbol slots, we refer to the technique as block-level slot-variant IE precoding. Through a careful examination of the optimal structure and the explicit duality inherent in block-level power minimization (PM) and signal-to-interference-plus-noise ratio (SINR) balancing (SB) problems, we discover that the joint optimization can be decomposed into subproblems with smaller variable sizes. As a step further, we propose block-level slot-invariant IE precoding by adding a structural constraint on the slot-variant IE precoding to maintain a constant precoder throughout the block. A novel linear precoder for IE is further presented, and we prove that the proposed slot-variant and slot-invariant IE precoding share an identical solution when the number of symbol slots does not exceed the number of users. Numerical simulations demonstrate that the proposed precoders achieve a significant complexity reduction compared against benchmark schemes, without sacrificing performance.

preprint2023arXiv

Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation

In this paper, we focus on the simulation of active stereovision depth sensors, which are popular in both academic and industry communities. Inspired by the underlying mechanism of the sensors, we designed a fully physics-grounded simulation pipeline that includes material acquisition, ray-tracing-based infrared (IR) image rendering, IR noise simulation, and depth estimation. The pipeline is able to generate depth maps with material-dependent error patterns similar to a real depth sensor in real time. We conduct real experiments to show that perception algorithms and reinforcement learning policies trained in our simulation platform could transfer well to the real-world test cases without any fine-tuning. Furthermore, due to the high degree of realism of this simulation, our depth sensor simulator can be used as a convenient testbed to evaluate the algorithm performance in the real world, which will largely reduce the human effort in developing robotic algorithms. The entire pipeline has been integrated into the SAPIEN simulator and is open-sourced to promote the research of vision and robotics communities.

preprint2022arXiv

A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining

Transformers are considered one of the most important deep learning models since 2018, in part because it establishes state-of-the-art (SOTA) records and could potentially replace existing Deep Neural Networks (DNNs). Despite the remarkable triumphs, the prolonged turnaround time of Transformer models is a widely recognized roadblock. The variety of sequence lengths imposes additional computing overhead where inputs need to be zero-padded to the maximum sentence length in the batch to accommodate the parallel computing platforms. This paper targets the field-programmable gate array (FPGA) and proposes a coherent sequence length adaptive algorithm-hardware co-design for Transformer acceleration. Particularly, we develop a hardware-friendly sparse attention operator and a length-aware hardware resource scheduling algorithm. The proposed sparse attention operator brings the complexity of attention-based models down to linear complexity and alleviates the off-chip memory traffic. The proposed length-aware resource hardware scheduling algorithm dynamically allocates the hardware resources to fill up the pipeline slots and eliminates bubbles for NLP tasks. Experiments show that our design has very small accuracy loss and has 80.2 $\times$ and 2.6 $\times$ speedup compared to CPU and GPU implementation, and 4 $\times$ higher energy efficiency than state-of-the-art GPU accelerator optimized via CUBLAS GEMM.

preprint2022arXiv

A Rare Topic Discovery Model for Short Texts Based on Co-occurrence word Network

We provide a simple and general solution for the discovery of scarce topics in unbalanced short-text datasets, namely, a word co-occurrence network-based model CWIBTD, which can simultaneously address the sparsity and unbalance of short-text topics and attenuate the effect of occasional pairwise occurrences of words, allowing the model to focus more on the discovery of scarce topics. Unlike previous approaches, CWIBTD uses co-occurrence word networks to model the topic distribution of each word, which improves the semantic density of the data space and ensures its sensitivity in identify-ing rare topics by improving the way node activity is calculated and normal-izing scarce topics and large topics to some extent. In addition, using the same Gibbs sampling as LDA makes CWIBTD easy to be extended to vari-ous application scenarios. Extensive experimental validation in the unbal-anced short text dataset confirms the superiority of CWIBTD over the base-line approach in discovering rare topics. Our model can be used for early and accurate discovery of emerging topics or unexpected events on social platforms.

preprint2022arXiv

A sentiment analysis model for car review texts based on adversarial training and whole word mask BERT

In the field of car evaluation, more and more netizens choose to express their opinions on the Internet platform, and these comments will affect the decision-making of buyers and the trend of car word-of-mouth. As an important branch of natural language processing (NLP), sentiment analysis provides an effective research method for analyzing the sentiment types of massive car review texts. However, due to the lexical professionalism and large text noise of review texts in the automotive field, when a general sentiment analysis model is applied to car reviews, the accuracy of the model will be poor. To overcome these above challenges, we aim at the sentiment analysis task of car review texts. From the perspective of word vectors, pre-training is carried out by means of whole word mask of proprietary vocabulary in the automotive field, and then training data is carried out through the strategy of an adversarial training set. Based on this, we propose a car review text sentiment analysis model based on adversarial training and whole word mask BERT(ATWWM-BERT).

preprint2022arXiv

Academic Resource Text Level Multi-label Classification based on Attention

Hierarchical multi-label academic text classification (HMTC) is to assign academic texts into a hierarchically structured labeling system. We propose an attention-based hierarchical multi-label classification algorithm of academic texts (AHMCA) by integrating features such as text, keywords, and hierarchical structure, the academic documents are classified into the most relevant categories. We utilize word2vec and BiLSTM to obtain embedding and latent vector representations of text, keywords, and hierarchies. We use hierarchical attention mechanism to capture the associations between keywords, label hierarchies, and text word vectors to generate hierarchical-specific document embedding vectors to replace the original text embeddings in HMCN-F. The experimental results on the academic text dataset demonstrate the effectiveness of the AHMCA algorithm.

preprint2022arXiv

Accurate Portraits of Scientific Resources and Knowledge Service Components

With the advent of the cloud computing era, the cost of creating, capturing and managing information has gradually decreased. The amount of data in the Internet is also showing explosive growth, and more and more scientific and technological resources are uploaded to the network. Different from news and social media data ubiquitous in the Internet, the main body of scientific and technological resources is composed of academic-style resources or entities such as papers, patents, authors, and research institutions. There is a rich relationship network between resources, from which a large amount of cutting-edge scientific and technological information can be mined. There are a large number of management and classification standards for existing scientific and technological resources, but these standards are difficult to completely cover all entities and associations of scientific and technological resources, and cannot accurately extract important information contained in scientific and technological resources. How to construct a complete and accurate representation of scientific and technological resources from structured and unstructured reports and texts in the network, and how to tap the potential value of scientific and technological resources is an urgent problem. The solution is to construct accurate portraits of scientific and technological resources in combination with knowledge graph related technologies.

preprint2022arXiv

AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis

Deep neural networks (DNNs) are proved to be vulnerable against backdoor attacks. A backdoor is often embedded in the target DNNs through injecting a backdoor trigger into training examples, which can cause the target DNNs misclassify an input attached with the backdoor trigger. Existing backdoor detection methods often require the access to the original poisoned training data, the parameters of the target DNNs, or the predictive confidence for each given input, which are impractical in many real-world applications, e.g., on-device deployed DNNs. We address the black-box hard-label backdoor detection problem where the DNN is fully black-box and only its final output label is accessible. We approach this problem from the optimization perspective and show that the objective of backdoor detection is bounded by an adversarial objective. Further theoretical and empirical studies reveal that this adversarial objective leads to a solution with highly skewed distribution; a singularity is often observed in the adversarial map of a backdoor-infected example, which we call the adversarial singularity phenomenon. Based on this observation, we propose the adversarial extreme value analysis(AEVA) to detect backdoors in black-box neural networks. AEVA is based on an extreme value analysis of the adversarial map, computed from the monte-carlo gradient estimation. Evidenced by extensive experiments across multiple popular tasks and backdoor attacks, our approach is shown effective in detecting backdoor attacks under the black-box hard-label scenarios.

preprint2022arXiv

Aspect-Based Sentiment Analysis using Local Context Focus Mechanism with DeBERTa

Text sentiment analysis, also known as opinion mining, is research on the calculation of people's views, evaluations, attitude and emotions expressed by entities. Text sentiment analysis can be divided into text-level sentiment analysis, sen-tence-level sentiment analysis and aspect-level sentiment analysis. Aspect-Based Sentiment Analysis (ABSA) is a fine-grained task in the field of sentiment analysis, which aims to predict the polarity of aspects. The research of pre-training neural model has significantly improved the performance of many natural language processing tasks. In recent years, pre training model (PTM) has been applied in ABSA. Therefore, there has been a question, which is whether PTMs contain sufficient syntactic information for ABSA. In this paper, we explored the recent DeBERTa model (Decoding-enhanced BERT with disentangled attention) to solve Aspect-Based Sentiment Analysis problem. DeBERTa is a kind of neural language model based on transformer, which uses self-supervised learning to pre-train on a large number of original text corpora. Based on the Local Context Focus (LCF) mechanism, by integrating DeBERTa model, we purpose a multi-task learning model for aspect-based sentiment analysis. The experiments result on the most commonly used the laptop and restaurant datasets of SemEval-2014 and the ACL twitter dataset show that LCF mechanism with DeBERTa has significant improvement.

preprint2022arXiv

Astrophysical implications on hyperon couplings and hyperon star properties with relativistic equations of states

Hyperons are essential constituents in the neutron star interior. The poorly-known hyperonic interaction is a source of uncertainty for studying laboratory hypernuclei and neutron star observations. In this work, we perform Bayesian inference of phenomenological hyperon-nucleon interactions using the tidal-deformability measurement of the GW170817 binary neutron star merger as detected by LIGO/Virgo and the mass-radius measurements of PSR J0030+0541 and PSR J0740+6620 as detected by NICER. The analysis is based on a set of stiff relativistic neutron-star-matter equation of states with hyperons from the relativistic mean-field theory, naturally fulfilling the causality requirement and empirical nuclear matter properties. We specifically utilize the strong correlation recently deduced between the scalar and vector meson hyperon couplings, imposed by the measured $Λ$ separation energy in single-$Λ$ hypernuclei, and perform four different tests with or without the strong correlation. We find that the laboratory hypernuclear constraint ensures a large enough $Λ$-scalar-meson coupling to match the large vector coupling in hyperon star matter. When adopting the current most probable intervals of hyperon couplings from the joint analysis of laboratory and astrophysical data, we find the maximum mass of hyperon stars is at most $2.176^{+0.085}_{-0.202}M_{\odot}$ ($68\%$ credible interval) from the chosen set of stiff equation of states. The reduction of the stellar radius due to hyperons is quantified based on our analysis and various hyperon star properties are provided.

preprint2022arXiv

Bi-convolution matrix factorization algorithm based on improved ConvMF

With the rapid development of information technology, "information overload" has become the main theme that plagues people's online life. As an effective tool to help users quickly search for useful information, a personalized recommendation is more and more popular among people. In order to solve the sparsity problem of the traditional matrix factorization algorithm and the problem of low utilization of review document information, this paper proposes a Bicon-vMF algorithm based on improved ConvMF. This algorithm uses two parallel convolutional neural networks to extract deep features from the user review set and item review set respectively and fuses these features into the decomposition of the rating matrix, so as to construct the user latent model and the item latent model more accurately. The experimental results show that compared with traditional recommendation algorithms like PMF, ConvMF, and DeepCoNN, the method proposed in this paper has lower prediction error and can achieve a better recommendation effect. Specifically, compared with the previous three algorithms, the prediction errors of the algorithm proposed in this paper are reduced by 45.8%, 16.6%, and 34.9%, respectively.

preprint2022arXiv

Block-Level Interference Exploitation Precoding without Symbol-by-Symbol Optimization

Symbol-level precoding (SLP) based on the concept of constructive interference (CI) is shown to be superior to traditional block-level precoding (BLP), however at the cost of a symbol-by-symbol optimization during the precoding design. In this paper, we propose a CI-based block-level precoding (CI-BLP) scheme for the downlink transmission of a multi-user multiple-input single-output (MU-MISO) communication system, where we design a constant precoding matrix to a block of symbol slots to exploit CI for each symbol slot simultaneously. A single optimization problem is formulated to maximize the minimum CI effect over the entire block, thus reducing the computational cost of traditional SLP as the optimization problem only needs to be solved once per block. By leveraging the Karush-Kuhn-Tucker (KKT) conditions and the dual problem formulation, the original optimization problem is finally shown to be equivalent to a quadratic programming (QP) over a simplex. Numerical results validate our derivations and exhibit superior performance for the proposed CI-BLP scheme over traditional BLP and SLP methods, thanks to the relaxed block-level power constraint.

preprint2022arXiv

BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling

Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art method for graph-based learning tasks. However, training GCNs at scale is still challenging, hindering both the exploration of more sophisticated GCN architectures and their applications to real-world large graphs. While it might be natural to consider graph partition and distributed training for tackling this challenge, this direction has only been slightly scratched the surface in the previous works due to the limitations of existing designs. In this work, we first analyze why distributed GCN training is ineffective and identify the underlying cause to be the excessive number of boundary nodes of each partitioned subgraph, which easily explodes the memory and communication costs for GCN training. Furthermore, we propose a simple yet effective method dubbed BNS-GCN that adopts random Boundary-Node-Sampling to enable efficient and scalable distributed GCN training. Experiments and ablation studies consistently validate the effectiveness of BNS-GCN, e.g., boosting the throughput by up to 16.2x and reducing the memory usage by up to 58%, while maintaining a full-graph accuracy. Furthermore, both theoretical and empirical analysis show that BNS-GCN enjoys a better convergence than existing sampling-based methods. We believe that our BNS-GCN has opened up a new paradigm for enabling GCN training at scale. The code is available at https://github.com/RICE-EIC/BNS-GCN.

preprint2022arXiv

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression

As HPC systems continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the I/O performance. However, little work has been done for effectively offloading lossy compression onto FPGA-based SmartNICs to reduce the compression overhead. In this paper, we propose a hardware-algorithm codesign for an efficient and adaptive lossy compressor for scientific data on FPGAs (called CEAZ), which is the first lossy compressor that can achieve high compression ratios and throughputs simultaneously. Specifically, we propose an efficient Huffman coding approach that can adaptively update Huffman codewords online based on codewords generated offline, from a variety of representative scientific datasets. Moreover, we derive a theoretical analysis to support a precise control of compression ratio under an error-bounded compression mode, enabling accurate offline Huffman codewords generation. This also helps us create a fixed-ratio compression mode for consistent throughput. In addition, we develop an efficient compression pipeline by adopting cuSZ's dual-quantization algorithm to our hardware use cases. Finally, we evaluate CEAZ on five real-world datasets with both a single FPGA board and 128 nodes (to accelerate parallel I/O). Experiments show that CEAZ outperforms the second-best FPGA-based lossy compressor by 2.3X of throughput and 3.0X of ratio. It also improves MPI_File_write and MPI_Gather throughputs by up to 28.9X and 37.8X, respectively.

preprint2022arXiv

Chinese Word Sense Embedding with SememeWSD and Synonym Set

Word embedding is a fundamental natural language processing task which can learn feature of words. However, most word embedding methods assign only one vector to a word, even if polysemous words have multi-senses. To address this limitation, we propose SememeWSD Synonym (SWSDS) model to assign a different vector to every sense of polysemous words with the help of word sense disambiguation (WSD) and synonym set in OpenHowNet. We use the SememeWSD model, an unsupervised word sense disambiguation model based on OpenHowNet, to do word sense disambiguation and annotate the polysemous word with sense id. Then, we obtain top 10 synonyms of the word sense from OpenHowNet and calculate the average vector of synonyms as the vector of the word sense. In experiments, We evaluate the SWSDS model on semantic similarity calculation with Gensim's wmdistance method. It achieves improvement of accuracy. We also examine the SememeWSD model on different BERT models to find the more effective model.

preprint2022arXiv

CollComm: Enabling Efficient Collective Quantum Communication Based on EPR buffering

The noisy and lengthy nature of quantum communication hinders the development of distributed quantum computing. The inefficient design of existing compilers for distributed quantum computing worsens the situation. Previous compilation frameworks couple communication hardware with the implementation of expensive remote gates. However, we discover that the efficiency of quantum communication, especially collective communication, can be significantly boosted by decoupling communication resources from remote operations, that is, the communication hardware would be used only for preparing remote entanglement, and the computational hardware, the component used to store program information, would be used for conducting remote gates. Based on the observation, we develop a compiler framework to optimize the collective communication happening in distributed quantum programs. In this framework, we decouple the communication preparation process in communication hardware from the remote gates conducted in computational hardware by buffering EPR pairs generated by communication hardware in qubits of the computational hardware. Experimental results show that the proposed framework can halve the communication cost of various distributed quantum programs, compared to state-of-the-art compilers for distributed quantum computing.

preprint2022arXiv

Cross-media Scientific Research Achievements Query based on Ranking Learning

With the advent of the information age, the scale of data on the Internet is getting larger and larger, and it is full of text, images, videos, and other information. Different from social media data and news data, scientific research achievements information has the characteristics of many proper nouns and strong ambiguity. The traditional single-mode query method based on keywords can no longer meet the needs of scientific researchers and managers of the Ministry of Science and Technology. Scientific research project information and scientific research scholar information contain a large amount of valuable scientific research achievement information. Evaluating the output capability of scientific research projects and scientific research teams can effectively assist managers in decision-making. In view of the above background, this paper expounds on the research status from four aspects: characteristic learning of scientific research results, cross-media research results query, ranking learning of scientific research results, and cross-media scientific research achievement query system.

preprint2022arXiv

Dark matter admixed neutron star properties in the light of X-ray pulse profile observations

The distribution of the dark matter (DM) in DM-admixed-neutron stars (DANSs) is supposed to be either a dense dark core or an extended dark halo, which is subject to the DM fraction of DANS ($f_χ$) and the DM properties, such as the mass ($m_χ$) and the strength of the self-interaction ($y$). In this paper, we perform an in-depth analysis of the formation criterion for dark core/dark halo and point out that the relative distribution of these two components is essentially determined by the ratio of the central enthalpy of the DM component to that of the baryonic matter component inside DANSs. For the critical case where the radii of DM and baryonic matter are the same, we further derive an analytical formula to describe the dependence of $f^{\rm crit}_χ$ on $m_χ$ and $y$ for given DANS mass. The relative distribution of the two components in DANSs can lead to different observational effects. We here focus on the modification of the pulsar pulse profile due to the extra light-bending effect in the case of a dark-halo existence and conduct the first investigation of the dark-halo effects on the pulse profile. We find that the peak flux deviation is strongly dependent on the ratio of the halo mass to the radius of the DM component. Lastly, we perform Bayesian parameter estimation on the DM particle properties based on the recent X-ray observations of PSR J0030+0451 and PSR J0740+6620 by the Neutron Star Interior Composition Explorer.

preprint2022arXiv

Efficient Hierarchical State Vector Simulation of Quantum Circuits via Acyclic Graph Partitioning

Early but promising results in quantum computing have been enabled by the concurrent development of quantum algorithms, devices, and materials. Classical simulation of quantum programs has enabled the design and analysis of algorithms and implementation strategies targeting current and anticipated quantum device architectures. In this paper, we present a graph-based approach to achieve efficient quantum circuit simulation. Our approach involves partitioning the graph representation of a given quantum circuit into acyclic sub-graphs/circuits that exhibit better data locality. Simulation of each sub-circuit is organized hierarchically, with the iterative construction and simulation of smaller state vectors, improving overall performance. Also, this partitioning reduces the number of passes through data, improving the total computation time. We present three partitioning strategies and observe that acyclic graph partitioning typically results in the best time-to-solution. In contrast, other strategies reduce the partitioning time at the expense of potentially increased simulation times. Experimental evaluation demonstrates the effectiveness of our approach.

preprint2022arXiv

Efimov resonance position near a narrow Feshbach resonance in $^6$Li-$^{133}$Cs mixture

In the vicinity of a narrow Feshbach resonances Efimov features are expected to be characterized by the resonance's properties rather than the van der Waals length of the interatomic potential. Although this theoretical prediction is well-established by now, it still lacks experimental confirmation. Here, we apply our recently developed three-channel model [Yudkin and Khaykovich, Phys. Rev. A 103, 063303 (2021)] to the experimental result obtained in a mass-imbalanced $^6$Li-$^{133}$Cs mixture in the vicinity of the narrowest resonance explored to date [Johansen at. al. Nat. Phys. 13, 731 (2017)]. We confirm that the observed position of the Efimov resonance is dictated mainly by the resonance physics while the influence of the van der Waals tail of the interatomic potential is minor. We show that the resonance position is strongly influenced by the presence of another Feshbach resonance which significantly alters the effective background scattering length at the narrow resonance position.

preprint2022arXiv

FastMapSVM: Classifying Complex Objects Using the FastMap Algorithm and Support-Vector Machines

Neural Networks and related Deep Learning methods are currently at the leading edge of technologies used for classifying objects. However, they generally demand large amounts of time and data for model training; and their learned models can sometimes be difficult to interpret. In this paper, we advance FastMapSVM -- an interpretable Machine Learning framework for classifying complex objects -- as an advantageous alternative to Neural Networks for general classification tasks. FastMapSVM extends the applicability of Support-Vector Machines (SVMs) to domains with complex objects by combining the complementary strengths of FastMap and SVMs. FastMap is an efficient linear-time algorithm that maps complex objects to points in a Euclidean space while preserving pairwise domain-specific distances between them. We demonstrate the efficiency and effectiveness of FastMapSVM in the context of classifying seismograms. We show that its performance, in terms of precision, recall, and accuracy, is comparable to that of other state-of-the-art methods. However, compared to other methods, FastMapSVM uses significantly smaller amounts of time and data for model training. It also provides a perspicuous visualization of the objects and the classification boundaries between them. We expect FastMapSVM to be viable for classification tasks in many other real-world domains.

preprint2022arXiv

GAAF: Searching Activation Functions for Binary Neural Networks through Genetic Algorithm

Binary neural networks (BNNs) show promising utilization in cost and power-restricted domains such as edge devices and mobile systems. This is due to its significantly less computation and storage demand, but at the cost of degraded performance. To close the accuracy gap, in this paper we propose to add a complementary activation function (AF) ahead of the sign based binarization, and rely on the genetic algorithm (GA) to automatically search for the ideal AFs. These AFs can help extract extra information from the input data in the forward pass, while allowing improved gradient approximation in the backward pass. Fifteen novel AFs are identified through our GA-based search, while most of them show improved performance (up to 2.54% on ImageNet) when testing on different datasets and network models. Our method offers a novel approach for designing general and application-specific BNN architecture. Our code is available at http://github.com/flying-Yan/GAAF.

preprint2022arXiv

GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing

With the increasing popularity of robotics in industrial control and autonomous driving, deep reinforcement learning (DRL) raises the attention of various fields. However, DRL computation on the modern powerful GPU platform is still inefficient due to its heterogeneous workloads and interleaved execution paradigm. To this end, we propose GMI-DRL, a systematic design to accelerate multi-GPU DRL via GPU spatial multiplexing. We introduce a novel design of resource-adjustable GPU multiplexing instances (GMIs) to match the actual needs of DRL tasks, an adaptive GMI management strategy to simultaneously achieve high GPU utilization and computation throughput, and a highly efficient inter-GMI communication support to meet the demands of various DRL communication patterns. Comprehensive experiments reveal that GMI-DRL outperforms state-of-the-art NVIDIA Isaac Gym with NCCL (up to 2.81X) and Horovod (up to 2.34X) support in training throughput on the latest DGX-A100 platform. Our work provides an initial user experience with GPU spatial multiplexing in processing heterogeneous workloads with a mixture of computation and communication.

preprint2022arXiv

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture

Graph Neural Networks (GNNs) have drawn tremendous attention due to their unique capability to extend Machine Learning (ML) approaches to applications broadly-defined as having unstructured data, especially graphs. Compared with other Machine Learning (ML) modalities, the acceleration of Graph Neural Networks (GNNs) is more challenging due to the irregularity and heterogeneity derived from graph typologies. Existing efforts, however, have focused mainly on handling graphs' irregularity and have not studied their heterogeneity. To this end we propose H-GCN, a PL (Programmable Logic) and AIE (AI Engine) based hybrid accelerator that leverages the emerging heterogeneity of Xilinx Versal Adaptive Compute Acceleration Platforms (ACAPs) to achieve high-performance GNN inference. In particular, H-GCN partitions each graph into three subgraphs based on its inherent heterogeneity, and processes them using PL and AIE, respectively. To further improve performance, we explore the sparsity support of AIE and develop an efficient density-aware method to automatically map tiles of sparse matrix-matrix multiplication (SpMM) onto the systolic tensor array. Compared with state-of-the-art GCN accelerators, H-GCN achieves, on average, speedups of 1.1~2.3X.

preprint2022arXiv

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years. Compared with other deep learning modalities, high-performance hardware acceleration of GCNs is as critical but even more challenging. The hurdles arise from the poor data locality and redundant computation due to the large size, high sparsity, and irregular non-zero distribution of real-world graphs. In this paper we propose a novel hardware accelerator for GCN inference, called I-GCN, that significantly improves data locality and reduces unnecessary computation. The mechanism is a new online graph restructuring algorithm we refer to as islandization. The proposed algorithm finds clusters of nodes with strong internal but weak external connections. The islandization process yields two major benefits. First, by processing islands rather than individual nodes, there is better on-chip data reuse and fewer off-chip memory accesses. Second, there is less redundant computation as aggregation for common/shared neighbors in an island can be reused. The parallel search, identification, and leverage of graph islands are all handled purely in hardware at runtime working in an incremental pipeline. This is done without any preprocessing of the graph data or adjustment of the GCN model structure. Experimental results show that I-GCN can significantly reduce off-chip accesses and prune 38% of aggregation operations, leading to performance speedups over CPUs, GPUs, the prior art GCN accelerators of 5549x, 403x, and 5.7x on average, respectively.

preprint2022arXiv

Information-theoretic Online Memory Selection for Continual Learning

A challenging problem in task-free continual learning is the online selection of a representative replay memory from data streams. In this work, we investigate the online memory selection problem from an information-theoretic perspective. To gather the most information, we propose the \textit{surprise} and the \textit{learnability} criteria to pick informative points and to avoid outliers. We present a Bayesian model to compute the criteria efficiently by exploiting rank-one matrix structures. We demonstrate that these criteria encourage selecting informative points in a greedy algorithm for online memory selection. Furthermore, by identifying the importance of \textit{the timing to update the memory}, we introduce a stochastic information-theoretic reservoir sampler (InfoRS), which conducts sampling among selective points with high information. Compared to reservoir sampling, InfoRS demonstrates improved robustness against data imbalance. Finally, empirical performances over continual learning benchmarks manifest its efficiency and efficacy.

preprint2022arXiv

Interacting $ud$ and $uds$ quark matter at finite densities and quark stars

The stability and equation of state of quark matter are studied within both two-flavor and (2+1)-flavor Nambu-Jona-Lasinio (NJL) models including the vector interactions. With a free parameter $α$, the Lagrangian is constructed by two parts, the original NJL Lagrangian and the Fierz transformation of it, as $L=(1-α) L_{\rm{NJL}}+αL_{\rm{Fierz}}$. We find that there is a possibility for both $ud$ nonstrange and $uds$ strange matter being absolute stable, depending on the interplay of the confinement with quark vector interaction and the exchange interaction channels. The calculated quark star properties can reconcile with the recently measured masses and radii of PSR J0030+0451 and PSR J0740+6620, as well as the tidal deformability of GW170817. Furthermore, the more strongly-interacting quark matter in the nonstrange stars allows a stiffer equation of state and consequently a higher maximum mass ($\sim2.7\, M_{\odot}$) than the strange ones ($\sim2.1\, M_{\odot}$). The sound velocities in strange and nonstrange quark star matter are briefly discussed compared to those of neutron star matter.

preprint2022arXiv

Iterative Geometry-Aware Cross Guidance Network for Stereo Image Inpainting

Currently, single image inpainting has achieved promising results based on deep convolutional neural networks. However, inpainting on stereo images with missing regions has not been explored thoroughly, which is also a significant but different problem. One crucial requirement for stereo image inpainting is stereo consistency. To achieve it, we propose an Iterative Geometry-Aware Cross Guidance Network (IGGNet). The IGGNet contains two key ingredients, i.e., a Geometry-Aware Attention (GAA) module and an Iterative Cross Guidance (ICG) strategy. The GAA module relies on the epipolar geometry cues and learns the geometry-aware guidance from one view to another, which is beneficial to make the corresponding regions in two views consistent. However, learning guidance from co-existing missing regions is challenging. To address this issue, the ICG strategy is proposed, which can alternately narrow down the missing regions of the two views in an iterative manner. Experimental results demonstrate that our proposed network outperforms the latest stereo image inpainting model and state-of-the-art single image inpainting models.

preprint2022arXiv

Knowledge Graph and Accurate Portrait Construction of Scientific and Technological Academic Conferences

In recent years, with the continuous progress of science and technology, the number of scientific research achievements is increasing day by day, as the exchange platform and medium of scientific research achievements, the scientific and technological academic conferences have become more and more abundant. The convening of scientific and technological academic conferences will bring large number of academic papers, researchers, research institutions and other data, and the massive data brings difficulties for researchers to obtain valuable information. Therefore, it is of great significance to use deep learning technology to mine the core information in the data of scientific and technological academic conferences, and to realize a knowledge graph and accurate portrait system of scientific and technological academic conferences, so that researchers can obtain scientific research information faster.

preprint2022arXiv

Learning and Fast Adaptation for Grid Emergency Control via Deep Meta Reinforcement Learning

As power systems are undergoing a significant transformation with more uncertainties, less inertia and closer to operation limits, there is increasing risk of large outages. Thus, there is an imperative need to enhance grid emergency control to maintain system reliability and security. Towards this end, great progress has been made in developing deep reinforcement learning (DRL) based grid control solutions in recent years. However, existing DRL-based solutions have two main limitations: 1) they cannot handle well with a wide range of grid operation conditions, system parameters, and contingencies; 2) they generally lack the ability to fast adapt to new grid operation conditions, system parameters, and contingencies, limiting their applicability for real-world applications. In this paper, we mitigate these limitations by developing a novel deep meta reinforcement learning (DMRL) algorithm. The DMRL combines the meta strategy optimization together with DRL, and trains policies modulated by a latent space that can quickly adapt to new scenarios. We test the developed DMRL algorithm on the IEEE 300-bus system. We demonstrate fast adaptation of the meta-trained DRL polices with latent variables to new operating conditions and scenarios using the proposed method and achieve superior performance compared to the state-of-the-art DRL and model predictive control (MPC) methods.

preprint2022arXiv

Mining and searching association relation of scientific papers based on deep learning

There is a complex correlation among the data of scientific papers. The phenomenon reveals the data characteristics, laws, and correlations contained in the data of scientific and technological papers in specific fields, which can realize the analysis of scientific and technological big data and help to design applications to serve scientific researchers. Therefore, the research on mining and searching the association relationship of scientific papers based on deep learning has far-reaching practical significance.

preprint2022arXiv

Neural Mean Discrepancy for Efficient Out-of-Distribution Detection

Various approaches have been proposed for out-of-distribution (OOD) detection by augmenting models, input examples, training sets, and optimization objectives. Deviating from existing work, we have a simple hypothesis that standard off-the-shelf models may already contain sufficient information about the training set distribution which can be leveraged for reliable OOD detection. Our empirical study on validating this hypothesis, which measures the model activation's mean for OOD and in-distribution (ID) mini-batches, surprisingly finds that activation means of OOD mini-batches consistently deviate more from those of the training data. In addition, training data's activation means can be computed offline efficiently or retrieved from batch normalization layers as a 'free lunch'. Based upon this observation, we propose a novel metric called Neural Mean Discrepancy (NMD), which compares neural means of the input examples and training data. Leveraging the simplicity of NMD, we propose an efficient OOD detector that computes neural means by a standard forward pass followed by a lightweight classifier. Extensive experiments show that NMD outperforms state-of-the-art OOD approaches across multiple datasets and model architectures in terms of both detection accuracy and computational cost.

preprint2022arXiv

On the moment of inertia of PSR J0737-3039 A from LIGO/Virgo and NICER

We perform a Bayesian analysis of neutrons star moment of inertia by utilizing the available gravitational-wave data from LIGO/Virgo (GW170817 and GW190425) and mass-radius measurements from the Neutron Star Interior Composition Explorer (PSR J0030+0415 and PSR J0740+6620), incorporating the possible phase transition in the pulsar inner core. We find that the moment of inertia of pulsar A in the double pulsar binary J0737-3039 is $\sim1.30\times10^{45}\,{\rm g\,cm^2}$, which only slightly depends on the employed hadronic equation of states. We also demonstrate how a moment of inertia measurement would improve our knowledge of the equation of state and the mass-radius relation for neutron stars and discuss whether a quark deconfinement phase transition is supported by the available data and forthcoming data that could be consistent with this hypothesis. We find that if pulsar A is a quark star, that its moment of inertia is a large value of $\sim1.55\times10^{45}\,{\rm g\,cm^2}$ suggesting the possibility of distinguishing it from (hybrid-)neutron stars with measurements of PSR J0737-3039A moment of inertia. We finally demonstrate the moment-of-inertia-compactness universal relations and provide analytical fits for both (hybrid-)neutron star and quark star results based on our analysis.

preprint2022arXiv

Prediction of Depression Severity Based on the Prosodic and Semantic Features with Bidirectional LSTM and Time Distributed CNN

Depression is increasingly impacting individuals both physically and psychologically worldwide. It has become a global major public health problem and attracts attention from various research fields. Traditionally, the diagnosis of depression is formulated through semi-structured interviews and supplementary questionnaires, which makes the diagnosis heavily relying on physicians experience and is subject to bias. Mental health monitoring and cloud-based remote diagnosis can be implemented through an automated depression diagnosis system. In this article, we propose an attention-based multimodality speech and text representation for depression prediction. Our model is trained to estimate the depression severity of participants using the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset. For the audio modality, we use the collaborative voice analysis repository (COVAREP) features provided by the dataset and employ a Bidirectional Long Short-Term Memory Network (Bi-LSTM) followed by a Time-distributed Convolutional Neural Network (T-CNN). For the text modality, we use global vectors for word representation (GloVe) to perform word embeddings and the embeddings are fed into the Bi-LSTM network. Results show that both audio and text models perform well on the depression severity estimation task, with best sequence level F1 score of 0.9870 and patient-level F1 score of 0.9074 for the audio model over five classes (healthy, mild, moderate, moderately severe, and severe), as well as sequence level F1 score of 0.9709 and patient-level F1 score of 0.9245 for the text model over five classes. Results are similar for the multimodality fused model, with the highest F1 score of 0.9580 on the patient-level depression detection task over five classes. Experiments show statistically significant improvements over previous works.

preprint2022arXiv

Probabilities of Causation with Nonbinary Treatment and Effect

This paper deals with the problem of estimating the probabilities of causation when treatment and effect are not binary. Tian and Pearl derived sharp bounds for the probability of necessity and sufficiency (PNS), the probability of sufficiency (PS), and the probability of necessity (PN) using experimental and observational data. In this paper, we provide theoretical bounds for all types of probabilities of causation to multivalued treatments and effects. We further discuss examples where our bounds guide practical decisions and use simulation studies to evaluate how informative the bounds are for various combinations of data.

preprint2022arXiv

QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation

The rapid development of quantum computing (QC) in the NISQ era urgently demands a low-level benchmark suite and insightful evaluation metrics for characterizing the properties of prototype NISQ devices, the efficiency of QC programming compilers, schedulers and assemblers, and the capability of quantum system simulators in a classical computer. In this work, we fill this gap by proposing a low-level, easy-to-use benchmark suite called QASMBench based on the OpenQASM assembly representation. It consolidates commonly used quantum routines and kernels from a variety of domains including chemistry, simulation, linear algebra, searching, optimization, arithmetic, machine learning, fault tolerance, cryptography, etc., trading-off between generality and usability. To analyze these kernels in terms of NISQ device execution, in addition to circuit width and depth, we propose four circuit metrics including gate density, retention lifespan, measurement density, and entanglement variance, to extract more insights about the execution efficiency, the susceptibility to NISQ error, and the potential gain from machine-specific optimizations. Applications in QASMBench can be launched and verified on several NISQ platforms, including IBM-Q, Rigetti, IonQ and Quantinuum. For evaluation, we measure the execution fidelity of a subset of QASMBench applications on 12 IBM-Q machines through density matrix state tomography, which comprises 25K circuit evaluations. We also compare the fidelity of executions among the IBM-Q machines, the IonQ QPU and the Rigetti Aspen M-1 system. QASMBench is released at: http://github.com/pnnl/QASMBench.

preprint2022arXiv

Quantum interference visibility spectroscopy in two-color photoemission from tungsten needle tips

When two-color femtosecond laser pulses interact with matter, electrons can be emitted through various multiphoton excitation pathways. Quantum interference between these pathways gives rise to a strong oscillation of the photoemitted electron current, experimentally characterized by its visibility. In this work, we demonstrate two-color visibility spectroscopy of multi-photon photoemission from a solid-state nanoemitter. We investigate the quantum pathway interference visibility over an almost octave-spanning wavelength range of the fundamental femtosecond laser pulses and their second-harmonic. The photoemission shows a high visibility of 90% +/- 5%, with a remarkably constant distribution. Furthermore, by varying the relative intensity ratio of the two colors, we find that we can vary the visibility between 0 and close to 100%. A simple but highly insightful theoretical model allows us to explain all observations, with excellent quantitative agreements. We expect this work to be universal to all kinds of photo-driven quantum interference, including quantum control in physics, chemistry and quantum engineering.

preprint2022arXiv

QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State Fidelity

In the past decade, remarkable progress has been achieved in deep learning related systems and applications. In the post Moore's Law era, however, the limit of semiconductor fabrication technology along with the increasing data size have slowed down the development of learning algorithms. In parallel, the fast development of quantum computing has pushed it to the new ear. Google illustrates quantum supremacy by completing a specific task (random sampling problem), in 200 seconds, which is impracticable for the largest classical computers. Due to the limitless potential, quantum based learning is an area of interest, in hopes that certain systems might offer a quantum speedup. In this work, we propose a novel architecture QuClassi, a quantum neural network for both binary and multi-class classification. Powered by a quantum differentiation function along with a hybrid quantum-classic design, QuClassi encodes the data with a reduced number of qubits and generates the quantum circuit, pushing it to the quantum platform for the best states, iteratively. We conduct intensive experiments on both the simulator and IBM-Q quantum platform. The evaluation results demonstrate that QuClassi is able to outperform the state-of-the-art quantum-based solutions, Tensorflow-Quantum and QuantumFlow by up to 53.75% and 203.00% for binary and multi-class classifications. When comparing to traditional deep neural networks, QuClassi achieves a comparable performance with 97.37% fewer parameters.

preprint2022arXiv

Research on accurate stereo portrait generation algorithm of scientific research team

In order to smoothly promote the establishment of scientific research projects, accurately identify the excellent scientific research team, and intuitively and comprehensively describe the scientific research team, it is of great significance for the scientific research management department to comprehensively understand and objectively evaluate the scientific research team. At present, the research work on the construction of accurate three-dimensional portrait of scientific research team is relatively less. In view of the practical demand of scientific research management department, this paper proposes an accurate stereo portrait generation algorithm of scientific research team. The algorithm includes three modules: research team identification, research topic extraction and research team portrait generation. Firstly, the leader of the scientific research team is identified based on the iterative middle centrality ranking method, and the members of the scientific research team are identified through the 2-faction and snowball methods, so as to realize the identification of the scientific research team. Then, considering the statistical information of words and the co-occurrence features of words in the research team, the research topics of the research team are extracted to improve the accuracy of research topic extraction. Finally, the research team portrait generation module generates the accurate three-dimensional portrait of the research team through the generation of the research team profile, the construction of the research cooperation relationship, and the construction of the research team topic cloud. The research team is identified on the data set of scientific research achievements, and the accurate three-dimensional portraits of the research team are generated and visualized. Experiments verify the effectiveness of the proposed algorithm.

preprint2022arXiv

Research on Intellectual Property Resource Profile and Evolution Law

In the era of big data, intellectual property-oriented scientific and technological resources show the trend of large data scale, high information density and low value density, which brings severe challenges to the effective use of intellectual property resources, and the demand for mining hidden information in intellectual property is increasing. This makes intellectual property-oriented science and technology resource portraits and analysis of evolution become the current research hotspot. This paper sorts out the construction method of intellectual property resource intellectual portrait and its pre-work property entity extraction and entity completion from the aspects of algorithm classification and general process, and directions for improvement of future methods.

preprint2022arXiv

Retrieval of Scientific and Technological Resources for Experts and Scholars

Institutions of higher learning, research institutes and other scientific research units have abundant scientific and technological resources of experts and scholars, and these talents with great scientific and technological innovation ability are an important force to promote industrial upgrading. The scientific and technological resources of experts and scholars are mainly composed of basic attributes and scientific research achievements. The basic attributes include information such as research interests, institutions, and educational work experience. However, due to information asymmetry and other reasons, the scientific and technological resources of experts and scholars cannot be connected with the society in a timely manner, and social needs cannot be accurately matched with experts and scholars. Therefore, it is very necessary to build an expert and scholar information database and provide relevant expert and scholar retrieval services. This paper sorts out the related research work in this field from four aspects: text relation extraction, text knowledge representation learning, text vector retrieval and visualization system.

preprint2022arXiv

Scientific and Technological Text Knowledge Extraction Method of based on Word Mixing and GRU

The knowledge extraction task is to extract triple relations (head entity-relation-tail entity) from unstructured text data. The existing knowledge extraction methods are divided into "pipeline" method and joint extraction method. The "pipeline" method is to separate named entity recognition and entity relationship extraction and use their own modules to extract them. Although this method has better flexibility, the training speed is slow. The learning model of joint extraction is an end-to-end model implemented by neural network to realize entity recognition and relationship extraction at the same time, which can well preserve the association between entities and relationships, and convert the joint extraction of entities and relationships into a sequence annotation problem. In this paper, we propose a knowledge extraction method for scientific and technological resources based on word mixture and GRU, combined with word mixture vector mapping method and self-attention mechanism, to effectively improve the effect of text relationship extraction for Chinese scientific and technological resources.

preprint2022arXiv

Searching Similarity Measure for Binarized Neural Networks

Being a promising model to be deployed in resource-limited devices, Binarized Neural Networks (BNNs) have drawn extensive attention from both academic and industry. However, comparing to the full-precision deep neural networks (DNNs), BNNs suffer from non-trivial accuracy degradation, limiting its applicability in various domains. This is partially because existing network components, such as the similarity measure, are specially designed for DNNs, and might be sub-optimal for BNNs. In this work, we focus on the key component of BNNs -- the similarity measure, which quantifies the distance between input feature maps and filters, and propose an automatic searching method, based on genetic algorithm, for BNN-tailored similarity measure. Evaluation results on Cifar10 and Cifar100 using ResNet, NIN and VGG show that most of the identified similarty measure can achieve considerable accuracy improvement (up to 3.39%) over the commonly-used cross-correlation approach.

preprint2022arXiv

Semantic Similarity Computing for Scientific Academic Conferences fused with domain features

Aiming at the problem that the current general-purpose semantic text similarity calculation methods are difficult to use the semantic information of scientific academic conference data, a semantic similarity calculation algorithm for scientific academic conferences by fusion with domain features is proposed. First, the domain feature information of the conference is obtained through entity recognition and keyword extraction, and it is input into the BERT network as a feature and the conference information. The structure of the Siamese network is used to solve the anisotropy problem of BERT. The output of the network is pooled and normalized, and finally the cosine similarity is used to calculate the similarity between the two sessions. Experimental results show that the SBFD algorithm has achieved good results on different data sets, and the Spearman correlation coefficient has a certain improvement compared with the comparison algorithm.

preprint2022arXiv

Semi-Supervised Vision Transformers

We study the training of Vision Transformers for semi-supervised image classification. Transformers have recently demonstrated impressive performance on a multitude of supervised learning tasks. Surprisingly, we show Vision Transformers perform significantly worse than Convolutional Neural Networks when only a small set of labeled data is available. Inspired by this observation, we introduce a joint semi-supervised learning framework, Semiformer, which contains a transformer stream, a convolutional stream and a carefully designed fusion module for knowledge sharing between these streams. The convolutional stream is trained on limited labeled data and further used to generate pseudo labels to supervise the training of the transformer stream on unlabeled data. Extensive experiments on ImageNet demonstrate that Semiformer achieves 75.5% top-1 accuracy, outperforming the state-of-the-art by a clear margin. In addition, we show, among other things, Semiformer is a general framework that is compatible with most modern transformer and convolutional neural architectures. Code is available at https://github.com/wengzejia1/Semiformer.

preprint2022arXiv

Sentiment Analysis of Online Travel Reviews Based on Capsule Network and Sentiment Lexicon

With the development of online travel services, it has great application prospects to timely mine users' evaluation emotions for travel services and use them as indicators to guide the improvement of online travel service quality. In this paper, we study the text sentiment classification of online travel reviews based on social media online comments and propose the SCCL model based on capsule network and sentiment lexicon. SCCL model aims at the lack of consideration of local features and emotional semantic features of the text in the language model that can efficiently extract text context features like BERT and GRU. Then make the following improvements to their shortcomings. On the one hand, based on BERT-BiGRU, the capsule network is introduced to extract local features while retaining good context features. On the other hand, the sentiment lexicon is introduced to extract the emotional sequence of the text to provide richer emotional semantic features for the model. To enhance the universality of the sentiment lexicon, the improved SO-PMI algorithm based on TF-IDF is used to expand the lexicon, so that the lexicon can also perform well in the field of online travel reviews.

preprint2022arXiv

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations

Pre-training has become a standard paradigm in many computer vision tasks. However, most of the methods are generally designed on the RGB image domain. Due to the discrepancy between the two-dimensional image plane and the three-dimensional space, such pre-trained models fail to perceive spatial information and serve as sub-optimal solutions for 3D-related tasks. To bridge this gap, we aim to learn a spatial-aware visual representation that can describe the three-dimensional space and is more suitable and effective for these tasks. To leverage point clouds, which are much more superior in providing spatial information compared to images, we propose a simple yet effective 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU. Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module to learn a spatial-aware representation from point clouds and an inter-modal feature interaction module to transfer the capability of perceiving spatial information from the point cloud encoder to the image encoder, respectively. Positive pairs for contrastive losses are established by the matching algorithm and the projection matrix. The whole framework is trained in an unsupervised end-to-end fashion. To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets, containing paired camera images and LIDAR point clouds. Codes and models are available at https://github.com/zhyever/SimIPU.

preprint2022arXiv

Social Network Community Detection Based on Textual Content Similarity and Sentimental Tendency

Shared travel has gradually become one of the hot topics discussed on social networking platforms such as Micro Blog. In a timely manner, deeper network community detection on the evaluation content of shared travel in social networks can effectively conduct research and analysis on the public opinion orientation related to shared travel, which has great application prospects. The existing community detection algorithms generally measure the similarity of nodes in the network from the perspective of spatial distance. This paper proposes a Community detection algorithm based on Textual content Similarity and sentimental Tendency (CTST), considering the network structure and node attributes at the same time. The content similarity and sentimental tendency of network community users are taken as node attributes, and on this basis, an undirected weighted network is constructed for community detection. This paper conducts experiments with actual data and analyzes the experimental results. It is found that the modularity of the community detection results is high and the effect is good.

preprint2022arXiv

SphereFed: Hyperspherical Federated Learning

Federated Learning aims at training a global model from multiple decentralized devices (i.e. clients) without exchanging their private local data. A key challenge is the handling of non-i.i.d. (independent identically distributed) data across multiple clients that may induce disparities of their local features. We introduce the Hyperspherical Federated Learning (SphereFed) framework to address the non-i.i.d. issue by constraining learned representations of data points to be on a unit hypersphere shared by clients. Specifically, all clients learn their local representations by minimizing the loss with respect to a fixed classifier whose weights span the unit hypersphere. After federated training in improving the global model, this classifier is further calibrated with a closed-form solution by minimizing a mean squared loss. We show that the calibration solution can be computed efficiently and distributedly without direct access of local data. Extensive experiments indicate that our SphereFed approach is able to improve the accuracy of multiple existing federated learning algorithms by a considerable margin (up to 6% on challenging datasets) with enhanced computation and communication efficiency across datasets and model architectures.

preprint2022arXiv

Topological EEG Nonlinear Dynamics Analysis for Emotion Recognition

Emotional recognition through exploring the electroencephalography (EEG) characteristics has been widely performed in recent studies. Nonlinear analysis and feature extraction methods for understanding the complex dynamical phenomena are associated with the EEG patterns of different emotions. The phase space reconstruction is a typical nonlinear technique to reveal the dynamics of the brain neural system. Recently, the topological data analysis (TDA) scheme has been used to explore the properties of space, which provides a powerful tool to think over the phase space. In this work, we proposed a topological EEG nonlinear dynamics analysis approach using the phase space reconstruction (PSR) technique to convert EEG time series into phase space, and the persistent homology tool explores the topological properties of the phase space. We perform the topological analysis of EEG signals in different rhythm bands to build emotion feature vectors, which shows high distinguishing ability. We evaluate the approach with two well-known benchmark datasets, the DEAP and DREAMER datasets. The recognition results achieved accuracies of 99.37% and 99.35% in arousal and valence classification tasks with DEAP, and 99.96%, 99.93%, and 99.95% in arousal, valence, and dominance classifications tasks with DREAMER, respectively. The performances are supposed to be outperformed current state-of-art approaches in DREAMER (improved by 1% to 10% depends on temporal length), while comparable to other related works evaluated in DEAP. The proposed work is the first investigation in the emotion recognition oriented EEG topological feature analysis, which brought a novel insight into the brain neural system nonlinear dynamics analysis and feature extraction.

preprint2022arXiv

Unified neutron star EOSs and neutron star structures in RMF models

In the framework of Thomas-Fermi approximation, we study systematically the EOSs and microscopic structures of neutron star matter in a vast density range with $n_\mathrm{b}\approx 10^{-10}$-2 $\mathrm{fm}^{-3}$, where various covariant density functionals are adopted, i.e., those with nonlinear self couplings (NL3, PK1, TM1, GM1, MTVTC) and density-dependent couplings (DD-LZ1, DDME-X, PKDD, DD-ME2, DD2, TW99). It is found that the EOSs generally coincide with each other at $n_\mathrm{b}\lesssim 10^{-4}$ fm${}^{-3}$ and 0.1 fm${}^{-3}\lesssim n_\mathrm{b} \lesssim 0.3$ fm${}^{-3}$, while in other density regions they are sensitive to the effective interactions between nucleons. By adopting functionals with larger slope of symmetry energy $L$, the curvature parameter $K_\mathrm{sym}$ and neutron drip density generally increase, while the droplet size, proton number of nucleus, core-crust transition density, and onset density of non-spherical nuclei decrease. All functionals predict neutron stars with maximum masses exceeding the two-solar-mass limit, while those of DD2, DD-LZ1, DD-ME2, and DDME-X predict optimum neutron star radii according to the observational constraints. Nevertheless, the corresponding skewness coefficients $J$ are much lager than expected, while only the functionals MTVTC and TW99 meet the start-of-art constraints on $J$. More accurate measurements on the radius of PSR J0740+6620 and the maximum mass of neutron stars are thus essential to identify the functional that satisfies all constraints from nuclear physics and astrophysical observations. Approximate linear correlations between neutron stars' radii at $M=1.4 M_{\odot}$ and $2 M_{\odot}$, the slope $L$ and curvature parameter $K_\mathrm{sym}$ of symmetry energy are observed as well, which is mainly attributed to the curvature-slope correlations in the functionals adopted here.

preprint2022arXiv

Unified nuclear matter EOSs constrained by the in-medium balance in density-dependent covariant density functionals

Considering the effects of charge screening, we propose a new numerical recipe within the framework of Thomas-Fermi approximation, where the properties of nuclear matter throughout a vast density range can be obtained self-consistently. Assuming spherical and cylindrical approximations for the Wigner-Seitz cell, typical nuclear matter structures (droplet, rod, slab, tube, bubble, and uniform) are observed. We then investigate the EOSs and microscopic structures of nuclear matter with both fixed proton fractions and $β$-equilibration, where two covariant density functionals DD-LZ1 and DD-ME2 are adopted. Despite the smaller slope $L$ of symmetry energy obtained with the functional DD-LZ1, the curvature parameter $K_\mathrm{sym}$ is much larger than that of DD-ME2, which is attributed to the peculiar density-dependent behavior of meson-nucleon couplings guided by the restoration of pseudo-spin symmetry around the Fermi levels in finite nuclei. Consequently, different mass-radius relations of neutron stars are predicted by the two functionals. Different microscopic structures of nonuniform nuclear matter are obtained as well, which are expected to affect various physical processes in neutron star properties and evolutions.

preprint2022arXiv

Unit Selection with Nonbinary Treatment and Effect

The unit selection problem aims to identify a set of individuals who are most likely to exhibit a desired mode of behavior, for example, selecting individuals who would respond one way if encouraged and a different way if not encouraged. Using a combination of experimental and observational data, Li and Pearl derived tight bounds on the "benefit function", which is the payoff/cost associated with selecting an individual with given characteristics. This paper extends the benefit function to the general form such that the treatment and effect are not restricted to binary. We propose an algorithm to test the identifiability of the nonbinary benefit function and an algorithm to compute the bounds of the nonbinary benefit function using experimental and observational data.

preprint2022arXiv

Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training

Monocular 3D object detection (Mono3D) has achieved unprecedented success with the advent of deep learning techniques and emerging large-scale autonomous driving datasets. However, drastic performance degradation remains an unwell-studied challenge for practical cross-domain deployment as the lack of labels on the target domain. In this paper, we first comprehensively investigate the significant underlying factor of the domain gap in Mono3D, where the critical observation is a depth-shift issue caused by the geometric misalignment of domains. Then, we propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D. To mitigate the depth-shift, we introduce the geometry-aligned multi-scale training strategy to disentangle the camera parameters and guarantee the geometry consistency of domains. Based on this, we develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain. Benefiting from the end-to-end framework that provides richer information of the pseudo labels, we propose the quality-aware supervision strategy to take instance-level pseudo confidences into account and improve the effectiveness of the target-domain training process. Moreover, the positive focusing training strategy and dynamic threshold are proposed to handle tremendous FN and FP pseudo samples. STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset. To the best of our knowledge, this is the first study to explore effective UDA methods for Mono3D.

preprint2021arXiv

Cramér-Rao Bound Optimization for Joint Radar-Communication Design

In this paper, we propose multi-input multi-output (MIMO) beamforming designs towards joint radar sensing and multi-user communications. We employ the Cramér-Rao bound (CRB) as a performance metric of target estimation, under both point and extended target scenarios. We then propose minimizing the CRB of radar sensing while guaranteeing a pre-defined level of signal-to-interference-plus-noise ratio (SINR) for each communication user. For the single-user scenario, we derive a closed form for the optimal solution for both cases of point and extended targets. For the multi-user scenario, we show that both problems can be relaxed into semidefinite programming by using the semidefinite relaxation approach, and prove that the global optimum can always be obtained. Finally, we demonstrate numerically that the globally optimal solutions are reachable via the proposed methods, which provide significant gains in target estimation performance over state-of-the-art benchmarks.

preprint2021arXiv

Growth and Strain Relaxation Mechanisms of InAs/InP/GaAsSb Core-Dual-Shell Nanowires

The combination of core/shell geometry and band gap engineering in nanowire heterostructures can be employed to realize systems with novel transport and optical properties. Here, we report on the growth of InAs/InP/GaAsSb core-dual-shell nanowires by catalyst-free chemical beam epitaxy on Si(111) substrates. Detailed morphological, structural, and compositional analyses of the nanowires as a function of growth parameters were carried out by scanning and transmission electron microscopy and by energy-dispersive X-ray spectroscopy. Furthermore, by combining the scanning transmission electron microscopy-Moire technique with geometric phase analysis, we studied the residual strain and the relaxation mechanisms in this system. We found that InP shell facets are well-developed along all the crystallographic directions only when the nominal thickness is above 1 nm, suggesting an island-growth mode. Moreover, the crystallographic analysis indicates that both InP and GaAsSb shells grow almost coherently to the InAs core along the 112 direction and elastically compressed along the 110 direction. For InP shell thickness above 8 nm, some dislocations and roughening occur at the interfaces. This study provides useful general guidelines for the fabrication of high-quality devices based on these core-dual-shell nanowires.

preprint2021arXiv

Hermes: Decentralized Dynamic Spectrum Access System for Massive Devices Deployment in 5G

With the incoming 5G network, the ubiquitous Internet of Things (IoT) devices can benefit our daily life, such as smart cameras, drones, etc. With the introduction of the millimeter-wave band and the thriving number of IoT devices, it is critical to design new dynamic spectrum access (DSA) system to coordinate the spectrum allocation across massive devices in 5G. In this paper, we present Hermes, the first decentralized DSA system for massive devices deployment. Specifically, we propose an efficient multi-agent reinforcement learning algorithm and introduce a novel shuffle mechanism, addressing the drawbacks of collision and fairness in existing decentralized systems. We implement Hermes in 5G network via simulations. Extensive evaluations show that Hermes significantly reduces collisions and improves fairness compared to the state-of-the-art decentralized methods. Furthermore, Hermes is able to adapt the environmental changes within 0.5 seconds, showing its deployment practicability in dynamic environment of 5G.

preprint2021arXiv

High-resolution ARPES endstation for in-situ electronic structure investigations at SSRF

Angle-resolved photoemission spectroscopy (ARPES) is one of the most powerful experimental techniques in condensed matter physics. Synchrotron ARPES, which uses photons with high flux and continuously tunable energy, has become particularly important. However, an excellent synchrotron ARPES system must have features such as a small beam spot, super-high energy resolution, and a user-friendly operation interface. A synchrotron beamline and an endstation (BL03U) were designed and constructed at the Shanghai Synchrotron Radiation Facility. The beam spot size at the sample position is 7.5 (V) $μ$m $\times$ 67 (H) $μ$m, and the fundamental photon range is 7-165 eV; the ARPES system enables photoemission with an energy resolution of 2.67 meV@21.2 eV. In addition, the ARPES system of this endstation is equipped with a six-axis cryogenic sample manipulator (the lowest temperature is 7 K) and is integrated with an oxide molecular beam epitaxy system and a scanning tunneling microscope, which can provide an advanced platform for in-situ characterization of the fine electronic structure of condensed matter.

preprint2021arXiv

On Provable Backdoor Defense in Collaborative Learning

As collaborative learning allows joint training of a model using multiple sources of data, the security problem has been a central concern. Malicious users can upload poisoned data to prevent the model's convergence or inject hidden backdoors. The so-called backdoor attacks are especially difficult to detect since the model behaves normally on standard test data but gives wrong outputs when triggered by certain backdoor keys. Although Byzantine-tolerant training algorithms provide convergence guarantee, provable defense against backdoor attacks remains largely unsolved. Methods based on randomized smoothing can only correct a small number of corrupted pixels or labels; methods based on subset aggregation cause a severe drop in classification accuracy due to low data utilization. We propose a novel framework that generalizes existing subset aggregation methods. The framework shows that the subset selection process, a deciding factor for subset aggregation methods, can be viewed as a code design problem. We derive the theoretical bound of data utilization ratio and provide optimal code construction. Experiments on non-IID versions of MNIST and CIFAR-10 show that our method with optimal codes significantly outperforms baselines using non-overlapping partition and random selection. Additionally, integration with existing coding theory results shows that special codes can track the location of the attackers. Such capability provides new countermeasures to backdoor attacks.

preprint2021arXiv

Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint

Neural networks have achieved remarkable success in many cognitive tasks. However, when they are trained sequentially on multiple tasks without access to old data, their performance on early tasks tend to drop significantly. This problem is often referred to as catastrophic forgetting, a key challenge in continual learning of neural networks. The regularization-based approach is one of the primary classes of methods to alleviate catastrophic forgetting. In this paper, we provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task. This viewpoint leads to a unified framework that can be instantiated to derive many existing algorithms such as Elastic Weight Consolidation and Kronecker factored Laplace approximation. Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning. Our theoretical results indicate the importance of accurate approximation of the Hessian matrix. The experimental results on several benchmarks provide empirical validation of our theoretical findings.

preprint2021arXiv

PredCoin: Defense against Query-based Hard-label Attack

Many adversarial attacks and defenses have recently been proposed for Deep Neural Networks (DNNs). While most of them are in the white-box setting, which is impractical, a new class of query-based hard-label (QBHL) black-box attacks pose a significant threat to real-world applications (e.g., Google Cloud, Tencent API). Till now, there has been no generalizable and practical approach proposed to defend against such attacks. This paper proposes and evaluates PredCoin, a practical and generalizable method for providing robustness against QBHL attacks. PredCoin poisons the gradient estimation step, an essential component of most QBHL attacks. PredCoin successfully identifies gradient estimation queries crafted by an attacker and introduces uncertainty to the output. Extensive experiments show that PredCoin successfully defends against four state-of-the-art QBHL attacks across various settings and tasks while preserving the target model's overall accuracy. PredCoin is also shown to be robust and effective against several defense-aware attacks, which may have full knowledge regarding the internal mechanisms of PredCoin.

preprint2020arXiv

A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs

Tensor computations present significant performance challenges that impact a wide spectrum of applications ranging from machine learning, healthcare analytics, social network analysis, data mining to quantum chemistry and signal processing. Efforts to improve the performance of tensor computations include exploring data layout, execution scheduling, and parallelism in common tensor kernels. This work presents a benchmark suite for arbitrary-order sparse tensor kernels using state-of-the-art tensor formats: coordinate (COO) and hierarchical coordinate (HiCOO) on CPUs and GPUs. It presents a set of reference tensor kernel implementations that are compatible with real-world tensors and power law tensors extended from synthetic graph generation techniques. We also propose Roofline performance models for these kernels to provide insights of computer platforms from sparse tensor view.

preprint2020arXiv

AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing

Deep learning systems have been successfully applied to Euclidean data such as images, video, and audio. In many applications, however, information and their relationships are better expressed with graphs. Graph Convolutional Networks (GCNs) appear to be a promising approach to efficiently learn from graph data structures, having shown advantages in many critical applications. As with other deep learning modalities, hardware acceleration is critical. The challenge is that real-world graphs are often extremely large and unbalanced; this poses significant performance demands and design challenges. In this paper, we propose Autotuning-Workload-Balancing GCN (AWB-GCN) to accelerate GCN inference. To address the issue of workload imbalance in processing real-world graphs, three hardware-based autotuning techniques are proposed: dynamic distribution smoothing, remote switching, and row remapping. In particular, AWB-GCN continuously monitors the sparse graph pattern, dynamically adjusts the workload distribution among a large number of processing elements (up to 4K PEs), and, after converging, reuses the ideal configuration. Evaluation is performed using an Intel D5005 FPGA with five commonly-used datasets. Results show that 4K-PE AWB-GCN can significantly elevate PE utilization by 7.7x on average and demonstrate considerable performance speedups over CPUs (3255x), GPUs (80.3x), and a prior GCN accelerator (5.1x).

preprint2020arXiv

Benchmarking Machine Learning Techniques with Di-Higgs Production at the LHC

Many domains of high energy physics analysis are starting to explore machine learning techniques. Powerful methods can be used to identify and measure rare processes from previously insurmountable backgrounds. One of the most profound Standard Model signatures still to be discovered at the LHC is the pair production of Higgs bosons through the Higgs self-coupling. The small cross section of this process makes detection very difficult even for the decay channel with the largest branching fraction ($hh\rightarrow b\bar{b}b\bar{b}$). This paper benchmarks a variety of approaches (boosted decision trees, various neural network architectures, semi-supervised algorithms) against one another to catalog a few of the various techniques available to high energy physicists as the era of the HL-LHC approaches.

preprint2020arXiv

Comprehensive analysis of the tidal effect in gravitational waves and implication for cosmology

Detection of gravitational waves (GWs) produced by coalescence of compact binaries provides a novel way to measure the luminosity distance of GW events. Combining their redshift, they can act as standard sirens to constrain cosmological parameters. For various GW detector networks in 2nd-generation (2G), 2.5G and 3G, we comprehensively analyze the method to constrain the equation-of-state (EOS) of binary neutron-stars (BNSs) and extract their redshifts through the imprints of tidal effects in GW waveforms. We find for these events, the observations of electromagnetic counterparts in low-redshift range $z < 0.1$ are important for constraining the tidal effects. Considering 17 different EOSs of NSs or quark-stars, we find GW observations have strong capability to determine the EOS. Applying the events as standard sirens, and considering the constraints of NS's EOS derived from low-redshift observations as prior, we can constrain the dark-energy EOS parameters $w_0$ and $w_a$. In 3G era, the potential constraints are $Δw_0\in (0.0006,0.004)$ and $Δw_a\in(0.004,0.02)$, which are 1-3 orders smaller than those from traditional methods, including Type Ia supernovas and baryon acoustic oscillations. The constraints are also 1 order smaller than the method of GW standard siren by fixing the redshifts through short-hard $γ$-ray bursts, due to more available GW events in this method. Therefore, GW standard sirens, based on the tidal effect measurement, provide a realizable and much more powerful tool in cosmology.

preprint2020arXiv

CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks

Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achieve a high pruning rate but introducing computation irregularity (random sparsity), which is unfriendly to parallel hardware. On the other hand, hardware-oriented structured pruning suffers from low pruning rate due to restricted constraints on allowable pruning structure. This paper presents CSB-RNN, an optimized full-stack RNN framework with a novel compressed structured block (CSB) pruning technique. The CSB pruned RNN model comes with both fine pruning granularity that facilitates a high pruning rate and regular structure that benefits the hardware parallelism. To address the challenges in parallelizing the CSB pruned model inference with fine-grained structural sparsity, we propose a novel hardware architecture with a dedicated compiler. Gaining from the architecture-compilation co-design, the hardware not only supports various RNN cell types, but is also able to address the challenging workload imbalance issue and therefore significantly improves the hardware efficiency.

preprint2020arXiv

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Deep Reinforcement Learning (RL) is proven powerful for decision making in simulated environments. However, training deep RL model is challenging in real world applications such as production-scale health-care or recommender systems because of the expensiveness of interaction and limitation of budget at deployment. One aspect of the data inefficiency comes from the expensive hyper-parameter tuning when optimizing deep neural networks. We propose Adaptive Behavior Policy Sharing (ABPS), a data-efficient training algorithm that allows sharing of experience collected by behavior policy that is adaptively selected from a pool of agents trained with an ensemble of hyper-parameters. We further extend ABPS to evolve hyper-parameters during training by hybridizing ABPS with an adapted version of Population Based Training (ABPS-PBT). We conduct experiments with multiple Atari games with up to 16 hyper-parameter/architecture setups. ABPS achieves superior overall performance, reduced variance on top 25% agents, and equivalent performance on the best agent compared to conventional hyper-parameter tuning with independent training, even though ABPS only requires the same number of environmental interactions as training a single agent. We also show that ABPS-PBT further improves the convergence speed and reduces the variance.

preprint2020arXiv

FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters

Deep Neural Networks (DNNs) have revolutionized numerous applications, but the demand for ever more performance remains unabated. Scaling DNN computations to larger clusters is generally done by distributing tasks in batch mode using methods such as distributed synchronous SGD. Among the issues with this approach is that to make the distributed cluster work with high utilization, the workload distributed to each node must be large, which implies nontrivial growth in the SGD mini-batch size. In this paper, we propose a framework called FPDeep, which uses a hybrid of model and layer parallelism to configure distributed reconfigurable clusters to train DNNs. This approach has numerous benefits. First, the design does not suffer from batch size growth. Second, novel workload and weight partitioning leads to balanced loads of both among nodes. And third, the entire system is a fine-grained pipeline. This leads to high parallelism and utilization and also minimizes the time features need to be cached while waiting for back-propagation. As a result, storage demand is reduced to the point where only on-chip memory is used for the convolution layers. We evaluate FPDeep with the Alexnet, VGG-16, and VGG-19 benchmarks. Experimental results show that FPDeep has good scalability to a large number of FPGAs, with the limiting factor being the FPGA-to-FPGA bandwidth. With 6 transceivers per FPGA, FPDeep shows linearity up to 83 FPGAs. Energy efficiency is evaluated with respect to GOPs/J. FPDeep provides, on average, 6.36x higher energy efficiency than comparable GPU servers.

preprint2020arXiv

Generative Image Inpainting with Submanifold Alignment

Image inpainting aims at restoring missing regions of corrupted images, which has many applications such as image restoration and object removal. However, current GAN-based generative inpainting models do not explicitly exploit the structural or textural consistency between restored contents and their surrounding contexts.To address this limitation, we propose to enforce the alignment (or closeness) between the local data submanifolds (or subspaces) around restored images and those around the original (uncorrupted) images during the learning process of GAN-based inpainting models. We exploit Local Intrinsic Dimensionality (LID) to measure, in deep feature space, the alignment between data submanifolds learned by a GAN model and those of the original data, from a perspective of both images (denoted as iLID) and local patches (denoted as pLID) of images. We then apply iLID and pLID as regularizations for GAN-based inpainting models to encourage two levels of submanifold alignment: 1) an image-level alignment for improving structural consistency, and 2) a patch-level alignment for improving textural details. Experimental results on four benchmark datasets show that our proposed model can generate more accurate results than state-of-the-art models.

preprint2020arXiv

Hybrid Models for Open Set Recognition

Open set recognition requires a classifier to detect samples not belonging to any of the classes in its training set. Existing methods fit a probability distribution to the training samples on their embedding space and detect outliers according to this distribution. The embedding space is often obtained from a discriminative classifier. However, such discriminative representation focuses only on known classes, which may not be critical for distinguishing the unknown classes. We argue that the representation space should be jointly learned from the inlier classifier and the density estimator (served as an outlier detector). We propose the OpenHybrid framework, which is composed of an encoder to encode the input data into a joint embedding space, a classifier to classify samples to inlier classes, and a flow-based density estimator to detect whether a sample belongs to the unknown category. A typical problem of existing flow-based models is that they may assign a higher likelihood to outliers. However, we empirically observe that such an issue does not occur in our experiments when learning a joint representation for discriminative and generative components. Experiments on standard open set benchmarks also reveal that an end-to-end trained OpenHybrid model significantly outperforms state-of-the-art methods and flow-based baselines.

preprint2020arXiv

Kinetic Control of Morphology and Composition in Ge/GeSn Core/Shell Nanowires

The growth of Sn-rich group-IV semiconductors at the nanoscale provides new paths for understanding the fundamental properties of metastable GeSn alloys. Here, we demonstrate the effect of the growth conditions on the morphology and composition of Ge/GeSn core/shell nanowires by correlating the experimental observations with a theoretical interpretation based on a multi-scale approach. We show that the cross-sectional morphology of Ge/GeSn core/shell nanowires changes from hexagonal to dodecagonal upon increasing the supply of the Sn precursor. This transformation strongly influences the Sn distribution as a higher Sn content is measured under the {112} growth front. Ab-initio DFT calculations provide an atomic-scale explanation by showing that Sn incorporation is favored at the {112} surfaces, where the Ge bonds are tensile-strained. A phase-field continuum model was developed to reproduce the morphological transformation and the Sn distribution within the wire, shedding light on the complex growth mechanism and unveiling the relation between segregation and faceting. The tunability of the photoluminescence emission with the change in composition and morphology of the GeSn shell highlights the potential of the core/shell nanowire system for opto-electronic devices operating at mid-infrared wavelengths.

preprint2020arXiv

Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification

Modern deep neural networks (DNNs) often require high memory consumption and large computational loads. In order to deploy DNN algorithms efficiently on edge or mobile devices, a series of DNN compression algorithms have been explored, including factorization methods. Factorization methods approximate the weight matrix of a DNN layer with the multiplication of two or multiple low-rank matrices. However, it is hard to measure the ranks of DNN layers during the training process. Previous works mainly induce low-rank through implicit approximations or via costly singular value decomposition (SVD) process on every training step. The former approach usually induces a high accuracy loss while the latter has a low efficiency. In this work, we propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. SVD training first decomposes each layer into the form of its full-rank SVD, then performs training directly on the decomposed weights. We add orthogonality regularization to the singular vectors, which ensure the valid form of SVD and avoid gradient vanishing/exploding. Low-rank is encouraged by applying sparsity-inducing regularizers on the singular values of each layer. Singular value pruning is applied at the end to explicitly reach a low-rank model. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy, comparing to not only previous factorization methods but also state-of-the-art filter pruning methods.

preprint2020arXiv

LotteryFL: Personalized and Communication-Efficient Federated Learning with Lottery Ticket Hypothesis on Non-IID Datasets

Federated learning is a popular distributed machine learning paradigm with enhanced privacy. Its primary goal is learning a global model that offers good performance for the participants as many as possible. The technology is rapidly advancing with many unsolved challenges, among which statistical heterogeneity (i.e., non-IID) and communication efficiency are two critical ones that hinder the development of federated learning. In this work, we propose LotteryFL -- a personalized and communication-efficient federated learning framework via exploiting the Lottery Ticket hypothesis. In LotteryFL, each client learns a lottery ticket network (i.e., a subnetwork of the base model) by applying the Lottery Ticket hypothesis, and only these lottery networks will be communicated between the server and clients. Rather than learning a shared global model in classic federated learning, each client learns a personalized model via LotteryFL; the communication cost can be significantly reduced due to the compact size of lottery networks. To support the training and evaluation of our framework, we construct non-IID datasets based on MNIST, CIFAR-10 and EMNIST by taking feature distribution skew, label distribution skew and quantity skew into consideration. Experiments on these non-IID datasets demonstrate that LotteryFL significantly outperforms existing solutions in terms of personalization and communication cost.

preprint2020arXiv

Near-Optimal Interference Exploitation 1-Bit Massive MIMO Precoding via Partial Branch-and-Bound

In this paper, we focus on 1-bit precoding for large-scale antenna systems in the downlink based on the concept of constructive interference (CI). By formulating the optimization problem that aims to maximize the CI effect subject to the 1-bit constraint on the transmit signals, we mathematically prove that, when relaxing the 1-bit constraint, the majority of the obtained transmit signals already satisfy the 1-bit constraint. Based on this important observation, we propose a 1-bit precoding method via a partial branch-and-bound (P-BB) approach, where the BB procedure is only performed for the entries that do not comply with the 1-bit constraint. The proposed P-BB enables the use of the BB framework in large-scale antenna scenarios, which was not applicable due to its prohibitive complexity. Numerical results demonstrate a near-optimal error rate performance for the proposed 1-bit precoding algorithm.

preprint2020arXiv

Orbital selectivity of layer resolved tunneling on iron superconductor Ba0.6K0.4Fe2As2

We use scanning tunneling microscopy/spectroscopy (STM/S) to elucidate the Cooper pairing of the iron pnictide superconductor Ba0.6K0.4Fe2As2. By a cold-cleaving technique, we obtain atomically resolved termination surfaces with different layer identities. Remarkably, we observe that the low-energy tunneling spectrum related to superconductivity has an unprecedented dependence on the layer-identity. By cross-referencing with the angle-revolved photoemission results and the tunneling data of LiFeAs, we find that tunneling on each termination surface probes superconductivity through selecting distinct Fe-3d orbitals. These findings imply the real-space orbital features of the Cooper pairing in the iron pnictide superconductors, and propose a new and general concept that, for complex multi-orbital material, tunneling on different terminating layers can feature orbital selectivity.

preprint2020arXiv

PoliteCamera: Respecting Strangers' Privacy in Mobile Photographing

Camera is a standard on-board sensor of modern mobile phones. It makes photo taking popular due to its convenience and high resolution. However, when users take a photo of a scenery, a building or a target person, a stranger may also be unintentionally captured in the photo. Such photos expose the location and activity of strangers, and hence may breach their privacy. In this paper, we propose a cooperative mobile photographing scheme called PoliteCamera to protect strangers' privacy. Through the cooperation between a photographer and a stranger, the stranger's face in a photo can be automatically blurred upon his request when the photo is taken. Since multiple strangers nearby the photographer might send out blurring requests but not all of them are in the photo, an adapted balanced convolutional neural network (ABCNN) is proposed to determine whether the requesting stranger is in the photo based on facial attributes. Evaluations demonstrate that the ABCNN can accurately predict facial attributes and PoliteCamera can provide accurate privacy protection for strangers.

preprint2020arXiv

Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control

Many real-world sequential decision-making problems can be formulated as optimal control with high-dimensional observations and unknown dynamics. A promising approach is to embed the high-dimensional observations into a lower-dimensional latent representation space, estimate the latent dynamics model, then utilize this model for control in the latent space. An important open question is how to learn a representation that is amenable to existing control algorithms? In this paper, we focus on learning representations for locally-linear control algorithms, such as iterative LQR (iLQR). By formulating and analyzing the representation learning problem from an optimal control perspective, we establish three underlying principles that the learned representation should comprise: 1) accurate prediction in the observation space, 2) consistency between latent and observation space dynamics, and 3) low curvature in the latent space transitions. These principles naturally correspond to a loss function that consists of three terms: prediction, consistency, and curvature (PCC). Crucially, to make PCC tractable, we derive an amortized variational bound for the PCC loss function. Extensive experiments on benchmark domains demonstrate that the new variational-PCC learning algorithm benefits from significantly more stable and reproducible training, and leads to superior control performance. Further ablation studies give support to the importance of all three PCC components for learning a good latent space for control.

preprint2020arXiv

Reconfigurable Intelligent Surface (RIS)-Enhanced Two-Way OFDM Communications

In this paper, we focus on the reconfigurable intelligent surface (RIS)-enhanced two-way device-to-device (D2D) multi-pair orthogonal-frequency-division-multiplexing (OFDM) communication systems. Specifically, we maximize the minimum bidirectional weighted sum-rate by jointly optimizing the sub-band allocation, the power allocation and the discrete phase shift (PS) design at the RIS. To tackle the main difficulty of the non-convex PS design at the RIS, we firstly formulate a semi-definite relaxation problem and further devise a low-complexity solution for the PS design by leveraging the projected sub-gradient method. We demonstrate the desirable performance gain for the proposed designs through numerical results.

preprint2020arXiv

Reinforcement Learning-based Black-Box Evasion Attacks to Link Prediction in Dynamic Graphs

Link prediction in dynamic graphs (LPDG) is an important research problem that has diverse applications such as online recommendations, studies on disease contagion, organizational studies, etc. Various LPDG methods based on graph embedding and graph neural networks have been recently proposed and achieved state-of-the-art performance. In this paper, we study the vulnerability of LPDG methods and propose the first practical black-box evasion attack. Specifically, given a trained LPDG model, our attack aims to perturb the graph structure, without knowing to model parameters, model architecture, etc., such that the LPDG model makes as many wrong predicted links as possible. We design our attack based on a stochastic policy-based RL algorithm. Moreover, we evaluate our attack on three real-world graph datasets from different application domains. Experimental results show that our attack is both effective and efficient.

preprint2020arXiv

RF-Rhythm: Secure and Usable Two-Factor RFID Authentication

Passive RFID technology is widely used in user authentication and access control. We propose RF-Rhythm, a secure and usable two-factor RFID authentication system with strong resilience to lost/stolen/cloned RFID cards. In RF-Rhythm, each legitimate user performs a sequence of taps on his/her RFID card according to a self-chosen secret melody. Such rhythmic taps can induce phase changes in the backscattered signals, which the RFID reader can detect to recover the user's tapping rhythm. In addition to verifying the RFID card's identification information as usual, the backend server compares the extracted tapping rhythm with what it acquires in the user enrollment phase. The user passes authentication checks if and only if both verifications succeed. We also propose a novel phase-hopping protocol in which the RFID reader emits Continuous Wave (CW) with random phases for extracting the user's secret tapping rhythm. Our protocol can prevent a capable adversary from extracting and then replaying a legitimate tapping rhythm from sniffed RFID signals. Comprehensive user experiments confirm the high security and usability of RF-Rhythm with false-positive and false-negative rates close to zero.

preprint2020arXiv

Short-Term and Long-Term Context Aggregation Network for Video Inpainting

Video inpainting aims to restore missing regions of a video and has many applications such as video editing and object removal. However, existing methods either suffer from inaccurate short-term context aggregation or rarely explore long-term frame information. In this work, we present a novel context aggregation network to effectively exploit both short-term and long-term frame information for video inpainting. In the encoding stage, we propose boundary-aware short-term context aggregation, which aligns and aggregates, from neighbor frames, local regions that are closely related to the boundary context of missing regions into the target frame. Furthermore, we propose dynamic long-term context aggregation to globally refine the feature map generated in the encoding stage using long-term frame features, which are dynamically updated throughout the inpainting process. Experiments show that it outperforms state-of-the-art methods with better inpainting results and fast inpainting speed.

preprint2020arXiv

The $v_1$-Periodic Region of the Complex-Motivic Ext

We establish a $v_1$-periodicity theorem in Ext over the complex-motivic Steenrod algebra. The element $h_1$ of Ext, which detects the homotopy class $η$ in the motivic Adams spectral sequence, is non-nilpotent and therefore generates $h_1$-towers. Our result is that, apart from these $h_1$-towers, $v_1$-periodicity behaves as it does classically.

preprint2020arXiv

The AVA-Kinetics Localized Human Actions Video Dataset

This paper describes the AVA-Kinetics localized human actions video dataset. The dataset is collected by annotating videos from the Kinetics-700 dataset using the AVA annotation protocol, and extending the original AVA dataset with these new AVA annotated Kinetics clips. The dataset contains over 230k clips annotated with the 80 AVA action classes for each of the humans in key-frames. We describe the annotation process and provide statistics about the new dataset. We also include a baseline evaluation using the Video Action Transformer Network on the AVA-Kinetics dataset, demonstrating improved performance for action classification on the AVA test set. The dataset can be downloaded from https://research.google.com/ava/

preprint2020arXiv

TIPRDC: Task-Independent Privacy-Respecting Data Crowdsourcing Framework for Deep Learning with Anonymized Intermediate Representations

The success of deep learning partially benefits from the availability of various large-scale datasets. These datasets are often crowdsourced from individual users and contain private information like gender, age, etc. The emerging privacy concerns from users on data sharing hinder the generation or use of crowdsourcing datasets and lead to hunger of training data for new deep learning applications. One na\"ıve solution is to pre-process the raw data to extract features at the user-side, and then only the extracted features will be sent to the data collector. Unfortunately, attackers can still exploit these extracted features to train an adversary classifier to infer private attributes. Some prior arts leveraged game theory to protect private attributes. However, these defenses are designed for known primary learning tasks, the extracted features work poorly for unknown learning tasks. To tackle the case where the learning task may be unknown or changing, we present TIPRDC, a task-independent privacy-respecting data crowdsourcing framework with anonymized intermediate representation. The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks. We design a hybrid training method to learn the anonymized intermediate representation: (1) an adversarial training process for hiding private information from features; (2) maximally retain original information using a neural-network-based mutual information estimator.

preprint2020arXiv

Visual Localization Using Semantic Segmentation and Depth Prediction

In this paper, we propose a monocular visual localization pipeline leveraging semantic and depth cues. We apply semantic consistency evaluation to rank the image retrieval results and a practical clustering technique to reject estimation outliers. In addition, we demonstrate a substantial performance boost achieved with a combination of multiple feature extractors. Furthermore, by using depth prediction with a deep neural network, we show that a significant amount of falsely matched keypoints are identified and eliminated. The proposed pipeline outperforms most of the existing approaches at the Long-Term Visual Localization benchmark 2020.

preprint2019arXiv

A Generalized Framework for Population Based Training

Population Based Training (PBT) is a recent approach that jointly optimizes neural network weights and hyperparameters which periodically copies weights of the best performers and mutates hyperparameters during training. Previous PBT implementations have been synchronized glass-box systems. We propose a general, black-box PBT framework that distributes many asynchronous "trials" (a small number of training steps with warm-starting) across a cluster, coordinated by the PBT controller. The black-box design does not make assumptions on model architectures, loss functions or training procedures. Our system supports dynamic hyperparameter schedules to optimize both differentiable and non-differentiable metrics. We apply our system to train a state-of-the-art WaveNet generative model for human voice synthesis. We show that our PBT system achieves better accuracy, less sensitivity and faster convergence compared to existing methods, given the same computational resource.

preprint2018arXiv

PhotoSafer: Content-Based and Context-Aware Private Photo Protection for Smartphones

Nowadays many people store photos in smartphones. Many of the photos contain sensitive, private information, such as a photocopy of driver's license and credit card. An arising privacy concern is with the unauthorized accesses to such private photos by installed apps. Coarse-grained access control systems such as the Android permission system offer all-or-nothing access to photos stored on smartphones, and users are unaware of the exact behavior of installed apps. Our analysis finds that 82% of the top 200 free apps in a popular Android app store have complete access to stored photos and network on a user's smartphone, which indicates possible private photo leakage. In addition, our user survey reveals that 87.5% of the 112 respondents are not aware that certain apps can access their photos without informing users, and all the respondents believe that the stored photos on their smartphones contain different types of private information. Hence, we propose PhotoSafer, a content-based, context-aware private photo protection system for Android phones. PhotoSafer can detect private photos based on photo content with a well-trained deep convolutional neural network, and control access to photos based on system status (e.g., screen locked or not) and app-running status (e.g., app in the background). Evaluations demonstrate that PhotoSafer can accurately identify private photos in real time. The efficacy and efficiency of the implemented prototype system show the potential for practical use.

preprint2018arXiv

Privacy-Preserving Outsourcing of Large-Scale Nonlinear Programming to the Cloud

The increasing massive data generated by various sources has given birth to big data analytics. Solving large-scale nonlinear programming problems (NLPs) is one important big data analytics task that has applications in many domains such as transport and logistics. However, NLPs are usually too computationally expensive for resource-constrained users. Fortunately, cloud computing provides an alternative and economical service for resource-constrained users to outsource their computation tasks to the cloud. However, one major concern with outsourcing NLPs is the leakage of user's private information contained in NLP formulations and results. Although much work has been done on privacy-preserving outsourcing of computation tasks, little attention has been paid to NLPs. In this paper, we for the first time investigate secure outsourcing of general large-scale NLPs with nonlinear constraints. A secure and efficient transformation scheme at the user side is proposed to protect user's private information; at the cloud side, generalized reduced gradient method is applied to effectively solve the transformed large-scale NLPs. The proposed protocol is implemented on a cloud computing testbed. Experimental evaluations demonstrate that significant time can be saved for users and the proposed mechanism has the potential for practical use.

preprint2016arXiv

$Δ$ (1232) effects in density-dependent relativistic Hartree-Fock theory and neutron stars

The density-dependent relativistic Hartree-Fock (DDRHF) theory is extended to include $Δ$-isobars for the study of dense nuclear matter and neutron stars. To this end, we solve the Rarita-Schwinger equation for spin-3/2 particle. Both the direct and exchange terms of the $Δ$-isobars' self-energies are evaluated in details. In comparison with the relativistic mean field theory (Hartree approximation), a weaker parameter dependence is found for DDRHF. An early appearance of $Δ$-isobars is recognized at $ρ_B\sim0.27$fm$^{-3}$, comparable with that of hyperons. Also, we find that the $Δ$-isobars' softening of the equation of state is found to be mainly due to the reduced Fock contributions from the coupling of the isoscalar mesons, while the pion contributions are found negligibly small. We finally conclude that with typical parameter sets, neutron stars with $Δ$-isobars in their interiors could be as heavy as the two massive pulsars whose masses are precisely measured, with slightly smaller radii than normal neutron stars.

preprint2016arXiv

Accumulation tests for FDR control in ordered hypothesis testing

Multiple testing problems arising in modern scientific applications can involve simultaneously testing thousands or even millions of hypotheses, with relatively few true signals. In this paper, we consider the multiple testing problem where prior information is available (for instance, from an earlier study under different experimental conditions), that can allow us to test the hypotheses as a ranked list in order to increase the number of discoveries. Given an ordered list of n hypotheses, the aim is to select a data-dependent cutoff k and declare the first k hypotheses to be statistically significant while bounding the false discovery rate (FDR). Generalizing several existing methods, we develop a family of "accumulation tests" to choose a cutoff k that adapts to the amount of signal at the top of the ranked list. We introduce a new method in this family, the HingeExp method, which offers higher power to detect true signals compared to existing techniques. Our theoretical results prove that these methods control a modified FDR on finite samples, and characterize the power of the methods in the family. We apply the tests to simulated data, including a high-dimensional model selection problem for linear regression. We also compare accumulation tests to existing methods for multiple testing on a real data problem of identifying differential gene expression over a dosage gradient.

preprint2016arXiv

Anisotropic Pauli spin blockade in hole quantum dots

We present measurements on gate-defined double quantum dots in Ge-Si core-shell nanowires, which we tune to a regime with visible shell filling in both dots. We observe a Pauli spin blockade and can assign the measured leakage current at low magnetic fields to spin-flip cotunneling, for which we measure a strong anisotropy related to an anisotropic g-factor. At higher magnetic fields we see signatures for leakage current caused by spin-orbit coupling between (1,1)-singlet and (2,0)-triplet states. Taking into account these anisotropic spin-flip mechanisms, we can choose the magnetic field direction with the longest spin lifetime for improved spin-orbit qubits.

preprint2016arXiv

Electric-field dependent g-factor anisotropy in Ge-Si core-shell nanowire quantum dots

We present angle-dependent measurements of the effective g-factor g* in a Ge-Si core-shell nanowire quantum dot. g* is found to be maximum when the magnetic field is pointing perpendicular to both the nanowire and the electric field induced by local gates. Alignment of the magnetic field with the electric field reduces g* significantly. g* is almost completely quenched when the magnetic field is aligned with the nanowire axis. These findings confirm recent calculations, where the obtained anisotropy is attributed to a Rashba-type spin-orbit interaction induced by heavy-hole light-hole mixing. In principle, this facilitates manipulation of spin-orbit qubits by means of a continuous high-frequency electric field.

preprint2016arXiv

Fast radio bursts and their gamma-ray or radio afterglows as Kerr-Newman black hole binaries

Fast radio bursts (FRBs) are radio transients lasting only about a few milliseconds. They seem to occur at cosmological distances. We propose that these events can be originated in the collapse of the magnetosphere of Kerr-Newman black holes (KNBHs). We show that the closed orbits of charged particles in the magnetosphere of these objects are unstable. After examining the dependencies on the specific charge of the particle and the spin and charge of the KNBH, we conclude that the resulting timescale and radiation mechanism fit well with the extant observations of FRBs. Furthermore, we argue that the merger of a KNBH binary is one of the plausible central engines for potential gamma-ray or radio afterglow following a certain FRBs, and can also account for gravitational wave (GW) events like GW 150914. Our model leads to predictions that can be tested by combined multi-wavelength electromagnetic and GW observations.

preprint2016arXiv

Highly tuneable hole quantum dots in Ge-Si core-shell nanowires

We define single quantum dots of lengths varying from 60 nm up to nearly half a micron in Ge-Si core-shell nanowires. The charging energies scale inversely with the quantum dot length between 18 and 4 meV. Subsequently, we split up a long dot into a double quantum dot with a separate control over the tunnel couplings and the electrochemical potential of each dot. Both single and double quantum dot configurations prove to be very stable and show excellent control over the electrostatic environment of the dots, making this system a highly versatile platform for spin-based quantum computing.

preprint2016arXiv

Internal X-ray plateau in short GRBs: Signature of supramassive fast-rotating quark stars?

A supramassive, strongly-magnetized millisecond neutron star (NS) has been proposed to be the candidate central engine of at least some short gamma-ray bursts (SGRBs), based on the "internal plateau" commonly observed in the early X-ray afterglow. While a previous analysis shows a qualitative consistency between this suggestion and the Swift SGRB data, the distribution of observed break time $t_b$ is much narrower than the distribution of the collapse time of supramassive NSs for the several NS equations-of-state (EoSs) investigated. In this paper, we study four recently-constructed "unified" NS EoSs, as well as three developed strange quark star (QS) EoSs within the new confinement density-dependent mass model. All the EoSs chosen here satisfy the recent observational constraints of the two massive pulsars whose masses are precisely measured. We construct sequences of rigidly rotating NS/QS configurations with increasing spinning frequency $f$, from non-rotating ($f = 0$) to the Keplerian frequency ($f = f_{\rm K}$), and provide convenient analytical parametrizations of the results. Assuming that the cosmological NS-NS merger systems have the same mass distribution as the Galactic NS-NS systems, we demonstrate that all except the BCPM NS EoS can reproduce the current $22\%$ supramassive NS/QS fraction constraint as derived from the SGRB data. We simultaneously simulate the observed quantities (the break time $t_b$, the break time luminosity $L_b$ and the total energy in the electromagnetic channel $E_{\rm total}$) of SGRBs, and find that while equally well reproducing other observational constraints, QS EoSs predict a much narrower $t_b$ distribution than that of the NS EoSs, better matching the data. We therefore suggest that the post-merger product of NS-NS mergers might be fast-rotating supramassive QSs rather than NSs.

preprint2016arXiv

ModelHub: Towards Unified Data and Lifecycle Management for Deep Learning

Deep learning has improved state-of-the-art results in many important fields, and has been the subject of much research in recent years, leading to the development of several systems for facilitating deep learning. Current systems, however, mainly focus on model building and training phases, while the issues of data management, model sharing, and lifecycle management are largely ignored. Deep learning modeling lifecycle generates a rich set of data artifacts, such as learned parameters and training logs, and comprises of several frequently conducted tasks, e.g., to understand the model behaviors and to try out new models. Dealing with such artifacts and tasks is cumbersome and largely left to the users. This paper describes our vision and implementation of a data and lifecycle management system for deep learning. First, we generalize model exploration and model enumeration queries from commonly conducted tasks by deep learning modelers, and propose a high-level domain specific language (DSL), inspired by SQL, to raise the abstraction level and accelerate the modeling process. To manage the data artifacts, especially the large amount of checkpointed float parameters, we design a novel model versioning system (dlv), and a read-optimized parameter archival storage system (PAS) that minimizes storage footprint and accelerates query workloads without losing accuracy. PAS archives versioned models using deltas in a multi-resolution fashion by separately storing the less significant bits, and features a novel progressive query (inference) evaluation algorithm. Third, we show that archiving versioned models using deltas poses a new dataset versioning problem and we develop efficient algorithms for solving it. We conduct extensive experiments over several real datasets from computer vision domain to show the efficiency of the proposed techniques.

preprint2016arXiv

Role of Arsenic in Iron-based Superconductivity at Atomic Scale

In iron-based superconductors, a unique tri-layer Fe-As (Se, Te, P) plays an essential role in controlling the electronic properties, especially the Cooper pairing interaction. Here we use scanning tunneling microscopy/spectroscopy (STM/S) to investigate the role of arsenic atom in superconducting Ba0.4K0.6Fe2As2 by directly breaking and restoring the Fe-As structure at atomic scale. After the up-As-layer peeled away, the tunneling spectrum of the exposed iron surface reveals a shallow incoherent gap, indicating a severe suppression of superconductivity without arsenic covering. When a pair of arsenic atoms is placed on such iron surface, a localized topographic feature is formed due to Fe-As orbital hybridization, and the superconducting coherent peaks recover locally with the gap magnitude the same as that on the iron-layer fully covered by arsenic. These observations unravel the Fe-As interactions on an atomic scale and imply its essential roles in the iron-based superconductivity.

preprint2015arXiv

Accelerating Non-volatile/Hybrid Processor Cache Design Space Exploration for Application Specific Embedded Systems

In this article, we propose a technique to accelerate nonvolatile or hybrid of volatile and nonvolatile processor cache design space exploration for application specific embedded systems. Utilizing a novel cache behavior modeling equation and a new accurate cache miss prediction mechanism, our proposed technique can accelerate NVM or hybrid FIFO processor cache design space exploration for SPEC CPU 2000 applications up to 249 times compared to the conventional approach.

preprint2015arXiv

Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

We discuss an approach for solving sparse or dense banded linear systems ${\bf A} {\bf x} = {\bf b}$ on a Graphics Processing Unit (GPU) card. The matrix ${\bf A} \in {\mathbb{R}}^{N \times N}$ is possibly nonsymmetric and moderately large; i.e., $10000 \leq N \leq 500000$. The ${\it split\ and\ parallelize}$ (${\tt SaP}$) approach seeks to partition the matrix ${\bf A}$ into diagonal sub-blocks ${\bf A}_i$, $i=1,\ldots,P$, which are independently factored in parallel. The solution may choose to consider or to ignore the matrices that couple the diagonal sub-blocks ${\bf A}_i$. This approach, along with the Krylov subspace-based iterative method that it preconditions, are implemented in a solver called ${\tt SaP::GPU}$, which is compared in terms of efficiency with three commonly used sparse direct solvers: ${\tt PARDISO}$, ${\tt SuperLU}$, and ${\tt MUMPS}$. ${\tt SaP::GPU}$, which runs entirely on the GPU except several stages involved in preliminary row-column permutations, is robust and compares well in terms of efficiency with the aforementioned direct solvers. In a comparison against Intel's ${\tt MKL}$, ${\tt SaP::GPU}$ also fares well when used to solve dense banded systems that are close to being diagonally dominant. ${\tt SaP::GPU}$ is publicly available and distributed as open source under a permissive BSD3 license.

preprint2015arXiv

Efficient Discriminative Nonorthogonal Binary Subspace with its Application to Visual Tracking

One of the crucial problems in visual tracking is how the object is represented. Conventional appearance-based trackers are using increasingly more complex features in order to be robust. However, complex representations typically not only require more computation for feature extraction, but also make the state inference complicated. We show that with a careful feature selection scheme, extremely simple yet discriminative features can be used for robust object tracking. The central component of the proposed method is a succinct and discriminative representation of the object using discriminative non-orthogonal binary subspace (DNBS) which is spanned by Haar-like features. The DNBS representation inherits the merits of the original NBS in that it efficiently describes the object. It also incorporates the discriminative information to distinguish foreground from background. However, the problem of finding the DNBS bases from an over-complete dictionary is NP-hard. We propose a greedy algorithm called discriminative optimized orthogonal matching pursuit (D-OOMP) to solve this problem. An iterative formulation named iterative D-OOMP is further developed to drastically reduce the redundant computation between iterations and a hierarchical selection strategy is integrated for reducing the search space of features. The proposed DNBS representation is applied to object tracking through SSD-based template matching. We validate the effectiveness of our method through extensive experiments on challenging videos with comparisons against several state-of-the-art trackers and demonstrate its capability to track objects in clutter and moving background.

preprint2015arXiv

Recognizing Temporal Linguistic Expression Pattern of Individual with Suicide Risk on Social Media

Suicide is a global public health problem. Early detection of individual suicide risk plays a key role in suicide prevention. In this paper, we propose to look into individual suicide risk through time series analysis of personal linguistic expression on social media (Weibo). We examined temporal patterns of the linguistic expression of individuals on Chinese social media (Weibo). Then, we used such temporal patterns as predictor variables to build classification models for estimating levels of individual suicide risk. Characteristics of time sequence curves to linguistic features including parentheses, auxiliary verbs, personal pronouns and body words are reported to affect performance of suicide most, and the predicting model has a accuracy higher than 0.60, shown by the results. This paper confirms the efficiency of the social media data in detecting individual suicide risk. Results of this study may be insightful for improving the performance of suicide prevention programs.

preprint2015arXiv

Vertical convection in neutrino-dominated accretion flows

We present the effects of the vertical convection on the structure and luminosity of the neutrino-dominated accretion flow (NDAF) around a stellar-mass black hole in spherical coordinates. We found that the convective energy transfer can suppress the radial advection in the NDAF, and that the density, temperature and opening angle are slightly changed. As a result, the neutrino luminosity and annihilation luminosity are increased, which is conducive to achieve the energy requirement of gamma-ray bursts.

preprint2014arXiv

High-T_c superconductivity in ultrathin Bi_2Sr_2CaCu_2O_8+x down to halfunit-cell thickness by protection with graphene

High-T_c superconductors confined to two dimension exhibit novel physical phenomena, such as superconductor-insulator transition. In the Bi_2Sr_2CaCu_2O_8+x (Bi2212) model system, despite extensive studies, the intrinsic superconducting properties at the thinness limit have been difficult to determine. Here we report a method to fabricate high quality single-crystal Bi2212 films down to half-unit-cell thickness in the form of graphene/Bi2212 van der Waals heterostructure, in which sharp superconducting transitions are observed. The heterostructure also exhibits a nonlinear current-voltage characteristic due to the Dirac nature of the graphene band structure. More interestingly, although the critical temperature remains essentially the same with reduced thickness of Bi2212, the slope of the normal state T-linear resistivity varies by a factor of 4-5, and the sheet resistance increases by three orders of magnitude, indicating a surprising decoupling of the normal state resistance and superconductivity. The developed technique is versatile, applicable to investigate other two-dimensional (2D) superconducting materials.

preprint2014arXiv

Observation of a Robust Zero-energy Bound State in Iron-based Superconductor Fe(Te,Se)

A robust zero-energy bound state (ZBS) in a superconductor, such as a Majorana or Andreev bound state, is often a consequence of non-trivial topological or symmetry related properties, and can provide indispensable information about the superconducting state. Here we use scanning tunneling microscopy/spectroscopy to demonstrate, on the atomic scale, that an isotropic ZBS emerges at the randomly distributed interstitial excess Fe sites in the superconducting Fe(Te,Se). This ZBS is localized with a short decay length of ~ 10 Å, and surprisingly robust against a magnetic field up to 8 Tesla, as well as perturbations by neighboring impurities. We find no natural explanation for the observation of such a robust zero-energy bound state, indicating a novel mechanism of impurities or an exotic pairing symmetry of the iron-based superconductivity.

preprint2014arXiv

Phase Diagram and Weak-link Behavior in Nd-doped CaFe_2As_2

The transport properties, phase diagram, and dopant distribution are investigated in systematically Nd doped CaFe_2As_2 single crystals. Coexistence of two superconducting (SC) phases with different critical transition temperature (T_c) is observed. The low-T_c phase emerges as x >= 0.031, and the T_c value increases to its maximum value of about 20 K at x = 0.083, the maximum doping level in our study. As x >= 0.060, the high-T_c phase with a T_c value of about 40 K is observed. The structure transition (STr) from tetragonal to orthorhombic phase vanishes suddenly around x = 0.060, where a new STr from tetragonal to collapsed tetragonal phase begins to turn up. Compared to the low-T_c phase, the end point of SC transition of the high-T_c phase is more sensitive to the magnetic field, showing a characteristic of Josephson weak-link behavior. Possible scenarios about this system are discussed based on our observations. We also find that the non-uniform SC properties cannot be attributed to the heterogeneous Nd distribution on the micro scale, as revealed by the detailed energy dispersive X-ray spectroscopy (EDS) measurements.

preprint2014arXiv

Revisting the boiling of quark nuggets at nonzero chemical potential

The boiling of possible quark nuggets during the quark-hadron phase transition of the Universe at nonzero chemical potential is revisited within the microscopic Brueckner-Hartree-Fock approach employed for the hadron phase, using two kinds of baryon interactions as fundamental inputs. To describe the deconfined phase of quark matter, we use a recently developed quark mass density-dependent model with a fully self-consistent thermodynamic treatment of confinement. We study the baryon number limit $A_{\rm boil}$ (above which boiling may be important) with three typical values for the confinement parameter $D$. It is firstly found that the baryon interaction with a softer equation of state for the hadron phase would only lead to a small increase of $A_{\rm boil}$. However, results depend sensitively on the confinement parameter in the quark model. Specifically, boiling might be important during the Universe cooling for a limited parameter range around $D^{1/2} = 170$ MeV, a value satisfying recent lattice QCD calculations of the vacuum chiral condensate, while for other choices of this parameter, boiling might not happen and cosmological quark nuggets of $10^2 < A < 10^{50}$ could survive.

preprint2014arXiv

Sensing Subjective Well-being from Social Media

Subjective Well-being(SWB), which refers to how people experience the quality of their lives, is of great use to public policy-makers as well as economic, sociological research, etc. Traditionally, the measurement of SWB relies on time-consuming and costly self-report questionnaires. Nowadays, people are motivated to share their experiences and feelings on social media, so we propose to sense SWB from the vast user generated data on social media. By utilizing 1785 users' social media data with SWB labels, we train machine learning models that are able to "sense" individual SWB from users' social media. Our model, which attains the state-by-art prediction accuracy, can then be used to identify SWB of large population of social media users in time with very low cost.

preprint2014arXiv

The amount of crustal entrainment and the type of Vela-like pulsars

The "glitch crisis" of Vela-like pulsars has been a great debate recently. It might challenge the standard two-component glitch model, because large fractions of superfluid neutrons are thought to be entrained in the lattices of the crust part, then there is not enough superfluid neutrons to trigger the large glitches in Vela-like pulsars. But the amount of entrainment which could effectively constrain the fractional moment of inertia of a pulsar, is very uncertain. In order to examine the importance of this parameter on the inner structures of neutron stars, we relax the "glitch crisis" argument, employ a set of most developed equations of state derived within microscopic many-body approaches that could fulfill the recent 2-solar-mass constraint, and evaluate their predictions for the fractional moment of inertia with two extremes of crustal entrainment. We find a final determination of the amount of entrainment could be closely related to the type of neutron star. If a large enough fraction of neutrons are entrained, a Vela-like pulsar could not be a hybrid star, namely no free quarks would be present in its core. In addition, we use the Vela data to narrow the parameter space of hyperon-meson couplings in the popular phenomenological relativistic mean field model.

preprint2014arXiv

When reputation enforces evolutionary cooperation in unreliable MANETs

In self-organized mobile ad hoc networks (MANETs), network functions rely on cooperation of self-interested nodes, where a challenge is to enforce their mutual cooperation. In this paper, we study cooperative packet forwarding in a one-hop unreliable channel which results from loss of packets and noisy observation of transmissions. We propose an indirect reciprocity framework based on evolutionary game theory, and enforce cooperation of packet forwarding strategies in both structured and unstructured MANETs. Furthermore, we analyze the evolutionary dynamics of cooperative strategies, and derive the threshold of benefit-to-cost ratio to guarantee the convergence of cooperation. The numerical simulations verify that the proposed evolutionary game theoretic solution enforces cooperation when the benefit-to-cost ratio of the altruistic exceeds the critical condition. In addition, the network throughput performance of our proposed strategy in structured MANETs is measured, which is in close agreement with that of the full cooperative strategy.

preprint2013arXiv

Electronic Band Structure of Wurtzite GaP Nanowires via Resonance Raman Spectroscopy

Raman measurements are performed on defect-free wurzite GaP nanowires. Resonance Raman measurements are carried out over the excitation energy range between 2.19 and 2.71 eV. Resonances at 2.38 eV and 2.67 eV of the E1(LO) mode and at 2.67 eV of the A1(LO) are observed. The presence of these intensity resonances clearly demonstrates the existence of energy states with Gamma_9hh and Gamma_7V (Gamma_7C) symmetries of the valence (conduction) band and allows to measure WZ phase GaP band energies at the Gamma point. In addition, we have investigated temperature dependent resonant Raman measurements, which allowed us to extrapolate the zero temperature values of Gamma point energies, along with the crystal field and spin-orbit splitting energies. Above results provide a feedback for refining available theoretical calculations to derive the correct wurtzite III-V semiconductor band structure.

preprint2013arXiv

Precisely aligned graphene grown on hexagonal boron nitride by catalyst free chemical vapor deposition

To grow precisely aligned graphene on h-BN without metal catalyst is extremely important, which allows for intriguing physical properties and devices of graphene/h-BN hetero-structure to be studied in a controllable manner. In this report, such hetero-structures were fabricated and investigated by atomic resolution scanning probe microscopy. Moirre patterns are observed and the sensitivity of moirre interferometry proves that the graphene grains can align precisely with the underlying h-BN lattice within an error of less than 0.05 degree. The occurrence of moirre pattern clearly indicates that the graphene locks into h-BN via van der Waals epitaxy with its interfacial stress greatly released. It is worthy to note that the edges of the graphene grains are primarily oriented along the armchair direction. The field effect mobility in such graphene flakes exceeds 20,000 cm2/V.s at ambient condition. This work opens the door of atomic engineering of graphene on h-BN, and sheds light on fundamental research as well as electronic applications based on graphene/h-BN hetero-structure.

preprint2012arXiv

Delineating effects of tensor force on the density dependence of nuclear symmetry energy

In this talk, we report results of our recent studies to delineate effects of the tensor force on the density dependence of nuclear symmetry energy within phenomenological models. The tensor force active in the isosinglet neutron-proton interaction channel leads to appreciable depletion/population of nucleons below/above the Fermi surface in the single-nucleon momentum distribution in cold symmetric nuclear matter (SNM). We found that as a consequence of the high momentum tail in SNM the kinetic part of the symmetry energy $E^{kin}_{sym}(ρ)$ is significantly below the well-known Fermi gas model prediction of approximately $12.5 (ρ/ρ_0)^{2/3}$. With about 15% nucleons in the high momentum tail as indicated by the recent experiments at J-Lab by the CLAS Collaboration, the $E^{kin}_{sym}(ρ)$ is negligibly small. It even becomes negative when more nucleons are in the high momentum tail in SNM. These features have recently been confirmed by three independent studies based on the state-of-the-art microscopic nuclear many-body theories. In addition, we also estimate the second-order tensor force contribution to the potential part of the symmetry energy. Implications of these findings in extracting information about nuclear symmetry energy from nuclear reactions are discussed briefly.

preprint2012arXiv

Too massive neutron stars: The role of dark matter?

The maximum mass of a neutron star is generally determined by the equation of state of the star material. In this study, we take into account dark matter particles, assumed to behave like fermions with a free parameter to account for the interaction strength among the particles, as a possible constituent of neutron stars. We find dark matter inside the star would soften the equation of state more strongly than that of hyperons, and reduce largely the maximum mass of the star. However, the neutron star maximum mass is sensitive to the particle mass of dark matter, and a very high neutron star mass larger than 2 times solar mass could be achieved when the particle mass is small enough. Such kind of dark-matter- admixed neutron stars could explain the recent measurement of the Shapiro delay in the radio pulsar PSR J1614-2230, which yielded a neutron star mass of 2 times solar mass that may be hardly reached when hyperons are considered only, as in the case of the microscopic Brueckner theory. Furthermore, in this particular case, we point out that the dark matter around a neutron star should also contribute to the mass measurement due to its pure gravitational effect. However, our numerically calculation illustrates that such contribution could be safely ignored because of the usual diluted dark matter environment assumed. We conclude that a very high mass measurement of about 2 times solar mass requires a really stiff equation of state in neutron stars, and find a strong upper limit (<= 0.64 GeV) for the particle mass of non-self- annihilating dark matter based on the present model.

preprint2011arXiv

Experimental Study of Active LRC Circuits with PT-Symmetries

Mutually coupled modes of a pair of active LRC circuits, one with amplification and another with an equivalent amount of attenuation, provide an experimental realization of a wide class of systems where gain/loss mechanisms break the Hermiticity while preserving parity-time PT symmetry. For a value PT of the gain/loss strength parameter the eigen-frequencies undergo a spontaneous phase transition from real to complex values, while the normal modes coalesce acquiring a definite chirality. The consequences of the phase-transition in the spatiotemporal energy evolution are also presented.

preprint2011arXiv

Short-range tensor interaction and high-density nuclear symmetry energy

Effects of the short-range tensor interaction on the density-dependence of nuclear symmetry energy are examined by applying an approximate expression for the second-order tensor contribution to the symmetry energy derived earlier by G.E. Brown and R. Machleidt. It is found that the uncertainty in the short-range tensor force leads directly to a divergent high-density behavior of the nuclear symmetry energy.

preprint2010arXiv

BaFe2As2 Surface Domains and Domain Walls: Mirroring the Bulk Spin Structure

High-resolution scanning tunneling microscopy (STM) measurements on BaFe2As2-one of the parent compounds of the iron-based superconductors-reveals a (1x1) As-terminated unit cell on the (001) surface. However, there are significant differences of the surface unit cell compared to the bulk: only one of the two As atoms in the unit cell is imaged and domain walls between different (1x1) regions display a C2 symmetry at the surface. It should have been C2v if the STM image reflected the geometric structure of the surface or the orthorhombic bulk. The inequivalent As atoms and the bias dependence of the domain walls indicate that the origin of the STM image is primarily electronic not geometric. We argue that the surface electronic topography mirrors the bulk spin structure of BaFe2As2, via strong orbital-spin coupling.

preprint2009arXiv

Strange stars with different quark mass scalings

We investigate the stability of strange quark matter and the properties of the corresponding strange stars, within a wide range of quark mass scaling. The calculation shows that the resulting maximum mass always lies between 1.5 solor mass and 1.8 solor mass for all the scalings chosen here. Strange star sequences with a linear scaling would support less gravitational mass, and a change (increase or decrease) of the scaling around the linear scaling would lead to a larger maximum mass. Radii invariably decrease with the mass scaling. Then the larger the scaling, the faster the star might spin. In addition, the variation of the scaling would cause an order of magnitude change of the strong electric field on quark surface, which is essential to support possible crusts of strange stars against gravity and may then have some astrophysical implications.

preprint2009arXiv

Surface Geometric and Electronic Structure of BaFe2As2(001)

BaFe2As2 exhibits properties characteristic of the parent compounds of the newly discovered iron (Fe)-based high-TC superconductors. By combining the real space imaging of scanning tunneling microscopy/spectroscopy (STM/S) with momentum space quantitative Low Energy Electron Diffraction (LEED) we have identified the surface plane of cleaved BaFe2As2 crystals as the As terminated Fe-As layer - the plane where superconductivity occurs. LEED and STM/S data on the BaFe2As2(001) surface indicate an ordered arsenic (As) - terminated metallic surface without reconstruction or lattice distortion. It is surprising that the STM images the different Fe-As orbitals associated with the orthorhombic structure, not the As atoms in the surface plane.

Ang Li

What is connected

Connect this record

See the researcher in context

Building this map preview

119 published item(s)

Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer

Block-Level MU-MISO Interference Exploitation Precoding: Optimal Structure and Explicit Duality

Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation

A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining

A Rare Topic Discovery Model for Short Texts Based on Co-occurrence word Network

A sentiment analysis model for car review texts based on adversarial training and whole word mask BERT

Academic Resource Text Level Multi-label Classification based on Attention

Accurate Portraits of Scientific Resources and Knowledge Service Components

AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis

Aspect-Based Sentiment Analysis using Local Context Focus Mechanism with DeBERTa

Astrophysical implications on hyperon couplings and hyperon star properties with relativistic equations of states

Bi-convolution matrix factorization algorithm based on improved ConvMF

Block-Level Interference Exploitation Precoding without Symbol-by-Symbol Optimization

BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression

Chinese Word Sense Embedding with SememeWSD and Synonym Set

CollComm: Enabling Efficient Collective Quantum Communication Based on EPR buffering

Cross-media Scientific Research Achievements Query based on Ranking Learning

Dark matter admixed neutron star properties in the light of X-ray pulse profile observations

Efficient Hierarchical State Vector Simulation of Quantum Circuits via Acyclic Graph Partitioning

Efimov resonance position near a narrow Feshbach resonance in $^6$Li-$^{133}$Cs mixture

FastMapSVM: Classifying Complex Objects Using the FastMap Algorithm and Support-Vector Machines

GAAF: Searching Activation Functions for Binary Neural Networks through Genetic Algorithm

GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

Information-theoretic Online Memory Selection for Continual Learning

Interacting $ud$ and $uds$ quark matter at finite densities and quark stars

Iterative Geometry-Aware Cross Guidance Network for Stereo Image Inpainting

Knowledge Graph and Accurate Portrait Construction of Scientific and Technological Academic Conferences

Learning and Fast Adaptation for Grid Emergency Control via Deep Meta Reinforcement Learning

Mining and searching association relation of scientific papers based on deep learning

Neural Mean Discrepancy for Efficient Out-of-Distribution Detection

On the moment of inertia of PSR J0737-3039 A from LIGO/Virgo and NICER

Prediction of Depression Severity Based on the Prosodic and Semantic Features with Bidirectional LSTM and Time Distributed CNN

Probabilities of Causation with Nonbinary Treatment and Effect

QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation

Quantum interference visibility spectroscopy in two-color photoemission from tungsten needle tips

QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State Fidelity

Research on accurate stereo portrait generation algorithm of scientific research team

Research on Intellectual Property Resource Profile and Evolution Law

Retrieval of Scientific and Technological Resources for Experts and Scholars

Scientific and Technological Text Knowledge Extraction Method of based on Word Mixing and GRU

Searching Similarity Measure for Binarized Neural Networks

Semantic Similarity Computing for Scientific Academic Conferences fused with domain features

Semi-Supervised Vision Transformers

Sentiment Analysis of Online Travel Reviews Based on Capsule Network and Sentiment Lexicon

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations

Social Network Community Detection Based on Textual Content Similarity and Sentimental Tendency

SphereFed: Hyperspherical Federated Learning

Topological EEG Nonlinear Dynamics Analysis for Emotion Recognition

Unified neutron star EOSs and neutron star structures in RMF models

Unified nuclear matter EOSs constrained by the in-medium balance in density-dependent covariant density functionals

Unit Selection with Nonbinary Treatment and Effect

Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training

Cramér-Rao Bound Optimization for Joint Radar-Communication Design

Growth and Strain Relaxation Mechanisms of InAs/InP/GaAsSb Core-Dual-Shell Nanowires

Hermes: Decentralized Dynamic Spectrum Access System for Massive Devices Deployment in 5G

High-resolution ARPES endstation for in-situ electronic structure investigations at SSRF

On Provable Backdoor Defense in Collaborative Learning

Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint

PredCoin: Defense against Query-based Hard-label Attack

A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs

AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing

Benchmarking Machine Learning Techniques with Di-Higgs Production at the LHC

Comprehensive analysis of the tidal effect in gravitational waves and implication for cosmology

CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters

Generative Image Inpainting with Submanifold Alignment

Hybrid Models for Open Set Recognition

Kinetic Control of Morphology and Composition in Ge/GeSn Core/Shell Nanowires

Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification

LotteryFL: Personalized and Communication-Efficient Federated Learning with Lottery Ticket Hypothesis on Non-IID Datasets