Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
36works
0followers
34topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

36 published item(s)

preprint2026arXiv

Checkerboard: A Simple, Effective, Efficient and Learning-free Clean Label Backdoor Attack with Low Poisoning Budget

Backdoor attacks threaten the deep learning supply chain by poisoning a small fraction of the training data so that a model behaves normally on clean inputs but misclassifies trigger-carrying inputs to an attacker-chosen target class. Clean-label backdoor attacks are especially dangerous because poisoned samples remain label-consistent and are therefore harder to detect. Yet existing clean-label attacks typically rely on expensive optimization, surrogate-model training, or nontrivial data access. We present Checkerboard, a theoretically grounded, learning-free clean-label backdoor attack that is effective, efficient, and simple to implement. From a linear separability formulation, we derive a checkerboard trigger in closed form, removing the need for surrogate-model training and trigger optimization. For texture-rich datasets, we introduce Complexity-driven Sample Selection, which uses only target-class data to improve trigger-to-background contrast by selecting low-complexity images for poisoning. Across four benchmark datasets, Checkerboard outperforms 8 baseline attacks and achieves state-of-the-art performance under low poisoning budgets. For example, on CIFAR-10, under a trigger perturbation budget of $10/255$, poisoning 20 training samples achieves $99.99\%$ Attack Success Rate (ASR). On ImageNet-100, a poisoning rate of only $0.46\%$ yields over $94\%$ ASR without degrading clean accuracy. The proposed attack also remains effective against state-of-the-art backdoor defenses and shows strong resistance to adaptive defenses.

preprint2023arXiv

${S}^{2}$Net: Accurate Panorama Depth Estimation on Spherical Surface

Monocular depth estimation is an ambiguous problem, thus global structural cues play an important role in current data-driven single-view depth estimation methods. Panorama images capture the complete spatial information of their surroundings utilizing the equirectangular projection which introduces large distortion. This requires the depth estimation method to be able to handle the distortion and extract global context information from the image. In this paper, we propose an end-to-end deep network for monocular panorama depth estimation on a unit spherical surface. Specifically, we project the feature maps extracted from equirectangular images onto unit spherical surface sampled by uniformly distributed grids, where the decoder network can aggregate the information from the distortion-reduced feature maps. Meanwhile, we propose a global cross-attention-based fusion module to fuse the feature maps from skip connection and enhance the ability to obtain global context. Experiments are conducted on five panorama depth estimation datasets, and the results demonstrate that the proposed method substantially outperforms previous state-of-the-art methods. All related codes will be open-sourced in the upcoming days.

preprint2022arXiv

Content-oriented learned image compression

In recent years, with the development of deep neural networks, end-to-end optimized image compression has made significant progress and exceeded the classic methods in terms of rate-distortion performance. However, most learning-based image compression methods are unlabeled and do not consider image semantics or content when optimizing the model. In fact, human eyes have different sensitivities to different content, so the image content also needs to be considered. In this paper, we propose a content-oriented image compression method, which handles different kinds of image contents with different strategies. Extensive experiments show that the proposed method achieves competitive subjective results compared with state-of-the-art end-to-end learned image compression methods or classic methods.

preprint2022arXiv

Decentralized Unsupervised Learning of Visual Representations

Collaborative learning enables distributed clients to learn a shared model for prediction while keeping the training data local on each client. However, existing collaborative learning methods require fully-labeled data for training, which is inconvenient or sometimes infeasible to obtain due to the high labeling cost and the requirement of expertise. The lack of labels makes collaborative learning impractical in many realistic settings. Self-supervised learning can address this challenge by learning from unlabeled data. Contrastive learning (CL), a self-supervised learning approach, can effectively learn visual representations from unlabeled image data. However, the distributed data collected on clients are usually not independent and identically distributed (non-IID) among clients, and each client may only have few classes of data, which degrades the performance of CL and learned representations. To tackle this problem, we propose a collaborative contrastive learning framework consisting of two approaches: feature fusion and neighborhood matching, by which a unified feature space among clients is learned for better data representations. Feature fusion provides remote features as accurate contrastive information to each client for better local learning. Neighborhood matching further aligns each client's local features to the remote features such that well-clustered features among clients can be learned. Extensive experiments show the effectiveness of the proposed framework. It outperforms other methods by 11% on IID data and matches the performance of centralized learning.

preprint2022arXiv

Federated Learning with Non-IID Data

Federated learning enables resource-constrained edge compute devices, such as mobile phones and IoT devices, to learn a shared model for prediction, while keeping the training data local. This decentralized approach to train models provides privacy, security, regulatory and economic benefits. In this work, we focus on the statistical challenge of federated learning when local data is non-IID. We first show that the accuracy of federated learning reduces significantly, by up to 55% for neural networks trained for highly skewed non-IID data, where each client device trains only on a single class of data. We further show that this accuracy reduction can be explained by the weight divergence, which can be quantified by the earth mover's distance (EMD) between the distribution over classes on each device and the population distribution. As a solution, we propose a strategy to improve training on non-IID data by creating a small subset of data which is globally shared between all the edge devices. Experiments show that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data.

preprint2022arXiv

Few-Shot Class-Incremental Learning from an Open-Set Perspective

The continual appearance of new objects in the visual world poses considerable challenges for current deep learning methods in real-world deployments. The challenge of new task learning is often exacerbated by the scarcity of data for the new categories due to rarity or cost. Here we explore the important task of Few-Shot Class-Incremental Learning (FSCIL) and its extreme data scarcity condition of one-shot. An ideal FSCIL model needs to perform well on all classes, regardless of their presentation order or paucity of data. It also needs to be robust to open-set real-world conditions and be easily adapted to the new tasks that always arise in the field. In this paper, we first reevaluate the current task setting and propose a more comprehensive and practical setting for the FSCIL task. Then, inspired by the similarity of the goals for FSCIL and modern face recognition systems, we propose our method -- Augmented Angular Loss Incremental Classification or ALICE. In ALICE, instead of the commonly used cross-entropy loss, we propose to use the angular penalty loss to obtain well-clustered features. As the obtained features not only need to be compactly clustered but also diverse enough to maintain generalization for future incremental classes, we further discuss how class augmentation, data augmentation, and data balancing affect classification performance. Experiments on benchmark datasets, including CIFAR100, miniImageNet, and CUB200, demonstrate the improved performance of ALICE over the state-of-the-art FSCIL methods.

preprint2022arXiv

Horizontal Layer Constrained Attention Neural Network for Semblance Velocity Picking

Semblance velocity analysis is a crucial step in seismic data processing. To avoid the huge time-cost when performed manually, some deep learning methods are proposed for automatic semblance velocity picking. However, the application of existing deep learning methods is still restricted by the shortage of labels in practice. In this letter, we propose an attention neural network combined with a point-to-point regression velocity picking strategy to mitigate this problem. In our method, semblance patch and velocity value are served as network input and output, respectively. In this way, global and local features hidden in semblance patch can be effectively extracted by attention neural network. A down-sampling strategy based on horizontal layer extraction is also designed to improve the picking efficiency in prediction process. Tests on synthetic and field datasets demonstrate that the proposed method can produce reasonable results and maintain global velocity trend consistent with labels. Besides, robustness against random noise is also tested on the field data.

preprint2022arXiv

Inference in Functional Linear Quantile Regression

In this paper, we study statistical inference in functional quantile regression for scalar response and a functional covariate. Specifically, we consider a functional linear quantile regression model where the effect of the covariate on the quantile of the response is modeled through the inner product between the functional covariate and an unknown smooth regression parameter function that varies with the level of quantile. The objective is to test that the regression parameter is constant across several quantile levels of interest. The parameter function is estimated by combining ideas from functional principal component analysis and quantile regression. An adjusted Wald testing procedure is proposed for this hypothesis of interest, and its chi-square asymptotic null distribution is derived. The testing procedure is investigated numerically in simulations involving sparse and noisy functional covariates and in a capital bike share data application. The proposed approach is easy to implement and the {\tt R} code is published online at \url{https://github.com/xylimeng/fQR-testing}.

preprint2022arXiv

MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription

Neural network based end-to-end Text-to-Speech (TTS) has greatly improved the quality of synthesized speech. While how to use massive spontaneous speech without transcription efficiently still remains an open problem. In this paper, we propose MHTTS, a fast multi-speaker TTS system that is robust to transcription errors and speaking style speech data. Specifically, we introduce a multi-head model and transfer text information from high-quality corpus with manual transcription to spontaneous speech with imperfectly recognized transcription by jointly training them. MHTTS has three advantages: 1) Our system synthesizes better quality multi-speaker voice with faster inference speed. 2) Our system is capable of transferring correct text information to data with imperfect transcription, simulated using corruption, or provided by an Automatic Speech Recogniser (ASR). 3) Our system can utilize massive real spontaneous speech with imperfect transcription and synthesize expressive voice.

preprint2022arXiv

New Penalized Stochastic Gradient Methods for Linearly Constrained Strongly Convex Optimization

For minimizing a strongly convex objective function subject to linear inequality constraints, we consider a penalty approach that allows one to utilize stochastic methods for problems with a large number of constraints and/or objective function terms. We provide upper bounds on the distance between the solutions to the original constrained problem and the penalty reformulations, guaranteeing the convergence of the proposed approach. We give a nested accelerated stochastic gradient method and propose a novel way for updating the smoothness parameter of the penalty function and the step-size. The proposed algorithm requires at most $\tilde O(1/\sqrtε)$ expected stochastic gradient iterations to produce a solution within an expected distance of $ε$ to the optimal solution of the original problem, which is the best complexity for this problem class to the best of our knowledge. We also show how to query an approximate dual solution after stochastically solving the penalty reformulations, leading to results on the convergence of the duality gap. Moreover, the nested structure of the algorithm and upper bounds on the distance to the optimal solutions allows one to safely eliminate constraints that are inactive at an optimal solution throughout the algorithm, which leads to improved complexity results. Finally, we present computational results that demonstrate the effectiveness and robustness of our algorithm.

preprint2022arXiv

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets. To navigate the Pareto front of model accuracy vs model size, researchers are trapped in a dilemma of optimizing model accuracy by training and fine-tuning models for each individual edge device while keeping the training GPU-hours tractable. In this paper, we propose Omni-sparsity DNN, where a single neural network can be pruned to generate optimized model for a large range of model sizes. We develop training strategies for Omni-sparsity DNN that allows it to find models along the Pareto front of word-error-rate (WER) vs model size while keeping the training GPU-hours to no more than that of training one singular model. We demonstrate the Omni-sparsity DNN with streaming E2E ASR models. Our results show great saving on training time and resources with similar or better accuracy on LibriSpeech compared to individually pruned sparse models: 2%-6.6% better WER on Test-other.

preprint2022arXiv

Ordered and tunable Majorana-zero-mode lattice in naturally strained LiFeAs

Majorana zero modes (MZMs) obey non-Abelian statistics and are considered building blocks for constructing topological qubits. Iron-based superconductors with topological band structures have emerged as promising hosting materials, since isolated candidate MZMs in the quantum limit have been observed inside the topological vortex cores. However, these materials suffer from issues related to alloying-induced disorder, uncontrolled vortex lattices and a low yield of topological vortices. Here, we report the formation of an ordered and tunable MZM lattice in naturally-strained stoichiometric LiFeAs by scanning tunneling microscopy/spectroscopy (STM/S). We observe biaxial charge density wave (CDW) stripes along the Fe-Fe and As-As directions in the strained regions. The vortices are pinned on the CDW stripes in the As-As direction and form an ordered lattice. We detect more than 90 percent of the vortices to be topological and possess the characteristics of isolated MZMs at the vortex center, forming an ordered MZM lattice with the density and the geometry tunable by an external magnetic field. Remarkably, with decreasing the spacing of neighboring vortices, the MZMs start to couple with each other. Our findings provide a new pathway towards tunable and ordered MZM lattices as a platform for future topological quantum computation.

preprint2022arXiv

Resilience-Motivated Distribution System Restoration Considering Electricity-Water-Gas Interdependency

A major outage in the electricity distribution system may affect the operation of water and natural gas supply systems, leading to an interruption of multiple services to critical customers. Therefore, enhancing resilience of critical infrastructures requires joint efforts of multiple sectors. In this paper, a distribution system service restoration method considering the electricity-water-gas interdependency is proposed. The objective is to provide electricity, water, and natural gas supplies to critical customers in the desired ratio according to their needs after an extreme event. The operational constraints of electricity, water, and natural gas networks are considered. The characteristics of electricity-driven coupling components, including water pumps and gas compressors, are also modeled. Relaxation techniques are applied to nonconvex constraints posed by physical laws of those networks. Consequently, the restoration problem is formulated as a mixed-integer second-order cone program, which can readily be solved by the off-the-shelf solvers. The proposed method is validated by numerical simulations on electricity-water-gas integrated systems, developed based on benchmark models of the subsystems. The results indicate that considering the interdependency refines the allocation of limited generation resources and demonstrate the exactness of the proposed convex relaxation.

preprint2022arXiv

SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems

We design deep neural networks (DNNs) and corresponding networks' splittings to distribute DNNs' workload to camera sensors and a centralized aggregator on head mounted devices to meet system performance targets in inference accuracy and latency under the given hardware resource constraints. To achieve an optimal balance among computation, communication, and performance, a split-aware neural architecture search framework, SplitNets, is introduced to conduct model designing, splitting, and communication reduction simultaneously. We further extend the framework to multi-view systems for learning to fuse inputs from multiple camera sensors with optimal performance and systemic efficiency. We validate SplitNets for single-view system on ImageNet as well as multi-view system on 3D classification, and show that the SplitNets framework achieves state-of-the-art (SOTA) performance and system latency compared with existing approaches.

preprint2021arXiv

Aggregate Modeling and Equilibrium Analysis of the Crowdsourcing Market for Autonomous Vehicles

Autonomous vehicles (AVs) have the potential of reshaping the human mobility in a wide variety of aspects. This paper focuses on a new possibility that the AV owners have the option of "renting" their AVs to a company, which can use these collected AVs to provide on-demand ride services without any drivers. We call such a mobility market with AV renting options the "AV crowdsourcing market". This paper establishes an aggregate equilibrium model with multiple transport modes to analyze the AV crowdsourcing market. The modeling framework can capture the customers' mode choices and AV owners' rental decisions with the presence of traffic congestion. Then, we explore different scenarios that either maximize the crowdsourcing platform's profit or maximize social welfare. Gradient-based optimization algorithms are designed for solving the problems. The results obtained by numerical examples reveal the welfare enhancement and the strong profitability of the AV crowdsourcing service. However, when the crowdsourcing scale is small, the crowdsourcing platform might not be profitable. A second-best pricing scheme is able to avoid such undesirable cases. The insights generated from the analyses provide guidance for regulators, service providers and citizens to make future decisions regarding the utilization of the AV crowdsourcing markets for serving the good of the society.

preprint2021arXiv

Exploring wav2vec 2.0 on speaker verification and language identification

Wav2vec 2.0 is a recently proposed self-supervised framework for speech representation learning. It follows a two-stage training process of pre-training and fine-tuning, and performs well in speech recognition tasks especially ultra-low resource cases. In this work, we attempt to extend self-supervised framework to speaker verification and language identification. First, we use some preliminary experiments to indicate that wav2vec 2.0 can capture the information about the speaker and language. Then we demonstrate the effectiveness of wav2vec 2.0 on the two tasks respectively. For speaker verification, we obtain a new state-of-the-art result, Equal Error Rate (EER) of 3.61% on the VoxCeleb1 dataset. For language identification, we obtain an EER of 12.02% on 1 second condition and an EER of 3.47% on full-length condition of the AP17-OLR dataset. Finally, we utilize one model to achieve the unified modeling by the multi-task learning for the two tasks.

preprint2021arXiv

Functional Group Bridge for Simultaneous Regression and Support Estimation

This article is motivated by studying multisensory effects on brain activities in intracranial electroencephalography (iEEG) experiments. Differential brain activities to multisensory stimulus presentations are zero in most regions and non-zero in some local regions, yielding locally sparse functions. Such studies are essentially a function-on-scalar regression problem, with interest being focused not only on estimating nonparametric functions but also on recovering the function supports. We propose a weighted group bridge approach for simultaneous function estimation and support recovery in function-on-scalar mixed effect models, while accounting for heterogeneity present in functional data. We use B-splines to transform sparsity of functions to its sparse vector counterpart of increasing dimension, and propose a fast non-convex optimization algorithm using nested alternative direction method of multipliers (ADMM) for estimation. Large sample properties are established. In particular, we show that the estimated coefficient functions are rate optimal in the minimax sense under the $L_2$ norm and resemble a phase transition phenomenon. For support estimation, we derive a convergence rate under the $L_{\infty}$ norm that leads to a sparsistency property under $δ$-sparsity, and provide a simple sufficient regularity condition under which a strict sparsistency property is established. An adjusted extended Bayesian information criterion is proposed for parameter tuning. The developed method is illustrated through simulation and an application to a novel iEEG dataset to study multisensory integration. We integrate the proposed method into RAVE, an R package that gains increasing popularity in the iEEG community.

preprint2021arXiv

Improving the performance of reputation evaluating by combining the structure of network and nonlinear recovery

Characterizing the reputation of an evaluator is particularly significant for consumer to obtain useful information from online rating systems. Furthermore, to overcome the difficulties with spam attacks on the rating system and to get the reliable on reputation of evaluators is an important topic in the research. We have noticed that most of the existing evaluator reputation evaluation methods only rely on the evaluator's rating information and abnormal behavior to establish a reputation system, which miss the systematic aspects of the rating systems including the structure of the evaluator-object bipartite network and the effects of nonlinear effects. This study we propose an improved reputation evaluation method by combining the structure of the evaluator-object bipartite network with rating information and introducing penalty and reward factors. This novel method has been empirically analyzed on a large-scale artificial data set and two real data sets. The results show that the proposed method is more accurate and robust in the presence of spam attacks. This fresh idea contributes a new way for building reputation evaluation models in sparse bipartite rating network.

preprint2021arXiv

Network-level rhythmic control of heterogeneous automated traffic with buses

Guaranteeing the quality of transit service is of great importance to promote the attractiveness of buses and alleviate urban traffic issues such as congestion and pollution. Emerging technologies of automated driving and V2X communication have the potential to enable the accurate control of vehicles and the efficient organization of traffic to enhance both the schedule adherence of buses and the overall network mobility. This study proposes an innovative network-level control scheme for heterogeneous automated traffic composed of buses and private cars under a full connected and automated environment. Inheriting the idea of network-level rhythmic control proposed by Lin et al. (2020), an augmented rhythmic control scheme for heterogeneous traffic, i.e., RC-H, is established to organize the mixed traffic in a rhythmic manner. Realized virtual platoons are designed for accommodating vehicles to pass through the network, including dedicated virtual platoons for buses to provide exclusive right-of-ways (ROWs) on their trips and regular virtual platoons for private cars along with an optimal assignment plan to minimize the total travel cost. A mixed-integer linear program (MILP) is formulated to optimize the RC-H scheme and a bilevel heuristic solution method is designed to relieve the computational burden of MILP. Numerical examples and simulation experiments are conducted to evaluate the performance of the RC-H scheme under different scenarios. The results show that the bus operation can be guaranteed and the travel delay can be minimized under various demand levels with transit priority. Moreover, compared with traffic signal control strategies, the RC-H scheme has significant advantages in handling massive traffic demand, in terms of both vehicle delay and network throughput.

preprint2021arXiv

Protonation-induced discrete superconducting phases in bulk FeSe single crystals

The superconducting transition temperature, $T_{\rm{c}}$, of FeSe can be significantly enhanced several-fold by applying pressure, electron doping, intercalating spacing layer, and reducing dimensionality. Various ordered electronic phases, such as nematicity and spin density waves, have also been observed accompanying high-$T_{\rm{c}}$ superconductivity. Investigation on the evolution of the electronic structure with $T_{\rm{c}}$ is essential to understanding electronic behavior and high-$T_{\rm{c}}$ superconductivity in FeSe and its derived superconductors. In this report, we have found a series of discrete superconducting phases, with a maximum $T_{\rm{c}}$ up to 44 K, in H$^+$-intercalated FeSe single crystals using an ionic liquid gating method. Accompanied with the increase of $T_{\rm{c}}$, suppression of the nematic phase and evolution from non-Fermi-liquid to Fermi-liquid behavior was observed. An abrupt change in the Fermi surface topology was proposed to explain the discrete superconducting phases. A band structure that favors the high-$T_{\rm{c}}$ superconducting phase was also revealed.

preprint2020arXiv

A gridded establishment dataset as a proxy for economic activity in China

Measuring the geographical distribution of economic activity plays a key role in scientific research and policymaking. However, previous studies and data on economic activity either have a coarse spatial resolution or cover a limited time span, and the high-resolution characteristics of socioeconomic dynamics are largely unknown. Here, we construct a dataset on the economic activity of mainland China, the gridded establishment dataset (GED), which measures the volume of establishments at a 0.01$^{\circ}$ latitude by 0.01$^{\circ}$ longitude scale. Specifically, our dataset captures the geographically based opening and closing of approximately 25.5 million firms that registered in mainland China over the period 2005-2015. The characteristics of fine granularity and long-term observability give the GED a high application value. The dataset not only allows us to quantify the spatiotemporal patterns of the establishments, urban vibrancy and socioeconomic activity, but also helps us uncover the fundamental principles underlying the dynamics of industrial and economic development.

preprint2020arXiv

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks

Neural Architecture Search (NAS) has demonstrated its power on various AI accelerating platforms such as Field Programmable Gate Arrays (FPGAs) and Graphic Processing Units (GPUs). However, it remains an open problem, how to integrate NAS with Application-Specific Integrated Circuits (ASICs), despite them being the most powerful AI accelerating platforms. The major bottleneck comes from the large design freedom associated with ASIC designs. Moreover, with the consideration that multiple DNNs will run in parallel for different workloads with diverse layer operations and sizes, integrating heterogeneous ASIC sub-accelerators for distinct DNNs in one design can significantly boost performance, and at the same time further complicate the design space. To address these challenges, in this paper we build ASIC template set based on existing successful designs, described by their unique dataflows, so that the design space is significantly reduced. Based on the templates, we further propose a framework, namely NASAIC, which can simultaneously identify multiple DNN architectures and the associated heterogeneous ASIC accelerator design, such that the design specifications (specs) can be satisfied, while the accuracy can be maximized. Experimental results show that compared with successive NAS and ASIC design optimizations which lead to design spec violations, NASAIC can guarantee the results to meet the design specs with 17.77%, 2.49x, and 2.32x reductions on latency, energy, and area and with 0.76% accuracy loss. To the best of the authors' knowledge, this is the first work on neural architecture and ASIC accelerator design co-exploration.

preprint2020arXiv

Energy-Aware Neural Architecture Optimization with Fast Splitting Steepest Descent

Designing energy-efficient networks is of critical importance for enabling state-of-the-art deep learning in mobile and edge settings where the computation and energy budgets are highly limited. Recently, Liu et al. (2019) framed the search of efficient neural architectures into a continuous splitting process: it iteratively splits existing neurons into multiple off-springs to achieve progressive loss minimization, thus finding novel architectures by gradually growing the neural network. However, this method was not specifically tailored for designing energy-efficient networks, and is computationally expensive on large-scale benchmarks. In this work, we substantially improve Liu et al. (2019) in two significant ways: 1) we incorporate the energy cost of splitting different neurons to better guide the splitting process, thereby discovering more energy-efficient network architectures; 2) we substantially speed up the splitting process of Liu et al. (2019), which requires expensive eigen-decomposition, by proposing a highly scalable Rayleigh-quotient stochastic gradient algorithm. Our fast algorithm allows us to reduce the computational cost of splitting to the same level of typical back-propagation updates and enables efficient implementation on GPU. Extensive empirical results show that our method can train highly accurate and energy-efficient networks on challenging datasets such as ImageNet, improving a variety of baselines, including the pruning-based methods and expert-designed architectures.

preprint2020arXiv

Generalized exceptional quantum walk search

We mainly study exceptional configuration for coined quantum walk search. For searching on a two-dimensional grid by AKR algorithm, we find some new classes of exceptional configurations that cannot be found by the AKR algorithm effectively and the known diagonal configuration can be regarded as its special case. Meanwhile, we give two modified quantum walk models that can improve the success probability in the exceptional configurations by numerical simulation. Furthermore, we introduce the concept of generalized exceptional configuration and consider search by quantum walk on a cycle with Grover coin. We find that the most natural coin combination model (G,-), where G is a Grover diffusion transformation, is a generalized exceptional configuration when just searching one marked vertex on the cycle. In the end, we find generalized exceptional configuration has a different evolution of quantum coherence from exceptional configuration. These extend largely the range of exceptional configuration of quantum walk search in some sense.

preprint2020arXiv

Improving Efficiency in Neural Network Accelerator Using Operands Hamming Distance optimization

Neural network accelerator is a key enabler for the on-device AI inference, for which energy efficiency is an important metric. The data-path energy, including the computation energy and the data movement energy among the arithmetic units, claims a significant part of the total accelerator energy. By revisiting the basic physics of the arithmetic logic circuits, we show that the data-path energy is highly correlated with the bit flips when streaming the input operands into the arithmetic units, defined as the hamming distance of the input operand matrices. Based on the insight, we propose a post-training optimization algorithm and a hamming-distance-aware training algorithm to co-design and co-optimize the accelerator and the network synergistically. The experimental results based on post-layout simulation with MobileNetV2 demonstrate on average 2.85X data-path energy reduction and up to 8.51X data-path energy reduction for certain layers.

preprint2020arXiv

Modeling light-controlled actuation of flexible magnetic composite structures using the finite element method (FEM)

Photoactive materials hold great promise for a variety of applications. We present a finite element model of light-controlled flexible magnetic composite structure composed of 33.3% Chromium dioxide (CrO2) and 66.7% Polydimethylsiloxane (PDMS) by weight. The structure has a dimension of 8 mm x 2 mm x 100 um and has been previously experimentally studied. Due to the low Curie temperature, the structure acts as an actuator, shows significant deflection under the external magnetic field and relaxation due to laser heating. Thermal and magnetic deflection analysis has been performed using the FEM model. The simulation results show a maximum structural deflection of 6.08 mm (76% of the length of the structure) when subjected to 30 mT magnetic flux density and 160 mW laser power at 303 K (room temperature). We will present the results of the simulation model and comparison to experimental data reproducing the previously observed motion of the (CrO2+PDMS). This model will enable future fracture and fatigue analysis as well as extension to new photoactive geometries.

preprint2020arXiv

Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization

ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is one of the most authoritative academic competitions in the field of Computer Vision (CV) in recent years. But applying ILSVRC's annual champion directly to fine-grained visual categorization (FGVC) tasks does not achieve good performance. To FGVC tasks, the small inter-class variations and the large intra-class variations make it a challenging problem. Our attention object location module (AOLM) can predict the position of the object and attention part proposal module (APPM) can propose informative part regions without the need of bounding-box or part annotations. The obtained object images not only contain almost the entire structure of the object, but also contains more details, part images have many different scales and more fine-grained features, and the raw images contain the complete object. The three kinds of training images are supervised by our multi-branch network. Therefore, our multi-branch and multi-scale learning network(MMAL-Net) has good classification ability and robustness for images of different scales. Our approach can be trained end-to-end, while provides short inference time. Through the comprehensive experiments demonstrate that our approach can achieves state-of-the-art results on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets. Our code will be available at https://github.com/ZF1044404254/MMAL-Net

preprint2020arXiv

Normalized solutions for a coupled fractional schrodinger system in low dimensions

We consider the following coupled fractional Schrödinger system: \begin{equation*} \left\{ \begin{aligned} &(-Δ)^su+λ_1u=μ_1|u|^{2p-2}u+β|v|^p|u|^{p-2}u\\ &(-Δ)^sv+λ_2v=μ_2|v|^{2p-2}v+β|u|^p|v|^{p-2}v\\ \end{aligned} \right.\quad\text{in}~{\mathbb{R}^N}, \end{equation*} with $0<s<1$, $2s<N\le 4s$ and $1+\frac{2s}{N}<p<\frac{N}{N-2s}$, under the following constraint \begin{align*} \int_{\mathbb{R}^N}|u|^2dx=a_1^2\quad\text{and}\quad \int_{\mathbb{R}^N}|v|^2dx=a_2^2. \end{align*} Assuming that the parameters $μ_1,μ_2,a_1, a_2$ are fixed quantities, we prove the existence of normalized solution for different ranges of the coupling parameter $β>0$ .

preprint2020arXiv

SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks

Machine learning techniques have been widely applied in Internet companies for various tasks, acting as an essential driving force, and feature engineering has been generally recognized as a crucial tache when constructing machine learning systems. Recently, a growing effort has been made to the development of automatic feature engineering methods, so that the substantial and tedious manual effort can be liberated. However, for industrial tasks, the efficiency and scalability of these methods are still far from satisfactory. In this paper, we proposed a staged method named SAFE (Scalable Automatic Feature Engineering), which can provide excellent efficiency and scalability, along with requisite interpretability and promising performance. Extensive experiments are conducted and the results show that the proposed method can provide prominent efficiency and competitive effectiveness when comparing with other methods. What&#39;s more, the adequate scalability of the proposed method ensures it to be deployed in large scale industrial tasks.

preprint2020arXiv

Scalable Double Regularization for 3D Nano-CT Reconstruction

Nano-CT (computerized tomography) has emerged as a non-destructive high-resolution cross-sectional imaging technique to effectively study the sub-$μ$m pore structure of shale, which is of fundamental importance to the evaluation and development of shale oil and gas. Nano-CT poses unique challenges to the inverse problem of reconstructing the 3D structure due to the lower signal-to-noise ratio (than Micro-CT) at the nano-scale, increased sensitivity to the misaligned geometry caused by the movement of object manipulator, limited sample size, and a larger volume of data at higher resolution. In this paper, we propose a scalable double regularization (SDR) method to utilize the entire dataset for simultaneous 3D structural reconstruction across slices through total variation regularization within slices and $L_1$ regularization between adjacent slices. SDR allows information borrowing both within and between slices, contrasting with the traditional methods that usually build on slice by slice reconstruction. We develop a scalable and memory-efficient algorithm by exploiting the systematic sparsity and consistent geometry induced by such Nano-CT data. We illustrate the proposed method using synthetic data and two Nano-CT imaging datasets of Jiulaodong (JLD) shale and Longmaxi (LMX) shale acquired in the Sichuan Basin. These numerical experiments show that the proposed method substantially outperforms selected alternatives both visually and quantitatively.

preprint2020arXiv

SID: Incremental Learning for Anchor-Free Object Detection via Selective and Inter-Related Distillation

Incremental learning requires a model to continually learn new tasks from streaming data. However, traditional fine-tuning of a well-trained deep neural network on a new task will dramatically degrade performance on the old task -- a problem known as catastrophic forgetting. In this paper, we address this issue in the context of anchor-free object detection, which is a new trend in computer vision as it is simple, fast, and flexible. Simply adapting current incremental learning strategies fails on these anchor-free detectors due to lack of consideration of their specific model structures. To deal with the challenges of incremental learning on anchor-free object detectors, we propose a novel incremental learning paradigm called Selective and Inter-related Distillation (SID). In addition, a novel evaluation metric is proposed to better assess the performance of detectors under incremental learning conditions. By selective distilling at the proper locations and further transferring additional instance relation knowledge, our method demonstrates significant advantages on the benchmark datasets PASCAL VOC and COCO.

preprint2020arXiv

Simulation Comparisons of Vehicle-based and Phase-based Traffic Control for Autonomous Vehicles at Isolated Intersections

With the advent of autonomous driving technologies, traffic control at intersections is expected to experience revolutionary changes. Various novel intersection control methods have been proposed in the existing literature, and they can be roughly divided into two categories: vehicle-based traffic control and phase-based traffic control. Phase-based traffic control can be treated as updated versions of the current intersection signal control with the incorporation of the performance of autonomous vehicle functions. Meanwhile, vehicle-based traffic control utilizes some brand-new methods, mostly in real-time fashion, to organize traffic at intersections for safe and efficient vehicle passages. However, to date, no systematic comparison between these two control categories has been performed to suggest their advantages and disadvantages. This paper conducts a series of numerical simulations under various traffic scenarios to perform a fair comparison of their performances. Specifically, we allow trajectory adjustments of incoming vehicles under phasebased traffic control, while for its vehicle-based counterpart, we implement two strategies, i.e., the first-come-first-serve strategy and the conflict-point based rolling-horizon optimization strategy. Overall, the simulation results show that vehicle-based traffic control generally incurs a negligible delay when traffic demand is low but lead to an excessive queuing time as the traffic volume becomes high. However, performance of vehicle-based traffic control may benefit from reduction in conflicting vehicle pairs. We also discovered that when autonomous driving technologies are not mature, the advantages of phase-based traffic control are much more distinct.

preprint2020arXiv

TimingCamouflage+: Netlist Security Enhancement with Unconventional Timing (with Appendix)

With recent advances in reverse engineering, attackers can reconstruct a netlist to counterfeit chips by opening the die and scanning all layers of authentic chips. This relatively easy counterfeiting is made possible by the use of the standard simple clocking scheme, where all combinational blocks function within one clock period, so that a netlist of combinational logic gates and flip-flops is sufficient to duplicate a design. In this paper, we propose to invalidate the assumption that a netlist completely represents the function of a circuit with unconventional timing. With the introduced wave-pipelining paths, attackers have to capture gate and interconnect delays during reverse engineering, or to test a huge number of combinational paths to identify the wave-pipelining paths. To hinder the test-based attack, we construct false paths with wave-pipelining to increase the counterfeiting challenge. Experimental results confirm that wave-pipelining true paths and false paths can be constructed in benchmark circuits successfully with only a negligible cost, thus thwarting the potential attack techniques.

preprint2020arXiv

Unsaturated Single Atoms on Monolayer Transition Metal Dichalcogenides for Ultrafast Hydrogen Evolution

Large scale implementation of electrochemical water splitting for hydrogen evolution requires cheap and efficient catalysts to replace expensive platinum. Molybdenum disulfide is one of the most promising alternative catalysts but its intrinsic activity is still inferior to platinum. There is therefore a need to explore new active site origins in molybdenum disulfide with ultrafast reaction kinetics and to understand their mechanisms. Here, we report a universal cold hydrogen plasma reduction method for synthesizing different single atoms sitting on two-dimensional monolayers. In case of molybdenum disulfide, we design and identify a new type of active site, i.e., unsaturated Mo single atoms on cogenetic monolayer molybdenum disulfide. The catalyst shows exceptional intrinsic activity with a Tafel slope of 35.1 mV dec-1 and a turnover frequency of ~10^3 s-1 at 100 mV, based on single flake microcell measurements. Theoretical studies indicate that coordinately unsaturated Mo single atoms sitting on molybdenum disulfide increase the bond strength between adsorbed hydrogen atoms and the substrates through hybridization, leading to fast hydrogen adsorption/desorption kinetics and superior hydrogen evolution activity. This work shines fresh light on preparing highly-efficient electrocatalysts for water splitting and other electrochemical processes, as well as provides a general method to synthesize single atoms on two-dimensional monolayers.

preprint2019arXiv

Function-on-Scalar Quantile Regression with Application to Mass Spectrometry Proteomics Data

Mass spectrometry proteomics, characterized by spiky, spatially heterogeneous functional data, can be used to identify potential cancer biomarkers. Existing mass spectrometry analyses utilize mean regression to detect spectral regions that are differentially expressed across groups. However, given the inter-patient heterogeneity that is a key hallmark of cancer, many biomarkers are only present at aberrant levels for a subset of, not all, cancer samples. Differences in these biomarkers can easily be missed by mean regression, but might be more easily detected by quantile-based approaches. Thus, we propose a unified Bayesian framework to perform quantile regression on functional responses. Our approach utilizes an asymmetric Laplace working likelihood, represents the functional coefficients with basis representations which enable borrowing of strength from nearby locations, and places a global-local shrinkage prior on the basis coefficients to achieve adaptive regularization. Different types of basis transform and continuous shrinkage priors can be used in our framework. A scalable Gibbs sampler is developed to generate posterior samples that can be used to perform Bayesian estimation and inference while accounting for multiple testing. Our framework performs quantile regression and coefficient regularization in a unified manner, allowing them to inform each other and leading to improvement in performance over competing methods as demonstrated by simulation studies. We also introduce an adjustment procedure to the model to improve its frequentist properties of posterior inference. We apply our model to identify proteomic biomarkers of pancreatic cancer that are differentially expressed for a subset of cancer patients compared to the normal controls, which were missed by previous mean-regression based approaches. Supplementary materials for this article are available online.

preprint2019arXiv

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference. The in-depth characterization of production-grade recommendation models shows that embedding operations with high model-, operator- and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. We propose RecNMP which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models. RecNMP is specifically tailored to production environments with heavy co-location of operators on a single server. Several hardware/software co-optimization techniques such as memory-side caching, table-aware packet scheduling, and hot entry profiling are studied, resulting in up to 9.8x memory latency speedup over a highly-optimized baseline. Overall, RecNMP offers 4.2x throughput improvement and 45.8% memory energy savings.