Source author record

Jun Zhu

Jun Zhu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

110works

30topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Pattern

Visible-thermal (RGB-T) object detection is a crucial technology for applications such as autonomous driving, where multimodal fusion enhances performance in challenging conditions like low light. However, the security of RGB-T detectors, particularly in the physical world, has been largely overlooked. This paper proposes a novel approach to RGB-T physical attacks using adversarial clothing with a non-overlapping RGB-T pattern (NORP). To simulate full-view (0$^{\circ}$--360$^{\circ}$) RGB-T attacks, we construct 3D RGB-T models for human and adversarial clothing. NORP is a new adversarial pattern design using distinct visible and thermal materials without overlap, avoiding the light reduction in overlapping RGB-T patterns (ORP). To optimize the NORP on adversarial clothing, we propose a spatial discrete-continuous optimization (SDCO) method. We systematically evaluated our method on RGB-T detectors with different fusion architectures, demonstrating high attack success rates both in the digital and physical worlds. Additionally, we introduce a fusion-stage ensemble method that enhances the transferability of adversarial attacks across unseen RGB-T detectors with different fusion architectures.

preprint2025arXiv

Audio Super-Resolution with Latent Bridge Models

Audio super-resolution (SR), i.e., upsampling the low-resolution (LR) waveform to the high-resolution (HR) version, has recently been explored with diffusion and bridge models, while previous methods often suffer from sub-optimal upsampling quality due to their uninformative generation prior. Towards high-quality audio super-resolution, we present a new system with latent bridge models (LBMs), where we compress the audio waveform into a continuous latent space and design an LBM to enable a latent-to-latent generation process that naturally matches the LR-toHR upsampling process, thereby fully exploiting the instructive prior information contained in the LR waveform. To further enhance the training results despite the limited availability of HR samples, we introduce frequency-aware LBMs, where the prior and target frequency are taken as model input, enabling LBMs to explicitly learn an any-to-any upsampling process at the training stage. Furthermore, we design cascaded LBMs and present two prior augmentation strategies, where we make the first attempt to unlock the audio upsampling beyond 48 kHz and empower a seamless cascaded SR process, providing higher flexibility for audio post-production. Comprehensive experimental results evaluated on the VCTK, ESC-50, Song-Describer benchmark datasets and two internal testsets demonstrate that we achieve state-of-the-art objective and perceptual quality for any-to-48kHz SR across speech, audio, and music signals, as well as setting the first record for any-to-192kHz audio SR. Demo at https://AudioLBM.github.io/.

preprint2024arXiv

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or improve the robustness of a policy to an unexpected perturbance. We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithm that discovers multiple strategies for solving a given task. Unlike prior work, it achieves this with a shared policy network trained over a single run. Specifically, we design an intrinsic reward based on an information-theoretic diversity objective. Our final objective alternately constraints on the diversity of the strategies and on the extrinsic reward. We solve the constrained optimization problem by casting it as a probabilistic inference task and use policy iteration to maximize the derived lower bound. Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks. Compared to baseline methods, DGPO achieves comparable rewards, while discovering more diverse strategies, and often with better sample efficiency.

preprint2022arXiv

A Roadmap for Big Model

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

preprint2022arXiv

A Semismooth Newton based Augmented Lagrangian Method for Nonsmooth Optimization on Matrix Manifolds

This paper is devoted to studying an augmented Lagrangian method for solving a class of manifold optimization problems, which have nonsmooth objective functions and nonlinear constraints. Under the constant positive linear dependence condition on manifolds, we show that the proposed method converges to a stationary point of the nonsmooth manifold optimization problem. Moreover, we propose a globalized semismooth Newton method to solve the augmented Lagrangian subproblem on manifolds efficiently. The local superlinear convergence of the manifold semismooth Newton method is also established under some suitable conditions. We also prove that the semismoothness on submanifolds can be inherited from that in the ambient manifold. Finally, numerical experiments on compressed modes and (constrained) sparse principal component analysis illustrate the advantages of the proposed method.

preprint2022arXiv

BadDet: Backdoor Attacks on Object Detection

Deep learning models have been deployed in numerous real-world applications such as autonomous driving and surveillance. However, these models are vulnerable in adversarial environments. Backdoor attack is emerging as a severe security threat which injects a backdoor trigger into a small portion of training data such that the trained model behaves normally on benign inputs but gives incorrect predictions when the specific trigger appears. While most research in backdoor attacks focuses on image classification, backdoor attacks on object detection have not been explored but are of equal importance. Object detection has been adopted as an important module in various security-sensitive applications such as autonomous driving. Therefore, backdoor attacks on object detection could pose severe threats to human lives and properties. We propose four kinds of backdoor attacks for object detection task: 1) Object Generation Attack: a trigger can falsely generate an object of the target class; 2) Regional Misclassification Attack: a trigger can change the prediction of a surrounding object to the target class; 3) Global Misclassification Attack: a single trigger can change the predictions of all objects in an image to the target class; and 4) Object Disappearance Attack: a trigger can make the detector fail to detect the object of the target class. We develop appropriate metrics to evaluate the four backdoor attacks on object detection. We perform experiments using two typical object detection models -- Faster-RCNN and YOLOv3 on different datasets. More crucially, we demonstrate that even fine-tuning on another benign dataset cannot remove the backdoor hidden in the object detection model. To defend against these backdoor attacks, we propose Detector Cleanse, an entropy-based run-time detection framework to identify poisoned testing samples for any deployed object detector.

preprint2022arXiv

Boosting Transferability of Targeted Adversarial Examples via Hierarchical Generative Networks

Transfer-based adversarial attacks can evaluate model robustness in the black-box setting. Several methods have demonstrated impressive untargeted transferability, however, it is still challenging to efficiently produce targeted transferability. To this end, we develop a simple yet effective framework to craft targeted transfer-based adversarial examples, applying a hierarchical generative network. In particular, we contribute to amortized designs that well adapt to multi-class targeted attacks. Extensive experiments on ImageNet show that our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods -- it reaches an average success rate of 29.1\% against six diverse models based only on one substitute white-box model, which significantly outperforms the state-of-the-art gradient-based attack methods. Moreover, the proposed method is also more efficient beyond an order of magnitude than gradient-based methods.

preprint2022arXiv

Controllable Evaluation and Generation of Physical Adversarial Patch on Face Recognition

Recent studies have revealed the vulnerability of face recognition models against physical adversarial patches, which raises security concerns about the deployed face recognition systems. However, it is still challenging to ensure the reproducibility for most attack algorithms under complex physical conditions, which leads to the lack of a systematic evaluation of the existing methods. It is therefore imperative to develop a framework that can enable a comprehensive evaluation of the vulnerability of face recognition in the physical world. To this end, we propose to simulate the complex transformations of faces in the physical world via 3D-face modeling, which serves as a digital counterpart of physical faces. The generic framework allows us to control different face variations and physical conditions to conduct reproducible evaluations comprehensively. With this digital simulator, we further propose a Face3DAdv method considering the 3D face transformations and realistic physical variations. Extensive experiments validate that Face3DAdv can significantly improve the effectiveness of diverse physically realizable adversarial patches in both simulated and physical environments, against various white-box and black-box face recognition models.

preprint2022arXiv

CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One

Continual learning requires incremental compatibility with a sequence of tasks. However, the design of model architecture remains an open question: In general, learning all tasks with a shared set of parameters suffers from severe interference between tasks; while learning each task with a dedicated parameter subspace is limited by scalability. In this work, we theoretically analyze the generalization errors for learning plasticity and memory stability in continual learning, which can be uniformly upper-bounded by (1) discrepancy between task distributions, (2) flatness of loss landscape and (3) cover of parameter space. Then, inspired by the robust biological learning system that processes sequential experiences with multiple parallel compartments, we propose Cooperation of Small Continual Learners (CoSCL) as a general strategy for continual learning. Specifically, we present an architecture with a fixed number of narrower sub-networks to learn all incremental tasks in parallel, which can naturally reduce the two errors through improving the three components of the upper bound. To strengthen this advantage, we encourage to cooperate these sub-networks by penalizing the difference of predictions made by their feature representations. With a fixed parameter budget, CoSCL can improve a variety of representative continual learning approaches by a large margin (e.g., up to 10.64% on CIFAR-100-SC, 9.33% on CIFAR-100-RS, 11.45% on CUB-200-2011 and 6.72% on Tiny-ImageNet) and achieve the new state-of-the-art performance.

preprint2022arXiv

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR. This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. Using box coordinates not only helps using explicit positional priors to improve the query-to-feature similarity and eliminate the slow training convergence issue in DETR, but also allows us to modulate the positional attention map using the box width and height information. Such a design makes it clear that queries in DETR can be implemented as performing soft ROI pooling layer-by-layer in a cascade manner. As a result, it leads to the best performance on MS-COCO benchmark among the DETR-like detection models under the same setting, e.g., AP 45.7\% using ResNet50-DC5 as backbone trained in 50 epochs. We also conducted extensive experiments to confirm our analysis and verify the effectiveness of our methods. Code is available at \url{https://github.com/SlongLiu/DAB-DETR}.

preprint2022arXiv

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

We present DINO (\textbf{D}ETR with \textbf{I}mproved de\textbf{N}oising anch\textbf{O}r boxes), a state-of-the-art end-to-end object detector. % in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrastive way for denoising training, a mixed query selection method for anchor initialization, and a look forward twice scheme for box prediction. DINO achieves $49.4$AP in $12$ epochs and $51.3$AP in $24$ epochs on COCO with a ResNet-50 backbone and multi-scale features, yielding a significant improvement of $\textbf{+6.0}$\textbf{AP} and $\textbf{+2.7}$\textbf{AP}, respectively, compared to DN-DETR, the previous best DETR-like model. DINO scales well in both model size and data size. Without bells and whistles, after pre-training on the Objects365 dataset with a SwinL backbone, DINO obtains the best results on both COCO \texttt{val2017} ($\textbf{63.2}$\textbf{AP}) and \texttt{test-dev} (\textbf{$\textbf{63.3}$AP}). Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results. Our code will be available at \url{https://github.com/IDEACVR/DINO}.

preprint2022arXiv

Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models

Diffusion probabilistic models (DPMs) are a class of powerful deep generative models (DGMs). Despite their success, the iterative generation process over the full timesteps is much less efficient than other DGMs such as GANs. Thus, the generation performance on a subset of timesteps is crucial, which is greatly influenced by the covariance design in DPMs. In this work, we consider diagonal and full covariances to improve the expressive power of DPMs. We derive the optimal result for such covariances, and then correct it when the mean of DPMs is imperfect. Both the optimal and the corrected ones can be decomposed into terms of conditional expectations over functions of noise. Building upon it, we propose to estimate the optimal covariance and its correction given imperfect mean by learning these conditional expectations. Our method can be applied to DPMs with both discrete and continuous timesteps. We consider the diagonal covariance in our implementation for computational efficiency. For an efficient practical implementation, we adopt a parameter sharing scheme and a two-stage training process. Empirically, our method outperforms a wide variety of covariance design on likelihood results, and improves the sample quality especially on a small number of timesteps.

preprint2022arXiv

Exploring Memorization in Adversarial Training

Deep learning models have a propensity for fitting the entire training set even with random labels, which requires memorization of every training sample. In this paper, we explore the memorization effect in adversarial training (AT) for promoting a deeper understanding of model capacity, convergence, generalization, and especially robust overfitting of the adversarially trained models. We first demonstrate that deep networks have sufficient capacity to memorize adversarial examples of training data with completely random labels, but not all AT algorithms can converge under the extreme circumstance. Our study of AT with random labels motivates further analyses on the convergence and generalization of AT. We find that some AT approaches suffer from a gradient instability issue and most recently suggested complexity measures cannot explain robust generalization by considering models trained on random labels. Furthermore, we identify a significant drawback of memorization in AT that it could result in robust overfitting. We then propose a new mitigation algorithm motivated by detailed memorization analyses. Extensive experiments on various datasets validate the effectiveness of the proposed method.

preprint2022arXiv

Fast Lossless Neural Compression with Integer-Only Discrete Flows

By applying entropy codecs with learned data distributions, neural compressors have significantly outperformed traditional codecs in terms of compression ratio. However, the high inference latency of neural networks hinders the deployment of neural compressors in practical applications. In this work, we propose Integer-only Discrete Flows (IODF), an efficient neural compressor with integer-only arithmetic. Our work is built upon integer discrete flows, which consists of invertible transformations between discrete random variables. We propose efficient invertible transformations with integer-only arithmetic based on 8-bit quantization. Our invertible transformation is equipped with learnable binary gates to remove redundant filters during inference. We deploy IODF with TensorRT on GPUs, achieving 10x inference speedup compared to the fastest existing neural compressors, while retaining the high compression rates on ImageNet32 and ImageNet64.

preprint2022arXiv

GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing

Certified defenses such as randomized smoothing have shown promise towards building reliable machine learning systems against $\ell_p$-norm bounded attacks. However, existing methods are insufficient or unable to provably defend against semantic transformations, especially those without closed-form expressions (such as defocus blur and pixelate), which are more common in practice and often unrestricted. To fill up this gap, we propose generalized randomized smoothing (GSmooth), a unified theoretical framework for certifying robustness against general semantic transformations via a novel dimension augmentation strategy. Under the GSmooth framework, we present a scalable algorithm that uses a surrogate image-to-image network to approximate the complex transformation. The surrogate model provides a powerful tool for studying the properties of semantic transformations and certifying robustness. Experimental results on several datasets demonstrate the effectiveness of our approach for robustness certification against multiple kinds of semantic transformations and corruptions, which is not achievable by the alternative baselines.

preprint2022arXiv

Maximum Likelihood Training for Score-Based Diffusion ODEs by High-Order Denoising Score Matching

Score-based generative models have excellent performance in terms of generation quality and likelihood. They model the data distribution by matching a parameterized score network with first-order data score functions. The score network can be used to define an ODE ("score-based diffusion ODE") for exact likelihood evaluation. However, the relationship between the likelihood of the ODE and the score matching objective is unclear. In this work, we prove that matching the first-order score is not sufficient to maximize the likelihood of the ODE, by showing a gap between the maximum likelihood and score matching objectives. To fill up this gap, we show that the negative likelihood of the ODE can be bounded by controlling the first, second, and third-order score matching errors; and we further present a novel high-order denoising score matching method to enable maximum likelihood training of score-based diffusion ODEs. Our algorithm guarantees that the higher-order matching error is bounded by the training error and the lower-order errors. We empirically observe that by high-order score matching, score-based diffusion ODEs achieve better likelihood on both synthetic data and CIFAR-10, while retaining the high generation quality.

preprint2022arXiv

Memory Replay with Data Compression for Continual Learning

Continual learning needs to overcome catastrophic forgetting of the past. Memory replay of representative old training samples has been shown as an effective solution, and achieves the state-of-the-art (SOTA) performance. However, existing work is mainly built on a small memory buffer containing a few original data, which cannot fully characterize the old data distribution. In this work, we propose memory replay with data compression (MRDC) to reduce the storage cost of old training samples and thus increase their amount that can be stored in the memory buffer. Observing that the trade-off between the quality and quantity of compressed data is highly nontrivial for the efficacy of memory replay, we propose a novel method based on determinantal point processes (DPPs) to efficiently determine an appropriate compression quality for currently-arrived training samples. In this way, using a naive data compression algorithm with a properly selected quality can largely boost recent strong baselines by saving more compressed data in a limited storage space. We extensively validate this across several benchmarks of class-incremental learning and in a realistic scenario of object detection for autonomous driving.

preprint2022arXiv

OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization

Deep learning has achieved tremendous success with independent and identically distributed (i.i.d.) data. However, the performance of neural networks often degenerates drastically when encountering out-of-distribution (OoD) data, i.e., when training and test data are sampled from different distributions. While a plethora of algorithms have been proposed for OoD generalization, our understanding of the data used to train and evaluate these algorithms remains stagnant. In this work, we first identify and measure two distinct kinds of distribution shifts that are ubiquitous in various datasets. Next, through extensive experiments, we compare OoD generalization algorithms across two groups of benchmarks, each dominated by one of the distribution shifts, revealing their strengths on one shift as well as limitations on the other shift. Overall, we position existing datasets and algorithms from different research areas seemingly unconnected into the same coherent picture. It may serve as a foothold that can be resorted to by future OoD generalization research. Our code is available at https://github.com/ynysjtu/ood_bench.

preprint2022arXiv

Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. Simulation-based training can alleviate this issue, but may suffer from its inherent mismatches from the simulator and real environment. It is therefore imperative to utilize the simulator to learn a robust policy for the real-world deployment. In this work, we consider policy learning for Robust Markov Decision Processes (RMDP), where the agent tries to seek a robust policy with respect to unexpected perturbations on the environments. Specifically, we focus on the setting where the training environment can be characterized as a generative model and a constrained perturbation can be added to the model during testing. Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional technical difficulties as we need to simultaneously estimate the training environment uncertainty from samples and find the worst-case perturbation for testing. To solve this issue, we propose a generic method which formalizes the perturbation as an opponent to obtain a two-player zero-sum game, and further show that the Nash Equilibrium corresponds to the robust policy. We prove that, with a polynomial number of samples from the generative model, our algorithm can find a near-optimal robust policy with a high probability. Our method is able to deal with general perturbations under some mild assumptions and can also be extended to more complex problems like robust partial observable Markov decision process, thanks to the game-theoretical formulation.

preprint2022arXiv

Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based Prior

Adversarial attacks have been extensively studied in recent years since they can identify the vulnerability of deep learning models before deployed. In this paper, we consider the black-box adversarial setting, where the adversary needs to craft adversarial examples without access to the gradients of a target model. Previous methods attempted to approximate the true gradient either by using the transfer gradient of a surrogate white-box model or based on the feedback of model queries. However, the existing methods inevitably suffer from low attack success rates or poor query efficiency since it is difficult to estimate the gradient in a high-dimensional input space with limited information. To address these problems and improve black-box attacks, we propose two prior-guided random gradient-free (PRGF) algorithms based on biased sampling and gradient averaging, respectively. Our methods can take the advantage of a transfer-based prior given by the gradient of a surrogate model and the query information simultaneously. Through theoretical analyses, the transfer-based prior is appropriately integrated with model queries by an optimal coefficient in each method. Extensive experiments demonstrate that, in comparison with the alternative state-of-the-arts, both of our methods require much fewer queries to attack black-box models with higher success rates.

preprint2022arXiv

Regret Analysis for Hierarchical Experts Bandit Problem

We study an extension of standard bandit problem in which there are R layers of experts. Multi-layered experts make selections layer by layer and only the experts in the last layer can play arms. The goal of the learning policy is to minimize the total regret in this hierarchical experts setting. We first analyze the case that total regret grows linearly with the number of layers. Then we focus on the case that all experts are playing Upper Confidence Bound (UCB) strategy and give several sub-linear upper bounds for different circumstances. Finally, we design some experiments to help the regret analysis for the general case of hierarchical UCB structure and show the practical significance of our theoretical results. This article gives many insights about reasonable hierarchical decision structure.

preprint2022arXiv

Robust Learning of Deep Time Series Anomaly Detection Models with Contaminated Training Data

Time series anomaly detection (TSAD) is an important data mining task with numerous applications in the IoT era. In recent years, a large number of deep neural network-based methods have been proposed, demonstrating significantly better performance than conventional methods on addressing challenging TSAD problems in a variety of areas. Nevertheless, these deep TSAD methods typically rely on a clean training dataset that is not polluted by anomalies to learn the "normal profile" of the underlying dynamics. This requirement is nontrivial since a clean dataset can hardly be provided in practice. Moreover, without the awareness of their robustness, blindly applying deep TSAD methods with potentially contaminated training data can possibly incur significant performance degradation in the detection phase. In this work, to tackle this important challenge, we firstly investigate the robustness of commonly used deep TSAD methods with contaminated training data which provides a guideline for applying these methods when the provided training data are not guaranteed to be anomaly-free. Furthermore, we propose a model-agnostic method which can effectively improve the robustness of learning mainstream deep TSAD models with potentially contaminated data. Experiment results show that our method can consistently prevent or mitigate performance degradation of mainstream deep TSAD models on widely used benchmark datasets.

preprint2022arXiv

Robustness and Accuracy Could Be Reconcilable by (Proper) Definition

The trade-off between robustness and accuracy has been widely studied in the adversarial literature. Although still controversial, the prevailing view is that this trade-off is inherent, either empirically or theoretically. Thus, we dig for the origin of this trade-off in adversarial training and find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance -- an overcorrection towards smoothness. Given this, we advocate employing local equivariance to describe the ideal behavior of a robust model, leading to a self-consistent robust error named SCORE. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty via robust optimization. By simply substituting KL divergence with variants of distance metrics, SCORE can be efficiently minimized. Empirically, our models achieve top-rank performance on RobustBench under AutoAttack. Besides, SCORE provides instructive insights for explaining the overfitting phenomenon and semantic input gradients observed on robust models. Code is available at https://github.com/P2333/SCORE.

preprint2022arXiv

Thompson Sampling for (Combinatorial) Pure Exploration

Existing methods of combinatorial pure exploration mainly focus on the UCB approach. To make the algorithm efficient, they usually use the sum of upper confidence bounds within arm set $S$ to represent the upper confidence bound of $S$, which can be much larger than the tight upper confidence bound of $S$ and leads to a much higher complexity than necessary, since the empirical means of different arms in $S$ are independent. To deal with this challenge, we explore the idea of Thompson Sampling (TS) that uses independent random samples instead of the upper confidence bounds, and design the first TS-based algorithm TS-Explore for (combinatorial) pure exploration. In TS-Explore, the sum of independent random samples within arm set $S$ will not exceed the tight upper confidence bound of $S$ with high probability. Hence it solves the above challenge, and achieves a lower complexity upper bound than existing efficient UCB-based algorithms in general combinatorial pure exploration. As for pure exploration of classic multi-armed bandit, we show that TS-Explore achieves an asymptotically optimal complexity upper bound.

preprint2022arXiv

Tianshou: a Highly Modularized Deep Reinforcement Learning Library

In this paper, we present Tianshou, a highly modularized Python library for deep reinforcement learning (DRL) that uses PyTorch as its backend. Tianshou intends to be research-friendly by providing a flexible and reliable infrastructure of DRL algorithms. It supports online and offline training with more than 20 classic algorithms through a unified interface. To facilitate related research and prove Tianshou's reliability, we have released Tianshou's benchmark of MuJoCo environments, covering eight classic algorithms with state-of-the-art performance. We open-sourced Tianshou at https://github.com/thu-ml/tianshou/.

preprint2022arXiv

Towards Job-Transition-Tag Graph for a Better Job Title Representation Learning

Works on learning job title representation are mainly based on \textit{Job-Transition Graph}, built from the working history of talents. However, since these records are usually messy, this graph is very sparse, which affects the quality of the learned representation and hinders further analysis. To address this specific issue, we propose to enrich the graph with additional nodes that improve the quality of job title representation. Specifically, we construct \textit{Job-Transition-Tag Graph}, a heterogeneous graph containing two types of nodes, i.e., job titles and tags (i.e., words related to job responsibilities or functionalities). Along this line, we reformulate job title representation learning as the task of learning node embedding on the \textit{Job-Transition-Tag Graph}. Experiments on two datasets show the interest of our approach.

preprint2022arXiv

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

Correctly classifying adversarial examples is an essential but challenging requirement for safely deploying machine learning models. As reported in RobustBench, even the state-of-the-art adversarially trained models struggle to exceed 67% robust test accuracy on CIFAR-10, which is far from practical. A complementary way towards robustness is to introduce a rejection option, allowing the model to not return predictions on uncertain inputs, where confidence is a commonly used certainty proxy. Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones. This intriguing property sheds light on using coupling strategies to better detect and reject adversarial examples. We evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks including adaptive ones, and demonstrate that the RR module is compatible with different adversarial training frameworks on improving robustness, with little extra computation. The code is available at https://github.com/P2333/Rectified-Rejection.

preprint2021arXiv

Cognitive Visual Inspection Service for LCD Manufacturing Industry

With the rapid growth of display devices, quality inspection via machine vision technology has become increasingly important for flat-panel displays (FPD) industry. This paper discloses a novel visual inspection system for liquid crystal display (LCD), which is currently a dominant type in the FPD industry. The system is based on two cornerstones: robust/high-performance defect recognition model and cognitive visual inspection service architecture. A hybrid application of conventional computer vision technique and the latest deep convolutional neural network (DCNN) leads to an integrated defect detection, classfication and impact evaluation model that can be economically trained with only image-level class annotations to achieve a high inspection accuracy. In addition, the properly trained model is robust to the variation of the image qulity, significantly alleviating the dependency between the model prediction performance and the image aquisition environment. This in turn justifies the decoupling of the defect recognition functions from the front-end device to the back-end serivce, motivating the design and realization of the cognitive visual inspection service architecture. Empirical case study is performed on a large-scale real-world LCD dataset from a manufacturing line with different layers and products, which shows the promising utility of our system, which has been deployed in a real-world LCD manufacturing line from a major player in the world.

preprint2021arXiv

DNN2LR: Automatic Feature Crossing for Credit Scoring

Credit scoring is a major application of machine learning for financial institutions to decide whether to approve or reject a credit loan. For sake of reliability, it is necessary for credit scoring models to be both accurate and globally interpretable. Simple classifiers, e.g., Logistic Regression (LR), are white-box models, but not powerful enough to model complex nonlinear interactions among features. Fortunately, automatic feature crossing is a promising way to find cross features to make simple classifiers to be more accurate without heavy handcrafted feature engineering. However, credit scoring is usually based on different aspects of users, and the data usually contains hundreds of feature fields. This makes existing automatic feature crossing methods not efficient for credit scoring. In this work, we find local piece-wise interpretations in Deep Neural Networks (DNNs) of a specific feature are usually inconsistent in different samples, which is caused by feature interactions in the hidden layers. Accordingly, we can design an automatic feature crossing method to find feature interactions in DNN, and use them as cross features in LR. We give definition of the interpretation inconsistency in DNN, based on which a novel feature crossing method for credit scoring prediction called DNN2LR is proposed. Apparently, the final model, i.e., a LR model empowered with cross features, generated by DNN2LR is a white-box model. Extensive experiments have been conducted on both public and business datasets from real-world credit scoring applications. Experimental shows that, DNN2LR can outperform the DNN model, as well as several feature crossing methods. Moreover, comparing with the state-of-the-art feature crossing methods, i.e., AutoCross, DNN2LR can accelerate the speed for feature crossing by about 10 to 40 times on datasets with large numbers of feature fields.

preprint2021arXiv

Efficient Inference of Flexible Interaction in Spiking-neuron Networks

Hawkes process provides an effective statistical framework for analyzing the time-dependent interaction of neuronal spiking activities. Although utilized in many real applications, the classic Hawkes process is incapable of modelling inhibitory interactions among neurons. Instead, the nonlinear Hawkes process allows for a more flexible influence pattern with excitatory or inhibitory interactions. In this paper, three sets of auxiliary latent variables (Pólya-Gamma variables, latent marked Poisson processes and sparsity variables) are augmented to make functional connection weights in a Gaussian form, which allows for a simple iterative algorithm with analytical updates. As a result, an efficient expectation-maximization (EM) algorithm is derived to obtain the maximum a posteriori (MAP) estimate. We demonstrate the accuracy and efficiency performance of our algorithm on synthetic and real data. For real neural recordings, we show our algorithm can estimate the temporal dynamics of interaction and reveal the interpretable functional connectivity underlying neural spike trains.

preprint2021arXiv

Fork or Fail: Cycle-Consistent Training with Many-to-One Mappings

Cycle-consistent training is widely used for jointly learning a forward and inverse mapping between two domains of interest without the cumbersome requirement of collecting matched pairs within each domain. In this regard, the implicit assumption is that there exists (at least approximately) a ground-truth bijection such that a given input from either domain can be accurately reconstructed from successive application of the respective mappings. But in many applications no such bijection can be expected to exist and large reconstruction errors can compromise the success of cycle-consistent training. As one important instance of this limitation, we consider practically-relevant situations where there exists a many-to-one or surjective mapping between domains. To address this regime, we develop a conditional variational autoencoder (CVAE) approach that can be viewed as converting surjective mappings to implicit bijections whereby reconstruction errors in both directions can be minimized, and as a natural byproduct, realistic output diversity can be obtained in the one-to-many direction. As theoretical motivation, we analyze a simplified scenario whereby minima of the proposed CVAE-based energy function align with the recovery of ground-truth surjective mappings. On the empirical side, we consider a synthetic image dataset with known ground-truth, as well as a real-world application involving natural language generation from knowledge graphs and vice versa, a prototypical surjective case. For the latter, our CVAE pipeline can capture such many-to-one mappings during cycle training while promoting textural diversity for graph-to-text tasks. Our code is available at github.com/QipengGuo/CycleGT *A condensed version of this paper has been accepted to AISTATS 2021. This version contains additional content and updates.

preprint2021arXiv

ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning

Continual learning usually assumes the incoming data are fully labeled, which might not be applicable in real applications. In this work, we consider semi-supervised continual learning (SSCL) that incrementally learns from partially labeled data. Observing that existing continual learning methods lack the ability to continually exploit the unlabeled data, we propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN), which continually passes the learned data distribution to the classifier. In particular, ORDisCo replays data sampled from the conditional generator to the classifier in an online manner, exploiting unlabeled data in a time- and storage-efficient way. Further, to explicitly overcome the catastrophic forgetting of unlabeled data, we selectively stabilize parameters of the discriminator that are important for discriminating the pairs of old unlabeled data and their pseudo-labels predicted by the classifier. We extensively evaluate ORDisCo on various semi-supervised learning benchmark datasets for SSCL, and show that ORDisCo achieves significant performance improvement on SVHN, CIFAR10 and Tiny-ImageNet, compared to strong baselines.

preprint2021arXiv

Relaxed Conditional Image Transfer for Semi-supervised Domain Adaptation

Semi-supervised domain adaptation (SSDA), which aims to learn models in a partially labeled target domain with the assistance of the fully labeled source domain, attracts increasing attention in recent years. To explicitly leverage the labeled data in both domains, we naturally introduce a conditional GAN framework to transfer images without changing the semantics in SSDA. However, we identify a label-domination problem in such an approach. In fact, the generator tends to overlook the input source image and only memorizes prototypes of each class, which results in unsatisfactory adaptation performance. To this end, we propose a simple yet effective Relaxed conditional GAN (Relaxed cGAN) framework. Specifically, we feed the image without its label to our generator. In this way, the generator has to infer the semantic information of input data. We formally prove that its equilibrium is desirable and empirically validate its practical convergence and effectiveness in image transfer. Additionally, we propose several techniques to make use of unlabeled data in the target domain, enhancing the model in SSDA settings. We validate our method on the well-adopted datasets: Digits, DomainNet, and Office-Home. We achieve state-of-the-art performance on DomainNet, Office-Home and most digit benchmarks in low-resource and high-resource settings.

preprint2021arXiv

Rethinking Natural Adversarial Examples for Classification Models

Recently, it was found that many real-world examples without intentional modifications can fool machine learning models, and such examples are called "natural adversarial examples". ImageNet-A is a famous dataset of natural adversarial examples. By analyzing this dataset, we hypothesized that large, cluttered and/or unusual background is an important reason why the images in this dataset are difficult to be classified. We validated the hypothesis by reducing the background influence in ImageNet-A examples with object detection techniques. Experiments showed that the object detection models with various classification models as backbones obtained much higher accuracy than their corresponding classification models. A detection model based on the classification model EfficientNet-B7 achieved a top-1 accuracy of 53.95%, surpassing previous state-of-the-art classification models trained on ImageNet, suggesting that accurate localization information can significantly boost the performance of classification models on ImageNet-A. We then manually cropped the objects in images from ImageNet-A and created a new dataset, named ImageNet-A-Plus. A human test on the new dataset showed that the deep learning-based classifiers still performed quite poorly compared with humans. Therefore, the new dataset can be used to study the robustness of classification models to the internal variance of objects without considering the background disturbance.

preprint2020arXiv

A Wasserstein Minimum Velocity Approach to Learning Unnormalized Models

Score matching provides an effective approach to learning flexible unnormalized models, but its scalability is limited by the need to evaluate a second-order derivative. In this paper, we present a scalable approximation to a general family of learning objectives including score matching, by observing a new connection between these objectives and Wasserstein gradient flows. We present applications with promise in learning neural density estimators on manifolds, and training implicit variational and Wasserstein auto-encoders with a manifold-valued prior.

preprint2020arXiv

Analyzing the Noise Robustness of Deep Neural Networks

Adversarial examples, generated by adding small but intentionally imperceptible perturbations to normal examples, can mislead deep neural networks (DNNs) to make incorrect predictions. Although much work has been done on both adversarial attack and defense, a fine-grained understanding of adversarial examples is still lacking. To address this issue, we present a visual analysis method to explain why adversarial examples are misclassified. The key is to compare and analyze the datapaths of both the adversarial and normal examples. A datapath is a group of critical neurons along with their connections. We formulate the datapath extraction as a subset selection problem and solve it by constructing and training a neural network. A multi-level visualization consisting of a network-level visualization of data flows, a layer-level visualization of feature maps, and a neuron-level visualization of learned features, has been designed to help investigate how datapaths of adversarial and normal examples diverge and merge in the prediction process. A quantitative evaluation and a case study were conducted to demonstrate the promise of our method to explain the misclassification of adversarial examples.

preprint2020arXiv

Design and Interpretation of Universal Adversarial Patches in Face Detection

We consider universal adversarial patches for faces -- small visual elements whose addition to a face image reliably destroys the performance of face detectors. Unlike previous work that mostly focused on the algorithmic design of adversarial examples in terms of improving the success rate as an attacker, in this work we show an interpretation of such patches that can prevent the state-of-the-art face detectors from detecting the real faces. We investigate a phenomenon: patches designed to suppress real face detection appear face-like. This phenomenon holds generally across different initialization, locations, scales of patches, backbones, and state-of-the-art face detection frameworks. We propose new optimization-based approaches to automatic design of universal adversarial patches for varying goals of the attack, including scenarios in which true positives are suppressed without introducing false positives. Our proposed algorithms perform well on real-world datasets, deceiving state-of-the-art face detectors in terms of multiple precision/recall metrics and transferability.

preprint2020arXiv

Dynamic Window-level Granger Causality of Multi-channel Time Series

Granger causality method analyzes the time series causalities without building a complex causality graph. However, the traditional Granger causality method assumes that the causalities lie between time series channels and remain constant, which cannot model the real-world time series data with dynamic causalities along the time series channels. In this paper, we present the dynamic window-level Granger causality method (DWGC) for multi-channel time series data. We build the causality model on the window-level by doing the F-test with the forecasting errors on the sliding windows. We propose the causality indexing trick in our DWGC method to reweight the original time series data. Essentially, the causality indexing is to decrease the auto-correlation and increase the cross-correlation causal effects, which improves the DWGC method. Theoretical analysis and experimental results on two synthetic and one real-world datasets show that the improved DWGC method with causality indexing better detects the window-level causalities.

preprint2020arXiv

Improving Black-box Adversarial Attacks with a Transfer-based Prior

We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.

preprint2020arXiv

Learning Implicit Generative Models by Teaching Explicit Ones

Implicit generative models are difficult to train as no explicit density functions are defined. Generative adversarial nets (GANs) present a minimax framework to train such models, which however can suffer from mode collapse due to the nature of the JS-divergence. This paper presents a learning by teaching (LBT) approach to learning implicit models, which intrinsically avoids the mode collapse problem by optimizing a KL-divergence rather than the JS-divergence in GANs. In LBT, an auxiliary density estimator is introduced to fit the implicit model's distribution while the implicit model teaches the density estimator to match the data distribution. LBT is formulated as a bilevel optimization problem, whose optimal generator matches the true data distribution. LBT can be naturally integrated with GANs to derive a hybrid LBT-GAN that enjoys complimentary benefits. Finally, we present a stochastic gradient ascent algorithm with unrolling to solve the challenging learning problems. Experimental results demonstrate the effectiveness of our method.

preprint2020arXiv

Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks

It has been widely recognized that adversarial examples can be easily crafted to fool deep networks, which mainly root from the locally non-linear behavior nearby input examples. Applying mixup in training provides an effective mechanism to improve generalization performance and model robustness against adversarial perturbations, which introduces the globally linear behavior in-between training examples. However, in previous work, the mixup-trained models only passively defend adversarial attacks in inference by directly classifying the inputs, where the induced global linearity is not well exploited. Namely, since the locality of the adversarial perturbations, it would be more efficient to actively break the locality via the globality of the model predictions. Inspired by simple geometric intuition, we develop an inference principle, named mixup inference (MI), for mixup-trained models. MI mixups the input with other random clean samples, which can shrink and transfer the equivalent perturbation if the input is adversarial. Our experiments on CIFAR-10 and CIFAR-100 demonstrate that MI can further improve the adversarial robustness for the models trained by mixup and its variants.

preprint2020arXiv

Nonparametric Score Estimators

Estimating the score, i.e., the gradient of log density function, from a set of samples generated by an unknown distribution is a fundamental task in inference and learning of probabilistic models that involve flexible yet intractable densities. Kernel estimators based on Stein's methods or score matching have shown promise, however their theoretical properties and relationships have not been fully-understood. We provide a unifying view of these estimators under the framework of regularized nonparametric regression. It allows us to analyse existing estimators and construct new ones with desirable properties by choosing different hypothesis spaces and regularizers. A unified convergence analysis is provided for such estimators. Finally, we propose score estimators based on iterative regularization that enjoy computational benefits from curl-free kernels and fast convergence.

preprint2020arXiv

Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness

Previous work shows that adversarially robust generalization requires larger sample complexity, and the same dataset, e.g., CIFAR-10, which enables good standard accuracy may not suffice to train robust models. Since collecting new training data could be costly, we focus on better utilizing the given data by inducing the regions with high sample density in the feature space, which could lead to locally sufficient samples for robust learning. We first formally show that the softmax cross-entropy (SCE) loss and its variants convey inappropriate supervisory signals, which encourage the learned feature points to spread over the space sparsely in training. This inspires us to propose the Max-Mahalanobis center (MMC) loss to explicitly induce dense feature regions in order to benefit robustness. Namely, the MMC loss encourages the model to concentrate on learning ordered and compact representations, which gather around the preset optimal centers for different classes. We empirically demonstrate that applying the MMC loss can significantly improve robustness even under strong adaptive attacks, while keeping state-of-the-art accuracy on clean inputs with little extra computation compared to the SCE loss.

preprint2020arXiv

SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models

Standard variational lower bounds used to train latent variable models produce biased estimates of most quantities of interest. We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. If parameterized by an encoder-decoder architecture, the parameters of the encoder can be optimized to minimize its variance of this estimator. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost. This estimator also allows use of latent variable models for tasks where unbiased estimators, rather than marginal likelihood lower bounds, are preferred, such as minimizing reverse KL divergences and estimating score functions.

preprint2020arXiv

Superconducting proximity effect in a transparent van der Waals superconductor-metal junction

We report on Andreev reflections at clean NbSe2-bilayer graphene junctions. The high transparency of the junction, which manifests as a large conductance enhancement of up to 1.8, enables us to see clear evidence of a proximity-induced superconducting gap in bilayer graphene and two Andreev reflections through a vertical NbSe2-graphene and a lateral graphene-graphene junction respectively. Quantum transport simulations capture the complexity of the experimental data and illuminate the impact of various microscopic parameters on the transmission of the junction. Our work establishes the practice and understanding of an all-van-der-Waals, high-performance superconducting junction. The realization of a highly transparent proximized graphene-graphene junction opens up possibilities to engineer emergent quantum phenomena.

preprint2020arXiv

To Relieve Your Headache of Training an MRF, Take AdVIL

We propose a black-box algorithm called {\it Adversarial Variational Inference and Learning} (AdVIL) to perform inference and learning on a general Markov random field (MRF). AdVIL employs two variational distributions to approximately infer the latent variables and estimate the partition function of an MRF, respectively. The two variational distributions provide an estimate of the negative log-likelihood of the MRF as a minimax optimization problem, which is solved by stochastic gradient descent. AdVIL is proven convergent under certain conditions. On one hand, compared with contrastive divergence, AdVIL requires a minimal assumption about the model structure and can deal with a broader family of MRFs. On the other hand, compared with existing black-box methods, AdVIL provides a tighter estimate of the log partition function and achieves much better empirical results.

preprint2020arXiv

Triple Generative Adversarial Networks

We propose a unified game-theoretical framework to perform classification and conditional image generation given limited supervision. It is formulated as a three-player minimax game consisting of a generator, a classifier and a discriminator, and therefore is referred to as Triple Generative Adversarial Network (Triple-GAN). The generator and the classifier characterize the conditional distributions between images and labels to perform conditional generation and classification, respectively. The discriminator solely focuses on identifying fake image-label pairs. Under a nonparametric assumption, we prove the unique equilibrium of the game is that the distributions characterized by the generator and the classifier converge to the data distribution. As a byproduct of the three-player mechanism, Triple-GAN is flexible to incorporate different semi-supervised classifiers and GAN architectures. We evaluate Triple-GAN in two challenging settings, namely, semi-supervised learning and the extreme low data regime. In both settings, Triple-GAN can achieve excellent classification results and generate meaningful samples in a specific class simultaneously. In particular, using a commonly adopted 13-layer CNN classifier, Triple-GAN outperforms extensive semi-supervised learning methods substantially on more than 10 benchmarks no matter data augmentation is applied or not.

preprint2020arXiv

Triple Memory Networks: a Brain-Inspired Method for Continual Learning

Continual acquisition of novel experience without interfering previously learned knowledge, i.e. continual learning, is critical for artificial neural networks, but limited by catastrophic forgetting. A neural network adjusts its parameters when learning a new task, but then fails to conduct the old tasks well. By contrast, the brain has a powerful ability to continually learn new experience without catastrophic interference. The underlying neural mechanisms possibly attribute to the interplay of hippocampus-dependent memory system and neocortex-dependent memory system, mediated by prefrontal cortex. Specifically, the two memory systems develop specialized mechanisms to consolidate information as more specific forms and more generalized forms, respectively, and complement the two forms of information in the interplay. Inspired by such brain strategy, we propose a novel approach named triple memory networks (TMNs) for continual learning. TMNs model the interplay of hippocampus, prefrontal cortex and sensory cortex (a neocortex region) as a triple-network architecture of generative adversarial networks (GAN). The input information is encoded as specific representation of the data distributions in a generator, or generalized knowledge of solving tasks in a discriminator and a classifier, with implementing appropriate brain-inspired algorithms to alleviate catastrophic forgetting in each module. Particularly, the generator replays generated data of the learned tasks to the discriminator and the classifier, both of which are implemented with a weight consolidation regularizer to complement the lost information in generation process. TMNs achieve new state-of-the-art performance on a variety of class-incremental learning benchmarks on MNIST, SVHN, CIFAR-10 and ImageNet-50, comparing with strong baseline methods.

preprint2020arXiv

Understanding and Stabilizing GANs' Training Dynamics with Control Theory

Generative adversarial networks (GANs) are effective in generating realistic images but the training is often unstable. There are existing efforts that model the training dynamics of GANs in the parameter space but the analysis cannot directly motivate practically effective stabilizing methods. To this end, we present a conceptually novel perspective from control theory to directly model the dynamics of GANs in the function space and provide simple yet effective methods to stabilize GANs' training. We first analyze the training dynamic of a prototypical Dirac GAN and adopt the widely-used closed-loop control (CLC) to improve its stability. We then extend CLC to stabilize the training dynamic of normal GANs, where CLC is implemented as a squared $L2$ regularizer on the output of the discriminator. Empirical results show that our method can effectively stabilize the training and obtain state-of-the-art performance on data generation tasks.

preprint2020arXiv

VFlow: More Expressive Generative Flows with Variational Data Augmentation

Generative flows are promising tractable models for density modeling that define probabilistic distributions with invertible transformations. However, tractability imposes architectural constraints on generative flows, making them less expressive than other types of generative models. In this work, we study a previously overlooked constraint that all the intermediate representations must have the same dimensionality with the original data due to invertibility, limiting the width of the network. We tackle this constraint by augmenting the data with some extra dimensions and jointly learning a generative flow for augmented data as well as the distribution of augmented dimensions under a variational inference framework. Our approach, VFlow, is a generalization of generative flows and therefore always performs better. Combining with existing generative flows, VFlow achieves a new state-of-the-art 2.98 bits per dimension on the CIFAR-10 dataset and is more compact than previous models to reach similar modeling quality.

preprint2019arXiv

Ferromagnetic van der Waals compound MnSb$_{1.8}$Bi$_{0.2}$Te$_4$

The intersection of topology and magnetism represents a new playground to discover novel quantum phenomena and device concepts. In this work, we show that a van der Waals compound MnSb$_{1.8}$Bi$_{0.2}$Te$_4$ exhibits a ferromagnetic ground state with a Curie temperature of 26 K, in contrast to the antiferromagnetic order previously found for other members of the Mn(Sb, Bi)$_2$Te$_4$ family. We employ magneto-transport, bulk magnetization and neutron scattering studies to illustrate the magnetic and electrical properties of MnSb$_{1.8}$Bi$_{0.2}$Te$_4$ and report on the observation of an unusual anomalous Hall effect. Our results are an important step in the synthesis and understanding of ferromagnetic topological insulators.

preprint2016arXiv

A Model-Based Scatter Artifacts Correction for Cone Beam CT

The purpose of this work is to provide a fast and accurate scatter artifacts correction algorithm for cone beam CT (CBCT) imaging. The method starts with an estimation of coarse scatter profiles for a set of CBCT data in either image domain or projection domain. A denoising algorithm designed specifically for Poisson signals is then applied to derive the final scatter distribution. Qualitative and quantitative evaluations using thorax and abdomen phantoms with Monte Carlo (MC) simulations, experimental Catphan phantom data, and in vivo human data acquired for a clinical image guided radiation therapy were performed. Results show that the proposed algorithm can significantly reduce scatter artifacts and recover the correct HU in either projection domain or image domain. For the MC thorax phantom study, four components segmentation yield the best results, while the results of three components segmentation are still acceptable. For the Catphan phantom data, the mean value over all pixels in the residual image is reduced from -21.8 HU to -0.2 HU and 0.7 HU for projection domain and image domain, respectively. The contrast of the in vivo human images are greatly improved after correction. The software-based technique has a number of advantages, such as high computational efficiency and accuracy, and the capability of performing scatter correction without modifying the clinical workflow or modifying the imaging hardware. When implemented practically, this should improve the accuracy of CBCT image quantitation and significantly impact CBCT-based interventional procedures and adaptive radiation therapy.

preprint2016arXiv

Analysis and Design of Secure Massive MIMO Systems in the Presence of Hardware Impairments

To keep the hardware costs of future communications systems manageable, the use of low-cost hardware components is desirable. This is particularly true for the emerging massive multiple-input multiple-output (MIMO) systems which equip base stations (BSs) with a large number of antenna elements. However, low-cost transceiver designs will further accentuate the hardware impairments which are present in any practical communication system. In this paper, we investigate the impact of hardware impairments on the secrecy performance of downlink massive MIMO systems in the presence of a passive multiple-antenna eavesdropper. Thereby, for the BS and the legitimate users, the joint effects of multiplicative phase noise, additive distortion noise, and amplified receiver noise are taken into account, whereas the eavesdropper is assumed to employ ideal hardware. We derive a lower bound for the ergodic secrecy rate of a given user when matched filter (MF) data precoding and artificial noise (AN) transmission are employed at the BS. Based on the derived analytical expression, we investigate the impact of the various system parameters on the secrecy rate and optimize both the pilot sets used for uplink training and the AN precoding. Our analytical and simulation results reveal that 1) the additive distortion noise at the BS may be beneficial for the secrecy performance, especially if the power assigned for AN emission is not sufficient; 2) all other hardware impairments have a negative impact on the secrecy performance; 3) {\textcolor{blue}{despite their susceptibility to pilot interference in the presence of phase noise}}, so-called spatially orthogonal pilot sequences are preferable unless the phase noise is very strong; 4) the proposed generalized null-space (NS) AN precoding method can efficiently mitigate the negative effects of phase noise.

preprint2016arXiv

Conditional Generative Moment-Matching Networks

Maximum mean discrepancy (MMD) has been successfully applied to learn deep generative models for characterizing a joint distribution of variables via kernel mean embedding. In this paper, we present conditional generative moment- matching networks (CGMMN), which learn a conditional distribution given some input variables based on a conditional maximum mean discrepancy (CMMD) criterion. The learning is performed by stochastic gradient descent with the gradient calculated by back-propagation. We evaluate CGMMN on a wide range of tasks, including predictive modeling, contextual generation, and Bayesian dark knowledge, which distills knowledge from a Bayesian model by learning a relatively small CGMMN student network. Our results demonstrate competitive performance in all the tasks.

preprint2016arXiv

DeePM: A Deep Part-Based Model for Object Detection and Semantic Part Localization

In this paper, we propose a deep part-based model (DeePM) for symbiotic object detection and semantic part localization. For this purpose, we annotate semantic parts for all 20 object categories on the PASCAL VOC 2012 dataset, which provides information on object pose, occlusion, viewpoint and functionality. DeePM is a latent graphical model based on the state-of-the-art R-CNN framework, which learns an explicit representation of the object-part configuration with flexible type sharing (e.g., a sideview horse head can be shared by a fully-visible sideview horse and a highly truncated sideview horse with head and neck only). For comparison, we also present an end-to-end Object-Part (OP) R-CNN which learns an implicit feature representation for jointly mapping an image ROI to the object and part bounding boxes. We evaluate the proposed methods for both the object and part detection performance on PASCAL VOC 2012, and show that DeePM consistently outperforms OP R-CNN in detecting objects and parts. In addition, it obtains superior performance to Fast and Faster R-CNNs in object detection.

preprint2016arXiv

Fast Sampling for Bayesian Max-Margin Models

Bayesian max-margin models have shown superiority in various practical applications, such as text categorization, collaborative prediction, social network link prediction and crowdsourcing, and they conjoin the flexibility of Bayesian modeling and predictive strengths of max-margin learning. However, Monte Carlo sampling for these models still remains challenging, especially for applications that involve large-scale datasets. In this paper, we present the stochastic subgradient Hamiltonian Monte Carlo (HMC) methods, which are easy to implement and computationally efficient. We show the approximate detailed balance property of subgradient HMC which reveals a natural and validated generalization of the ordinary HMC. Furthermore, we investigate the variants that use stochastic subsampling and thermostats for better scalability and mixing. Using stochastic subgradient Markov Chain Monte Carlo (MCMC), we efficiently solve the posterior inference task of various Bayesian max-margin models and extensive experimental results demonstrate the effectiveness of our approach.

preprint2016arXiv

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Word embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative topic embedding model to combine the two types of patterns. In our model, topics are represented by embedding vectors, and are shared across documents. The probability of each word is influenced by both its local context and its topic. A variational inference method yields the topic embeddings as well as the topic mixing proportions for each document. Jointly they represent the document in a low-dimensional continuous space. In two document classification tasks, our method performs better than eight existing methods, with fewer features. In addition, we illustrate with an example that our method can generate coherent topics even based on only one document.

preprint2016arXiv

Kernel Bayesian Inference with Posterior Regularization

We propose a vector-valued regression problem whose solution is equivalent to the reproducing kernel Hilbert space (RKHS) embedding of the Bayesian posterior distribution. This equivalence provides a new understanding of kernel Bayesian inference. Moreover, the optimization problem induces a new regularization for the posterior embedding estimator, which is faster and has comparable performance to the squared regularization in kernel Bayes' rule. This regularization coincides with a former thresholding approach used in kernel POMDPs whose consistency remains to be established. Our theoretical work solves this open problem and provides consistency analysis in regression settings. Based on our optimizational formulation, we propose a flexible Bayesian posterior regularization framework which for the first time enables us to put regularization at the distribution level. We apply this method to nonparametric state-space filtering tasks with extremely nonlinear dynamics and show performance gains over all other baselines.

preprint2016arXiv

Learning Deep Generative Models with Doubly Stochastic MCMC

We present doubly stochastic gradient MCMC, a simple and generic method for (approximate) Bayesian inference of deep generative models (DGMs) in a collapsed continuous parameter space. At each MCMC sampling step, the algorithm randomly draws a mini-batch of data samples to estimate the gradient of log-posterior and further estimates the intractable expectation over hidden variables via a neural adaptive importance sampler, where the proposal distribution is parameterized by a deep neural network and learnt jointly. We demonstrate the effectiveness on learning various DGMs in a wide range of tasks, including density estimation, data generation and missing data imputation. Our method outperforms many state-of-the-art competitors.

preprint2016arXiv

Learning to Generate with Memory

Memory units have been widely used to enrich the capabilities of deep networks on capturing long-term dependencies in reasoning and prediction tasks, but little investigation exists on deep generative models (DGMs) which are good at inferring high-level invariant representations from unlabeled data. This paper presents a deep generative model with a possibly large external memory and an attention mechanism to capture the local detail information that is often lost in the bottom-up abstraction process in representation learning. By adopting a smooth attention model, the whole network is trained end-to-end by optimizing a variational bound of data likelihood via auto-encoding variational Bayesian methods, where an asymmetric recognition network is learnt jointly to infer high-level invariant representations. The asymmetric architecture can reduce the competition between bottom-up invariant feature extraction and top-down generation of instance details. Our experiments on several datasets demonstrate that memory can significantly boost the performance of DGMs and even achieve state-of-the-art results on various tasks, including density estimation, image generation, and missing value imputation.

preprint2016arXiv

Low-Complexity QoS-Aware Coordinated Scheduling for Heterogenous Networks

In this paper, we consider a heterogenous network (HetNet), where low-power indoor femtocells are deployed in the coverage area of the existing macro base station (MBS). This paper proposes a novel coordinated random beamforming and user scheduling strategy to improve the throughput of users served by the femtocell access point (FAP) while satisfying the quality-of-service (QoS) requirements of users served by both MBS and FAP. The strategy, termed as QoS-Aware Coodinated Scheduling (QACS), requires limited coordination between the MBS and FAP, i.e., only the indexes of the qualified beams are shared. Exact statistical analysis for the ergodic achievable rate of both FAP and MBS with the proposed strategy are presented. Scheduling fairness is also addressed for the proposed QACS.

preprint2016arXiv

Max-Margin Deep Generative Models for (Semi-)Supervised Learning

Deep generative models (DGMs) are effective on learning multilayered representations of complex data and performing inference of input data by exploring the generative ability. However, it is relatively insufficient to empower the discriminative ability of DGMs on making accurate predictions. This paper presents max-margin deep generative models (mmDGMs) and a class-conditional variant (mmDCGMs), which explore the strongly discriminative principle of max-margin learning to improve the predictive performance of DGMs in both supervised and semi-supervised learning, while retaining the generative capability. In semi-supervised learning, we use the predictions of a max-margin classifier as the missing labels instead of performing full posterior inference for efficiency; we also introduce additional max-margin and label-balance regularization terms of unlabeled data for effectiveness. We develop an efficient doubly stochastic subgradient algorithm for the piecewise linear objectives in different settings. Empirical results on various datasets demonstrate that: (1) max-margin learning can significantly improve the prediction performance of DGMs and meanwhile retain the generative ability; (2) in supervised learning, mmDGMs are competitive to the best fully discriminative networks when employing convolutional neural networks as the generative and recognition models; and (3) in semi-supervised learning, mmDCGMs can perform efficient inference and achieve state-of-the-art classification results on several benchmarks.

preprint2016arXiv

Max-Margin Nonparametric Latent Feature Models for Link Prediction

Link prediction is a fundamental task in statistical network analysis. Recent advances have been made on learning flexible nonparametric Bayesian latent feature models for link prediction. In this paper, we present a max-margin learning method for such nonparametric latent feature relational models. Our approach attempts to unite the ideas of max-margin learning and Bayesian nonparametrics to discover discriminative latent features for link prediction. It inherits the advances of nonparametric Bayesian methods to infer the unknown latent social dimension, while for discriminative link prediction, it adopts the max-margin learning principle by minimizing a hinge-loss using the linear expectation operator, without dealing with a highly nonlinear link likelihood function. For posterior inference, we develop an efficient stochastic variational inference algorithm under a truncated mean-field assumption. Our methods can scale up to large-scale real networks with millions of entities and tens of millions of positive links. We also provide a full Bayesian formulation, which can avoid tuning regularization hyper-parameters. Experimental results on a diverse range of real datasets demonstrate the benefits inherited from max-margin learning and Bayesian nonparametric inference.

preprint2016arXiv

Nematic Quantum Critical Fluctuations in BaFe$_{2-x}$Ni$_x$As$_2$

We have systematically studied the nematic fluctuations in the electron-doped iron-based superconductor BaFe$_{2-x}$Ni$_x$As$_2$ by measuring the in-plane resistance change under uniaxial pressure. While the nematic quantum critical point can be identified through the measurements along the (110) direction as studied previously, quantum and thermal critical fluctuations cannot be distinguished due to similar Curie-Weiss-like behaviors. Here we find that a sizable pressure-dependent resistivity along the (100) direction is present in all doping levels, which is against the simple picture of an Ising-type nematic model. The signal along the (100) direction becomes maximum at optimal doping, suggesting that it is associated with nematic quantum critical fluctuations. Our results indicate that thermal fluctuations from striped antiferromagnetic order dominate the underdoped regime along the (110) direction. We argue that either there is a strong coupling between the quantum critical fluctuations and the fermions, or more exotically, a higher symmetry may be present around optimal doping.

preprint2016arXiv

Physical Layer Security for Massive MIMO Systems Impaired by Phase Noise

In this paper, we investigate the impact of phase noise on the secrecy performance of downlink massive MIMO systems in the presence of a passive multiple-antenna eavesdropper. Thereby, for the base station (BS) and the legitimate users, the effect of multiplicative phase noise is taken into account, whereas the eavesdropper is assumed to employ ideal hardware. We derive a lower bound for the ergodic secrecy rate of a given user when matched filter data precoding and artificial noise transmission are employed at the BS. Based on the derived analytical expression, we investigate the impact of the various system parameters on the secrecy rate. Our analytical and simulation results reveal that distributively deployed local oscillators (LOs) can achieve a better performance than one common LO for all BS antennas as long as a sufficient amount of power is assigned for data transmission.

preprint2016arXiv

PSDVec: a Toolbox for Incremental and Scalable Word Embedding

PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping of words in a natural language to continuous vectors which encode the semantic/syntactic regularities between the words. PSDVec implements a word embedding learning method based on a weighted low-rank positive semidefinite approximation. To scale up the learning process, we implement a blockwise online learning algorithm to learn the embeddings incrementally. This strategy greatly reduces the learning time of word embeddings on a large vocabulary, and can learn the embeddings of new words without re-learning the whole vocabulary. On 9 word similarity/analogy benchmark sets and 2 Natural Language Processing (NLP) tasks, PSDVec produces embeddings that has the best average performance among popular word embedding tools. PSDVec provides a new option for NLP practitioners.

preprint2016arXiv

Scaling up Dynamic Topic Models

Dynamic topic models (DTMs) are very effective in discovering topics and capturing their evolution trends in time series data. To do posterior inference of DTMs, existing methods are all batch algorithms that scan the full dataset before each update of the model and make inexact variational approximations with mean-field assumptions. Due to a lack of a more scalable inference algorithm, despite the usefulness, DTMs have not captured large topic dynamics. This paper fills this research void, and presents a fast and parallelizable inference algorithm using Gibbs Sampling with Stochastic Gradient Langevin Dynamics that does not make any unwarranted assumptions. We also present a Metropolis-Hastings based $O(1)$ sampler for topic assignments for each word token. In a distributed environment, our algorithm requires very little communication between workers during sampling (almost embarrassingly parallel) and scales up to large-scale applications. We are able to learn the largest Dynamic Topic Model to our knowledge, and learned the dynamics of 1,000 topics from 2.6 million documents in less than half an hour, and our empirical results show that our algorithm is not only orders of magnitude faster than the baselines but also achieves lower perplexity.

preprint2016arXiv

Secure Massive MIMO Systems with Limited RF Chains

In future practical deployments of massive multi-input multi-output (MIMO) systems, the number of radio frequency (RF) chains at the base stations (BSs) may be much smaller than the number of BS antennas to reduce the overall expenditure. In this paper, we propose a novel design framework for joint data and artificial noise (AN) precoding in a multiuser massive MIMO system with limited number of RF chains, which improves the wireless security performance. With imperfect channel state information (CSI), we analytically derive an achievable lower bound on the ergodic secrecy rate of any mobile terminal (MT), for both analog and hybrid precoding schemes. The closed-form lower bound is used to determine optimal power splitting between data and AN that maximizes the secrecy rate through simple one-dimensional search. Analytical and numerical results together reveal that the proposed hybrid precoder, although suffers from reduced secrecy rate compared with theoretical full-dimensional precoder, is free of the high computational complexity of large-scale matrix inversion and null-space calculations, and largely reduces the hardware cost.

preprint2016arXiv

SeDMiD for Confusion Detection: Uncovering Mind State from Time Series Brain Wave Data

Understanding how brain functions has been an intriguing topic for years. With the recent progress on collecting massive data and developing advanced technology, people have become interested in addressing the challenge of decoding brain wave data into meaningful mind states, with many machine learning models and algorithms being revisited and developed, especially the ones that handle time series data because of the nature of brain waves. However, many of these time series models, like HMM with hidden state in discrete space or State Space Model with hidden state in continuous space, only work with one source of data and cannot handle different sources of information simultaneously. In this paper, we propose an extension of State Space Model to work with different sources of information together with its learning and inference algorithms. We apply this model to decode the mind state of students during lectures based on their brain waves and reach a significant better results compared to traditional methods.

preprint2016arXiv

Spectral Learning for Supervised Topic Models

Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on variational approximation or Monte Carlo sampling, which often suffers from the local minimum defect. Spectral methods have been applied to learn unsupervised topic models, such as latent Dirichlet allocation (LDA), with provable guarantees. This paper investigates the possibility of applying spectral methods to recover the parameters of supervised LDA (sLDA). We first present a two-stage spectral method, which recovers the parameters of LDA followed by a power update method to recover the regression model parameters. Then, we further present a single-phase spectral algorithm to jointly recover the topic distribution matrix as well as the regression weights. Our spectral algorithms are provably correct and computationally efficient. We prove a sample complexity bound for each algorithm and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on synthetic and real-world datasets verify the theory and demonstrate the practical effectiveness of the spectral algorithms. In fact, our results on a large-scale review rating dataset demonstrate that our single-phase spectral algorithm alone gets comparable or even better performance than state-of-the-art methods, while previous work on spectral methods has rarely reported such promising performance.

preprint2016arXiv

Statistically-estimated tree composition for the northeastern United States at the time of Euro-American settlement

We present a gridded 8 km-resolution data product of the estimated composition of tree taxa at the time of Euro-American settlement of the northeastern United States and the statistical methodology used to produce the product from trees recorded by land surveyors. Composition is defined as the proportion of stems larger than approximately 20 cm diameter at breast height for 22 tree taxa, generally at the genus level. The data come from settlement-era public survey records that are transcribed and then aggregated spatially, giving count data. The domain is divided into two regions, eastern (Maine to Ohio) and midwestern (Indiana to Minnesota). Public Land Survey point data in the midwestern region (ca. 0.8-km resolution) are aggregated to a regular 8 km grid, while data in the eastern region, from Town Proprietor Surveys, are aggregated at the township level in irregularly-shaped local administrative units. The product is based on a Bayesian statistical model fit to the count data that estimates composition on a regular 8 km grid across the entire domain. The statistical model is designed to handle data from both the regular grid and the irregularly-shaped townships and allows us to estimate composition at locations with no data and to smooth over noise caused by limited counts in locations with data. The model also allows us to quantify uncertainty in our composition estimates, making the product suitable for applications employing data assimilation. We expect this data product to be useful for understanding the state of vegetation in the northeastern United States prior to large-scale Euro-American settlement. In addition to specific regional questions, the data product can also serve as a baseline against which to investigate how forests and ecosystems change after intensive settlement. The data product is available at the NIS data portal as version 1.0.

preprint2016arXiv

Streaming Gibbs Sampling for LDA Model

Streaming variational Bayes (SVB) is successful in learning LDA models in an online manner. However previous attempts toward developing online Monte-Carlo methods for LDA have little success, often by having much worse perplexity than their batch counterparts. We present a streaming Gibbs sampling (SGS) method, an online extension of the collapsed Gibbs sampling (CGS). Our empirical study shows that SGS can reach similar perplexity as CGS, much better than SVB. Our distributed version of SGS, DSGS, is much more scalable than SVB mainly because the updates' communication complexity is small.

preprint2016arXiv

Towards Better Analysis of Deep Convolutional Neural Networks

Deep convolutional neural networks (CNNs) have achieved breakthrough performance in many pattern recognition tasks such as image classification. However, the development of high-quality deep models typically relies on a substantial amount of trial-and-error, as there is still no clear understanding of when and why a deep model works. In this paper, we present a visual analytics approach for better understanding, diagnosing, and refining deep CNNs. We formulate a deep CNN as a directed acyclic graph. Based on this formulation, a hybrid visualization is developed to disclose the multiple facets of each neuron and the interactions between them. In particular, we introduce a hierarchical rectangle packing algorithm and a matrix reordering algorithm to show the derived features of a neuron cluster. We also propose a biclustering-based edge bundling method to reduce visual clutter caused by a large number of connections between neurons. We evaluated our method on a set of CNNs and the results are generally favorable.

preprint2016arXiv

Using Edge-Preserving Algorithm with Non-local Mean for Significantly Improved Image-Domain Material Decomposition in Dual Energy CT

Increased noise is a general concern for dual-energy material decomposition. Here, we develop an image-domain material decomposition algorithm for dual-energy CT (DECT) by incorporating an edge-preserving filter into the Local HighlY constrained backPRojection Reconstruction (HYPR-LR) framework. With effective use of the non-local mean, the proposed algorithm, which is referred to as HYPR-NLM, reduces the noise in dual energy decomposition while preserving the accuracy of quantitative measurement and spatial resolution of the material-specific dual energy images. We demonstrate the noise reduction and resolution preservation of the algorithm with iodine concentrate numerical phantom by comparing the HYPR-NLM algorithm to the direct matrix inversion, HYPR-LR and iterative image-domain material decomposition (Iter-DECT). We also show the superior performance of the HYPR-NLM over the existing methods by using two sets of cardiac perfusing imaging data. The reference drawn from the comparison study includes: (1) HYPR-NLM significantly reduces the DECT material decomposition noise while preserving quantitative measurements and high-frequency edge information, and (2) HYPR-NLM is robust with respect to parameter selection.

preprint2016arXiv

WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation

Developing efficient and scalable algorithms for Latent Dirichlet Allocation (LDA) is of wide interest for many applications. Previous work has developed an O(1) Metropolis-Hastings sampling method for each token. However, the performance is far from being optimal due to random accesses to the parameter matrices and frequent cache misses. In this paper, we first carefully analyze the memory access efficiency of existing algorithms for LDA by the scope of random access, which is the size of the memory region in which random accesses fall, within a short period of time. We then develop WarpLDA, an LDA sampler which achieves both the best O(1) time complexity per token and the best O(K) scope of random access. Our empirical results in a wide range of testing conditions demonstrate that WarpLDA is consistently 5-15x faster than the state-of-the-art Metropolis-Hastings based LightLDA, and is comparable or faster than the sparsity aware F+LDA. With WarpLDA, users can learn up to one million topics from hundreds of millions of documents in a few hours, at an unprecedentedly throughput of 11G tokens per second.

preprint2015arXiv

A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

Most existing word embedding methods can be categorized into Neural Embedding Models and Matrix Factorization (MF)-based methods. However some models are opaque to probabilistic interpretation, and MF-based methods, typically solved using Singular Value Decomposition (SVD), may incur loss of corpus information. In addition, it is desirable to incorporate global latent factors, such as topics, sentiments or writing styles, into the word embedding model. Since generative models provide a principled way to incorporate latent factors, we propose a generative word embedding model, which is easy to interpret, and can serve as a basis of more sophisticated latent factor models. The model inference reduces to a low rank weighted positive semidefinite approximation problem. Its optimization is approached by eigendecomposition on a submatrix, followed by online blockwise regression, which is scalable and avoids the information loss in SVD. In experiments on 7 common benchmark datasets, our vectors are competitive to word2vec, and better than other MF-based methods.

preprint2015arXiv

Bayesian Matrix Completion via Adaptive Relaxed Spectral Regularization

Bayesian matrix completion has been studied based on a low-rank matrix factorization formulation with promising results. However, little work has been done on Bayesian matrix completion based on the more direct spectral regularization formulation. We fill this gap by presenting a novel Bayesian matrix completion method based on spectral regularization. In order to circumvent the difficulties of dealing with the orthonormality constraints of singular vectors, we derive a new equivalent form with relaxed constraints, which then leads us to design an adaptive version of spectral regularization feasible for Bayesian inference. Our Bayesian method requires no parameter tuning and can infer the number of latent factors automatically. Experiments on synthetic and real datasets demonstrate encouraging results on rank recovery and collaborative filtering, with notably good results for very sparse matrices.

preprint2015arXiv

Bounded-Distortion Metric Learning

Metric learning aims to embed one metric space into another to benefit tasks like classification and clustering. Although a greatly distorted metric space has a high degree of freedom to fit training data, it is prone to overfitting and numerical inaccuracy. This paper presents {\it bounded-distortion metric learning} (BDML), a new metric learning framework which amounts to finding an optimal Mahalanobis metric space with a bounded-distortion constraint. An efficient solver based on the multiplicative weights update method is proposed. Moreover, we generalize BDML to pseudo-metric learning and devise the semidefinite relaxation and a randomized algorithm to approximately solve it. We further provide theoretical analysis to show that distortion is a key ingredient for stability and generalization ability of our BDML algorithm. Extensive experiments on several benchmark datasets yield promising results.

preprint2015arXiv

Building Memory with Concept Learning Capabilities from Large-scale Knowledge Base

We present a new perspective on neural knowledge base (KB) embeddings, from which we build a framework that can model symbolic knowledge in the KB together with its learning process. We show that this framework well regularizes previous neural KB embedding model for superior performance in reasoning tasks, while having the capabilities of dealing with unseen entities, that is, to learn their embeddings from natural language descriptions, which is very like human's behavior of learning semantic concepts.

preprint2015arXiv

Discriminative Nonparametric Latent Feature Relational Models with Data Augmentation

We present a discriminative nonparametric latent feature relational model (LFRM) for link prediction to automatically infer the dimensionality of latent features. Under the generic RegBayes (regularized Bayesian inference) framework, we handily incorporate the prediction loss with probabilistic inference of a Bayesian model; set distinct regularization parameters for different types of links to handle the imbalance issue in real networks; and unify the analysis of both the smooth logistic log-loss and the piecewise linear hinge loss. For the nonconjugate posterior inference, we present a simple Gibbs sampler via data augmentation, without making restricting assumptions as done in variational methods. We further develop an approximate sampler using stochastic gradient Langevin dynamics to handle large networks with hundreds of thousands of entities and millions of links, orders of magnitude larger than what existing LFRM models can process. Extensive studies on various real networks show promising performance.

preprint2015arXiv

Dropout Training for SVMs with Data Augmentation

Dropout and other feature noising schemes have shown promising results in controlling over-fitting by artificially corrupting the training data. Though extensive theoretical and empirical studies have been performed for generalized linear models, little work has been done for support vector machines (SVMs), one of the most successful approaches for supervised learning. This paper presents dropout training for both linear SVMs and the nonlinear extension with latent representation learning. For linear SVMs, to deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques. Our algorithm iteratively minimizes the expectation of a re-weighted least square problem, where the re-weights are analytically updated. For nonlinear latent SVMs, we consider learning one layer of latent representations in SVMs and extend the data augmentation technique in conjunction with first-order Taylor-expansion to deal with the intractable expected non-smooth hinge loss and the nonlinearity of latent representations. Finally, we apply the similar data augmentation ideas to develop a new IRLS algorithm for the expected logistic loss under corrupting distributions, and we further develop a non-linear extension of logistic regression by incorporating one layer of latent representations. Our algorithms offer insights on the connection and difference between the hinge loss and logistic loss in dropout training. Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of both linear and nonlinear SVMs. In addition, the nonlinear SVMs further improve the prediction performance on several image datasets.

preprint2015arXiv

Fast Scatter Artifacts Correction for Cone-Beam CT without System Modification and Repeat Scan

We provide a fast and accurate scatter artifacts correction algorithm for cone beam CT (CBCT) imaging. The method starts with an estimation of coarse scatter profile for a set of CBCT images. A total-variation denoising algorithm designed specifically for Poisson signal is then applied to derive the final scatter distribution. Qualitatively and quantitatively evaluations using Monte Carlo (MC) simulations, experimental CBCT phantom data, and \emph{in vivo} human data acquired for a clinical image guided radiation therapy were performed. Results show that the proposed algorithm can significantly reduce scatter artifacts and recover the correct HU within either projection domain or image domain. Further test shows the method is robust with respect to segmentation procedure.

preprint2015arXiv

Jointly Modeling Topics and Intents with Global Order Structure

Modeling document structure is of great importance for discourse analysis and related applications. The goal of this research is to capture the document intent structure by modeling documents as a mixture of topic words and rhetorical words. While the topics are relatively unchanged through one document, the rhetorical functions of sentences usually change following certain orders in discourse. We propose GMM-LDA, a topic modeling based Bayesian unsupervised model, to analyze the document intent structure cooperated with order information. Our model is flexible that has the ability to combine the annotations and do supervised learning. Additionally, entropic regularization can be introduced to model the significant divergence between topics and intents. We perform experiments in both unsupervised and supervised settings, results show the superiority of our model over several state-of-the-art baselines.

preprint2015arXiv

Linear Precoding of Data and Artificial Noise in Secure Massive MIMO Systems

In this paper, we consider secure downlink transmission in a multi-cell massive multiple-input multiple-output (MIMO) system where the numbers of base station (BS) antennas, mobile terminals, and eavesdropper antennas are asymptotically large. The channel state information of the eavesdropper is assumed to be unavailable at the BS and hence, linear precoding of data and artificial noise (AN) are employed for secrecy enhancement. Four different data precoders (i.e., selfish zero-forcing (ZF)/regularized channel inversion (RCI) and collaborative ZF/RCI precoders) and three different AN precoders (i.e., random, selfish/collaborative null-space based precoders) are investigated and the corresponding achievable ergodic secrecy rates are analyzed. Our analysis includes the effects of uplink channel estimation, pilot contamination, multi-cell interference, and path-loss. Furthermore, to strike a balance between complexity and performance, linear precoders that are based on matrix polynomials are proposed for both data and AN precoding. The polynomial coefficients of the data and AN precoders are optimized respectively for minimization of the sum mean squared error of and the AN leakage to the mobile terminals in the cell of interest using tools from free probability and random matrix theory. Our analytical and simulation results provide interesting insights for the design of secure multi-cell massive MIMO systems and reveal that the proposed polynomial data and AN precoders closely approach the performance of selfish RCI data and null-space based AN precoders, respectively.

preprint2015arXiv

Max-margin Deep Generative Models

preprint2015arXiv

Multi-Objective Optimization for Robust Power Efficient and Secure Full-Duplex Wireless Communication Systems

In this paper, we investigate the power efficient resource allocation algorithm design for secure multiuser wireless communication systems employing a full-duplex (FD) base station (BS) for serving multiple half-duplex (HD) downlink (DL) and uplink (UL) users simultaneously. We propose a multi-objective optimization framework to study two conflicting yet desirable design objectives, i.e., total DL transmit power minimization and total UL transmit power minimization. To this end, the weighed Tchebycheff method is adopted to formulate the resource allocation algorithm design as a multi-objective optimization problem (MOOP). The considered MOOP takes into account the quality-of-service (QoS) requirements of all legitimate users for guaranteeing secure DL and UL transmission in the presence of potential eavesdroppers. Thereby, secure UL transmission is enabled by the FD BS and would not be possible with an HD BS. The imperfectness of the channel state information of the eavesdropping channels and the inter-user interference channels is incorporated for robust resource allocation algorithm design. Although the considered MOOP is non-convex, we solve it optimally by semidefinite programming (SDP) relaxation. Simulation results not only unveil the trade-off between the total DL transmit power and the total UL transmit power, but also confirm the robustness of the proposed algorithm against potential eavesdroppers.

preprint2015arXiv

Per-Antenna Constant Envelope Precoding for Secure Transmission in Large-Scale MISO Systems

Secure transmission in large-scale MISO systems employing artificial noise (AN) is studied under the per-antenna constant envelope (CE) constraint. Achievable secrecy rate of the per-antenna CE precoding scheme for large-scale MISO is analyzed and compared with that of the matched filter linear precoding. A two-stage per-antenna CE precoding scheme for joint signal-plus-AN transmission is proposed. The first stage determines the per-antenna CE precoding for the information-bearing signal. A properly generated AN using an iteration algorithm is incorporated into the transmit signal in the second stage such that the combined signal-plus-AN satisfies the per-antenna CE constraint and the AN is orthogonal to the user channel. It is shown that compared to conventional per-antenna CE transmission, this joint signal-plus-AN secure transmission scheme does not require additional transmit power. An alternative low-complexity AN generation scheme which uses a separate antenna to cancel the AN leakage to the intended user introduced by randomly generated AN is also proposed.

preprint2015arXiv

Pose-Guided Human Parsing with Deep Learned Features

Parsing human body into semantic regions is crucial to human-centric analysis. In this paper, we propose a segment-based parsing pipeline that explores human pose information, i.e. the joint location of a human model, which improves the part proposal, accelerates the inference and regularizes the parsing process at the same time. Specifically, we first generate part segment proposals with respect to human joints predicted by a deep model, then part- specific ranking models are trained for segment selection using both pose-based features and deep-learned part potential features. Finally, the best ensemble of the proposed part segments are inferred though an And-Or Graph. We evaluate our approach on the popular Penn-Fudan pedestrian parsing dataset, and demonstrate the effectiveness of using the pose information for each stage of the parsing pipeline. Finally, we show that our approach yields superior part segmentation accuracy comparing to the state-of-the-art methods.

preprint2014arXiv

Bayesian Inference with Posterior Regularization and applications to Infinite Latent SVMs

Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors can affect posterior distributions through Bayes' rule, imposing posterior regularization is arguably more direct and in some cases more natural and general. In this paper, we present regularized Bayesian inference (RegBayes), a novel computational framework that performs posterior inference with a regularization term on the desired post-data posterior distribution under an information theoretical formulation. RegBayes is more flexible than the procedure that elicits expert knowledge via priors, and it covers both directed Bayesian networks and undirected Markov networks whose Bayesian formulation results in hybrid chain graph models. When the regularization is induced from a linear operator on the posterior distributions, such as the expectation operator, we present a general convex-analysis theorem to characterize the solution of RegBayes. Furthermore, we present two concrete examples of RegBayes, infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the large-margin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets, which appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics. Such results were not available until now, and contribute to push forward the interface between these two important subfields, which have been largely treated as isolated in the community.

preprint2014arXiv

Characterizations of all-derivable points in $B(H)$

Let ${\mathcal{K}}$ and ${\mathcal{H}}$ be two Hilbert space, and let $B({\mathcal{K}},{\mathcal{H}})$ be the algebra of all bounded linear operators from ${\mathcal{K}}$ into ${\mathcal{H}}$. We say that an element $G\in B({\mathcal{H}},{\mathcal{H}})$ is an all-derivable point in $B({\mathcal{H}},{\mathcal{H}})$ if every derivable linear mapping $φ$ at $G$ (i.e. $φ(ST)=φ(S)T+Sφ(T)$ for any $S,T\in B(H)$ with $ST=G$) is a derivation. Let both $φ: B({\mathcal{H}},{\mathcal{K}})\rightarrow B({\mathcal{H}},{\mathcal{K}})$ and $ψ: B({\mathcal{K}},{\mathcal{H}})\rightarrow B({\mathcal{K}},{\mathcal{H}})$ be two linear mappings. In this paper, the following results will be proved : if $Yφ(W)=ψ(Y)W$ for any $Y\in B({\mathcal{K}},{\mathcal{H}})$ and $W\in B({\mathcal{H}},{\mathcal{K}})$, then $φ(W)=DW$ and $ψ(Y)=YD$ for some $D\in B({\mathcal{K}})$. As an important application, we will show that an operator $G$ is an all-derivable point in $B({\mathcal{H}},{\mathcal{H}})$ if and only if $G\neq 0$.

preprint2014arXiv

Contrastive Feature Induction for Efficient Structure Learning of Conditional Random Fields

Structure learning of Conditional Random Fields (CRFs) can be cast into an L1-regularized optimization problem. To avoid optimizing over a fully linked model, gain-based or gradient-based feature selection methods start from an empty model and incrementally add top ranked features to it. However, for high-dimensional problems like statistical relational learning, training time of these incremental methods can be dominated by the cost of evaluating the gain or gradient of a large collection of candidate features. In this study we propose a fast feature evaluation algorithm called Contrastive Feature Induction (CFI), which only evaluates a subset of features that involve both variables with high signals (deviation from mean) and variables with high errors (residue). We prove that the gradient of candidate features can be represented solely as a function of signals and errors, and that CFI is an efficient approximation of gradient-based evaluation methods. Experiments on synthetic and real data sets show competitive learning speed and accuracy of CFI on pairwise CRFs, compared to state-of-the-art structure learning methods such as full optimization over all features, and Grafting.

preprint2014arXiv

Dropout Training for Support Vector Machines

preprint2014arXiv

Local Adaptive Grouped Regularization and its Oracle Properties for Varying Coefficient Regression

Varying coefficient regression is a flexible technique for modeling data where the coefficients are functions of some effect-modifying parameter, often time or location in a certain domain. While there are a number of methods for variable selection in a varying coefficient regression model, the existing methods are mostly for global selection, which includes or excludes each covariate over the entire domain. Presented here is a new local adaptive grouped regularization (LAGR) method for local variable selection in spatially varying coefficient linear and generalized linear regression. LAGR selects the covariates that are associated with the response at any point in space, and simultaneously estimates the coefficients of those covariates by tailoring the adaptive group Lasso toward a local regression model with locally linear coefficient estimates. Oracle properties of the proposed method are established under local linear regression and local generalized linear regression. The finite sample properties of LAGR are assessed in a simulation study and for illustration, the Boston housing price data set is analyzed.

preprint2014arXiv

Secure Transmission in Multi-Cell Massive MIMO Systems

In this paper, we consider physical layer security provisioning in multi-cell massive multiple-input multiple-output (MIMO) systems. Specifically, we consider secure downlink transmission in a multi-cell massive MIMO system with matched-filter precoding and artificial noise (AN) generation at the base station (BS) in the presence of a passive multi-antenna eavesdropper. We investigate the resulting achievable ergodic secrecy rate and the secrecy outage probability for the cases of perfect training and pilot contamination. Thereby, we consider two different AN shaping matrices, namely, the conventional AN shaping matrix, where the AN is transmitted in the null space of the matrix formed by all user channels, and a random AN shaping matrix, which avoids the complexity associated with finding the null space of a large matrix. Our analytical and numerical results reveal that in multi-cell massive MIMO systems employing matched-filter precoding (1) AN generation is required to achieve a positive ergodic secrecy rate if the user and the eavesdropper experience the same path-loss, (2) even with AN generation secure transmission may not be possible if the number of eavesdropper antennas is too large and not enough power is allocated to channel estimation, (3) for a given fraction of power allocated to AN and a given number of users, in case of pilot contamination, the ergodic secrecy rate is not a monotonically increasing function of the number of BS antennas, and (4) random AN shaping matrices provide a favourable performance/complexity tradeoff and are an attractive alternative to conventional AN shaping matrices.

preprint2013arXiv

An optimal problem for relative entropy

Relative entropy is an essential tool in quantum information theory. There are so many problems which are related to relative entropy. In this article, the optimal values which are defined by $\displaystyle\max_{U\in{U(\cX_{d})}} S(Uρ{U^{\ast}}\parallelσ)$ and $\displaystyle\min_{U\in{U(\cX_{d})}} S(Uρ{U^{\ast}}\parallelσ)$ for two positive definite operators $ρ,σ\in{\textmd{Pd}(\cX)}$ are obtained. And the set of $S(Uρ{U^{\ast}}\parallelσ)$ for every unitary operator $U$ is full of the interval $[\displaystyle\min_{U\in{U(\cX_{d})}} S(Uρ{U^{\ast}}\parallelσ),\displaystyle\max_{U\in{U(\cX_{d})}} S(Uρ{U^{\ast}}\parallelσ)]$

preprint2013arXiv

Discriminative Relational Topic Models

Many scientific and engineering fields involve analyzing network data. For document networks, relational topic models (RTMs) provide a probabilistic generative process to describe both the link structure and document contents, and they have shown promise on predicting network structures and discovering latent topic representations. However, existing RTMs have limitations in both the restricted model expressiveness and incapability of dealing with imbalanced network data. To expand the scope and improve the inference accuracy of RTMs, this paper presents three extensions: 1) unlike the common link likelihood with a diagonal weight matrix that allows the-same-topic interactions only, we generalize it to use a full weight matrix that captures all pairwise topic interactions and is applicable to asymmetric networks; 2) instead of doing standard Bayesian inference, we perform regularized Bayesian inference (RegBayes) with a regularization parameter to deal with the imbalanced link structure issue in common real networks and improve the discriminative ability of learned latent representations; and 3) instead of doing variational approximation with strict mean-field assumptions, we present collapsed Gibbs sampling algorithms for the generalized relational topic models by exploring data augmentation without making restricting assumptions. Under the generic RegBayes framework, we carefully investigate two popular discriminative loss functions, namely, the logistic log-loss and the max-margin hinge loss. Experimental results on several real network datasets demonstrate the significance of these extensions on improving the prediction performance, and the time efficiency can be dramatically improved with a simple fast approximation method.

preprint2013arXiv

Gibbs Max-margin Topic Models with Data Augmentation

Max-margin learning is a powerful approach to building classifiers and structured output predictors. Recent work on max-margin supervised topic models has successfully integrated it with Bayesian topic models to discover discriminative latent semantic structures and make accurate predictions for unseen testing data. However, the resulting learning problems are usually hard to solve because of the non-smoothness of the margin loss. Existing approaches to building max-margin supervised topic models rely on an iterative procedure to solve multiple latent SVM subproblems with additional mean-field assumptions on the desired posterior distributions. This paper presents an alternative approach by defining a new max-margin loss. Namely, we present Gibbs max-margin supervised topic models, a latent variable Gibbs classifier to discover hidden topic representations for various tasks, including classification, regression and multi-task learning. Gibbs max-margin supervised topic models minimize an expected margin loss, which is an upper bound of the existing margin loss derived from an expected prediction rule. By introducing augmented variables and integrating out the Dirichlet variables analytically by conjugacy, we develop simple Gibbs sampling algorithms with no restricting assumptions and no need to solve SVM subproblems. Furthermore, each step of the "augment-and-collapse" Gibbs sampling algorithms has an analytical conditional distribution, from which samples can be easily drawn. Experimental results demonstrate significant improvements on time efficiency. The classification performance is also significantly improved over competitors on binary, multi-class and multi-label classification tasks.

preprint2013arXiv

Improved Bayesian Logistic Supervised Topic Models with Data Augmentation

Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually over-weighted by document word counts; and 2) existing variational inference methods make strict mean-field assumptions. We address these issues by: 1) introducing a regularization constant to better balance the two parts based on an optimization formulation of Bayesian inference; and 2) developing a simple Gibbs sampling algorithm by introducing auxiliary Polya-Gamma variables and collapsing out Dirichlet variables. Our augment-and-collapse sampling algorithm has analytical forms of each conditional distribution without making any restricting assumptions and can be easily parallelized. Empirical results demonstrate significant improvements on prediction performance and time efficiency.

preprint2013arXiv

Linear mappings of local preserving-majorization on matrix algebras

Let $\M_{n\times n}$ be the algebra of all $n\times n$ matrices. For $x,y\in {R}^{n}$ it is said that $x$ is majorized by $y$ if there is a double stochastic matrix $A\in {M}_{n\times n}$ such that $x=Ay$ (denoted by $x\prec y$). Suppose that $Φ$ is a linear mapping from ${R}^{n}$ into ${R}^{n}$, which is said to be strictly isotone if $Φ(x)\prec Φ(y)$ whenever $x\prec y$. We say that an element $α\in {R}^{n}$ is a strictly all-isotone point if every strictly isotone $φ$ at $α$ (i.e. $Φ(α)\precΦ(y)$ whenever $x\in {R}^{n}$ with $α\prec x$, and $Φ(x)\precΦ(α)$ whenever $x\in {R}^{n}$ with $x\prec α$) is a strictly isotone. In this paper we show that every $α=(α_{1},α_{2},...,α_{n})\in {R}^{n}$ with $α_{1}>α_{2}>...>α_{n}$ is a strictly all-isotone point.

preprint2013arXiv

Online Bayesian Passive-Aggressive Learning

Online Passive-Aggressive (PA) learning is an effective framework for performing max-margin online learning. But the deterministic formulation and estimated single large-margin model could limit its capability in discovering descriptive structures underlying complex data. This pa- per presents online Bayesian Passive-Aggressive (BayesPA) learning, which subsumes the online PA and extends naturally to incorporate latent variables and perform nonparametric Bayesian inference, thus providing great flexibility for explorative analysis. We apply BayesPA to topic modeling and derive efficient online learning algorithms for max-margin topic models. We further develop nonparametric methods to resolve the number of topics. Experimental results on real datasets show that our approaches significantly improve time efficiency while maintaining comparable results with the batch counterparts.

preprint2012arXiv

Max-Margin Nonparametric Latent Feature Models for Link Prediction

We present a max-margin nonparametric latent feature model, which unites the ideas of max-margin learning and Bayesian nonparametrics to discover discriminative latent features for link prediction and automatically infer the unknown latent social dimension. By minimizing a hinge-loss using the linear expectation operator, we can perform posterior inference efficiently without dealing with a highly nonlinear link likelihood function; by using a fully-Bayesian formulation, we can avoid tuning regularization constants. Experimental results on real datasets appear to demonstrate the benefits inherited from max-margin learning and fully-Bayesian nonparametric inference.

preprint2012arXiv

Penalized maximum likelihood estimation and variable selection in geostatistics

We consider the problem of selecting covariates in spatial linear models with Gaussian process errors. Penalized maximum likelihood estimation (PMLE) that enables simultaneous variable selection and parameter estimation is developed and, for ease of computation, PMLE is approximated by one-step sparse estimation (OSE). To further improve computational efficiency, particularly with large sample sizes, we propose penalized maximum covariance-tapered likelihood estimation (PMLE$_{\mathrm{T}}$) and its one-step sparse estimation (OSE$_{\mathrm{T}}$). General forms of penalty functions with an emphasis on smoothly clipped absolute deviation are used for penalized maximum likelihood. Theoretical properties of PMLE and OSE, as well as their approximations PMLE$_{\mathrm{T}}$ and OSE$_{\mathrm{T}}$ using covariance tapering, are derived, including consistency, sparsity, asymptotic normality and the oracle properties. For covariance tapering, a by-product of our theoretical results is consistency and asymptotic normality of maximum covariance-tapered likelihood estimates. Finite-sample properties of the proposed methods are demonstrated in a simulation study and, for illustration, the methods are applied to analyze two real data sets.

preprint2012arXiv

Sparse Topical Coding

We present sparse topical coding (STC), a non-probabilistic formulation of topic models for discovering latent representations of large collections of data. Unlike probabilistic topic models, STC relaxes the normalization constraint of admixture proportions and the constraint of defining a normalized likelihood function. Such relaxations make STC amenable to: 1) directly control the sparsity of inferred representations by using sparsity-inducing regularizers; 2) be seamlessly integrated with a convex error function (e.g., SVM hinge loss) for supervised learning; and 3) be efficiently learned with a simply structured coordinate descent algorithm. Our results demonstrate the advantages of STC and supervised MedSTC on identifying topical meanings of words and improving classification accuracy and time efficiency.

preprint2012arXiv

Unifying Treatment of Discord via Relative Entropy

A new form of zero-discord state via Petz's monotonicity condition on relative entropy with equality has been derived systematically. A generalization of symmetric zero-discord states is presented and the related physical implications are discussed.

preprint2011arXiv

Characterizations of all-derivable points in nest algebras

Let $\mathcal{A}$ be an operator algebra on a Hilbert space. We say that an element $G\in {\mathcal{A}}$ is an all-derivable point of ${\mathcal{A}}$ if every derivable linear mapping $ϕ$ at $G$ (i.e. $ϕ(ST)=ϕ(S)T+Sϕ(T)$ for any $S,T\in alg{\mathcal{N}}$ with $ST=G$) is a derivation. Suppose that $\mathcal{N}$ is a nontrivial complete nest on a Hilbert space $H$. We show in this paper that $G\in {alg\mathcal{N}}$ is an all-derivable point if and only if $G\neq0$.

preprint2011arXiv

Jordan Higher All-Derivable Points in Nest Algebras

Let $\mathcal{N}$ be a non-trivial and complete nest on a Hilbert space $H$. Suppose $d=\{d_n: n\in N\}$ is a group of linear mappings from Alg$\mathcal{N}$ into itself. We say that $d=\{d_n: n\in N\}$ is a Jordan higher derivable mapping at a given point $G$ if $d_{n}(ST+ST)=\sum\limits_{i+j=n}\{d_{i}(S)d_{j}(T)+d_{j}(T)d_{i}(S)\}$ for any $S,T\in Alg \mathcal{N}$ with $ST=G$. An element $G\in Alg \mathcal{N}$ is called a Jordan higher all-derivable point if every Jordan higher derivable mapping at $G$ is a higher derivation. In this paper, we mainly prove that any given point $G$ of Alg$\mathcal{N}$ is a Jordan higher all-derivable point. This extends some results in \cite{Chen11} to the case of higher derivations.

preprint2011arXiv

Jordan higher all-derivable points in triangular algebras

Let ${\mathcal{T}}$ be a triangular algebra. We say that $D=\{D_{n}: n\in N\}\subseteq L({\mathcal{T}})$ is a Jordan higher derivable mapping at $G$ if $D_{n}(ST+TS)=\sum_{i+j=n}(D_{i}(S)D_{j}(T)+D_{i}(T)D_{j}(S))$ for any $S,T\in {\mathcal{T}}$ with $ST=G$. An element $G\in {\mathcal{T}}$ is called a Jordan higher all-derivable point of ${\mathcal{T}}$ if every Jordan higher derivable linear mapping $D=\{D_{n}\}_{n\in N}$ at $G$ is a higher derivation. In this paper, under some mild conditions on ${\mathcal{T}}$, we prove that some elements of ${\mathcal{T}}$ are Jordan higher all-derivable points. This extends some results in [6] to the case of Jordan higher derivations.

preprint2010arXiv

Photoluminescence from nanocrystalline graphite monofluoride

We synthesize and study the structural and optical properties of nanocrystalline graphene monofluoride and graphite monofluoride, which are carbon-based wide bandgap materials. Using laser excitations 2.41 - 5.08 eV, we identify six emission modes of graphite monofluoride, spanning the visible spectrum from red to violet. The energy and linewidth of the modes point to defect-induced midgap states as the source of the photoemission. We discuss possible candidates. Our findings open the window to electro-optical applications of graphene fluoride.

preprint2009arXiv

MedLDA: A General Framework of Maximum Margin Supervised Topic Models

Supervised topic models utilize document's side information for discovering predictive low dimensional representations of documents. Existing models apply the likelihood-based estimation. In this paper, we present a general framework of max-margin supervised topic models for both continuous and categorical response variables. Our approach, the maximum entropy discrimination latent Dirichlet allocation (MedLDA), utilizes the max-margin principle to train supervised topic models and estimate predictive topic representations that are arguably more suitable for prediction tasks. The general principle of MedLDA can be applied to perform joint max-margin learning and maximum likelihood estimation for arbitrary topic models, directed or undirected, and supervised or unsupervised, when the supervised side information is available. We develop efficient variational methods for posterior inference and parameter estimation, and demonstrate qualitatively and quantitatively the advantages of MedLDA over likelihood-based topic models on movie review and 20 Newsgroups data sets.

preprint2009arXiv

The Deposition of High-Quality HfO2 on Graphene and the Effect of Remote Oxide Phonon Scattering

We demonstrate the atomic layer deposition of high-quality HfO2 film on graphene and report the magnitude of remote oxide phonon (ROP) scattering in dual-oxide graphene transistors. Top gates with 30 nm HfO2 oxide layer exhibit excellent doping capacity of greater than 1.5x10^(13)/cm^(2). The carrier mobility in HfO2-covered graphene reaches 20,000 cm^(2)/Vs at low temperature, which is the highest among oxide-covered graphene and compares to that of pristine samples. The temperature-dependent resistivity exhibits the effect of ROP scattering from both the SiO2 substrate and the HfO2 over-layer. At room temperature, surface phonon modes of the HfO2 film centered at 54 meV dominate and limit the carrier mobility to ~20,000 cm^(2)/Vs. Our results highlight the important choice of oxide in graphene devices.

Jun Zhu

What is connected

Connect this record

See the researcher in context

Building this map preview

110 published item(s)

Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Pattern

Audio Super-Resolution with Latent Bridge Models

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

A Roadmap for Big Model

A Semismooth Newton based Augmented Lagrangian Method for Nonsmooth Optimization on Matrix Manifolds

BadDet: Backdoor Attacks on Object Detection

Boosting Transferability of Targeted Adversarial Examples via Hierarchical Generative Networks

Controllable Evaluation and Generation of Physical Adversarial Patch on Face Recognition

CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models

Exploring Memorization in Adversarial Training

Fast Lossless Neural Compression with Integer-Only Discrete Flows

GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing

Maximum Likelihood Training for Score-Based Diffusion ODEs by High-Order Denoising Score Matching

Memory Replay with Data Compression for Continual Learning

OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization

Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based Prior

Regret Analysis for Hierarchical Experts Bandit Problem

Robust Learning of Deep Time Series Anomaly Detection Models with Contaminated Training Data

Robustness and Accuracy Could Be Reconcilable by (Proper) Definition

Thompson Sampling for (Combinatorial) Pure Exploration

Tianshou: a Highly Modularized Deep Reinforcement Learning Library

Towards Job-Transition-Tag Graph for a Better Job Title Representation Learning

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

Cognitive Visual Inspection Service for LCD Manufacturing Industry

DNN2LR: Automatic Feature Crossing for Credit Scoring

Efficient Inference of Flexible Interaction in Spiking-neuron Networks

Fork or Fail: Cycle-Consistent Training with Many-to-One Mappings

ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning

Relaxed Conditional Image Transfer for Semi-supervised Domain Adaptation

Rethinking Natural Adversarial Examples for Classification Models

A Wasserstein Minimum Velocity Approach to Learning Unnormalized Models

Analyzing the Noise Robustness of Deep Neural Networks

Design and Interpretation of Universal Adversarial Patches in Face Detection

Dynamic Window-level Granger Causality of Multi-channel Time Series

Improving Black-box Adversarial Attacks with a Transfer-based Prior

Learning Implicit Generative Models by Teaching Explicit Ones

Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks

Nonparametric Score Estimators

Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness

SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models

Superconducting proximity effect in a transparent van der Waals superconductor-metal junction

To Relieve Your Headache of Training an MRF, Take AdVIL

Triple Generative Adversarial Networks

Triple Memory Networks: a Brain-Inspired Method for Continual Learning

Understanding and Stabilizing GANs' Training Dynamics with Control Theory

VFlow: More Expressive Generative Flows with Variational Data Augmentation

Ferromagnetic van der Waals compound MnSb$_{1.8}$Bi$_{0.2}$Te$_4$

A Model-Based Scatter Artifacts Correction for Cone Beam CT

Analysis and Design of Secure Massive MIMO Systems in the Presence of Hardware Impairments

Conditional Generative Moment-Matching Networks

DeePM: A Deep Part-Based Model for Object Detection and Semantic Part Localization

Fast Sampling for Bayesian Max-Margin Models

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Kernel Bayesian Inference with Posterior Regularization

Learning Deep Generative Models with Doubly Stochastic MCMC

Learning to Generate with Memory

Low-Complexity QoS-Aware Coordinated Scheduling for Heterogenous Networks

Max-Margin Deep Generative Models for (Semi-)Supervised Learning

Max-Margin Nonparametric Latent Feature Models for Link Prediction

Nematic Quantum Critical Fluctuations in BaFe$_{2-x}$Ni$_x$As$_2$

Physical Layer Security for Massive MIMO Systems Impaired by Phase Noise

PSDVec: a Toolbox for Incremental and Scalable Word Embedding

Scaling up Dynamic Topic Models

Secure Massive MIMO Systems with Limited RF Chains

SeDMiD for Confusion Detection: Uncovering Mind State from Time Series Brain Wave Data

Spectral Learning for Supervised Topic Models

Statistically-estimated tree composition for the northeastern United States at the time of Euro-American settlement

Streaming Gibbs Sampling for LDA Model

Towards Better Analysis of Deep Convolutional Neural Networks

Using Edge-Preserving Algorithm with Non-local Mean for Significantly Improved Image-Domain Material Decomposition in Dual Energy CT