Source author record

Bo Han

Bo Han appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

65works

24topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Rigid Alignment: Graph Federated Learning via Dual Manifold Calibration

Graph Federated Learning (GFL) enables collaborative representation learning across distributed subgraphs while preserving privacy. However, heterogeneity remains a critical challenge, as subgraphs across clients typically differ significantly in both semantics and structures. Existing methods address heterogeneity by enforcing the rigid alignment of model parameters or prototypes between clients and the server. However, these alignments implicitly rely on a restrictive global linearity assumption that summarizes local data distributions using a single and globally consistent representation space. This severely compresses the personalized representation space of clients and fails to preserve diverse local graph distributions. To overcome these limitations, we propose Federated Graph Manifold Calibration (FedGMC), a novel paradigm that tackles semantic heterogeneity and structural heterogeneity from a unified manifold perspective. Instead of enforcing rigid alignment, FedGMC introduces a dual manifold calibration mechanism that preserves global commonalities while maximizing the personalized representation space of local clients. Specifically, for semantic heterogeneity, the server constructs a geometrically optimal semantic manifold via equidistant semantic anchors, so as to guide the calibration of local semantic manifolds. For structural heterogeneity, the server constructs a global structural manifold by building global structural templates, so as to guide the calibration of local structural manifolds. Finally, the server dynamically refines both global semantic manifolds and structural manifolds by aggregating local manifolds. Extensive experiments on eleven homophilic and heterophilic graphs demonstrate that FedGMC effectively balances global commonality and local personalization, thereby significantly outperforming state-of-the-art baseline methods.

preprint2023arXiv

Counterfactual Fairness with Partially Known Causal Graph

Fair machine learning aims to avoid treating individuals or sub-populations unfavourably based on \textit{sensitive attributes}, such as gender and race. Those methods in fair machine learning that are built on causal inference ascertain discrimination and bias through causal effects. Though causality-based fair learning is attracting increasing attention, current methods assume the true causal graph is fully known. This paper proposes a general method to achieve the notion of counterfactual fairness when the true causal graph is unknown. To be able to select features that lead to counterfactual fairness, we derive the conditions and algorithms to identify ancestral relations between variables on a \textit{Partially Directed Acyclic Graph (PDAG)}, specifically, a class of causal DAGs that can be learned from observational data combined with domain knowledge. Interestingly, we find that counterfactual fairness can be achieved as if the true causal graph were fully known, when specific background knowledge is provided: the sensitive attributes do not have ancestors in the causal graph. Results on both simulated and real-world datasets demonstrate the effectiveness of our method.

preprint2023arXiv

FRAS: Federated Reinforcement Learning empowered Adaptive Point Cloud Video Streaming

Point cloud video transmission is challenging due to high encoding/decoding complexity, high video bitrate, and low latency requirement. Consequently, conventional adaptive streaming methodologies often find themselves unsatisfactory to meet the requirements in threefold: 1) current algorithms reuse existing quality of experience (QoE) definitions while overlooking the unique features of point cloud video thus failing to provide optimal user experience, 2) most deep learning approaches require long-span data collections to learn sufficiently varied network conditions and result in long training periods and capacity occupation, 3) cloud training approaches pose privacy risks caused by leakage of user reported service usage and networking conditions. To overcome the limitations, we present FRAS, the first federated reinforcement learning framework, to the best of our knowledge, for adaptive point cloud video streaming. We define a new QoE model which takes the unique features of point cloud video into account. Each client uses reinforcement learning (RL) to train video quality selection with the objective of optimizing the user's QoE under multiple constraints. Then, a federated learning framework is integrated with the RL algorithm to enhance training performance with privacy preservation. Extensive simulations using real point cloud videos and network traces reveal the superiority of the proposed scheme over baseline schemes. We also implement a prototype that demonstrates the performance of FRAS via real-world tests.

preprint2022arXiv

Bilateral Dependency Optimization: Defending Against Model-inversion Attacks

Through using only a well-trained classifier, model-inversion (MI) attacks can recover the data used for training the classifier, leading to the privacy leakage of the training data. To defend against MI attacks, previous work utilizes a unilateral dependency optimization strategy, i.e., minimizing the dependency between inputs (i.e., features) and outputs (i.e., labels) during training the classifier. However, such a minimization process conflicts with minimizing the supervised loss that aims to maximize the dependency between inputs and outputs, causing an explicit trade-off between model robustness against MI attacks and model utility on classification tasks. In this paper, we aim to minimize the dependency between the latent representations and the inputs while maximizing the dependency between latent representations and the outputs, named a bilateral dependency optimization (BiDO) strategy. In particular, we use the dependency constraints as a universally applicable regularizer in addition to commonly used losses for deep neural networks (e.g., cross-entropy), which can be instantiated with appropriate dependency criteria according to different tasks. To verify the efficacy of our strategy, we propose two implementations of BiDO, by using two different dependency measures: BiDO with constrained covariance (BiDO-COCO) and BiDO with Hilbert-Schmidt Independence Criterion (BiDO-HSIC). Experiments show that BiDO achieves the state-of-the-art defense performance for a variety of datasets, classifiers, and MI attacks while suffering a minor classification-accuracy drop compared to the well-trained classifier with no defense, which lights up a novel road to defend against MI attacks.

preprint2022arXiv

Brightening of a dark monolayer semiconductor via strong light-matter coupling in a cavity

Engineering the properties of quantum materials via strong light-matter coupling is a compelling research direction with a multiplicity of modern applications. Those range from modifying charge transport in organic molecules, steering particle correlation and interactions, and even controlling chemical reactions. Here, we study the modification of the material properties via strong coupling and demonstrate an effective inversion of the excitonic band-ordering in a monolayer of WSe2 with spin-forbidden, optically dark ground state. In our experiments, we harness the strong light-matter coupling between cavity photon and the high energy, spin-allowed bright exciton, and thus creating two bright polaritonic modes in the optical bandgap with the lower polariton mode pushed below the WSe2 dark state. We demonstrate that in this regime the commonly observed luminescence quenching stemming from the fast relaxation to the dark ground state is prevented, which results in the brightening of this intrinsically dark material. We probe this effective brightening by temperature-dependent photoluminescence, and we find an excellent agreement with a theoretical model accounting for the inversion of the band ordering and phonon-assisted polariton relaxation.

preprint2022arXiv

CausalAdv: Adversarial Robustness through the Lens of Causality

The adversarial vulnerability of deep neural networks has attracted significant attention in machine learning. As causal reasoning has an instinct for modelling distribution change, it is essential to incorporate causality into analyzing this specific type of distribution change induced by adversarial attacks. However, causal formulations of the intuition of adversarial attacks and the development of robust DNNs are still lacking in the literature. To bridge this gap, we construct a causal graph to model the generation process of adversarial examples and define the adversarial distribution to formalize the intuition of adversarial attacks. From the causal perspective, we study the distinction between the natural and adversarial distribution and conclude that the origin of adversarial vulnerability is the focus of models on spurious correlations. Inspired by the causal understanding, we propose the Causal inspired Adversarial distribution alignment method, CausalAdv, to eliminate the difference between natural and adversarial distributions by considering spurious correlations. Extensive experiments demonstrate the efficacy of the proposed method. Our work is the first attempt towards using causality to understand and mitigate the adversarial vulnerability.

preprint2022arXiv

Contrastive Learning with Boosted Memorization

Self-supervised learning has achieved a great success in the representation learning of visual and textual data. However, the current methods are mainly validated on the well-curated datasets, which do not exhibit the real-world long-tailed distribution. Recent attempts to consider self-supervised long-tailed learning are made by rebalancing in the loss perspective or the model perspective, resembling the paradigms in the supervised long-tailed learning. Nevertheless, without the aid of labels, these explorations have not shown the expected significant promise due to the limitation in tail sample discovery or the heuristic structure design. Different from previous works, we explore this direction from an alternative perspective, i.e., the data perspective, and propose a novel Boosted Contrastive Learning (BCL) method. Specifically, BCL leverages the memorization effect of deep neural networks to automatically drive the information discrepancy of the sample views in contrastive learning, which is more efficient to enhance the long-tailed learning in the label-unaware context. Extensive experiments on a range of benchmark datasets demonstrate the effectiveness of BCL over several state-of-the-art methods. Our code is available at https://github.com/MediaBrain-SJTU/BCL.

preprint2022arXiv

DeepMix: Mobility-aware, Lightweight, and Hybrid 3D Object Detection for Headsets

Mobile headsets should be capable of understanding 3D physical environments to offer a truly immersive experience for augmented/mixed reality (AR/MR). However, their small form-factor and limited computation resources make it extremely challenging to execute in real-time 3D vision algorithms, which are known to be more compute-intensive than their 2D counterparts. In this paper, we propose DeepMix, a mobility-aware, lightweight, and hybrid 3D object detection framework for improving the user experience of AR/MR on mobile headsets. Motivated by our analysis and evaluation of state-of-the-art 3D object detection models, DeepMix intelligently combines edge-assisted 2D object detection and novel, on-device 3D bounding box estimations that leverage depth data captured by headsets. This leads to low end-to-end latency and significantly boosts detection accuracy in mobile scenarios. A unique feature of DeepMix is that it fully exploits the mobility of headsets to fine-tune detection results and boost detection accuracy. To the best of our knowledge, DeepMix is the first 3D object detection that achieves 30 FPS (an end-to-end latency much lower than the 100 ms stringent requirement of interactive AR/MR). We implement a prototype of DeepMix on Microsoft HoloLens and evaluate its performance via both extensive controlled experiments and a user study with 30+ participants. DeepMix not only improves detection accuracy by 9.1--37.3% but also reduces end-to-end latency by 2.68--9.15x, compared to the baseline that uses existing 3D object detection models.

preprint2022arXiv

Device-Cloud Collaborative Recommendation via Meta Controller

On-device machine learning enables the lightweight deployment of recommendation models in local clients, which reduces the burden of the cloud-based recommenders and simultaneously incorporates more real-time user features. Nevertheless, the cloud-based recommendation in the industry is still very important considering its powerful model capacity and the efficient candidate generation from the billion-scale item pool. Previous attempts to integrate the merits of both paradigms mainly resort to a sequential mechanism, which builds the on-device recommender on top of the cloud-based recommendation. However, such a design is inflexible when user interests dramatically change: the on-device model is stuck by the limited item cache while the cloud-based recommendation based on the large item pool do not respond without the new re-fresh feedback. To overcome this issue, we propose a meta controller to dynamically manage the collaboration between the on-device recommender and the cloud-based recommender, and introduce a novel efficient sample construction from the causal perspective to solve the dataset absence issue of meta controller. On the basis of the counterfactual samples and the extended training, extensive experiments in the industrial recommendation scenarios show the promise of meta controller in the device-cloud collaboration.

preprint2022arXiv

Direct observation of local antiferroelectricity induced phonon softening at a SrTiO3 defect

Defects in oxides usually exhibit exotic properties that may be associated with the local lattice dynamics. Here, at atomic spatial resolution, we directly measure phonon modes of an antiphase boundary (APB) in SrTiO3 freestanding membrane and correlate them with the picometer-level structural distortion. We find that the SrTiO3 APB introduces new defect phonon modes that are absent in bulk SrTiO3. These modes are highly sensitive to the subtle structure distortion, i.e., the SrTiO3 APB generates the local electric dipoles forming an antiferroelectric configuration, which significantly softens the transverse optical (TO) and longitudinal optical (LO) modes at Γ point. Correlating the local phonons with the subtle structural distortion, our findings provide valuable insights into understanding the defect properties in complex oxides and essential information for their applications such as thermoelectric devices.

preprint2022arXiv

Do We Need to Penalize Variance of Losses for Learning with Label Noise?

Algorithms which minimize the averaged loss have been widely designed for dealing with noisy labels. Intuitively, when there is a finite training sample, penalizing the variance of losses will improve the stability and generalization of the algorithms. Interestingly, we found that the variance should be increased for the problem of learning with noisy labels. Specifically, increasing the variance will boost the memorization effects and reduce the harmfulness of incorrect labels. By exploiting the label noise transition matrix, regularizers can be easily designed to reduce the variance of losses and be plugged in many existing algorithms. Empirically, the proposed method by increasing the variance of losses significantly improves the generalization ability of baselines on both synthetic and real-world datasets.

preprint2022arXiv

EAGAN: Efficient Two-stage Evolutionary Architecture Search for GANs

Generative adversarial networks (GANs) have proven successful in image generation tasks. However, GAN training is inherently unstable. Although many works try to stabilize it by manually modifying GAN architecture, it requires much expertise. Neural architecture search (NAS) has become an attractive solution to search GANs automatically. The early NAS-GANs search only generators to reduce search complexity but lead to a sub-optimal GAN. Some recent works try to search both generator (G) and discriminator (D), but they suffer from the instability of GAN training. To alleviate the instability, we propose an efficient two-stage evolutionary algorithm-based NAS framework to search GANs, namely EAGAN. We decouple the search of G and D into two stages, where stage-1 searches G with a fixed D and adopts the many-to-one training strategy, and stage-2 searches D with the optimal G found in stage-1 and adopts the one-to-one training and weight-resetting strategies to enhance the stability of GAN training. Both stages use the non-dominated sorting method to produce Pareto-front architectures under multiple objectives (e.g., model size, Inception Score (IS), and Fréchet Inception Distance (FID)). EAGAN is applied to the unconditional image generation task and can efficiently finish the search on the CIFAR-10 dataset in 1.2 GPU days. Our searched GANs achieve competitive results (IS=8.81$\pm$0.10, FID=9.91) on the CIFAR-10 dataset and surpass prior NAS-GANs on the STL-10 dataset (IS=10.44$\pm$0.087, FID=22.18). Source code: https://github.com/marsggbo/EAGAN.

preprint2022arXiv

Estimating Instance-dependent Bayes-label Transition Matrix using a Deep Neural Network

In label-noise learning, estimating the transition matrix is a hot topic as the matrix plays an important role in building statistically consistent classifiers. Traditionally, the transition from clean labels to noisy labels (i.e., clean-label transition matrix (CLTM)) has been widely exploited to learn a clean label classifier by employing the noisy data. Motivated by that classifiers mostly output Bayes optimal labels for prediction, in this paper, we study to directly model the transition from Bayes optimal labels to noisy labels (i.e., Bayes-label transition matrix (BLTM)) and learn a classifier to predict Bayes optimal labels. Note that given only noisy data, it is ill-posed to estimate either the CLTM or the BLTM. But favorably, Bayes optimal labels have less uncertainty compared with the clean labels, i.e., the class posteriors of Bayes optimal labels are one-hot vectors while those of clean labels are not. This enables two advantages to estimate the BLTM, i.e., (a) a set of examples with theoretically guaranteed Bayes optimal labels can be collected out of noisy data; (b) the feasible solution space is much smaller. By exploiting the advantages, we estimate the BLTM parametrically by employing a deep neural network, leading to better generalization and superior classification performance.

preprint2022arXiv

Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack

The AutoAttack (AA) has been the most reliable method to evaluate adversarial robustness when considerable computational resources are available. However, the high computational cost (e.g., 100 times more than that of the project gradient descent attack) makes AA infeasible for practitioners with limited computational resources, and also hinders applications of AA in the adversarial training (AT). In this paper, we propose a novel method, minimum-margin (MM) attack, to fast and reliably evaluate adversarial robustness. Compared with AA, our method achieves comparable performance but only costs 3% of the computational time in extensive experiments. The reliability of our method lies in that we evaluate the quality of adversarial examples using the margin between two targets that can precisely identify the most adversarial example. The computational efficiency of our method lies in an effective Sequential TArget Ranking Selection (STARS) method, ensuring that the cost of the MM attack is independent of the number of classes. The MM attack opens a new way for evaluating adversarial robustness and provides a feasible and reliable way to generate high-quality adversarial examples in AT.

preprint2022arXiv

FedNoiL: A Simple Two-Level Sampling Method for Federated Learning with Noisy Labels

Federated learning (FL) aims at training a global model on the server side while the training data are collected and located at the local devices. Hence, the labels in practice are usually annotated by clients of varying expertise or criteria and thus contain different amounts of noises. Local training on noisy labels can easily result in overfitting to noisy labels, which is devastating to the global model through aggregation. Although recent robust FL methods take malicious clients into account, they have not addressed local noisy labels on each device and the impact to the global model. In this paper, we develop a simple two-level sampling method "FedNoiL" that (1) selects clients for more robust global aggregation on the server; and (2) selects clean labels and correct pseudo-labels at the client end for more robust local training. The sampling probabilities are built upon clean label detection by the global model. Moreover, we investigate different schedules changing the local epochs between aggregations over the course of FL, which notably improves the communication and computation efficiency in noisy label setting. In experiments with homogeneous/heterogeneous data distributions and noise ratios, we observed that direct combinations of SOTA FL methods with SOTA noisy-label learning methods can easily fail but our method consistently achieves better and robust performance.

preprint2022arXiv

From the synthesis of hBN crystals to their use as nanosheets for optoelectronic devices

In the wide world of 2D materials, hexagonal boron nitride (hBN) holds a special place due to its excellent characteristics. In addition to its thermal, chemical and mechanical stability, hBN demonstrates high thermal conductivity, low compressibility, and wide band gap around 6 eV, making it promising candidate for many groundbreaking applications and more specifically for optoelectronic devices. Millimeters scale hexagonal boron nitride crystals are obtained through a disruptive dual method (PDC/PCS) consisting in a complementary coupling of the Polymer Derived Ceramics route and a Pressure-Controlled Sintering process. In addition to their excellent chemical and crystalline quality, these crystals exhibit a free exciton lifetime of 0.43 ns, as determined by time-resolved cathodoluminescence measurements, confirming their interesting optical properties. To go further in applicative fields, hBN crystals are then exfoliated, and resulting Boron Nitride NanoSheets (BNNSs) are used to encapsulate transition metal dichalcogenides (TMDs). Such van der Waals heterostructures are tested by optical spectroscopy. BNNSs do not luminesce in the emission spectral range of TMDs and the photoluminescence width of the exciton at 4K is in the range 2-3 meV. All these results demonstrate that these BNNSs are relevant for future opto-electronic applications.

preprint2022arXiv

Improving Adversarial Robustness via Mutual Information Estimation

Deep neural networks (DNNs) are found to be vulnerable to adversarial noise. They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively. We find that adversarial samples usually have larger adversarial MI and smaller natural MI compared with those w.r.t. natural samples. Motivated by this observation, we propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process. In this way, the target model is expected to pay more attention to the natural pattern that contains objective semantics. Empirical evaluations demonstrate that our method could effectively improve the adversarial accuracy against multiple attacks.

preprint2022arXiv

Instance-dependent Label-noise Learning under a Structural Causal Model

Label noise will degenerate the performance of deep learning algorithms because deep neural networks easily overfit label errors. Let X and Y denote the instance and clean label, respectively. When Y is a cause of X, according to which many datasets have been constructed, e.g., SVHN and CIFAR, the distributions of P(X) and P(Y|X) are entangled. This means that the unsupervised instances are helpful to learn the classifier and thus reduce the side effect of label noise. However, it remains elusive on how to exploit the causal information to handle the label noise problem. In this paper, by leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning. In particular, we show that properly modeling the instances will contribute to the identifiability of the label noise transition matrix and thus lead to a better classifier. Empirically, our method outperforms all state-of-the-art methods on both synthetic and real-world label-noise datasets.

preprint2022arXiv

Instance-Dependent Label-Noise Learning with Manifold-Regularized Transition Matrix Estimation

In label-noise learning, estimating the transition matrix has attracted more and more attention as the matrix plays an important role in building statistically consistent classifiers. However, it is very challenging to estimate the transition matrix T(x), where x denotes the instance, because it is unidentifiable under the instance-dependent noise(IDN). To address this problem, we have noticed that, there are psychological and physiological evidences showing that we humans are more likely to annotate instances of similar appearances to the same classes, and thus poor-quality or ambiguous instances of similar appearances are easier to be mislabeled to the correlated or same noisy classes. Therefore, we propose assumption on the geometry of T(x) that "the closer two instances are, the more similar their corresponding transition matrices should be". More specifically, we formulate above assumption into the manifold embedding, to effectively reduce the degree of freedom of T(x) and make it stably estimable in practice. The proposed manifold-regularized technique works by directly reducing the estimation error without hurting the approximation error about the estimation problem of T(x). Experimental evaluations on four synthetic and two real-world datasets demonstrate that our method is superior to state-of-the-art approaches for label-noise learning under the challenging IDN.

preprint2022arXiv

Learning with Multiple Complementary Labels

A complementary label (CL) simply indicates an incorrect class of an example, but learning with CLs results in multi-class classifiers that can predict the correct class. Unfortunately, the problem setting only allows a single CL for each example, which notably limits its potential since our labelers may easily identify multiple CLs (MCLs) to one example. In this paper, we propose a novel problem setting to allow MCLs for each example and two ways for learning with MCLs. In the first way, we design two wrappers that decompose MCLs into many single CLs, so that we could use any method for learning with CLs. However, the supervision information that MCLs hold is conceptually diluted after decomposition. Thus, in the second way, we derive an unbiased risk estimator; minimizing it processes each set of MCLs as a whole and possesses an estimation error bound. We further improve the second way into minimizing properly chosen upper bounds. Experiments show that the former way works well for learning with MCLs but the latter is even better.

preprint2022arXiv

Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm Regularization

Nonconvex regularization has been popularly used in low-rank matrix learning. However, extending it for low-rank tensor learning is still computationally expensive. To address this problem, we develop an efficient solver for use with a nonconvex extension of the overlapped nuclear norm regularizer. Based on the proximal average algorithm, the proposed algorithm can avoid expensive tensor folding/unfolding operations. A special "sparse plus low-rank" structure is maintained throughout the iterations, and allows fast computation of the individual proximal steps. Empirical convergence is further improved with the use of adaptive momentum. We provide convergence guarantees to critical points on smooth losses and also on objectives satisfying the Kurdyka-Łojasiewicz condition. While the optimization problem is nonconvex and nonsmooth, we show that its critical points still have good statistical performance on the tensor completion problem. Experiments on various synthetic and real-world data sets show that the proposed algorithm is efficient in both time and space and more accurate than the existing state-of-the-art.

preprint2022arXiv

Meta Discovery: Learning to Discover Novel Classes given Very Limited Data

In novel class discovery (NCD), we are given labeled data from seen classes and unlabeled data from unseen classes, and we train clustering models for the unseen classes. However, the implicit assumptions behind NCD are still unclear. In this paper, we demystify assumptions behind NCD and find that high-level semantic features should be shared among the seen and unseen classes. Based on this finding, NCD is theoretically solvable under certain assumptions and can be naturally linked to meta-learning that has exactly the same assumption as NCD. Thus, we can empirically solve the NCD problem by meta-learning algorithms after slight modifications. This meta-learning-based methodology significantly reduces the amount of unlabeled data needed for training and makes it more practical, as demonstrated in experiments. The use of very limited data is also justified by the application scenario of NCD: since it is unnatural to label only seen-class data, NCD is sampling instead of labeling in causality. Therefore, unseen-class data should be collected on the way of collecting seen-class data, which is why they are novel and first need to be clustered.

preprint2022arXiv

Modeling Adversarial Noise for Adversarial Training

Deep neural networks have been demonstrated to be vulnerable to adversarial noise, promoting the development of defense against adversarial attacks. Motivated by the fact that adversarial noise contains well-generalizing features and that the relationship between adversarial data and natural data can help infer natural data and make reliable predictions, in this paper, we study to model adversarial noise by learning the transition relationship between adversarial labels (i.e. the flipped labels used to generate adversarial data) and natural labels (i.e. the ground truth labels of the natural data). Specifically, we introduce an instance-dependent transition matrix to relate adversarial labels and natural labels, which can be seamlessly embedded with the target model (enabling us to model stronger adaptive adversarial noise). Empirical evaluations demonstrate that our method could effectively improve adversarial accuracy.

preprint2022arXiv

MSR: Making Self-supervised learning Robust to Aggressive Augmentations

Most recent self-supervised learning methods learn visual representation by contrasting different augmented views of images. Compared with supervised learning, more aggressive augmentations have been introduced to further improve the diversity of training pairs. However, aggressive augmentations may distort images' structures leading to a severe semantic shift problem that augmented views of the same image may not share the same semantics, thus degrading the transfer performance. To address this problem, we propose a new SSL paradigm, which counteracts the impact of semantic shift by balancing the role of weak and aggressively augmented pairs. Specifically, semantically inconsistent pairs are of minority and we treat them as noisy pairs. Note that deep neural networks (DNNs) have a crucial memorization effect that DNNs tend to first memorize clean (majority) examples before overfitting to noisy (minority) examples. Therefore, we set a relatively large weight for aggressively augmented data pairs at the early learning stage. With the training going on, the model begins to overfit noisy pairs. Accordingly, we gradually reduce the weights of aggressively augmented pairs. In doing so, our method can better embrace the aggressive augmentations and neutralize the semantic shift problem. Experiments show that our model achieves 73.1% top-1 accuracy on ImageNet-1K with ResNet-50 for 200 epochs, which is a 2.5% improvement over BYOL. Moreover, experiments also demonstrate that the learned representations can transfer well for various downstream tasks.

preprint2022arXiv

Nebula: Reliable Low-latency Video Transmission for Mobile Cloud Gaming

Mobile cloud gaming enables high-end games on constrained devices by streaming the game content from powerful servers through mobile networks. Mobile networks suffer from highly variable bandwidth, latency, and losses that affect the gaming experience. This paper introduces Nebula, an end-to-end cloud gaming framework to minimize the impact of network conditions on the user experience. Nebula relies on an end-to-end distortion model adapting the video source rate and the amount of frame-level redundancy based on the measured network conditions. As a result, it minimizes the motion-to-photon (MTP) latency while protecting the frames from losses. We fully implement Nebula and evaluate its performance against the state of the art techniques and latest research in real-time mobile cloud gaming transmission on a physical testbed over emulated and real wireless networks. Nebula consistently balances MTP latency (<140 ms) and visual quality (>31 dB) even in highly variable environments. A user experiment confirms that Nebula maximizes the user experience with high perceived video quality, playability, and low user load.

preprint2022arXiv

NoiLIn: Improving Adversarial Training and Correcting Stereotype of Noisy Labels

Adversarial training (AT) formulated as the minimax optimization problem can effectively enhance the model's robustness against adversarial attacks. The existing AT methods mainly focused on manipulating the inner maximization for generating quality adversarial variants or manipulating the outer minimization for designing effective learning objectives. However, empirical results of AT always exhibit the robustness at odds with accuracy and the existence of the cross-over mixture problem, which motivates us to study some label randomness for benefiting the AT. First, we thoroughly investigate noisy labels (NLs) injection into AT's inner maximization and outer minimization, respectively and obtain the observations on when NL injection benefits AT. Second, based on the observations, we propose a simple but effective method -- NoiLIn that randomly injects NLs into training data at each training epoch and dynamically increases the NL injection rate once robust overfitting occurs. Empirically, NoiLIn can significantly mitigate the AT's undesirable issue of robust overfitting and even further improve the generalization of the state-of-the-art AT methods. Philosophically, NoiLIn sheds light on a new perspective of learning with NLs: NLs should not always be deemed detrimental, and even in the absence of NLs in the training set, we may consider injecting them deliberately. Codes are available in https://github.com/zjfheart/NoiLIn.

preprint2022arXiv

Pluralistic Image Completion with Probabilistic Mixture-of-Experts

Pluralistic image completion focuses on generating both visually realistic and diverse results for image completion. Prior methods enjoy the empirical successes of this task. However, their used constraints for pluralistic image completion are argued to be not well interpretable and unsatisfactory from two aspects. First, the constraints for visual reality can be weakly correlated to the objective of image completion or even redundant. Second, the constraints for diversity are designed to be task-agnostic, which causes the constraints to not work well. In this paper, to address the issues, we propose an end-to-end probabilistic method. Specifically, we introduce a unified probabilistic graph model that represents the complex interactions in image completion. The entire procedure of image completion is then mathematically divided into several sub-procedures, which helps efficient enforcement of constraints. The sub-procedure directly related to pluralistic results is identified, where the interaction is established by a Gaussian mixture model (GMM). The inherent parameters of GMM are task-related, which are optimized adaptively during training, while the number of its primitives can control the diversity of results conveniently. We formally establish the effectiveness of our method and demonstrate it with comprehensive experiments.

preprint2022arXiv

Pointwise Binary Classification with Pairwise Confidence Comparisons

To alleviate the data requirement for training effective binary classifiers in binary classification, many weakly supervised learning settings have been proposed. Among them, some consider using pairwise but not pointwise labels, when pointwise labels are not accessible due to privacy, confidentiality, or security reasons. However, as a pairwise label denotes whether or not two data points share a pointwise label, it cannot be easily collected if either point is equally likely to be positive or negative. Thus, in this paper, we propose a novel setting called pairwise comparison (Pcomp) classification, where we have only pairs of unlabeled data that we know one is more likely to be positive than the other. Firstly, we give a Pcomp data generation process, derive an unbiased risk estimator (URE) with theoretical guarantee, and further improve URE using correction functions. Secondly, we link Pcomp classification to noisy-label learning to develop a progressive URE and improve it by imposing consistency regularization. Finally, we demonstrate by experiments the effectiveness of our methods, which suggests Pcomp is a valuable and practically useful type of pairwise supervision besides the pairwise label.

preprint2022arXiv

Probabilistic Margins for Instance Reweighting in Adversarial Training

Reweighting adversarial data during training has been recently shown to improve adversarial robustness, where data closer to the current decision boundaries are regarded as more critical and given larger weights. However, existing methods measuring the closeness are not very reliable: they are discrete and can take only a few values, and they are path-dependent, i.e., they may change given the same start and end points with different attack paths. In this paper, we propose three types of probabilistic margin (PM), which are continuous and path-independent, for measuring the aforementioned closeness and reweighting adversarial data. Specifically, a PM is defined as the difference between two estimated class-posterior probabilities, e.g., such the probability of the true label minus the probability of the most confusing label given some natural data. Though different PMs capture different geometric properties, all three PMs share a negative correlation with the vulnerability of data: data with larger/smaller PMs are safer/riskier and should have smaller/larger weights. Experiments demonstrate that PMs are reliable measurements and PM-based reweighting methods outperform state-of-the-art methods.

preprint2022arXiv

Reliable Adversarial Distillation with Unreliable Teachers

In ordinary distillation, student networks are trained with soft labels (SLs) given by pretrained teacher networks, and students are expected to improve upon teachers since SLs are stronger supervision than the original hard labels. However, when considering adversarial robustness, teachers may become unreliable and adversarial distillation may not work: teachers are pretrained on their own adversarial data, and it is too demanding to require that teachers are also good at every adversarial data queried by students. Therefore, in this paper, we propose reliable introspective adversarial distillation (IAD) where students partially instead of fully trust their teachers. Specifically, IAD distinguishes between three cases given a query of a natural data (ND) and the corresponding adversarial data (AD): (a) if a teacher is good at AD, its SL is fully trusted; (b) if a teacher is good at ND but not AD, its SL is partially trusted and the student also takes its own SL into account; (c) otherwise, the student only relies on its own SL. Experiments demonstrate the effectiveness of IAD for improving upon teachers in terms of adversarial robustness.

preprint2022arXiv

Rethinking Class-Prior Estimation for Positive-Unlabeled Learning

Given only positive (P) and unlabeled (U) data, PU learning can train a binary classifier without any negative data. It has two building blocks: PU class-prior estimation (CPE) and PU classification; the latter has been well studied while the former has received less attention. Hitherto, the distributional-assumption-free CPE methods rely on a critical assumption that the support of the positive data distribution cannot be contained in the support of the negative data distribution. If this is violated, those CPE methods will systematically overestimate the class prior; it is even worse that we cannot verify the assumption based on the data. In this paper, we rethink CPE for PU learning-can we remove the assumption to make CPE always valid? We show an affirmative answer by proposing Regrouping CPE (ReCPE) that builds an auxiliary probability distribution such that the support of the positive data distribution is never contained in the support of the negative data distribution. ReCPE can work with any CPE method by treating it as the base method. Theoretically, ReCPE does not affect its base if the assumption already holds for the original probability distribution; otherwise, it reduces the positive bias of its base. Empirically, ReCPE improves all state-of-the-art CPE methods on various datasets, implying that the assumption has indeed been violated here.

preprint2022arXiv

Robust Weight Perturbation for Adversarial Training

Overfitting widely exists in adversarial robust training of deep networks. An effective remedy is adversarial weight perturbation, which injects the worst-case weight perturbation during network training by maximizing the classification loss on adversarial examples. Adversarial weight perturbation helps reduce the robust generalization gap; however, it also undermines the robustness improvement. A criterion that regulates the weight perturbation is therefore crucial for adversarial training. In this paper, we propose such a criterion, namely Loss Stationary Condition (LSC) for constrained perturbation. With LSC, we find that it is essential to conduct weight perturbation on adversarial data with small classification loss to eliminate robust overfitting. Weight perturbation on adversarial data with large classification loss is not necessary and may even lead to poor robustness. Based on these observations, we propose a robust perturbation strategy to constrain the extent of weight perturbation. The perturbation strategy prevents deep networks from overfitting while avoiding the side effect of excessive weight perturbation, significantly improving the robustness of adversarial training. Extensive experiments demonstrate the superiority of the proposed method over the state-of-the-art adversarial training methods.

preprint2022arXiv

Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model

The drastic increase of data quantity often brings the severe decrease of data quality, such as incorrect label annotations, which poses a great challenge for robustly training Deep Neural Networks (DNNs). Existing learning \mbox{methods} with label noise either employ ad-hoc heuristics or restrict to specific noise assumptions. However, more general situations, such as instance-dependent label noise, have not been fully explored, as scarce studies focus on their label corruption process. By categorizing instances into confusing and unconfusing instances, this paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. The resultant model can be realized by DNNs, where the training procedure is accomplished by employing an alternating optimization algorithm. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness over state-of-the-art counterparts.

preprint2022arXiv

TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation

In few-shot domain adaptation (FDA), classifiers for the target domain are trained with accessible labeled data in the source domain (SD) and few labeled data in the target domain (TD). However, data usually contain private information in the current era, e.g., data distributed on personal phones. Thus, the private information will be leaked if we directly access data in SD to train a target-domain classifier (required by FDA methods). In this paper, to thoroughly prevent the privacy leakage in SD, we consider a very challenging problem setting, where the classifier for the TD has to be trained using few labeled target data and a well-trained SD classifier, named few-shot hypothesis adaptation (FHA). In FHA, we cannot access data in SD, as a result, the private information in SD will be protected well. To this end, we propose a target orientated hypothesis adaptation network (TOHAN) to solve the FHA problem, where we generate highly-compatible unlabeled data (i.e., an intermediate domain) to help train a target-domain classifier. TOHAN maintains two deep networks simultaneously, where one focuses on learning an intermediate domain and the other takes care of the intermediate-to-target distributional adaptation and the target-risk minimization. Experimental results show that TOHAN outperforms competitive baselines significantly.

preprint2022arXiv

Understanding and Improving Graph Injection Attack by Promoting Unnoticeability

Recently Graph Injection Attack (GIA) emerges as a practical attack scenario on Graph Neural Networks (GNNs), where the adversary can merely inject few malicious nodes instead of modifying existing nodes or edges, i.e., Graph Modification Attack (GMA). Although GIA has achieved promising results, little is known about why it is successful and whether there is any pitfall behind the success. To understand the power of GIA, we compare it with GMA and find that GIA can be provably more harmful than GMA due to its relatively high flexibility. However, the high flexibility will also lead to great damage to the homophily distribution of the original graph, i.e., similarity among neighbors. Consequently, the threats of GIA can be easily alleviated or even prevented by homophily-based defenses designed to recover the original homophily. To mitigate the issue, we introduce a novel constraint -- homophily unnoticeability that enforces GIA to preserve the homophily, and propose Harmonious Adversarial Objective (HAO) to instantiate it. Extensive experiments verify that GIA with HAO can break homophily-based defenses and outperform previous GIA attacks by a significant margin. We believe our methods can serve for a more reliable evaluation of the robustness of GNNs.

preprint2022arXiv

Understanding Robust Overfitting of Adversarial Training and Beyond

Robust overfitting widely exists in adversarial training of deep networks. The exact underlying reasons for this are still not completely understood. Here, we explore the causes of robust overfitting by comparing the data distribution of \emph{non-overfit} (weak adversary) and \emph{overfitted} (strong adversary) adversarial training, and observe that the distribution of the adversarial data generated by weak adversary mainly contain small-loss data. However, the adversarial data generated by strong adversary is more diversely distributed on the large-loss data and the small-loss data. Given these observations, we further designed data ablation adversarial training and identify that some small-loss data which are not worthy of the adversary strength cause robust overfitting in the strong adversary mode. To relieve this issue, we propose \emph{minimum loss constrained adversarial training} (MLCAT): in a minibatch, we learn large-loss data as usual, and adopt additional measures to increase the loss of the small-loss data. Technically, MLCAT hinders data fitting when they become easy to learn to prevent robust overfitting; philosophically, MLCAT reflects the spirit of turning waste into treasure and making the best use of each adversarial data; algorithmically, we designed two realizations of MLCAT, and extensive experiments demonstrate that MLCAT can eliminate robust overfitting and further boost adversarial robustness.

preprint2022arXiv

Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning

In federated learning (FL), model performance typically suffers from client drift induced by data heterogeneity, and mainstream works focus on correcting client drift. We propose a different approach named virtual homogeneity learning (VHL) to directly "rectify" the data heterogeneity. In particular, VHL conducts FL with a virtual homogeneous dataset crafted to satisfy two conditions: containing no private information and being separable. The virtual dataset can be generated from pure noise shared across clients, aiming to calibrate the features from the heterogeneous clients. Theoretically, we prove that VHL can achieve provable generalization performance on the natural distribution. Empirically, we demonstrate that VHL endows FL with drastically improved convergence speed and generalization performance. VHL is the first attempt towards using a virtual dataset to address data heterogeneity, offering new and effective means to FL.

preprint2021arXiv

A Survey of Label-noise Representation Learning: Past, Present and Future

Classical machine learning implicitly assumes that labels of the training data are sampled from a clean distribution, which can be too restrictive for real-world scenarios. However, statistical-learning-based methods may not train deep learning models robustly with these noisy labels. Therefore, it is urgent to design Label-Noise Representation Learning (LNRL) methods for robustly training deep models with noisy labels. To fully understand LNRL, we conduct a survey study. We first clarify a formal definition for LNRL from the perspective of machine learning. Then, via the lens of learning theory and empirical study, we figure out why noisy labels affect deep models' performance. Based on the theoretical guidance, we categorize different LNRL methods into three directions. Under this unified taxonomy, we provide a thorough discussion of the pros and cons of different categories. More importantly, we summarize the essential components of robust LNRL, which can spark new directions. Lastly, we propose possible research directions within LNRL, such as new datasets, instance-dependent LNRL, and adversarial LNRL. We also envision potential directions beyond LNRL, such as learning with feature-noise, preference-noise, domain-noise, similarity-noise, graph-noise and demonstration-noise.

preprint2021arXiv

Butterfly: One-step Approach towards Wildly Unsupervised Domain Adaptation

In unsupervised domain adaptation (UDA), classifiers for the target domain (TD) are trained with clean labeled data from the source domain (SD) and unlabeled data from TD. However, in the wild, it is difficult to acquire a large amount of perfectly clean labeled data in SD given limited budget. Hence, we consider a new, more realistic and more challenging problem setting, where classifiers have to be trained with noisy labeled data from SD and unlabeled data from TD -- we name it wildly UDA (WUDA). We show that WUDA ruins all UDA methods if taking no care of label noise in SD, and to this end, we propose a Butterfly framework, a powerful and efficient solution to WUDA. Butterfly maintains four deep networks simultaneously, where two take care of all adaptations (i.e., noisy-to-clean, labeled-to-unlabeled, and SD-to-TD-distributional) and then the other two can focus on classification in TD. As a consequence, Butterfly possesses all the conceptually necessary components for solving WUDA. Experiments demonstrate that, under WUDA, Butterfly significantly outperforms existing baseline methods.

preprint2021arXiv

Confidence Scores Make Instance-dependent Label-noise Learning Possible

In learning with noisy labels, for every instance, its label can randomly walk to other classes following a transition distribution which is named a noise model. Well-studied noise models are all instance-independent, namely, the transition depends only on the original label but not the instance itself, and thus they are less practical in the wild. Fortunately, methods based on instance-dependent noise have been studied, but most of them have to rely on strong assumptions on the noise models. To alleviate this issue, we introduce confidence-scored instance-dependent noise (CSIDN), where each instance-label pair is equipped with a confidence score. We find with the help of confidence scores, the transition distribution of each instance can be approximately estimated. Similarly to the powerful forward correction for instance-independent noise, we propose a novel instance-level forward correction for CSIDN. We demonstrate the utility and effectiveness of our method through multiple experiments under synthetic label noise and real-world unknown noise.

preprint2021arXiv

Engineering of Atomic-Scale Flexoelectricity at Grain Boundaries

Flexoelectricity is a type of ubiquitous and prominent electromechanical coupling, pertaining to the response of electrical polarization to mechanical strain gradients while not restricted to the symmetry of materials. However, large elastic deformation in most solids is usually difficult to achieve and the strain gradient at minuscule is challenging to control. Here we exploit the exotic structural inhomogeneity of grain boundary to achieve a huge strain gradient (~ 1.2 nm-1) within 3 ~ 4 unit-cells, and thus obtain atomic-scale flexoelectric polarization up to ~ 38 μC/cm2 at a 24 LaAlO3 grain boundary. The nanoscale flexoelectricity also modifies the electrical activity of grain boundaries. Moreover, we prove that it is a general and feasible way to form large strain gradients at atomic scale by altering the misorientation angles of grain boundaries in different dielectric materials. Thus, engineering of grain boundaries provides an effective pathway to achieve tunable flexoelectricity and broadens the electromechanical functionalities of non-piezoelectric materials.

preprint2021arXiv

Torsion, energy magnetization and thermal Hall effect

We study the effective action of hydrostatic response to torsion in the absence of spin connections in gapped $\left(2+1\right)$-dimensional topological phases. In previous studies, a torsional Chern-Simons term with a temperature-squared ($T^2$) coefficient was proposed as an alternative action to describe thermal Hall effect with the idea of balancing the diffusion of heat by a torsional field. However, the question remains whether this action leads to local bulk thermal response which is not suppressed by the gap. In our hydrostatic effective action, we show that the $T^2$ bulk term is invariant under variations up to boundary terms considering the back reaction of the geometry on local temperature, which precisely describes the edge thermal current. Furthermore, there is no boundary diffeomorphism anomalies and bulk inflow thermal currents at equilibrium and therefore no edge-to-edge adiabatic thermal current pumping. These results are in consistent with exponentially suppressed thermal current for gapped phases.

preprint2021arXiv

Understanding the Interaction of Adversarial Training with Noisy Labels

Noisy labels (NL) and adversarial examples both undermine trained models, but interestingly they have hitherto been studied independently. A recent adversarial training (AT) study showed that the number of projected gradient descent (PGD) steps to successfully attack a point (i.e., find an adversarial example in its proximity) is an effective measure of the robustness of this point. Given that natural data are clean, this measure reveals an intrinsic geometric property -- how far a point is from its class boundary. Based on this breakthrough, in this paper, we figure out how AT would interact with NL. Firstly, we find if a point is too close to its noisy-class boundary (e.g., one step is enough to attack it), this point is likely to be mislabeled, which suggests to adopt the number of PGD steps as a new criterion for sample selection for correcting NL. Secondly, we confirm AT with strong smoothing effects suffers less from NL (without NL corrections) than standard training (ST), which suggests AT itself is an NL correction. Hence, AT with NL is helpful for improving even the natural accuracy, which again illustrates the superiority of AT as a general-purpose robust learning criterion.

preprint2020arXiv

Attacks Which Do Not Kill Training Make Adversarial Learning Stronger

Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question---do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ confident adversarial data for updating the current model. We propose a novel approach of friendly adversarial training (FAT): rather than employing most adversarial data maximizing the loss, we search for least adversarial (i.e., friendly adversarial) data minimizing the loss, among the adversarial data that are confidently misclassified. Our novel formulation is easy to implement by just stopping the most adversarial data searching algorithms such as PGD (projected gradient descent) early, which we call early-stopped PGD. Theoretically, FAT is justified by an upper bound of the adversarial risk. Empirically, early-stopped PGD allows us to answer the earlier question negatively---adversarial robustness can indeed be achieved without compromising the natural generalization.

preprint2020arXiv

Beyond Unfolding: Exact Recovery of Latent Convex Tensor Decomposition under Reshuffling

Exact recovery of tensor decomposition (TD) methods is a desirable property in both unsupervised learning and scientific data analysis. The numerical defects of TD methods, however, limit their practical applications on real-world data. As an alternative, convex tensor decomposition (CTD) was proposed to alleviate these problems, but its exact-recovery property is not properly addressed so far. To this end, we focus on latent convex tensor decomposition (LCTD), a practically widely-used CTD model, and rigorously prove a sufficient condition for its exact-recovery property. Furthermore, we show that such property can be also achieved by a more general model than LCTD. In the new model, we generalize the classic tensor (un-)folding into reshuffling operation, a more flexible mapping to relocate the entries of the matrix into a tensor. Armed with the reshuffling operations and exact-recovery property, we explore a totally novel application for (generalized) LCTD, i.e., image steganography. Experimental results on synthetic data validate our theory, and results on image steganography show that our method outperforms the state-of-the-art methods.

preprint2020arXiv

Entanglement Entropy of Generalized Moore-Read Fractional Quantum Hall State Interfaces

Topologically ordered phases of matter can be characterized by the presence of a universal, constant contribution to the entanglement entropy known as the topological entanglement entropy (TEE). The TEE can been calculated for Abelian phases via a "cut-and-glue" approach by treating the entanglement cut as a physical cut, coupling the resulting gapless edges with explicit tunneling terms, and computing the entanglement between the two edges. We provide a first step towards extending this methodology to non-Abelian topological phases, focusing on the generalized Moore-Read (MR) fractional quantum Hall states at filling fractions $ν=1/n$. We consider interfaces between different MR states, write down explicit gapping interactions, which we motivate using an anyon condensation picture, and compute the entanglement entropy for an entanglement cut lying along the interface. Our work provides new insight towards understanding the connections between anyon condensation, gapped interfaces of non-Abelian phases, and TEE.

preprint2020arXiv

GeneticKNN: A Weighted KNN Approach Supported by Genetic Algorithm for Photometric Redshift Estimation of Quasars

We combine K-Nearest Neighbors (KNN) with genetic algorithm (GA) for photometric redshift estimation of quasars, short for GeneticKNN, which is a weighted KNN approach supported by GA. This approach has two improvements compared to KNN: one is the feature weighted by GA; another is that the predicted redshift is not the redshift average of K neighbors but the weighted average of median and mean of redshifts for K neighbors, i.e. $p\times z_{median} + (1-p)\times z_{mean}$. Based on the SDSS and SDSS-WISE quasar samples, we explore the performance of GeneticKNN for photometric redshift estimation, comparing with the other six traditional machine learning methods, i.e. Least absolute shrinkage and selection operator (LASSO), support vector regression (SVR), Multi Layer Perceptrons (MLP), XGBoost, KNN and random forest. KNN and random forest show their superiority. Considering the easy implementation of KNN, we make improvement on KNN as GeneticKNN and apply GeneticKNN on photometric redshift estimation of quasars. Finally the performance of GeneticKNN is better than that of LASSO, SVR, MLP, XGBoost, KNN and random forest for all cases. Moreover the accuracy is better with the additional WISE magnitudes for the same method.

preprint2020arXiv

Hamiltonian approach to the torsional anomalies and its dimensional ladder

Torsion can cause various anomalies in various dimensions, including the $\left(3+1\right)$-dimensional $[(3+1)D]$ Nieh-Yan anomaly, the $\left(2+1\right)$D Hughes-Leigh-Fradkin (HLF) parity anomaly, and the $\left(3+1\right)$D, $\left(1+1\right)$D chiral energy-momentum anomaly. We study these anomalies from the Hamiltonian approach. We derive the $\left(1+1\right)$D chiral energy-momentum anomaly from the single-body Hamiltonian. We then show how other torsional anomalies can be related to the $\left(1+1\right)$D chiral energy-momentum anomaly in a straightforward way. Finally, the Nieh-Yan anomaly and the $\left(3+1\right)$D chiral energy-momentum anomaly are obtained from the parity anomaly and the HLF effective action, respectively. Hence, we have constructed the dimensional ladder for the torsional anomalies from the single-body Hamiltonian picture.

preprint2020arXiv

Multi-Class Classification from Noisy-Similarity-Labeled Data

A similarity label indicates whether two instances belong to the same class while a class label shows the class of the instance. Without class labels, a multi-class classifier could be learned from similarity-labeled pairwise data by meta classification learning. However, since the similarity label is less informative than the class label, it is more likely to be noisy. Deep neural networks can easily remember noisy data, leading to overfitting in classification. In this paper, we propose a method for learning from only noisy-similarity-labeled data. Specifically, to model the noise, we employ a noise transition matrix to bridge the class-posterior probability between clean and noisy data. We further estimate the transition matrix from only noisy data and build a novel learning system to learn a classifier which can assign noise-free class labels for instances. Moreover, we theoretically justify how our proposed method generalizes for learning classifiers. Experimental results demonstrate the superiority of the proposed method over the state-of-the-art method on benchmark-simulated and real-world noisy-label datasets.

preprint2020arXiv

Nieh-Yan Anomaly: Torsional Landau Levels, central charge and anomalous thermal Hall effect

The Nieh-Yan anomaly is the anomalous breakdown of the chiral U(1) symmetry caused by the interaction between torsion and fermions. We study this anomaly from the point of view of torsional Landau levels. It was found that the torsional Landau levels are gapless, while their contributions to the chiral anomaly are canceled, except those from the lowest torsional Landau levels. Hence, the dimension is effectively reduced from (3+1)-dimensional to (1+1)-dimensional. We further show that the coefficient of the Nieh-Yan anomaly is the free energy density in (1+1) dimensions. Especially, at finite temperature, the thermal Nieh-Yan anomaly is proportional to the central charge. The anomalous thermal Hall conductance in Weyl semimetals is then shown to be proportional to the central charge, which is the experimental fingerprint of the thermal Nieh-Yan anomaly.

preprint2020arXiv

Searching to Exploit Memorization Effect in Learning from Corrupted Labels

Sample selection approaches are popular in robust learning from noisy labels. However, how to properly control the selection process so that deep networks can benefit from the memorization effect is a hard problem. In this paper, motivated by the success of automated machine learning (AutoML), we model this issue as a function approximation problem. Specifically, we design a domain-specific search space based on general patterns of the memorization effect and propose a novel Newton algorithm to solve the bi-level optimization problem efficiently. We further provide theoretical analysis of the algorithm, which ensures a good approximation to critical points. Experiments are performed on benchmark data sets. Results demonstrate that the proposed method is much better than the state-of-the-art noisy-label-learning approaches, and also much more efficient than existing AutoML algorithms.

preprint2020arXiv

Torsional Anomalies and Bulk-Dislocation Correspondence in Weyl Systems

Based on the supersymmetric quantum mechanical approach, we have systematically studied both the $U\left(1\right)$ gauge anomaly and the diffeomorphism anomaly in Weyl systems with torsion, curvature and external electromagnetic fields. These anomalies relate to the chiral current (or current) non-conservation and chiral energy-momentum (or energy-momentum) non-conservation, respectively, which can be applied to the $^{3}\text{He-A}$ phase, the chiral superconductors and the Weyl semimetals with dislocations and disclinations. In sharp difference with other anomalies, there exist torsional anomalies depending on the position of Weyl nodes in the energy-momentum space. These anomalies originate from particles pumped up through the Weyl nodes and they are thus insensitive to the ultra-violet physics, while the Nieh-Yan anomaly is from the particle inflow through the ultra-violet cut-off. The current non-conservation as well as the energy-momentum non-conservation are found, which stem from the zero modes trapped in the dislocations and they can be understood from the Callan-Harvey mechanism. Finally, by comparing our results with the well-established momentum anomaly in the $^{3}\text{He-A}$ phase, the Nieh-Yan term as well as other cut-off dependent terms are shown to be negligible, because the ratio between the Lorentz symmetry breaking scale and the chemical potential is of order $10^{-5}$.

preprint2019arXiv

Correlating the Electronic Structures of Metallic/Semiconductor MoTe2 Interface to its Atomic Structures

Contact interface properties are important in determining the performances of devices based on atomically thin two-dimensional (2D) materials, especially those with short channels. Understanding the contact interface is therefore quite important to design better devices. Herein, we use scanning transmission electron microscopy, electron energy loss spectroscopy, and first-principles calculations to reveal the electronic structures within the metallic (1T')-semiconducting (2H) MoTe2 coplanar phase boundary across a wide spectral range and correlate its properties and atomic structure. We find that the 2H-MoTe2 excitonic peaks cross the phase boundary into the 1T' phase within a range of approximately 150 nm. The 1T'-MoTe2 crystal field can penetrate the boundary and extend into the 2H phase by approximately two unit cells. The plasmonic oscillations exhibit strong angle dependence, i.e., a red-shift (approximately 0.3 eV-1.2 eV) occurs within 4 nm at 1T'/2H-MoTe2 boundaries with large tilt angles, but there is no shift at zero-tilted boundaries. These atomic-scale measurements reveal the structure-property relationships of 1T'/2H-MoTe2 boundary, providing useful information for phase boundary engineering and device development based on 2D materials.

preprint2019arXiv

The general fifth-order nonlinear Schrödinger equation with nonzero boundary conditions: Inverse scattering transform and multisoliton solutions

Under investigation in this work is the inverse scattering transform of the general fifth-order nonlinear Schrödinger equation with nonzero boundary conditions (NZBCs), which can be reduced to several integrable equations. Firstly, a matrix Riemann-Hilbert problem for the equation with NZBCs at infinity is systematically investigated.Then the inverse problems are solved through the investigation of the matrix Riemann-Hilbert problem. Thus, the general solutions for the potentials, and explicit expressions for the reflection-less potentials are well constructed. Furthermore, the trace formulae and theta conditions are also presented. In particular, we analyze the simple-pole and double-pole solutions for the equation with NZBCs. Finally, the dynamics of the obtained solutions are graphically discussed. These results provided in this work can be useful to enrich and explain the related nonlinear wave phenomena in nonlinear fields.

preprint2016arXiv

:telephone::person::sailboat::whale::okhand:; or "Call me Ishmael" - How do you translate emoji?

We report on an exploratory analysis of Emoji Dick, a project that leverages crowdsourcing to translate Melville's Moby Dick into emoji. This distinctive use of emoji removes textual context, and leads to a varying translation quality. In this paper, we use statistical word alignment and part-of-speech tagging to explore how people use emoji. Despite these simple methods, we observed differences in token and part-of-speech distributions. Experiments also suggest that semantics are preserved in the translation, and repetition is more common in emoji.

preprint2016arXiv

Astronomical Data Fusion Tool Based on PostgreSQL

With the application of advanced astronomical technologies, equipments and methods all over the world, astronomy covers from radio, infrared, visible light, ultraviolet, X-ray and gamma ray band, and enters into the era of full wavelength astronomy. How to effectively integrate data from different ground- and space-based observation equipments, different observers, different bands, different observation time, requires the data fusion technology. In this paper we introduce the cross-match tool that is developed by the Python language and based on the PostgreSQL database and uses Q3C as the core index, facilitating the cross-match work of massive astronomical data. It provides four different cross-match functions, namely: I) cross-match of custom error range; II) cross-match of catalog error; III) cross-match based on the elliptic error range; IV) cross-match of the nearest algorithm. The cross-match result set provides good foundation for subsequent data mining and statistics based on multiwavelength data. The most advantage of this tool is a user-oriented tool applied locally by users. By means of this tool, users may easily create their own databases, manage their own data and cross-match databases according to their requirements. In addition, this tool is also able to transfer data from one database into another database. More importantly, the tool is easy to get started and used by astronomers without writing any code.

preprint2016arXiv

LeaveNow: A Social Network-based Smart Evacuation System for Disaster Management

The importance of timely response to natural disasters and evacuating affected people to safe areas is paramount to save lives. Emergency services are often handicapped by the amount of rescue resources at their disposal. We present a system that leverages the power of a social network forming new connections among people based on \textit{real-time location} and expands the rescue resources pool by adding private sector cars. We also introduce a car-sharing algorithm to identify safe routes in an emergency with the aim of minimizing evacuation time, maximizing pick-up of people without cars, and avoiding traffic congestion.

preprint2016arXiv

On the Convergence of A Family of Robust Losses for Stochastic Gradient Descent

The convergence of Stochastic Gradient Descent (SGD) using convex loss functions has been widely studied. However, vanilla SGD methods using convex losses cannot perform well with noisy labels, which adversely affect the update of the primal variable in SGD methods. Unfortunately, noisy labels are ubiquitous in real world applications such as crowdsourcing. To handle noisy labels, in this paper, we present a family of robust losses for SGD methods. By employing our robust losses, SGD methods successfully reduce negative effects caused by noisy labels on each update of the primal variable. We not only reveal that the convergence rate is O(1/T) for SGD methods using robust losses, but also provide the robustness analysis on two representative robust losses. Comprehensive experimental results on six real-world datasets show that SGD methods using robust losses are obviously more robust than other baseline methods in most situations with fast convergence.

preprint2016arXiv

Photometric Redshift Estimation for Quasars by Integration of KNN and SVM

The massive photometric data collected from multiple large-scale sky surveys offer significant opportunities for measuring distances of celestial objects by photometric redshifts. However, catastrophic failure is still an unsolved problem for a long time and exists in the current photometric redshift estimation approaches (such as $k$-nearest-neighbor). In this paper, we propose a novel two-stage approach by integration of $k$-nearest-neighbor (KNN) and support vector machine (SVM) methods together. In the first stage, we apply KNN algorithm on photometric data and estimate their corresponding z$_{\rm phot}$. By analysis, we find two dense regions with catastrophic failure, one in the range of z$_{\rm phot}\in[0.3,1.2]$, the other in the range of z$_{\rm phot}\in [1.2,2.1]$. In the second stage, we map the photometric input pattern of points falling into the two ranges from original attribute space into a high dimensional feature space by Gaussian kernel function in SVM. In the high dimensional feature space, many outlier points resulting from catastrophic failure by simple Euclidean distance computation in KNN can be identified by a classification hyperplane of SVM and further be corrected. Experimental results based on the SDSS (the Sloan Digital Sky Survey) quasar data show that the two-stage fusion approach can significantly mitigate catastrophic failure and improve the estimation accuracy of photometric redshifts of quasars. The percents in different |$Δ$z| ranges and rms (root mean square) error by the integrated method are $83.47\%$, $89.83\%$, $90.90\%$ and 0.192, respectively, compared to the results by KNN ($71.96\%$, $83.78\%$, $89.73\%$ and 0.204).

preprint2015arXiv

600-T Magnetic Fields due to Cold Electron Flow in a simple Cu-Coil irradiated by High Power Laser pulses

A new simple mechanism due to cold electron flow to produce strong magnetic field is proposed. A 600-T strong magnetic field is generated in the free space at the laser intensity of 5.7x10^15 Wcm^-2. Theoretical analysis indicates that the magnetic field strength is proportional to laser intensity. Such a strong magnetic field offers a new experimental test bed to study laser-plasma physics, in particular, fast-ignition laser fusion research and laboratory astrophysics.

preprint2015arXiv

Modeling non local thermodynamic equilibrium plasma using the Flexible Atomic Code data

We present a new code, RCF("Radiative-Collisional code based on FAC"), which is used to simulate steady-state plasmas under non local thermodynamic equilibrium condition, especially photoinization dominated plasmas. RCF takes almost all of the radiative and collisional atomic processes into rate equation to interpret the plasmas systematically. The Flexible Atomic Code (FAC) supplies all the atomic data RCF needed, which insures calculating completeness and consistency of atomic data. With four input parameters relating to the radiation source and target plasma, RCF calculates the population of levels and charge states, as well as potentially emission spectrum. In preliminary application, RCF successfully reproduces the results of a photoionization experiment with reliable atomic data. The effects of the most important atomic processes on the charge state distribution are also discussed.

preprint2014arXiv

A global minimization algorithm for Tikhonov functionals with sparsity constraints

In this paper we present a globally convergent algorithm for the computation of a minimizer of the Tikhonov functional with sparsity promoting penalty term for nonlinear forward operators in Banach space. The dual TIGRA method uses a gradient descent iteration in the dual space at decreasing values of the regularization parameter $α_j$, where the approximation obtained with $α_j$ serves as the starting value for the dual iteration with parameter $α_{j+1}$. With the discrepancy principle as a global stopping rule the method further yields an automatic parameter choice. We prove convergence of the algorithm under suitable step-size selection and stopping rules and illustrate our theoretic results with numerical experiments for the nonlinear autoconvolution problem.

preprint2014arXiv

HSR: L1/2 Regularized Sparse Representation for Fast Face Recognition using Hierarchical Feature Selection

In this paper, we propose a novel method for fast face recognition called L1/2 Regularized Sparse Representation using Hierarchical Feature Selection (HSR). By employing hierarchical feature selection, we can compress the scale and dimension of global dictionary, which directly contributes to the decrease of computational cost in sparse representation that our approach is strongly rooted in. It consists of Gabor wavelets and Extreme Learning Machine Auto-Encoder (ELM-AE) hierarchically. For Gabor wavelets part, local features can be extracted at multiple scales and orientations to form Gabor-feature based image, which in turn improves the recognition rate. Besides, in the presence of occluded face image, the scale of Gabor-feature based global dictionary can be compressed accordingly because redundancies exist in Gabor-feature based occlusion dictionary. For ELM-AE part, the dimension of Gabor-feature based global dictionary can be compressed because high-dimensional face images can be rapidly represented by low-dimensional feature. By introducing L1/2 regularization, our approach can produce sparser and more robust representation compared to regularized Sparse Representation based Classification (SRC), which also contributes to the decrease of the computational cost in sparse representation. In comparison with related work such as SRC and Gabor-feature based SRC (GSRC), experimental results on a variety of face databases demonstrate the great advantage of our method for computational cost. Moreover, we also achieve approximate or even better recognition rate.

preprint2014arXiv

LARSEN-ELM: Selective Ensemble of Extreme Learning Machines using LARS for Blended Data

Extreme learning machine (ELM) as a neural network algorithm has shown its good performance, such as fast speed, simple structure etc, but also, weak robustness is an unavoidable defect in original ELM for blended data. We present a new machine learning framework called LARSEN-ELM for overcoming this problem. In our paper, we would like to show two key steps in LARSEN-ELM. In the first step, preprocessing, we select the input variables highly related to the output using least angle regression (LARS). In the second step, training, we employ Genetic Algorithm (GA) based selective ensemble and original ELM. In the experiments, we apply a sum of two sines and four datasets from UCI repository to verify the robustness of our approach. The experimental results show that compared with original ELM and other methods such as OP-ELM, GASEN-ELM and LSBoost, LARSEN-ELM significantly improve robustness performance while keeping a relatively high speed.

preprint2014arXiv

RMSE-ELM: Recursive Model based Selective Ensemble of Extreme Learning Machines for Robustness Improvement

Extreme learning machine (ELM) as an emerging branch of shallow networks has shown its excellent generalization and fast learning speed. However, for blended data, the robustness of ELM is weak because its weights and biases of hidden nodes are set randomly. Moreover, the noisy data exert a negative effect. To solve this problem, a new framework called RMSE-ELM is proposed in this paper. It is a two-layer recursive model. In the first layer, the framework trains lots of ELMs in different groups concurrently, then employs selective ensemble to pick out an optimal set of ELMs in each group, which can be merged into a large group of ELMs called candidate pool. In the second layer, selective ensemble is recursively used on candidate pool to acquire the final ensemble. In the experiments, we apply UCI blended datasets to confirm the robustness of our new approach in two key aspects (mean square error and standard deviation). The space complexity of our method is increased to some degree, but the results have shown that RMSE-ELM significantly improves robustness with slightly computational time compared with representative methods (ELM, OP-ELM, GASEN-ELM, GASEN-BP and E-GASEN). It becomes a potential framework to solve robustness issue of ELM for high-dimensional blended data in the future.

Bo Han

What is connected

Connect this record

See the researcher in context

Building this map preview

65 published item(s)

Beyond Rigid Alignment: Graph Federated Learning via Dual Manifold Calibration

Counterfactual Fairness with Partially Known Causal Graph

FRAS: Federated Reinforcement Learning empowered Adaptive Point Cloud Video Streaming

Bilateral Dependency Optimization: Defending Against Model-inversion Attacks

Brightening of a dark monolayer semiconductor via strong light-matter coupling in a cavity

CausalAdv: Adversarial Robustness through the Lens of Causality

Contrastive Learning with Boosted Memorization

DeepMix: Mobility-aware, Lightweight, and Hybrid 3D Object Detection for Headsets

Device-Cloud Collaborative Recommendation via Meta Controller

Direct observation of local antiferroelectricity induced phonon softening at a SrTiO3 defect

Do We Need to Penalize Variance of Losses for Learning with Label Noise?

EAGAN: Efficient Two-stage Evolutionary Architecture Search for GANs

Estimating Instance-dependent Bayes-label Transition Matrix using a Deep Neural Network

Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack

FedNoiL: A Simple Two-Level Sampling Method for Federated Learning with Noisy Labels

From the synthesis of hBN crystals to their use as nanosheets for optoelectronic devices

Improving Adversarial Robustness via Mutual Information Estimation

Instance-dependent Label-noise Learning under a Structural Causal Model

Instance-Dependent Label-Noise Learning with Manifold-Regularized Transition Matrix Estimation

Learning with Multiple Complementary Labels

Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm Regularization

Meta Discovery: Learning to Discover Novel Classes given Very Limited Data

Modeling Adversarial Noise for Adversarial Training

MSR: Making Self-supervised learning Robust to Aggressive Augmentations

Nebula: Reliable Low-latency Video Transmission for Mobile Cloud Gaming

NoiLIn: Improving Adversarial Training and Correcting Stereotype of Noisy Labels

Pluralistic Image Completion with Probabilistic Mixture-of-Experts

Pointwise Binary Classification with Pairwise Confidence Comparisons

Probabilistic Margins for Instance Reweighting in Adversarial Training

Reliable Adversarial Distillation with Unreliable Teachers

Rethinking Class-Prior Estimation for Positive-Unlabeled Learning

Robust Weight Perturbation for Adversarial Training

Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model

TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation

Understanding and Improving Graph Injection Attack by Promoting Unnoticeability

Understanding Robust Overfitting of Adversarial Training and Beyond

Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning

A Survey of Label-noise Representation Learning: Past, Present and Future

Butterfly: One-step Approach towards Wildly Unsupervised Domain Adaptation

Confidence Scores Make Instance-dependent Label-noise Learning Possible

Engineering of Atomic-Scale Flexoelectricity at Grain Boundaries

Torsion, energy magnetization and thermal Hall effect

Understanding the Interaction of Adversarial Training with Noisy Labels

Attacks Which Do Not Kill Training Make Adversarial Learning Stronger

Beyond Unfolding: Exact Recovery of Latent Convex Tensor Decomposition under Reshuffling

Entanglement Entropy of Generalized Moore-Read Fractional Quantum Hall State Interfaces

GeneticKNN: A Weighted KNN Approach Supported by Genetic Algorithm for Photometric Redshift Estimation of Quasars

Hamiltonian approach to the torsional anomalies and its dimensional ladder

Multi-Class Classification from Noisy-Similarity-Labeled Data

Nieh-Yan Anomaly: Torsional Landau Levels, central charge and anomalous thermal Hall effect

Searching to Exploit Memorization Effect in Learning from Corrupted Labels

Torsional Anomalies and Bulk-Dislocation Correspondence in Weyl Systems

Correlating the Electronic Structures of Metallic/Semiconductor MoTe2 Interface to its Atomic Structures

The general fifth-order nonlinear Schrödinger equation with nonzero boundary conditions: Inverse scattering transform and multisoliton solutions

:telephone::person::sailboat::whale::okhand:; or "Call me Ishmael" - How do you translate emoji?

Astronomical Data Fusion Tool Based on PostgreSQL

LeaveNow: A Social Network-based Smart Evacuation System for Disaster Management

On the Convergence of A Family of Robust Losses for Stochastic Gradient Descent

Photometric Redshift Estimation for Quasars by Integration of KNN and SVM

600-T Magnetic Fields due to Cold Electron Flow in a simple Cu-Coil irradiated by High Power Laser pulses

Modeling non local thermodynamic equilibrium plasma using the Flexible Atomic Code data

A global minimization algorithm for Tikhonov functionals with sparsity constraints

HSR: L1/2 Regularized Sparse Representation for Fast Face Recognition using Hierarchical Feature Selection

LARSEN-ELM: Selective Ensemble of Extreme Learning Machines using LARS for Blended Data

RMSE-ELM: Recursive Model based Selective Ensemble of Extreme Learning Machines for Robustness Improvement