Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
85works
0followers
38topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

85 published item(s)

preprint2026arXiv

Research on Vision-Language Question Answering Models for Industrial Robots

A hierarchical cross-modal fusion model is proposed for vision-language question answering (VLQA) in industrial robotics, targeting the challenges of semantic ambiguity, complex environmental layouts, and domain-specific terminology common in modern manufacturing. The framework integrates advanced object detection, multi-scale visual encoding, syntactic parsing, and task-aware semantic attention to unite vision and language signals into a joint reasoning space. Region-based deep networks extract visual features, weighted embeddings aggregate, and recurrent neural parsing encodes sentence structures. Through fine-grained semantic alignment driven by adaptive fusion and cross-attention mechanisms, the system can handle operational queries, instruction steps, and anomaly detection with higher reliability. Compared to the existing VLQA benchmarks, validation experiments conducted on the IVQA and RIF benchmarks indicate improvements in semantic alignment, Top-1 accuracy, and robustness to ambiguous or procedural task queries. Ablation studies further quantify the impact of each architectural module, confirming the necessity of multi-level feature integration and context-driven gating for dependable industrial deployment. The technical advancements reported here provide core methodologies to improve the interpretability and operational effectiveness of industrial robots faced with diverse human-robot interaction tasks.

preprint2025arXiv

Tidally Trapped Two-pole Pulsations Discovered in a Close Binary with a Massive $β$ Cephei Star

Tidally tilted pulsators (TTPs), whose pulsation axis aligns with the binary's semi-major axis, represent a newly established class of oscillators in binary systems. While all previously known TTPs are either $δ$ Scuti or subdwarf B-type stars, their existence has remained unidentified in more massive $β$ Cephei variables. Here, we report the discovery of tidally trapped pulsations in the massive ellipsoidal variable HD~329379, based on photometry from the Transiting Exoplanet Survey Satellite (TESS). Our analysis reveals a $β$ Cephei pulsator in a 2.25-day orbit whose pulsation mode amplitude is strongly modulated with the orbital frequency. Based on our analysis, we concluded that this modulation can be explained by pulsations with significantly larger amplitude near the star's two tidal poles (apsides). We interpret this as a tidally distorted quadrupole pulsation chariacteristiced by trapped two-pole pulsations, with a pulsation axis aligned with the tidal axis. This represents the first identification of such a pulsation mode in a $β$ Cephei star, which differs from single-sided pulsations observed in previous works, marking a rare and important discovery. Our work extends the family of TTPs beyond $δ$ Scuti and subdwarf B-type stars to include more massive $β$ Cephei variables. In particular, the two-pole pulsator HD~329379 stands out as the prototype of a new class of TTPs in massive stars. These results not only provide a new insight to probe the interior structure and evolutionary state for massive stars but also offer a unique opportunity to study the interaction between pulsations and strong tidal distortions.

preprint2024arXiv

Origin of zigzag antiferromagnetic orders in XPS3 (X= Fe, Ni) monolayers

Recently, two monolayer magnetic materials, i.e., FePS3 and NiPS3, have been successfully fabricated. Despite that they have the same atomic structure, the two monolayers exhibit distinct magnetic properties. FePS3 holds an out-of-plane zigzag antiferromagnetic (AFM-ZZ) structure, while NiPS3 exhibits an in-plane AFM-ZZ structure. However, there is no theoretical model which can properly describe its magnetic ground state due to the lack of a full understanding of its magnetic interactions. Here, by combining the first-principles calculations and the newly developed machine learning method, we construct an exact spin Hamiltonian of the two magnetic materials. Different from the previous studies which failed to fully consider the spin-orbit coupling effect, we find that the AFM-ZZ ground state in FePS3 is stabilized by competing ferromagnetic nearest-neighbor and antiferromagnetic third nearest-neighbor exchange interactions, and combining single-ion anisotropy. Whereas, the often ignored nearest-neighbor biquadratic exchange is responsible for the in-plane AFM-ZZ ground state in NiPS3. We additionally calculate spin-wave spectrum of AFM-ZZ structure in the two monolayers based on the exact spin Hamiltonian, which can be directly verified by the experimental investigation. Our work provides a theoretical framework for the origin of AFM-ZZ ground state in two-dimensional materials.

preprint2023arXiv

Defending Backdoor Attacks on Vision Transformer via Patch Processing

Vision Transformers (ViTs) have a radically different architecture with significantly less inductive bias than Convolutional Neural Networks. Along with the improvement in performance, security and robustness of ViTs are also of great importance to study. In contrast to many recent works that exploit the robustness of ViTs against adversarial examples, this paper investigates a representative causative attack, i.e., backdoor. We first examine the vulnerability of ViTs against various backdoor attacks and find that ViTs are also quite vulnerable to existing attacks. However, we observe that the clean-data accuracy and backdoor attack success rate of ViTs respond distinctively to patch transformations before the positional encoding. Then, based on this finding, we propose an effective method for ViTs to defend both patch-based and blending-based trigger backdoor attacks via patch processing. The performances are evaluated on several benchmark datasets, including CIFAR10, GTSRB, and TinyImageNet, which show the proposed novel defense is very successful in mitigating backdoor attacks for ViTs. To the best of our knowledge, this paper presents the first defensive strategy that utilizes a unique characteristic of ViTs against backdoor attacks. The paper will appear in the Proceedings of the AAAI'23 Conference. This work was initially submitted in November 2021 to CVPR'22, then it was re-submitted to ECCV'22. The paper was made public in June 2022. The authors sincerely thank all the referees from the Program Committees of CVPR'22, ECCV'22, and AAAI'23.

preprint2023arXiv

Exploration in Model-based Reinforcement Learning with Randomized Reward

Model-based Reinforcement Learning (MBRL) has been widely adapted due to its sample efficiency. However, existing worst-case regret analysis typically requires optimistic planning, which is not realistic in general. In contrast, motivated by the theory, empirical study utilizes ensemble of models, which achieve state-of-the-art performance on various testing environments. Such deviation between theory and empirical study leads us to question whether randomized model ensemble guarantee optimism, and hence the optimal worst-case regret? This paper partially answers such question from the perspective of reward randomization, a scarcely explored direction of exploration with MBRL. We show that under the kernelized linear regulator (KNR) model, reward randomization guarantees a partial optimism, which further yields a near-optimal worst-case regret in terms of the number of interactions. We further extend our theory to generalized function approximation and identified conditions for reward randomization to attain provably efficient exploration. Correspondingly, we propose concrete examples of efficient reward randomization. To the best of our knowledge, our analysis establishes the first worst-case regret analysis on randomized MBRL with function approximation.

preprint2023arXiv

Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation

The stochastic proximal point (SPP) methods have gained recent attention for stochastic optimization, with strong convergence guarantees and superior robustness to the classic stochastic gradient descent (SGD) methods showcased at little to no cost of computational overhead added. In this article, we study a minibatch variant of SPP, namely M-SPP, for solving convex composite risk minimization problems. The core contribution is a set of novel excess risk bounds of M-SPP derived through the lens of algorithmic stability theory. Particularly under smoothness and quadratic growth conditions, we show that M-SPP with minibatch-size $n$ and iteration count $T$ enjoys an in-expectation fast rate of convergence consisting of an $\mathcal{O}\left(\frac{1}{T^2}\right)$ bias decaying term and an $\mathcal{O}\left(\frac{1}{nT}\right)$ variance decaying term. In the small-$n$-large-$T$ setting, this result substantially improves the best known results of SPP-type approaches by revealing the impact of noise level of model on convergence rate. In the complementary small-$T$-large-$n$ regime, we provide a two-phase extension of M-SPP to achieve comparable convergence rates. Moreover, we derive a near-tight high probability (over the randomness of data) bound on the parameter estimation error of a sampling-without-replacement variant of M-SPP. Numerical evidences are provided to support our theoretical predictions when substantialized to Lasso and logistic regression models.

preprint2022arXiv

$k$-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy

When designing clustering algorithms, the choice of initial centers is crucial for the quality of the learned clusters. In this paper, we develop a new initialization scheme, called HST initialization, for the $k$-median problem in the general metric space (e.g., discrete space induced by graphs), based on the construction of metric embedding tree structure of the data. From the tree, we propose a novel and efficient search algorithm, for good initial centers that can be used subsequently for the local search algorithm. Our proposed HST initialization can produce initial centers achieving lower errors than those from another popular initialization method, $k$-median++, with comparable efficiency. The HST initialization can also be extended to the setting of differential privacy (DP) to generate private initial centers. We show that the error from applying DP local search followed by our private HST initialization improves previous results on the approximation error, and approaches the lower bound within a small factor. Experiments justify the theory and demonstrate the effectiveness of our proposed method. Our approach can also be extended to the $k$-means problem.

preprint2022arXiv

A Class of Two-Timescale Stochastic EM Algorithms for Nonconvex Latent Variable Models

The Expectation-Maximization (EM) algorithm is a popular choice for learning latent variable models. Variants of the EM have been initially introduced, using incremental updates to scale to large datasets, and using Monte Carlo (MC) approximations to bypass the intractable conditional expectation of the latent data for most nonconvex models. In this paper, we propose a general class of methods called Two-Timescale EM Methods based on a two-stage approach of stochastic updates to tackle an essential nonconvex optimization task for latent variable models. We motivate the choice of a double dynamic by invoking the variance reduction virtue of each stage of the method on both sources of noise: the index sampling for the incremental update and the MC approximation. We establish finite-time and global convergence bounds for nonconvex objective functions. Numerical applications on various models such as deformable template for image analysis or nonlinear models for pharmacokinetics are also presented to illustrate our findings.

preprint2022arXiv

A Tale of Two Flows: Cooperative Learning of Langevin Flow and Normalizing Flow Toward Energy-Based Model

This paper studies the cooperative learning of two generative flow models, in which the two models are iteratively updated based on the jointly synthesized examples. The first flow model is a normalizing flow that transforms an initial simple density to a target density by applying a sequence of invertible transformations. The second flow model is a Langevin flow that runs finite steps of gradient-based MCMC toward an energy-based model. We start from proposing a generative framework that trains an energy-based model with a normalizing flow as an amortized sampler to initialize the MCMC chains of the energy-based model. In each learning iteration, we generate synthesized examples by using a normalizing flow initialization followed by a short-run Langevin flow revision toward the current energy-based model. Then we treat the synthesized examples as fair samples from the energy-based model and update the model parameters with the maximum likelihood learning gradient, while the normalizing flow directly learns from the synthesized examples by maximizing the tractable likelihood. Under the short-run non-mixing MCMC scenario, the estimation of the energy-based model is shown to follow the perturbation of maximum likelihood, and the short-run Langevin flow and the normalizing flow form a two-flow generator that we call CoopFlow. We provide an understating of the CoopFlow algorithm by information geometry and show that it is a valid generator as it converges to a moment matching estimator. We demonstrate that the trained CoopFlow is capable of synthesizing realistic images, reconstructing images, and interpolating between images.

preprint2022arXiv

Best Subset Selection with Efficient Primal-Dual Algorithm

Best subset selection is considered the `gold standard' for many sparse learning problems. A variety of optimization techniques have been proposed to attack this non-convex and NP-hard problem. In this paper, we investigate the dual forms of a family of $\ell_0$-regularized problems. An efficient primal-dual method has been developed based on the primal and dual problem structures. By leveraging the dual range estimation along with the incremental strategy, our algorithm potentially reduces redundant computation and improves the solutions of best subset selection. Theoretical analysis and experiments on synthetic and real-world datasets validate the efficiency and statistical properties of the proposed solutions.

preprint2022arXiv

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images

Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection. We first devise a depth decoupling convolutional neural network (DDCNN), which contains a depth estimation branch and a saliency detection branch. The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data. The saliency detection branch is used to fuse the RGB feature and depth feature to predict the RGB-D saliency. Then, the whole DDCNN is assigned as the backbone in a teacher-student framework for semi-supervised learning. Moreover, we also introduce a consistency loss on the intermediate attention and saliency maps for the unlabeled data, as well as a supervised depth and saliency loss for labeled data. Experimental results on seven widely-used benchmark datasets demonstrate that our DDCNN outperforms state-of-the-art methods both quantitatively and qualitatively. We also demonstrate that our semi-supervised DS-Net can further improve the performance, even when using an RGB image with the pseudo depth map.

preprint2022arXiv

Boosting the Confidence of Generalization for $L_2$-Stable Randomized Learning Algorithms

Exponential generalization bounds with near-tight rates have recently been established for uniformly stable learning algorithms. The notion of uniform stability, however, is stringent in the sense that it is invariant to the data-generating distribution. Under the weaker and distribution dependent notions of stability such as hypothesis stability and $L_2$-stability, the literature suggests that only polynomial generalization bounds are possible in general cases. The present paper addresses this long standing tension between these two regimes of results and makes progress towards relaxing it inside a classic framework of confidence-boosting. To this end, we first establish an in-expectation first moment generalization error bound for potentially randomized learning algorithms with $L_2$-stability, based on which we then show that a properly designed subbagging process leads to near-tight exponential generalization bounds over the randomness of both data and algorithm. We further substantialize these generic results to stochastic gradient descent (SGD) to derive improved high-probability generalization bounds for convex or non-convex optimization problems with natural time decaying learning rates, which have not been possible to prove with the existing hypothesis stability or uniform stability based results.

preprint2022arXiv

Breaking the Linear Error Barrier in Differentially Private Graph Distance Release

Releasing all pairwise shortest path (APSP) distances between vertices on general graphs under weight Differential Privacy (DP) is known as a challenging task. In the previous attempt of (Sealfon 2016}, by adding Laplace noise to each edge weight or to each output distance, to achieve DP with some fixed budget, with high probability the maximal absolute error among all published pairwise distances is roughly $O(n)$ where $n$ is the number of nodes. It was shown that this error could be reduced for some special graphs, which, however, is hard for general graphs. Therefore, whether the approximation error can be reduced to sublinear in $n$ is posted as an interesting open problem. We break the linear barrier on the distance approximation error of previous result, by proposing an algorithm that releases a constructed synthetic graph privately. Computing all pairwise distances on the constructed graph only introduces $\tilde O(n^{1/2})$ error in answering all pairwise shortest path distances for fixed privacy parameter. Our method is based on a novel graph diameter (link length) augmentation via constructing "shortcuts" for the paths. By adding a set of shortcut edges to the original graph, we show that any node pair has a shortest path with link length $\tilde O(n^{1/2})$. Then by adding noises with some positive mean to the edge weights, we show that the new graph is differentially private and can be published to answer all pairwise shortest path distances with $\tilde O(n^{1/2})$ approximation error using standard APSP computation. Additionally, we consider the graph with small feedback vertex set number. A feedback vertex set (FVS) of a graph is a set of vertices whose removal leaves a graph without cycles, and the feedback vertex set number of a graph, $k$, is the size of a smallest feedback vertex set. We propose a DP algorithm with error rate $\tilde O(k)$.

preprint2022arXiv

Catoni-style Confidence Sequences under Infinite Variance

In this paper, we provide an extension of confidence sequences for settings where the variance of the data-generating distribution does not exist or is infinite. Confidence sequences furnish confidence intervals that are valid at arbitrary data-dependent stopping times, naturally having a wide range of applications. We first establish a lower bound for the width of the Catoni-style confidence sequences for the finite variance case to highlight the looseness of the existing results. Next, we derive tight Catoni-style confidence sequences for data distributions having a relaxed bounded~$p^{th}-$moment, where~$p \in (1,2]$, and strengthen the results for the finite variance case of~$p =2$. The derived results are shown to better than confidence sequences obtained using Dubins-Savage inequality.

preprint2022arXiv

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

Training a game-playing reinforcement learning agent requires multiple interactions with the environment. Ignorant random exploration may cause a waste of time and resources. It's essential to alleviate such waste. As discussed in this paper, under the settings of the off-policy actor critic algorithms, we demonstrate that the critic can bring more expected discounted rewards than or at least equal to the actor. Thus, the Q value predicted by the critic is a better signal to redistribute the action originally sampled from the policy distribution predicted by the actor. This paper introduces the novel Critic Guided Action Redistribution (CGAR) algorithm and tests it on the OpenAI MuJoCo tasks. The experimental results demonstrate that our method improves the sample efficiency and achieves state-of-the-art performance. Our code can be found at https://github.com/tairanhuang/CGAR.

preprint2022arXiv

Channel Estimation for Wideband MmWave MIMO OFDM System Exploiting Block Sparsity

In this letter, we investigate time-domain channel estimation for wideband millimeter wave (mmWave) MIMO OFDM system. By transmitting frequency-domain pilot symbols as well as different beamforming vectors, we observe that the time-domain mmWave MIMO channels exhibit channel delay sparsity and especially block sparsity among different spatial directions. Then we propose a time-domain channel estimation exploiting block sparsity (TDCEBS) scheme, which always aims at finding the best nonzero block achieving the largest projection of the residue at each iterations. In particular, we evaluate the system performance using the QuaDRiGa which is recommended by 5G New Radio to generate wideband mmWave MIMO channels. The effectiveness of the proposed TDCEBS scheme is verified by the simulation results, as the proposed scheme outperforms the existing schemes.

preprint2022arXiv

Communication-Efficient TeraByte-Scale Model Training Framework for Online Advertising

Click-Through Rate (CTR) prediction is a crucial component in the online advertising industry. In order to produce a personalized CTR prediction, an industry-level CTR prediction model commonly takes a high-dimensional (e.g., 100 or 1000 billions of features) sparse vector (that is encoded from query keywords, user portraits, etc.) as input. As a result, the model requires Terabyte scale parameters to embed the high-dimensional input. Hierarchical distributed GPU parameter server has been proposed to enable GPU with limited memory to train the massive network by leveraging CPU main memory and SSDs as secondary storage. We identify two major challenges in the existing GPU training framework for massive-scale ad models and propose a collection of optimizations to tackle these challenges: (a) the GPU, CPU, SSD rapidly communicate with each other during the training. The connections between GPUs and CPUs are non-uniform due to the hardware topology. The data communication route should be optimized according to the hardware topology; (b) GPUs in different computing nodes frequently communicates to synchronize parameters. We are required to optimize the communications so that the distributed system can become scalable. In this paper, we propose a hardware-aware training workflow that couples the hardware topology into the algorithm design. To reduce the extensive communication between computing nodes, we introduce a $k$-step model merging algorithm for the popular Adam optimizer and provide its convergence rate in non-convex optimization. To the best of our knowledge, this is the first application of $k$-step adaptive optimization method in industrial-level CTR model training. The numerical results on real-world data confirm that the optimized system design considerably reduces the training time of the massive model, with essentially no loss in accuracy.

preprint2022arXiv

Discriminative Similarity for Data Clustering

Similarity-based clustering methods separate data into clusters according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose {\em Clustering by Discriminative Similarity (CDS)}, a novel method which learns discriminative similarity for data clustering. CDS learns an unsupervised similarity-based classifier from each data partition, and searches for the optimal partition of the data by minimizing the generalization error of the learnt classifiers associated with the data partitions. By generalization analysis via Rademacher complexity, the generalization error bound for the unsupervised similarity-based classifier is expressed as the sum of discriminative similarity between the data from different classes. It is proved that the derived discriminative similarity can also be induced by the integrated squared error bound for kernel density classification. In order to evaluate the performance of the proposed discriminative similarity, we propose a new clustering method using a kernel as the similarity function, CDS via unsupervised kernel classification (CDSK), with its effectiveness demonstrated by experimental results.

preprint2022arXiv

Distances Release with Differential Privacy in Tree and Grid Graph

Data about individuals may contain private and sensitive information. The differential privacy (DP) was proposed to address the problem of protecting the privacy of each individual while keeping useful information about a population. Sealfon (2016) introduced a private graph model in which the graph topology is assumed to be public while the weight information is assumed to be private. That model can express hidden congestion patterns in a known transportation system. In this paper, we revisit the problem of privately releasing approximate distances between all pairs of vertices in (Sealfon 2016). Our goal is to minimize the additive error, namely the difference between the released distance and actual distance under private setting. We propose improved solutions to that problem for several cases. For the problem of privately releasing all-pairs distances, we show that for tree with depth $h$, we can release all-pairs distances with additive error $O(\log^{1.5} h \cdot \log^{1.5} V)$ for fixed privacy parameter where $V$ the number of vertices in the tree, which improves the previous error bound $O(\log^{2.5} V)$, since the size of $h$ can be as small as $O(\log V)$. Our result implies that a $\log V$ factor is saved, and the additive error in tree can be smaller than the error on array/path. Additionally, for the grid graph with arbitrary edge weights, we also propose a method to release all-pairs distances with additive error $\tilde O(V^{3/4}) $ for fixed privacy parameters. On the application side, many cities like Manhattan are composed of horizontal streets and vertical avenues, which can be modeled as a grid graph.

preprint2022arXiv

Fast ABC-Boost: A Unified Framework for Selecting the Base Class in Multi-Class Classification

The work in ICML'09 showed that the derivatives of the classical multi-class logistic regression loss function could be re-written in terms of a pre-chosen "base class" and applied the new derivatives in the popular boosting framework. In order to make use of the new derivatives, one must have a strategy to identify/choose the base class at each boosting iteration. The idea of "adaptive base class boost" (ABC-Boost) in ICML'09, adopted a computationally expensive "exhaustive search" strategy for the base class at each iteration. It has been well demonstrated that ABC-Boost, when integrated with trees, can achieve substantial improvements in many multi-class classification tasks. Furthermore, the work in UAI'10 derived the explicit second-order tree split gain formula which typically improved the classification accuracy considerably, compared with using only the fist-order information for tree-splitting, for both multi-class and binary-class classification tasks. In this paper, we develop a unified framework for effectively selecting the base class by introducing a series of ideas to improve the computational efficiency of ABC-Boost. Our framework has parameters $(s,g,w)$. At each boosting iteration, we only search for the "$s$-worst classes" (instead of all classes) to determine the base class. We also allow a "gap" $g$ when conducting the search. That is, we only search for the base class at every $g+1$ iterations. We furthermore allow a "warm up" stage by only starting the search after $w$ boosting iterations. The parameters $s$, $g$, $w$, can be viewed as tunable parameters and certain combinations of $(s,g,w)$ may even lead to better test accuracy than the "exhaustive search" strategy. Overall, our proposed framework provides a robust and reliable scheme for implementing ABC-Boost in practice.

preprint2022arXiv

Finding Control Synthesis for Kinematic Shortest Paths

This work presents the analysis of the properties of the shortest path control synthesis for the rigid body system. The systems we focus on in this work have only kinematic constraints. However, even for seemingly simple systems and constraints, the shortest paths for generic rigid body systems were only found recently, especially for 3D systems. Based on the Pontraygon's Maximum Principle (MPM) and Lagrange equations, we present the necessary conditions for optimal switches, which form the control synthesis boundaries. We formally show that the shortest path for nearby configurations will have similar adjoint functions and parameters, i.e., Lagrange multipliers. We further show that the gradients of the necessary condition equation can be used to verify whether a configuration is inside a control synthesis region or on the boundary. We present a procedure to find the shortest kinematic paths and control synthesis, using the gradients of the control constraints. Given the shortest path and the corresponding control sequences, the optimal control sequence for nearby configurations can be derived if and only if they belong to the same control synthesis region. The proposed procedure can work for both 2D and 3D rigid body systems. We use a 2D Dubins vehicle system to verify the correctness of the proposed approach. More verifications and experiments will be presented in the extensions of this work.

preprint2022arXiv

From a few Accurate 2D Correspondences to 3D Point Clouds

Key points, correspondences, projection matrices, point clouds and dense clouds are the skeletons in image-based 3D reconstruction, of which point clouds have the important role in generating a realistic and natural model for a 3D reconstructed object. To achieve a good 3D reconstruction, the point clouds must be almost everywhere in the surface of the object. In this article, with a main purpose to build the point clouds covering the entire surface of the object, we propose a new feature named a geodesic feature or geo-feature. Based on the new geo-feature, if there are several (given) initial world points on the object's surface along with all accurately estimated projection matrices, some new world points on the geodesics connecting any two of these given world points will be reconstructed. Then the regions on the surface bordering by these initial world points will be covered by the point clouds. Thus, if the initial world points are around the surface, the point clouds will cover the entire surface. This article proposes a new method to estimate the world points and projection matrices from their correspondences. This method derives the closed-form and iterative solutions for the world points and projection matrices and proves that when the number of world points is less than seven and the number of images is at least five, the proposed solutions are global optimal. We propose an algorithm named World points from their Correspondences (WPfC) to estimate the world points and projection matrices from their correspondences, and another algorithm named Creating Point Clouds (CrPC) to create the point clouds from the world points and projection matrices given by the first algorithm.

preprint2022arXiv

GCWSNet: Generalized Consistent Weighted Sampling for Scalable and Accurate Training of Neural Networks

We develop the "generalized consistent weighted sampling" (GCWS) for hashing the "powered-GMM" (pGMM) kernel (with a tuning parameter $p$). It turns out that GCWS provides a numerically stable scheme for applying power transformation on the original data, regardless of the magnitude of $p$ and the data. The power transformation is often effective for boosting the performance, in many cases considerably so. We feed the hashed data to neural networks on a variety of public classification datasets and name our method ``GCWSNet''. Our extensive experiments show that GCWSNet often improves the classification accuracy. Furthermore, it is evident from the experiments that GCWSNet converges substantially faster. In fact, GCWS often reaches a reasonable accuracy with merely (less than) one epoch of the training process. This property is much desired because many applications, such as advertisement click-through rate (CTR) prediction models, or data streams (i.e., data seen only once), often train just one epoch. Another beneficial side effect is that the computations of the first layer of the neural networks become additions instead of multiplications because the input data become binary (and highly sparse). Empirical comparisons with (normalized) random Fourier features (NRFF) are provided. We also propose to reduce the model size of GCWSNet by count-sketch and develop the theory for analyzing the impact of using count-sketch on the accuracy of GCWS. Our analysis shows that an ``8-bit'' strategy should work well in that we can always apply an 8-bit count-sketch hashing on the output of GCWS hashing without hurting the accuracy much. There are many other ways to take advantage of GCWS when training deep neural networks. For example, one can apply GCWS on the outputs of the last layer to boost the accuracy of trained deep neural networks.

preprint2022arXiv

Integrity Authentication in Tree Models

Tree models are very widely used in practice of machine learning and data mining. In this paper, we study the problem of model integrity authentication in tree models. In general, the task of model integrity authentication is the design \& implementation of mechanisms for checking/detecting whether the model deployed for the end-users has been tampered with or compromised, e.g., malicious modifications on the model. We propose an authentication framework that enables the model builders/distributors to embed a signature to the tree model and authenticate the existence of the signature by only making a small number of black-box queries to the model. To the best of our knowledge, this is the first study of signature embedding on tree models. Our proposed method simply locates a collection of leaves and modifies their prediction values, which does not require any training/testing data nor any re-training. The experiments on a large number of public classification datasets confirm that the proposed signature embedding process has a high success rate while only introducing a minimal prediction accuracy loss.

preprint2022arXiv

Intriguing Magnetoelectric Effect in Two-dimensional Ferromagnetic/Perovskite Oxide Ferroelectric Heterostructure

Two-dimensional (2D) magnets have broad application prospects in the spintronics, but how to effectively control them with a small electric field is still an issue. Here we propose that 2D magnets can be efficiently controlled in a multiferroic heterostructure composed of 2D magnetic material and perovskite oxide ferroelectric (POF) whose dielectric polarization is easily flipped under a small electric field. We illustrate the feasibility of such strategy in the bilayer CrI3/BiFeO3(001) heterostructure by using the first-principles calculations. Different from the traditional POF multiferroic heterostructures which have strong interface interactions, we find that the interface interaction between CrI3 and BiFeO3(001) is van der Waals type. Whereas, the heterostructure has particular strong magnetoelectric coupling where the bilayer CrI3 can be efficiently switched between ferromagnetic and antiferromagnetic types by the polarized states of BiFeO3(001). We also discover the competing effect between electron doping and the additional electric field on the interlayer exchange coupling interaction of CrI3, which is responsible to the magnetic phase transition. Our results provide a new avenue for the tuning of 2D magnets with a small electric field.

preprint2022arXiv

Intrinsic anomalous spin Hall effect

Charge-spin interconversion in magnetic materials is investigated by using first-principles calculations. In addition to the conventional spin Hall effect (SHE) that requires mutual orthogonality of the charge current, spin-flow direction, and spin polarization, the recently proposed anomalous SHE (ASHE) is confirmed in Mn2Au and WTe2. The interaction of the order parameter with conduction electrons leads to sizeable nonzero spin Berry curvatures that give rise to nonzero anomalous spin Hall conductivity (ASHC). Our calculations show that the ASHE is intrinsic and originates from the order-parameter-controlled spin-orbit interaction, which generates an extra anomalous effective field. A useful relationship among the order parameter, spin Berry curvature and ASHC is revealed. Our findings open a new avenue for arbitrary-type spin current generation and detection.

preprint2022arXiv

Layer-wise and Dimension-wise Locally Adaptive Federated Learning

In the emerging paradigm of Federated Learning (FL), large amount of clients such as mobile devices are used to train possibly high-dimensional models on their respective data. Combining (dimension-wise) adaptive gradient methods (e.g. Adam, AMSGrad) with FL has been an active direction, which is shown to outperform traditional SGD based FL in many cases. In this paper, we focus on the problem of training federated deep neural networks, and propose a novel FL framework which further introduces layer-wise adaptivity to the local model updates. Our framework can be applied to locally adaptive FL methods including two recent algorithms, Mime and Fed-AMS. Theoretically, we provide a convergence analysis of our layer-wise FL methods, coined Fed-LAMB and Mime-LAMB, which matches the convergence rate of state-of-the-art results in FL and exhibits linear speedup in terms of the number of workers. Experimental results on various datasets and models, under both IID and non-IID local data settings, show that both Fed-LAMB and Mime-LAMB achieve faster convergence speed and better generalization performance, compared to the various recent adaptive FL methods.

preprint2022arXiv

Low-dose CT reconstruction by self-supervised learning in the projection domain

In the intention of minimizing excessive X-ray radiation administration to patients, low-dose computed tomography (LDCT) has become a distinct trend in radiology. However, while lowering the radiation dose reduces the risk to the patient, it also increases noise and artifacts, compromising image quality and clinical diagnosis. In most supervised learning methods, paired CT images are required, but such images are unlikely to be available in the clinic. We present a self-supervised learning model (Noise2Projection) that fully exploits the raw projection images to reduce noise and improve the quality of reconstructed LDCT images. Unlike existing self-supervised algorithms, the proposed method only requires noisy CT projection images and reduces noise by exploiting the correlation between nearby projection images. We trained and tested the model using clinical data and the quantitative and qualitative results suggest that our model can effectively reduce LDCT image noise while also drastically removing artifacts in LDCT images.

preprint2022arXiv

MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image Inpainting

Although achieving significant progress, existing deep generative inpainting methods are far from real-world applications due to the low generalization across different scenes. As a result, the generated images usually contain artifacts or the filled pixels differ greatly from the ground truth. Image-level predictive filtering is a widely used image restoration technique, predicting suitable kernels adaptively according to different input scenes. Inspired by this inherent advantage, we explore the possibility of addressing image inpainting as a filtering task. To this end, we first study the advantages and challenges of image-level predictive filtering for image inpainting: the method can preserve local structures and avoid artifacts but fails to fill large missing areas. Then, we propose semantic filtering by conducting filtering on the deep feature level, which fills the missing semantic information but fails to recover the details. To address the issues while adopting the respective advantages, we propose a novel filtering technique, i.e., Multilevel Interactive Siamese Filtering (MISF), which contains two branches: kernel prediction branch (KPB) and semantic & image filtering branch (SIFB). These two branches are interactively linked: SIFB provides multi-level features for KPB while KPB predicts dynamic kernels for SIFB. As a result, the final method takes the advantage of effective semantic & image-level filling for high-fidelity inpainting. We validate our method on three challenging datasets, i.e., Dunhuang, Places2, and CelebA. Our method outperforms state-of-the-art baselines on four metrics, i.e., L1, PSNR, SSIM, and LPIPS. Please try the released code and model at https://github.com/tsingqguo/misf.

preprint2022arXiv

Multi-view Geometry: Correspondences Refinement Based on Algebraic Properties

Correspondences estimation or feature matching is a key step in the image-based 3D reconstruction problem. In this paper, we propose two algebraic properties for correspondences. The first is a rank deficient matrix construct from the correspondences of at least nine key-points on two images (two-view correspondences) and the second is also another rank deficient matrix built from the other correspondences of six key-points on at least five images (multi-view correspondences). To our knowledge, there are no theoretical results for multi-view correspondences prior to this paper. To obtain accurate correspondences, multi-view correspondences seem to be more useful than two-view correspondences. From these two algebraic properties, we propose an refinement algorithm for correspondences. This algorithm is a combination of correspondences refinement, outliers recognition and missing key-points recovery. Real experiments from the project of reconstructing Buddha statue show that the proposed refinement algorithm can reduce the average error from 77 pixels to 55 pixels on the correspondences estimation. This drop is substantial and it validates our results.

preprint2022arXiv

NL2GDPR: Automatically Develop GDPR Compliant Android Application Features from Natural Language

The recent privacy leakage incidences and the more strict policy regulations demand a much higher standard of compliance for companies and mobile apps. However, such obligations also impose significant challenges on app developers for complying with these regulations that contain various perspectives, activities, and roles, especially for small companies and developers who are less experienced in this matter or with limited resources. To address these hurdles, we develop an automatic tool, NL2GDPR, which can generate policies from natural language descriptions from the developer while also ensuring the app's functionalities are compliant with General Data Protection Regulation (GDPR). NL2GDPR is developed by leveraging an information extraction tool, OIA (Open Information Annotation), developed by Baidu Cognitive Computing Lab. At the core, NL2GDPR is a privacy-centric information extraction model, appended with a GDPR policy finder and a policy generator. We perform a comprehensive study to grasp the challenges in extracting privacy-centric information and generating privacy policies, while exploiting optimizations for this specific task. With NL2GDPR, we can achieve 92.9%, 95.2%, and 98.4% accuracy in correctly identifying GDPR policies related to personal data storage, process, and share types, respectively. To the best of our knowledge, NL2GDPR is the first tool that allows a developer to automatically generate GDPR compliant policies, with only the need of entering the natural language for describing the app features. Note that other non-GDPR-related features might be integrated with the generated features to build a complex app.

preprint2022arXiv

Noisy $\ell^{0}$-Sparse Subspace Clustering on Dimensionality Reduced Data

Sparse subspace clustering methods with sparsity induced by $\ell^{0}$-norm, such as $\ell^{0}$-Sparse Subspace Clustering ($\ell^{0}$-SSC)~\citep{YangFJYH16-L0SSC-ijcv}, are demonstrated to be more effective than its $\ell^{1}$ counterpart such as Sparse Subspace Clustering (SSC)~\citep{ElhamifarV13}. However, the theoretical analysis of $\ell^{0}$-SSC is restricted to clean data that lie exactly in subspaces. Real data often suffer from noise and they may lie close to subspaces. In this paper, we show that an optimal solution to the optimization problem of noisy $\ell^{0}$-SSC achieves subspace detection property (SDP), a key element with which data from different subspaces are separated, under deterministic and semi-random model. Our results provide theoretical guarantee on the correctness of noisy $\ell^{0}$-SSC in terms of SDP on noisy data for the first time, which reveals the advantage of noisy $\ell^{0}$-SSC in terms of much less restrictive condition on subspace affinity. In order to improve the efficiency of noisy $\ell^{0}$-SSC, we propose Noisy-DR-$\ell^{0}$-SSC which provably recovers the subspaces on dimensionality reduced data. Noisy-DR-$\ell^{0}$-SSC first projects the data onto a lower dimensional space by random projection, then performs noisy $\ell^{0}$-SSC on the projected data for improved efficiency. Experimental results demonstrate the effectiveness of Noisy-DR-$\ell^{0}$-SSC.

preprint2022arXiv

Offline Change Detection under Contamination

In this work, we propose a non-parametric and robust change detection algorithm to detect multiple change points in time series data under contamination. The contamination model is sufficiently general, in that, the most common model used in the context of change detection -- Huber contamination model -- is a special case. Also, the contamination model is oblivious and arbitrary. The change detection algorithm is designed for the offline setting, where the objective is to detect changes when all data are received. We only make weak moment assumptions on the inliers (uncorrupted data) to handle a large class of distributions. The robust scan statistic in the algorithm is fashioned using mean estimators based on influence functions. We establish the consistency of the estimated change point indexes as the number of samples increases, and provide empirical evidence to support the consistency results.

preprint2022arXiv

On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond

The FedProx algorithm is a simple yet powerful distributed proximal point optimization method widely used for federated learning (FL) over heterogeneous data. Despite its popularity and remarkable success witnessed in practice, the theoretical understanding of FedProx is largely underinvestigated: the appealing convergence behavior of FedProx is so far characterized under certain non-standard and unrealistic dissimilarity assumptions of local functions, and the results are limited to smooth optimization problems. In order to remedy these deficiencies, we develop a novel local dissimilarity invariant convergence theory for FedProx and its minibatch stochastic extension through the lens of algorithmic stability. As a result, we contribute to derive several new and deeper insights into FedProx for non-convex federated optimization including: 1) convergence guarantees independent on local dissimilarity type conditions; 2) convergence guarantees for non-smooth FL problems; and 3) linear speedup with respect to size of minibatch and number of sampled devices. Our theory for the first time reveals that local dissimilarity and smoothness are not must-have for FedProx to get favorable complexity bounds. Preliminary experimental results on a series of benchmark FL datasets are reported to demonstrate the benefit of minibatching for improving the sample efficiency of FedProx.

preprint2022arXiv

On Distributed Adaptive Optimization with Gradient Compression

We study COMP-AMS, a distributed optimization framework based on gradient averaging and adaptive AMSGrad algorithm. Gradient compression with error feedback is applied to reduce the communication cost in the gradient transmission process. Our convergence analysis of COMP-AMS shows that such compressed gradient averaging strategy yields same convergence rate as standard AMSGrad, and also exhibits the linear speedup effect w.r.t. the number of local workers. Compared with recently proposed protocols on distributed adaptive methods, COMP-AMS is simple and convenient. Numerical experiments are conducted to justify the theoretical findings, and demonstrate that the proposed method can achieve same test accuracy as the full-gradient AMSGrad with substantial communication savings. With its simplicity and efficiency, COMP-AMS can serve as a useful distributed training framework for adaptive gradient methods.

preprint2022arXiv

On the Power-Law Hessian Spectrums in Deep Learning

It is well-known that the Hessian of deep loss landscape matters to optimization, generalization, and even robustness of deep learning. Recent works empirically discovered that the Hessian spectrum in deep learning has a two-component structure that consists of a small number of large eigenvalues and a large number of nearly-zero eigenvalues. However, the theoretical mechanism or the mathematical behind the Hessian spectrum is still largely under-explored. To the best of our knowledge, we are the first to demonstrate that the Hessian spectrums of well-trained deep neural networks exhibit simple power-law structures. Inspired by the statistical physical theories and the spectral analysis of natural proteins, we provide a maximum-entropy theoretical interpretation for explaining why the power-law structure exist and suggest a spectral parallel between protein evolution and training of deep neural networks. By conducing extensive experiments, we further use the power-law spectral framework as a useful tool to explore multiple novel behaviors of deep learning.

preprint2022arXiv

One Loss for Quantization: Deep Hashing with Discrete Wasserstein Distributional Matching

Image hashing is a principled approximate nearest neighbor approach to find similar items to a query in a large collection of images. Hashing aims to learn a binary-output function that maps an image to a binary vector. For optimal retrieval performance, producing balanced hash codes with low-quantization error to bridge the gap between the learning stage's continuous relaxation and the inference stage's discrete quantization is important. However, in the existing deep supervised hashing methods, coding balance and low-quantization error are difficult to achieve and involve several losses. We argue that this is because the existing quantization approaches in these methods are heuristically constructed and not effective to achieve these objectives. This paper considers an alternative approach to learning the quantization constraints. The task of learning balanced codes with low quantization error is re-formulated as matching the learned distribution of the continuous codes to a pre-defined discrete, uniform distribution. This is equivalent to minimizing the distance between two distributions. We then propose a computationally efficient distributional distance by leveraging the discrete property of the hash functions. This distributional distance is a valid distance and enjoys lower time and sample complexities. The proposed single-loss quantization objective can be integrated into any existing supervised hashing method to improve code balance and quantization error. Experiments confirm that the proposed approach substantially improves the performance of several representative hashing~methods.

preprint2022arXiv

Package for Fast ABC-Boost

This report presents the open-source package which implements the series of our boosting works in the past years. In particular, the package includes mainly three lines of techniques, among which the following two are already the standard implementations in popular boosted tree platforms: (i) The histogram-based (feature-binning) approach makes the tree implementation convenient and efficient. In Li et al (2007), a simple fixed-length adaptive binning algorithm was developed. In this report, we demonstrate that such a simple algorithm is still surprisingly effective compared to more sophisticated variants in popular tree platforms. (ii) The explicit gain formula in Li (20010) for tree splitting based on second-order derivatives of the loss function typically improves, often considerably, over the first-order methods. Although the gain formula in Li (2010) was derived for logistic regression loss, it is a generic formula for loss functions with second-derivatives. For example, the open-source package also includes $L_p$ regression for $p\geq 1$. The main contribution of this package is the ABC-Boost (adaptive base class boosting) for multi-class classification. The initial work in Li (2008) derived a new set of derivatives of the classical multi-class logistic regression by specifying a "base class". The accuracy can be substantially improved if the base class is chosen properly. The major technical challenge is to design a search strategy to select the base class. The prior published works implemented an exhaustive search procedure to find the base class which is computationally too expensive. Recently, a new report (Li and Zhao, 20022) presents a unified framework of "Fast ABC-Boost" which allows users to efficiently choose the proper search space for the base class. The package provides interfaces for linux, windows, mac, matlab, R, python.

preprint2022arXiv

pGMM Kernel Regression and Comparisons with Boosted Trees

In this work, we demonstrate the advantage of the pGMM (``powered generalized min-max'') kernel in the context of (ridge) regression. In recent prior studies, the pGMM kernel has been extensively evaluated for classification tasks, for logistic regression, support vector machines, as well as deep neural networks. In this paper, we provide an experimental study on ridge regression, to compare the pGMM kernel regression with the ordinary ridge linear regression as well as the RBF kernel ridge regression. Perhaps surprisingly, even without a tuning parameter (i.e., $p=1$ for the power parameter of the pGMM kernel), the pGMM kernel already performs well. Furthermore, by tuning the parameter $p$, this (deceptively simple) pGMM kernel even performs quite comparably to boosted trees. Boosting and boosted trees are very popular in machine learning practice. For regression tasks, typically, practitioners use $L_2$ boost, i.e., for minimizing the $L_2$ loss. Sometimes for the purpose of robustness, the $L_1$ boost might be a choice. In this study, we implement $L_p$ boost for $p\geq 1$ and include it in the package of ``Fast ABC-Boost''. Perhaps also surprisingly, the best performance (in terms of $L_2$ regression loss) is often attained at $p>2$, in some cases at $p\gg 2$. This phenomenon has already been demonstrated by Li et al (UAI 2010) in the context of k-nearest neighbor classification using $L_p$ distances. In summary, the implementation of $L_p$ boost provides practitioners the additional flexibility of tuning boosting algorithms for potentially achieving better accuracy in regression applications.

preprint2022arXiv

Planar Turán number of disjoint union of $C_3$ and $C_4$

The {\em planar Turán number} of $H$, denoted by $ex_{\mathcal{P}}(n,H)$, is the maximum number of edges in an $H$-free planar graph. The planar Turán number of $k\geq 3$ vertex-disjoint union of cycles is a trivial value $3n-6$. Lan, Shi and Song determine the exact value of $ex_{\mathcal{P}}(n,2C_3)$. We continue to study planar Turán number of vertex-disjoint union of cycles and obtain the exact value of $ex_{\mathcal{P}}(n,H)$, where $H$ is vertex-disjoint union of $C_3$ and $C_4$. The extremal graphs are also characterized. We also improve the lower bound of $ex_{\mathcal{P}}(n,2C_k)$ when $k$ is sufficiently large.

preprint2022arXiv

Planning and Formulations in Pursuit-Evasion: Keep-away Games and Their Strategies

We study a pursuit-evasion problem which can be viewed as an extension of the keep-away game. In the game, pursuer(s) will attempt to intersect or catch the evader, while the evader can visit a fixed set of locations, which we denote as the anchors. These anchors may or may not be stationary. When the velocity of the pursuers is limited and considered low compared to the evaders, we are interested in whether a winning strategy exists for the pursuers or the evaders, or the game will draw. When the anchors are stationary, we show an algorithm that can help answer the above question. The primary motivation for this study is to explore the boundaries between kinematic and dynamic constraints. In particular, whether the solution of the kinematic problem can be used to speed up the search for the problems with dynamic constraints and how to discretize the problem to utilize such relations best. In this work, we show that a geometric branch-and-bound type of approach can be used to solve the stationary anchor problem, and the approach and the solution can be extended to solve the dynamic problem where the pursuers have dynamic constraints, including velocity and acceleration bounds.

preprint2022arXiv

Planning through Workspace Constraint Satisfaction and Optimization

In this work, we present a workspace-based planning framework, which though using redundant workspace key-points to represent robot states, can take advantage of the interpretable geometric information to derive good quality collision-free paths for even complex robots. Using workspace geometries, we first find collision-free piece-wise linear paths for each key point so that at the endpoints of each segment, the distance constraints are satisfied among the key points. Using these piece-wise linear paths as initial conditions, we can perform optimization steps to quickly find paths that satisfy various constraints and piece together all segments to obtain a valid path. We show that these adjusted paths are unlikely to create a collision, and the proposed approach is fast and can produce good quality results.

preprint2022arXiv

Proximity Graph Maintenance for Fast Online Nearest Neighbor Search

Approximate Nearest Neighbor (ANN) search is a fundamental technique for (e.g.,) the deployment of recommender systems. Recent studies bring proximity graph-based methods into practitioners' attention -- proximity graph-based methods outperform other solutions such as quantization, hashing, and tree-based ANN algorithm families. In current recommendation systems, data point insertions, deletions, and queries are streamed into the system in an online fashion as users and items change dynamically. As proximity graphs are constructed incrementally by inserting data points as new vertices into the graph, online insertions and queries are well-supported in proximity graph. However, a data point deletion incurs removing a vertex from the proximity graph index, while no proper graph index updating mechanisms are discussed in previous studies. To tackle the challenge, we propose an incremental proximity graph maintenance (IPGM) algorithm for online ANN. IPGM supports both vertex deletion and insertion on proximity graphs. Given a vertex deletion request, we thoroughly investigate solutions to update the connections of the vertex. The proposed updating scheme eliminates the performance drop in online ANN methods on proximity graphs, making the algorithm suitable for practical systems.

preprint2022arXiv

Quantum Anomalous Hall and Valley Quantum Anomalous Hall Effects in Two-Dimensional d0 Orbital XY Monolayers

We propose a new family of the d0 orbital XY (X = K, Rb, Cs; Y = N, P, As, Sb, Bi) monolayers with abundant and novel topology and valley properties. The KN, RbN, RbP, RbAs, CsP, CsAs, and CsSb monolayers possess remarkable quantum anomalous Hall effect (QAHE). CsSb monolayer also exhibits extraordinary valley QAHE with giant splitting. Moreover, the topological properties of XY monolayers can be effciently tuned by the in-plane strain, owing to the strain-induced band inversion between the px;y and pz orbitals. Our findings suggest that the d0 orbital XY monolayers can be good candidates for promising applications in the spintronics and multifunctional topological-based devices.

preprint2022arXiv

Regression with Label Permutation in Generalized Linear Model

The assumption that response and predictor belong to the same statistical unit may be violated in practice. Unbiased estimation and recovery of true label ordering based on unlabeled data are challenging tasks and have attracted increasing attentions in the recent literature. In this paper, we present a relatively complete analysis of label permutation problem for the generalized linear model with multivariate responses. The theory is established under different scenarios, with knowledge of true parameters, with partial knowledge of underlying label permutation matrix and without any knowledge. Our results remove the stringent conditions required by the current literature and are further extended to the missing observation setting which has never been considered in the field of label permutation problem. On computational side, we propose two methods, "maximum likelihood estimation" algorithm and "two-step estimation" algorithm, to accommodate for different settings. When the proportion of permuted labels is moderate, both methods work effectively. Multiple numerical experiments are provided and corroborate our theoretical findings.

preprint2022arXiv

SpaceE: Knowledge Graph Embedding by Relational Linear Transformation in the Entity Space

Translation distance based knowledge graph embedding (KGE) methods, such as TransE and RotatE, model the relation in knowledge graphs as translation or rotation in the vector space. Both translation and rotation are injective; that is, the translation or rotation of different vectors results in different results. In knowledge graphs, different entities may have a relation with the same entity; for example, many actors starred in one movie. Such a non-injective relation pattern cannot be well modeled by the translation or rotation operations in existing translation distance based KGE methods. To tackle the challenge, we propose a translation distance-based KGE method called SpaceE to model relations as linear transformations. The proposed SpaceE embeds both entities and relations in knowledge graphs as matrices and SpaceE naturally models non-injective relations with singular linear transformations. We theoretically demonstrate that SpaceE is a fully expressive model with the ability to infer multiple desired relation patterns, including symmetry, skew-symmetry, inversion, Abelian composition, and non-Abelian composition. Experimental results on link prediction datasets illustrate that SpaceE substantially outperforms many previous translation distance based knowledge graph embedding methods, especially on datasets with many non-injective relations. The code is available based on the PaddlePaddle deep learning platform https://www.paddlepaddle.org.cn.

preprint2022arXiv

Stability and Risk Bounds of Iterative Hard Thresholding

In this paper, we analyze the generalization performance of the Iterative Hard Thresholding (IHT) algorithm widely used for sparse recovery problems. The parameter estimation and sparsity recovery consistency of IHT has long been known in compressed sensing. From the perspective of statistical learning, another fundamental question is how well the IHT estimation would predict on unseen data. This paper makes progress towards answering this open question by introducing a novel sparse generalization theory for IHT under the notion of algorithmic stability. Our theory reveals that: 1) under natural conditions on the empirical risk function over $n$ samples of dimension $p$, IHT with sparsity level $k$ enjoys an $\mathcal{\tilde O}(n^{-1/2}\sqrt{k\log(n)\log(p)})$ rate of convergence in sparse excess risk; 2) a tighter $\mathcal{\tilde O}(n^{-1/2}\sqrt{\log(n)})$ bound can be established by imposing an additional iteration stability condition on a hypothetical IHT procedure invoked to the population risk; and 3) a fast rate of order $\mathcal{\tilde O}\left(n^{-1}k(\log^3(n)+\log(p))\right)$ can be derived for strongly convex risk function under proper strong-signal conditions. The results have been substantialized to sparse linear regression and sparse logistic regression models to demonstrate the applicability of our theory. Preliminary numerical evidence is provided to confirm our theoretical predictions.

preprint2022arXiv

Variational Flow Graphical Model

This paper introduces a novel approach to embed flow-based models with hierarchical structures. The proposed framework is named Variational Flow Graphical (VFG) Model. VFGs learn the representation of high dimensional data via a message-passing scheme by integrating flow-based functions through variational inference. By leveraging the expressive power of neural networks, VFGs produce a representation of the data using a lower dimension, thus overcoming the drawbacks of many flow-based models, usually requiring a high dimensional latent space involving many trivial variables. Aggregation nodes are introduced in the VFG models to integrate forward-backward hierarchical information via a message passing scheme. Maximizing the evidence lower bound (ELBO) of data likelihood aligns the forward and backward messages in each aggregation node achieving a consistency node state. Algorithms have been developed to learn model parameters through gradient updating regarding the ELBO objective. The consistency of aggregation nodes enable VFGs to be applicable in tractable inference on graphical structures. Besides representation learning and numerical inference, VFGs provide a new approach for distribution modeling on datasets with graphical latent structures. Additionally, theoretical study shows that VFGs are universal approximators by leveraging the implicitly invertible flow-based structures. With flexible graphical structures and superior excessive power, VFGs could potentially be used to improve probabilistic inference. In the experiments, VFGs achieves improved evidence lower bound (ELBO) and likelihood values on multiple datasets.

preprint2021arXiv

Absorption of massless scalar field by furry black holes in de Rham-Gabadadze-Tolley theory

We study the absorption of massless scalar field by two types furry charged black holes in de Rham-Gabadadze-Tolley (dRGT) theory. The absorption cross section is calculated in high frequency limit $σ_{hf}$ and low frequency limit $σ_{lf}$. We show that the high frequency limit $σ_{hf}$ is the area of shadow and the low frequency limit $σ_{lf}$ is the area of horizon. The ratio $R_{f}=\frac{σ_{hf}}{σ_{lf}}$ is used to measure the impact of charges on the absorption spectra of furry black hole. If the black hole possess an extra charge except mass, the interval value of absorption ratio $R_{f}$ is different: $[1,\frac{27}{16}]$ for electric charge, $[0.7675,\frac{27}{16}]$ for positive charge and $[\frac{27}{16},3.1835]$ for negative charge. We also use a numerical method to compute the absorption cross section in the finite frequency domain. A series of numerical results are presented.

preprint2021arXiv

High-Order Statistical Functional Expansion and Its Application To Some Nonsmooth Problems

Let $\bx_j = \btheta +\bep_j, j=1,...,n$, be observations of an unknown parameter $\btheta$ in a Euclidean or separable Hilbert space $\scrH$, where $\bep_j$ are noises as random elements in $\scrH$ from a general distribution. We study the estimation of $f(\btheta)$ for a given functional $f:\scrH\rightarrow \RR$ based on $\bx_j$'s. The key element of our approach is a new method which we call High-Order Degenerate Statistical Expansion. It leverages the use of classical multivariate Taylor expansion and degenerate $U$-statistic and yields an elegant explicit formula. In the univariate case of $\scrH=\R$, the formula expresses the error of the proposed estimator as a sum of order $k$ degenerate $U$-products of the noises with coefficient $f^{(k)}(\btheta)/k!$ and an explicit remainder term in the form of the Riemann-Liouville integral as in the Taylor expansion around the true $\btheta$. For general $\scrH$, the formula expresses the estimation error in terms of the inner product of $f^{(k)}(\btheta)/k!$ and the average of the tensor products of $k$ noises with distinct indices and a parallel extension of the remainder term from the univariate case. This makes the proposed method a natural statistical version of the classical Taylor expansion. The proposed estimator can be viewed as a jackknife estimator of an ideal degenerate expansion of $f(\cdot)$ around the true $\btheta$ with the degenerate $U$-product of the noises, and can be approximated by bootstrap. Thus, the jackknife, bootstrap and Taylor expansion approaches all converge to the proposed estimator. We develop risk bounds for the proposed estimator and a central limit theorem under a second moment condition (even in expansions of higher than the second order). We apply this new method to generalize several existing results with smooth and nonsmooth $f$ to universal $\bep_j$'s with only minimum moment constraints.

preprint2021arXiv

Quantization Algorithms for Random Fourier Features

The method of random projection (RP) is the standard technique in machine learning and many other areas, for dimensionality reduction, approximate near neighbor search, compressed sensing, etc. Basically, RP provides a simple and effective scheme for approximating pairwise inner products and Euclidean distances in massive data. Closely related to RP, the method of random Fourier features (RFF) has also become popular, for approximating the Gaussian kernel. RFF applies a specific nonlinear transformation on the projected data from random projections. In practice, using the (nonlinear) Gaussian kernel often leads to better performance than the linear kernel (inner product), partly due to the tuning parameter $(γ)$ introduced in the Gaussian kernel. Recently, there has been a surge of interest in studying properties of RFF. After random projections, quantization is an important step for efficient data storage, computation, and transmission. Quantization for RP has also been extensive studied in the literature. In this paper, we focus on developing quantization algorithms for RFF. The task is in a sense challenging due to the tuning parameter $γ$ in the Gaussian kernel. For example, the quantizer and the quantized data might be tied to each specific tuning parameter $γ$. Our contribution begins with an interesting discovery, that the marginal distribution of RFF is actually free of the Gaussian kernel parameter $γ$. This small finding significantly simplifies the design of the Lloyd-Max (LM) quantization scheme for RFF in that there would be only one LM quantizer for RFF (regardless of $γ$). We also develop a variant named LM$^2$-RFF quantizer, which in certain cases is more accurate. Experiments confirm that the proposed quantization schemes perform well.

preprint2021arXiv

Simulation on the Transparency of Electrons and Ion Back Flow for a Time Projection Chamber based on Staggered Multiple THGEMs

The IBF and the transparent rate of electrons are two essential indicators of TPC, which affect the energy resolution and counting rate respectively. In this paper, we propose several novel strategies of staggered multi-THGEM to suppress IBF, where the geometry of the first layer THGEM will be optimized to increase the electron transparent rate. By Garfield++ simulation, the electron transparency rate can be more than 90% of single THGEM with a optimized large hole. By simulating these configurations of triple and quadruple THGEM structures, we conclude that the IBF can be reduced to 0.2% level in an optimized configuration denoted as "ACBA". This strategy for staggered THGEM could have potential applications in future TPC projects.

preprint2020arXiv

A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data

A tacit assumption in linear regression is that (response, predictor)-pairs correspond to identical observational units. A series of recent works have studied scenarios in which this assumption is violated under terms such as ``Unlabeled Sensing and ``Regression with Unknown Permutation''. In this paper, we study the setup of multiple response variables and a notion of mismatches that generalizes permutations in order to allow for missing matches as well as for one-to-many matches. A two-stage method is proposed under the assumption that most pairs are correctly matched. In the first stage, the regression parameter is estimated by handling mismatches as contaminations, and subsequently the generalized permutation is estimated by a basic variant of matching. The approach is both computationally convenient and equipped with favorable statistical guarantees. Specifically, it is shown that the conditions for permutation recovery become considerably less stringent as the number of responses $m$ per observation increase. Particularly, for $m = Ω(\log n)$, the required signal-to-noise ratio no longer depends on the sample size $n$. Numerical results on synthetic and real data are presented to support the main findings of our analysis.

preprint2020arXiv

Adaptive Randomization in Network Data

Network data have appeared frequently in recent research. For example, in comparing the effects of different types of treatment, network models have been proposed to improve the quality of estimation and hypothesis testing. In this paper, we focus on efficiently estimating the average treatment effect using an adaptive randomization procedure in networks. We work on models of causal frameworks, for which the treatment outcome of a subject is affected by its own covariate as well as those of its neighbors. Moreover, we consider the case in which, when we assign treatments to the current subject, only the subnetwork of existing subjects is revealed. New randomized procedures are proposed to minimize the mean squared error of the estimated differences between treatment effects. In network data, it is usually difficult to obtain theoretical properties because the numbers of nodes and connections increase simultaneously. Under mild assumptions, our proposed procedure is closely related to a time-varying inhomogeneous Markov chain. We then use Lyapunov functions to derive the theoretical properties of the proposed procedures. The advantages of the proposed procedures are also demonstrated by extensive simulations and experiments on real network data.

preprint2020arXiv

An Inverse-free Truncated Rayleigh-Ritz Method for Sparse Generalized Eigenvalue Problem

This paper considers the sparse generalized eigenvalue problem (SGEP), which aims to find the leading eigenvector with at most $k$ nonzero entries. SGEP naturally arises in many applications in machine learning, statistics, and scientific computing, for example, the sparse principal component analysis (SPCA), the sparse discriminant analysis (SDA), and the sparse canonical correlation analysis (SCCA). In this paper, we focus on the development of a three-stage algorithm named {\em inverse-free truncated Rayleigh-Ritz method} ({\em IFTRR}) to efficiently solve SGEP. In each iteration of IFTRR, only a small number of matrix-vector products is required. This makes IFTRR well-suited for large scale problems. Particularly, a new truncation strategy is proposed, which is able to find the support set of the leading eigenvector effectively. Theoretical results are developed to explain why IFTRR works well. Numerical simulations demonstrate the merits of IFTRR.

preprint2020arXiv

Characterization on projective submanifolds of codimensions 2 and 3

In this article we give a necessary and sufficient condition to characterize projective submanifolds in ${\mathbb P}^N$ with codimensions 2 and 3. The conditions involve the Chern classes of the manifold and a very ample line bundle on the manifold. This generalizes our earlier characterization for hypersurfaces. The higher codimensional cases are proposed as a general question.

preprint2020arXiv

Chern class inequalities on polarized manifolds and nef vector bundles

This article is concerned with Chern class and Chern number inequalities on polarized manifolds and nef vector bundles. For a polarized pair $(M,L)$ with $L$ very ample, our first main result is a family of sharp Chern class inequalities. Among them the first one is a variant of a classical result and the equality case of the second one is a characterization of hypersurfaces. The second main result is a Chern number inequality on it, which includes a reverse Miyaoka-Yau type inequality. The third main result is that the Chern numbers of a nef vector bundle over a compact Kähler manifold are bounded below by the Euler number. As an application, we classify compact Kähler manifolds with nonnegative bisectional curvature whose Chern numbers are all positive. A conjecture related to the Euler number of compact Kähler manifolds with nonpositive bisectional curvature is proposed, which can be regarded as a complex analogue to the Hopf conjecture.

preprint2020arXiv

Cluster-Adaptive Network A/B Testing: From Randomization to Estimation

A/B testing is an important decision-making tool in product development for evaluating user engagement or satisfaction from a new service, feature or product. The goal of A/B testing is to estimate the average treatment effects (ATE) of a new change, which becomes complicated when users are interacting. When the important assumption of A/B testing, the Stable Unit Treatment Value Assumption (SUTVA), which states that each individual's response is affected by their own treatment only, is not valid, the classical estimate of the ATE usually leads to a wrong conclusion. In this paper, we propose a cluster-adaptive network A/B testing procedure, which involves a sequential cluster-adaptive randomization and a cluster-adjusted estimator. The cluster-adaptive randomization is employed to minimize the cluster-level Mahalanobis distance within the two treatment groups, so that the variance of the estimate of the ATE can be reduced. In addition, the cluster-adjusted estimator is used to eliminate the bias caused by network interference, resulting in a consistent estimation for the ATE. Numerical studies suggest our cluster-adaptive network A/B testing achieves consistent estimation with higher efficiency. An empirical study is conducted based on a real world network to illustrate how our method can benefit decision-making in application.

preprint2020arXiv

Complexity results for two kinds of colored disconnections of graphs

The concept of rainbow disconnection number of graphs was introduced by Chartrand et al. in 2018. Inspired by this concept, we put forward the concepts of rainbow vertex-disconnection and proper disconnection in graphs. In this paper, we first show that it is $NP$-complete to decide whether a given edge-colored graph $G$ with maximum degree $Δ(G)=4$ is proper disconnected. Then, for a graph $G$ with $Δ(G)\leq 3$ we show that $pd(G)\leq 2$ and determine the graphs with $pd(G)=1$ and $2$, respectively. Furthermore, we show that for a general graph $G$, deciding whether $pd(G)=1$ is $NP$-complete, even if $G$ is bipartite. We also show that it is $NP$-complete to decide whether a given vertex-colored graph $G$ is rainbow vertex-disconnected, even though the graph $G$ has $Δ(G)=3$ or is bipartite.

preprint2020arXiv

Cooperative Rate-Splitting for Secrecy Sum-Rate Enhancement in Multi-antenna Broadcast Channels

In this paper, we employ Cooperative Rate-Splitting (CRS) technique to enhance the Secrecy Sum Rate (SSR) for the Multiple Input Single Output (MISO) Broadcast Channel (BC), consisting of two legitimate users and one eavesdropper, with perfect Channel State Information (CSI) available at all nodes. For CRS based on the three-node relay channel, the transmitter splits and encodes the messages of legitimate users into common and private streams based on Rate-Splitting (RS). With the goal of maximizing SSR, the proposed CRS strategy opportunistically asks the relaying legitimate user to forward its decoded common message. During the transmission, the eavesdropper keeps wiretapping silently. To ensure secure transmission, the common message is used for the dual purpose, serving both as a desired message and Artificial Noise (AN) without consuming extra transmit power comparing to the conventional AN design. Taking into account the total power constraint and the Physical Layer (PHY) security, the precoders and time-slot allocation are jointly optimized by solving the non-convex SSR maximization problem based on Sequential Convex Approximation (SCA) algorithm. Numerical results show that the proposed CRS secure transmission scheme outperforms existing Multi-User Linear Precoding (MU-LP) and Cooperative Non-Orthogonal Multiple Access (C-NOMA) strategies. Therefore, CRS is a promising strategy to enhance the PHY security in multi-antenna BC systems.

preprint2020arXiv

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Neural networks of ads systems usually take input from multiple resources, e.g., query-ad relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot binary features, with typically only a tiny fraction of nonzero feature values per example. Deep learning models in online advertising industries can have terabyte-scale parameters that do not fit in the GPU memory nor the CPU main memory on a computing node. For example, a sponsored online advertising system can contain more than $10^{11}$ sparse features, making the neural network a massive model with around 10 TB parameters. In this paper, we introduce a distributed GPU hierarchical parameter server for massive scale deep learning ads systems. We propose a hierarchical workflow that utilizes GPU High-Bandwidth Memory, CPU main memory and SSD as 3-layer hierarchical storage. All the neural network training computations are contained in GPUs. Extensive experiments on real-world data confirm the effectiveness and the scalability of the proposed system. A 4-node hierarchical GPU parameter server can train a model more than 2X faster than a 150-node in-memory distributed parameter server in an MPI cluster. In addition, the price-performance ratio of our proposed system is 4-9 times better than an MPI-cluster solution.

preprint2020arXiv

Distributed Primal-Dual Optimization for Online Multi-Task Learning

Conventional online multi-task learning algorithms suffer from two critical limitations: 1) Heavy communication caused by delivering high velocity of sequential data to a central machine; 2) Expensive runtime complexity for building task relatedness. To address these issues, in this paper we consider a setting where multiple tasks are geographically located in different places, where one task can synchronize data with others to leverage knowledge of related tasks. Specifically, we propose an adaptive primal-dual algorithm, which not only captures task-specific noise in adversarial learning but also carries out a projection-free update with runtime efficiency. Moreover, our model is well-suited to decentralized periodic-connected tasks as it allows the energy-starved or bandwidth-constraint tasks to postpone the update. Theoretical results demonstrate the convergence guarantee of our distributed algorithm with an optimal regret. Empirical results confirm that the proposed model is highly effective on various real-world datasets.

preprint2020arXiv

FedSKETCH: Communication-Efficient and Private Federated Learning via Sketching

Communication complexity and privacy are the two key challenges in Federated Learning where the goal is to perform a distributed learning through a large volume of devices. In this work, we introduce FedSKETCH and FedSKETCHGATE algorithms to address both challenges in Federated learning jointly, where these algorithms are intended to be used for homogeneous and heterogeneous data distribution settings respectively. The key idea is to compress the accumulation of local gradients using count sketch, therefore, the server does not have access to the gradients themselves which provides privacy. Furthermore, due to the lower dimension of sketching used, our method exhibits communication-efficiency property as well. We provide, for the aforementioned schemes, sharp convergence guarantees. Finally, we back up our theory with various set of experiments.

preprint2020arXiv

Generalization Bounds for High-dimensional M-estimation under Sparsity Constraint

The $\ell_0$-constrained empirical risk minimization ($\ell_0$-ERM) is a promising tool for high-dimensional statistical estimation. The existing analysis of $\ell_0$-ERM estimator is mostly on parameter estimation and support recovery consistency. From the perspective of statistical learning, another fundamental question is how well the $\ell_0$-ERM estimator would perform on unseen samples. The answer to this question is important for understanding the learnability of such a non-convex (and also NP-hard) M-estimator but still relatively under explored. In this paper, we investigate this problem and develop a generalization theory for $\ell_0$-ERM. We establish, in both white-box and black-box statistical regimes, a set of generalization gap and excess risk bounds for $\ell_0$-ERM to characterize its sparse prediction and optimization capability. Our theory mainly reveals three findings: 1) tighter generalization bounds can be attained by $\ell_0$-ERM than those of $\ell_2$-ERM if the risk function is (with high probability) restricted strongly convex; 2) tighter uniform generalization bounds can be established for $\ell_0$-ERM than the conventional dense ERM; and 3) sparsity level invariant bounds can be established by imposing additional strong-signal conditions to ensure the stability of $\ell_0$-ERM. In light of these results, we further provide generalization guarantees for the Iterative Hard Thresholding (IHT) algorithm which serves as one of the most popular greedy pursuit methods for approximately solving $\ell_0$-ERM. Numerical evidence is provided to confirm our theoretical predictions when implied to sparsity-constrained linear regression and logistic regression models.

preprint2020arXiv

IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation

Feature selection is an important tool to deal with high dimensional data. In unsupervised case, many popular algorithms aim at maintaining the structure of the original data. In this paper, we propose a simple and effective feature selection algorithm to enhance sample similarity preservation through a new perspective, topology preservation, which is represented by persistent diagrams from the context of computational topology. This method is designed upon a unified feature selection framework called IVFS, which is inspired by random subset method. The scheme is flexible and can handle cases where the problem is analytically intractable. The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data. We demonstrate that our algorithm can provide satisfactory performance under a sharp sub-sampling rate, which supports efficient implementation of our proposed method to large scale datasets. Extensive experiments validate the effectiveness of the proposed feature selection scheme.

preprint2020arXiv

MeDaS: An open-source platform as service to help break the walls between medicine and informatics

In the past decade, deep learning (DL) has achieved unprecedented success in numerous fields including computer vision, natural language processing, and healthcare. In particular, DL is experiencing an increasing development in applications for advanced medical image analysis in terms of analysis, segmentation, classification, and furthermore. On the one hand, tremendous needs that leverage the power of DL for medical image analysis are arising from the research community of a medical, clinical, and informatics background to jointly share their expertise, knowledge, skills, and experience. On the other hand, barriers between disciplines are on the road for them often hampering a full and efficient collaboration. To this end, we propose our novel open-source platform, i.e., MeDaS -- the MeDical open-source platform as Service. To the best of our knowledge, MeDaS is the first open-source platform proving a collaborative and interactive service for researchers from a medical background easily using DL related toolkits, and at the same time for scientists or engineers from information sciences to understand the medical knowledge side. Based on a series of toolkits and utilities from the idea of RINV (Rapid Implementation aNd Verification), our proposed MeDaS platform can implement pre-processing, post-processing, augmentation, visualization, and other phases needed in medical image analysis. Five tasks including the subjects of lung, liver, brain, chest, and pathology, are validated and demonstrated to be efficiently realisable by using MeDaS.

preprint2020arXiv

Meta-CoTGAN: A Meta Cooperative Training Paradigm for Improving Adversarial Text Generation

Training generative models that can generate high-quality text with sufficient diversity is an important open problem for Natural Language Generation (NLG) community. Recently, generative adversarial models have been applied extensively on text generation tasks, where the adversarially trained generators alleviate the exposure bias experienced by conventional maximum likelihood approaches and result in promising generation quality. However, due to the notorious defect of mode collapse for adversarial training, the adversarially trained generators face a quality-diversity trade-off, i.e., the generator models tend to sacrifice generation diversity severely for increasing generation quality. In this paper, we propose a novel approach which aims to improve the performance of adversarial text generation via efficiently decelerating mode collapse of the adversarial training. To this end, we introduce a cooperative training paradigm, where a language model is cooperatively trained with the generator and we utilize the language model to efficiently shape the data distribution of the generator against mode collapse. Moreover, instead of engaging the cooperative update for the generator in a principled way, we formulate a meta learning mechanism, where the cooperative update to the generator serves as a high level meta task, with an intuition of ensuring the parameters of the generator after the adversarial update would stay resistant against mode collapse. In the experiment, we demonstrate our proposed approach can efficiently slow down the pace of mode collapse for the adversarial text generators. Overall, our proposed method is able to outperform the baseline approaches with significant margins in terms of both generation quality and diversity in the testified domains.

preprint2020arXiv

Monochromatic disconnection: Erdős-Gallai-type problems and product graphs

For an edge-colored graph $G$, we call an edge-cut $M$ of $G$ monochromatic if the edges of $M$ are colored with a same color. The graph $G$ is called monochromatically disconnected if any two distinct vertices of $G$ are separated by a monochromatic edge-cut. The monochromatic disconnection number, denoted by $md(G)$, of a connected graph $G$ is the maximum number of colors that are allowed to make $G$ monochromatically disconnected. In this paper, we solve the Erdős-Gallai-type problems for the monochromatic disconnection, and give the monochromatic disconnection numbers for four graph products, i.e., Cartesian, strong, lexicographic, and tensor products.

preprint2020arXiv

Nonnegative Hermitian vector bundles and Chern numbers

We show in this article that if a holomorphic vector bundle has a nonnegative Hermitian metric in the sense of Bott and Chern, which always exists on globally generated holomorphic vector bundles, then some special linear combinations of Chern forms are strongly nonnegative. This particularly implies that all the Chern numbers of such a holomorphic vector bundle are nonnegative and can be bounded below and above respectively by two special Chern numbers. As applications, we obtain a family of new results on compact connected complex manifolds which are homogeneous or can be holomorphically immersed into complex tori, some of which improve several classical results.

preprint2020arXiv

Rainbow monochromatic $k$-edge-connection colorings of graphs

A path in an edge-colored graph is called a monochromatic path if all edges of the path have a same color. We call $k$ paths $P_1,\cdots,P_k$ rainbow monochromatic paths if every $P_i$ is monochromatic and for any two $i\neq j$, $P_i$ and $P_j$ have different colors. An edge-coloring of a graph $G$ is said to be a rainbow monochromatic $k$-edge-connection coloring (or $RMC_k$-coloring for short) if every two distinct vertices of $G$ are connected by at least $k$ rainbow monochromatic paths. We use $rmc_k(G)$ to denote the maximum number of colors that ensures $G$ has an $RMC_k$-coloring, and this number is called the rainbow monochromatic $k$-edge-connection number. We prove the existence of $RMC_k$-colorings of graphs, and then give some bounds of $rmc_k(G)$ and present some graphs whose $rmc_k(G)$ reaches the lower bound. We also obtain the threshold function for $rmc_k(G(n,p))\geq f(n)$, where $\lfloor\frac{n}{2}\rfloor> k\geq 1$.

preprint2020arXiv

Randomized Kernel Multi-view Discriminant Analysis

In many artificial intelligence and computer vision systems, the same object can be observed at distinct viewpoints or by diverse sensors, which raises the challenges for recognizing objects from different, even heterogeneous views. Multi-view discriminant analysis (MvDA) is an effective multi-view subspace learning method, which finds a discriminant common subspace by jointly learning multiple view-specific linear projections for object recognition from multiple views, in a non-pairwise way. In this paper, we propose the kernel version of multi-view discriminant analysis, called kernel multi-view discriminant analysis (KMvDA). To overcome the well-known computational bottleneck of kernel methods, we also study the performance of using random Fourier features (RFF) to approximate Gaussian kernels in KMvDA, for large scale learning. Theoretical analysis on stability of this approximation is developed. We also conduct experiments on several popular multi-view datasets to illustrate the effectiveness of our proposed strategy.

preprint2020arXiv

RGB-D SLAM in Dynamic Environments Using Point Correlations

In this paper, a simultaneous localization and mapping (SLAM) method that eliminates the influence of moving objects in dynamic environments is proposed. This method utilizes the correlation between map points to separate points that are part of the static scene and points that are part of different moving objects into different groups. A sparse graph is first created using Delaunay triangulation from all map points. In this graph, the vertices represent map points, and each edge represents the correlation between adjacent points. If the relative position between two points remains consistent over time, there is correlation between them, and they are considered to be moving together rigidly. If not, they are considered to have no correlation and to be in separate groups. After the edges between the uncorrelated points are removed during point-correlation optimization, the remaining graph separates the map points of the moving objects from the map points of the static scene. The largest group is assumed to be the group of reliable static map points. Finally, motion estimation is performed using only these points. The proposed method was implemented for RGB-D sensors, evaluated with a public RGB-D benchmark, and tested in several additional challenging environments. The experimental results demonstrate that robust and accurate performance can be achieved by the proposed SLAM method in both slightly and highly dynamic environments. Compared with other state-of-the-art methods, the proposed method can provide competitive accuracy with good real-time performance.

preprint2020arXiv

Selective Convolutional Network: An Efficient Object Detector with Ignoring Background

It is well known that attention mechanisms can effectively improve the performance of many CNNs including object detectors. Instead of refining feature maps prevalently, we reduce the prohibitive computational complexity by a novel attempt at attention. Therefore, we introduce an efficient object detector called Selective Convolutional Network (SCN), which selectively calculates only on the locations that contain meaningful and conducive information. The basic idea is to exclude the insignificant background areas, which effectively reduces the computational cost especially during the feature extraction. To solve it, we design an elaborate structure with negligible overheads to guide the network where to look next. It's end-to-end trainable and easy-embedding. Without additional segmentation datasets, we explores two different train strategies including direct supervision and indirect supervision. Extensive experiments assess the performance on PASCAL VOC2007 and MS COCO detection datasets. Results show that SSD and Pelee integrated with our method averagely reduce the calculations in a range of 1/5 and 1/3 with slight loss of accuracy, demonstrating the feasibility of SCN.

preprint2020arXiv

Solving the Robust Matrix Completion Problem via a System of Nonlinear Equations

We consider the problem of robust matrix completion, which aims to recover a low rank matrix $L_*$ and a sparse matrix $S_*$ from incomplete observations of their sum $M=L_*+S_*\in\mathbb{R}^{m\times n}$. Algorithmically, the robust matrix completion problem is transformed into a problem of solving a system of nonlinear equations, and the alternative direction method is then used to solve the nonlinear equations. In addition, the algorithm is highly parallelizable and suitable for large scale problems. Theoretically, we characterize the sufficient conditions for when $L_*$ can be approximated by a low rank approximation of the observed $M_*$. And under proper assumptions, it is shown that the algorithm converges to the true solution linearly. Numerical simulations show that the simple method works as expected and is comparable with state-of-the-art methods.

preprint2020arXiv

Structure-Feature based Graph Self-adaptive Pooling

Various methods to deal with graph data have been proposed in recent years. However, most of these methods focus on graph feature aggregation rather than graph pooling. Besides, the existing top-k selection graph pooling methods have a few problems. First, to construct the pooled graph topology, current top-k selection methods evaluate the importance of the node from a single perspective only, which is simplistic and unobjective. Second, the feature information of unselected nodes is directly lost during the pooling process, which inevitably leads to a massive loss of graph feature information. To solve these problems mentioned above, we propose a novel graph self-adaptive pooling method with the following objectives: (1) to construct a reasonable pooled graph topology, structure and feature information of the graph are considered simultaneously, which provide additional veracity and objectivity in node selection; and (2) to make the pooled nodes contain sufficiently effective graph information, node feature information is aggregated before discarding the unimportant nodes; thus, the selected nodes contain information from neighbor nodes, which can enhance the use of features of the unselected nodes. Experimental results on four different datasets demonstrate that our method is effective in graph classification and outperforms state-of-the-art graph pooling methods.

preprint2020arXiv

The Benefits of Diversity: Permutation Recovery in Unlabeled Sensing from Multiple Measurement Vectors

In "Unlabeled Sensing", one observes a set of linear measurements of an underlying signal with incomplete or missing information about their ordering, which can be modeled in terms of an unknown permutation. Previous work on the case of a single noisy measurement vector has exposed two main challenges: 1) a high requirement concerning the \emph{signal-to-noise ratio} ($\snr$), i.e., approximately of the order of $n^{5}$, and 2) a massive computational burden in light of NP-hardness in general. In this paper, we study the case of \emph{multiple} noisy measurement vectors (MMVs) resulting from a \emph{common} permutation and investigate to what extent the number of MMVs $m$ facilitates permutation recovery by "borrowing strength". The above two challenges have at least partially been resolved within our work. First, we show that a large stable rank of the signal significantly reduces the required snr which can drop from a polynomial in $n$ for $m = 1$ to a constant for $m = Ω(\log n)$, where $m$ denotes the number of MMVs and $n$ denotes the number of measurements per MV. This bound is shown to be sharp and is associated with a phase transition phenomenon. Second, we propose a computational scheme for recovering the unknown permutation in practice. For the "oracle case" with the known signal, the maximum likelihood (ML) estimator reduces to a linear assignment problem whose global optimum can be obtained efficiently. For the case in which both the signal and permutation are unknown, the problem is reformulated as a bi-convex optimization problem with an auxiliary variable, which can be solved by the Alternating Direction Method of Multipliers (ADMM). Numerical experiments based on the proposed computational scheme confirm the tightness of our theoretical analysis.

preprint2020arXiv

The rainbow vertex-disconnection in graphs

Let $G$ be a nontrivial connected and vertex-colored graph. A subset $X$ of the vertex set of $G$ is called rainbow if any two vertices in $X$ have distinct colors. The graph $G$ is called \emph{rainbow vertex-disconnected} if for any two vertices $x$ and $y$ of $G$, there exists a vertex subset $S$ of $G$ such that when $x$ and $y$ are nonadjacent, $S$ is rainbow and $x$ and $y$ belong to different components of $G-S$; whereas when $x$ and $y$ are adjacent, $S+x$ or $S+y$ is rainbow and $x$ and $y$ belong to different components of $(G-xy)-S$. For a connected graph $G$, the \emph{rainbow vertex-disconnection number} of $G$, denoted by $rvd(G)$, is the minimum number of colors that are needed to make $G$ rainbow vertex-disconnected. In this paper, we characterize all graphs of order $n$ with rainbow vertex-disconnection number $k$ for $k\in\{1,2,n\}$, and determine the rainbow vertex-disconnection numbers of some special graphs. Moreover, we study the extremal problems on the number of edges of a connected graph $G$ with order $n$ and $rvd(G)=k$ for given integers $k$ and $n$ with $1\leq k\leq n$.

preprint2020arXiv

Two-dimensional transition metal oxides Mn2O3 realized quantum anomalous Hall effect

The quantum anomalous Hall effect is a intriguing topological nontrivial phase arising from spontaneous magnetization and spin-orbit coupling. However, the tremendously harsh realizing requirements of the quantum anomalous Hall effects in magnetic topological insulators of Cr or V-doped (Bi,Sb)2Te3 film, hinder its practical applications. Here, we use first principles calculations to predict that the three Mn2O3 structure is an intrinsic ferromagnetic Chern insulator. Remarkably, a quantum anomalous Hall phase of Chern number C = -2 is found, and there are two corresponding gapless chiral edge states appearing inside the bulk gap. More interestingly, only a small tensile strain is needed to induce the phase transition from Cmm2 and C222 phase to P6/mmm phase. Meanwhile, a topological quantum phase transition between a quantum anomalous Hall phase and a trivial insulating phase can be realize. The combination of these novel properties renders the two-dimensional ferromagnet a promising platform for high effciency electronic and spintronic applications.

preprint2020arXiv

Understanding and Detecting Convergence for Stochastic Gradient Descent with Momentum

Convergence detection of iterative stochastic optimization methods is of great practical interest. This paper considers stochastic gradient descent (SGD) with a constant learning rate and momentum. We show that there exists a transient phase in which iterates move towards a region of interest, and a stationary phase in which iterates remain bounded in that region around a minimum point. We construct a statistical diagnostic test for convergence to the stationary phase using the inner product between successive gradients and demonstrate that the proposed diagnostic works well. We theoretically and empirically characterize how momentum can affect the test statistic of the diagnostic, and how the test statistic captures a relatively sparse signal within the gradients in convergence. Finally, we demonstrate an application to automatically tune the learning rate by reducing it each time stationarity is detected, and show the procedure is robust to mis-specified initial rates.

preprint2020arXiv

Upper bounds for the $MD$-numbers and characterization of extremal graphs

For an edge-colored graph $G$, we call an edge-cut $M$ of $G$ monochromatic if the edges of $M$ are colored with the same color. The graph $G$ is called monochromatic disconnected if any two distinct vertices of $G$ are separated by a monochromatic edge-cut. For a connected graph $G$, the monochromatic disconnection number (or $MD$-number for short) of $G$, denoted by $md(G)$, is the maximum number of colors that are allowed in order to make $G$ monochromatic disconnected. For graphs with diameter one, they are complete graphs and so their $MD$-numbers are $1$. For graphs with diameter at least 3, we can construct $2$-connected graphs such that their $MD$-numbers can be arbitrarily large; whereas for graphs $G$ with diameter two, we show that if $G$ is a $2$-connected graph then $md(G)\leq 2$, and if $G$ has a cut-vertex then $md(G)$ is equal to the number of blocks of $G$. So, we will focus on studying $2$-connected graphs with diameter two, and give two upper bounds of their $MD$-numbers depending on their connectivity and independent numbers, respectively. We also characterize the $\left\lfloor\frac{n}{2}\right\rfloor$-connected graphs (with large connectivity) whose $MD$-numbers are $2$ and the $2$-connected graphs (with small connectivity) whose $MD$-numbers archive the upper bound $\left\lfloor\frac{n}{2}\right\rfloor.$ For graphs with connectivity less than $\frac n 2$, we show that if the connectivity of a graph is in linear with its order $n$, then its $MD$-number is upper bounded by a constant, and this suggests us to leave a conjecture that for a $k$-connected graph $G$, $md(G)\leq \left\lfloor\frac{n}{k}\right\rfloor$.

preprint2019arXiv

Fully spin-polarized quadratic non-Dirac bands realized quantum anomalous Hall effect

The quantum anomalous Hall effect is a intriguing quantum state which exhibits the chiral edge states in the absence of magnetic field. While the search for quantum anomalous Hall insulators is still active, the researchers mainly search for the systems containing magnetic atom. Here, based on first-principles density functional theory, we predict a new family of chern insulators with fully spin-polarized quadratic px;y non-Dirac bands in the alkali earth metal BaX (X = Si, Ge, Sn) system. We show that BaX monolayer has a half-metallic ferromagnetic ground state. The ferromagnetism is mainly originated from the p orbitals of Si, Ge and Sn atoms. The 2D BaSn monolayer exhibits a large magnetocrystalline anisotropic energy of 12.20 meV/cell and a nontrivial band gap of 159.10 meV. Interestingly, both the spin polarization of the chiral edge currents and the sign of Chern number can be tuned by doping. Furthermore, the 4 % compressive strain can drive structural phase transition but the nontrivial topological properties remain reserve in the 2D BaX systems. Our findings not only extend the novel concepts but also provide fascinating opportunities for the realization of quantum anomalous Hall effect experimentally.

preprint2019arXiv

Joule-Thomson expansion of the Bardeen-AdS black holes

The Joule-Thomson expansion process is studied for Bardeen-AdS black holes in the extended phase space. Firstly, we get Joule-Thomson coefficient and find that the divergent point of Joule-Thomson coefficient coincides with the zero point of temperature. The inversion curves are also obtained from the zero point of Joule-Thomson coefficient. Then the minimum inversion temperature and the corresponding mass are obtained. In addition, the ratio between minimum inversion and critical temperature for Bardeen-AdS black holes is also calculated. We obtain the isenthalpic curve in $T$-$P$ graph and demonstrate the cooling-heating region by the inversion curve. An interesting phenomenon we get is that black hole is always in heating process. The same phenomenon can be also obtained from the charged AdS black holes.

preprint2019arXiv

On some applications of Gauduchon metrics

We apply the existence and special properties of Gauduchon metrics to give several applications. The first one is concerned with the implications of algebro-geometric nature under the existence of a Hermitian metric with nonnegative holomorphic sectional curvature. The second one is to show the non-existence of holomorphic sections on Hermitian vector bundles under certain conditions. The third one is to give a restriction on the $\partial\bar{\partial}$-closedness of some real $(n-1,n-1)$-forms on compact complex manifolds.

preprint2019arXiv

Outlier Detection and Data Clustering via Innovation Search

The idea of Innovation Search was proposed as a data clustering method in which the directions of innovation were utilized to compute the adjacency matrix and it was shown that Innovation Pursuit can notably outperform the self representation based subspace clustering methods. In this paper, we present a new discovery that the directions of innovation can be used to design a provable and strong robust (to outlier) PCA method. The proposed approach, dubbed iSearch, uses the direction search optimization problem to compute an optimal direction corresponding to each data point. iSearch utilizes the directions of innovation to measure the innovation of the data points and it identifies the outliers as the most innovative data points. Analytical performance guarantees are derived for the proposed robust PCA method under different models for the distribution of the outliers including randomly distributed outliers, clustered outliers, and linearly dependent outliers. In addition, we study the problem of outlier detection in a union of subspaces and it is shown that iSearch provably recovers the span of the inliers when the inliers lie in a union of subspaces. Moreover, we present theoretical studies which show that the proposed measure of innovation remains stable in the presence of noise and the performance of iSearch is robust to noisy data. In the challenging scenarios in which the outliers are close to each other or they are close to the span of the inliers, iSearch is shown to remarkably outperform most of the existing methods. The presented method shows that the directions of innovation are useful representation of the data which can be used to perform both data clustering and outlier detection.

preprint2018arXiv

A Tight Bound of Hard Thresholding

This paper is concerned with the hard thresholding operator which sets all but the $k$ largest absolute elements of a vector to zero. We establish a {\em tight} bound to quantitatively characterize the deviation of the thresholded solution from a given signal. Our theoretical result is universal in the sense that it holds for all choices of parameters, and the underlying analysis depends only on fundamental arguments in mathematical optimization. We discuss the implications for two domains: Compressed Sensing. On account of the crucial estimate, we bridge the connection between the restricted isometry property (RIP) and the sparsity parameter for a vast volume of hard thresholding based algorithms, which renders an improvement on the RIP condition especially when the true sparsity is unknown. This suggests that in essence, many more kinds of sensing matrices or fewer measurements are admissible for the data acquisition procedure. Machine Learning. In terms of large-scale machine learning, a significant yet challenging problem is learning accurate sparse models in an efficient manner. In stark contrast to prior work that attempted the $\ell_1$-relaxation for promoting sparsity, we present a novel stochastic algorithm which performs hard thresholding in each iteration, hence ensuring such parsimonious solutions. Equipped with the developed bound, we prove the {\em global linear convergence} for a number of prevalent statistical models under mild assumptions, even though the problem turns out to be non-convex.