Source author record

Ping Li

Ping Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

161works

46topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Research on Vision-Language Question Answering Models for Industrial Robots

A hierarchical cross-modal fusion model is proposed for vision-language question answering (VLQA) in industrial robotics, targeting the challenges of semantic ambiguity, complex environmental layouts, and domain-specific terminology common in modern manufacturing. The framework integrates advanced object detection, multi-scale visual encoding, syntactic parsing, and task-aware semantic attention to unite vision and language signals into a joint reasoning space. Region-based deep networks extract visual features, weighted embeddings aggregate, and recurrent neural parsing encodes sentence structures. Through fine-grained semantic alignment driven by adaptive fusion and cross-attention mechanisms, the system can handle operational queries, instruction steps, and anomaly detection with higher reliability. Compared to the existing VLQA benchmarks, validation experiments conducted on the IVQA and RIF benchmarks indicate improvements in semantic alignment, Top-1 accuracy, and robustness to ambiguous or procedural task queries. Ablation studies further quantify the impact of each architectural module, confirming the necessity of multi-level feature integration and context-driven gating for dependable industrial deployment. The technical advancements reported here provide core methodologies to improve the interpretability and operational effectiveness of industrial robots faced with diverse human-robot interaction tasks.

preprint2025arXiv

Tidally Trapped Two-pole Pulsations Discovered in a Close Binary with a Massive $β$ Cephei Star

Tidally tilted pulsators (TTPs), whose pulsation axis aligns with the binary's semi-major axis, represent a newly established class of oscillators in binary systems. While all previously known TTPs are either $δ$ Scuti or subdwarf B-type stars, their existence has remained unidentified in more massive $β$ Cephei variables. Here, we report the discovery of tidally trapped pulsations in the massive ellipsoidal variable HD~329379, based on photometry from the Transiting Exoplanet Survey Satellite (TESS). Our analysis reveals a $β$ Cephei pulsator in a 2.25-day orbit whose pulsation mode amplitude is strongly modulated with the orbital frequency. Based on our analysis, we concluded that this modulation can be explained by pulsations with significantly larger amplitude near the star's two tidal poles (apsides). We interpret this as a tidally distorted quadrupole pulsation chariacteristiced by trapped two-pole pulsations, with a pulsation axis aligned with the tidal axis. This represents the first identification of such a pulsation mode in a $β$ Cephei star, which differs from single-sided pulsations observed in previous works, marking a rare and important discovery. Our work extends the family of TTPs beyond $δ$ Scuti and subdwarf B-type stars to include more massive $β$ Cephei variables. In particular, the two-pole pulsator HD~329379 stands out as the prototype of a new class of TTPs in massive stars. These results not only provide a new insight to probe the interior structure and evolutionary state for massive stars but also offer a unique opportunity to study the interaction between pulsations and strong tidal distortions.

preprint2024arXiv

Origin of zigzag antiferromagnetic orders in XPS3 (X= Fe, Ni) monolayers

Recently, two monolayer magnetic materials, i.e., FePS3 and NiPS3, have been successfully fabricated. Despite that they have the same atomic structure, the two monolayers exhibit distinct magnetic properties. FePS3 holds an out-of-plane zigzag antiferromagnetic (AFM-ZZ) structure, while NiPS3 exhibits an in-plane AFM-ZZ structure. However, there is no theoretical model which can properly describe its magnetic ground state due to the lack of a full understanding of its magnetic interactions. Here, by combining the first-principles calculations and the newly developed machine learning method, we construct an exact spin Hamiltonian of the two magnetic materials. Different from the previous studies which failed to fully consider the spin-orbit coupling effect, we find that the AFM-ZZ ground state in FePS3 is stabilized by competing ferromagnetic nearest-neighbor and antiferromagnetic third nearest-neighbor exchange interactions, and combining single-ion anisotropy. Whereas, the often ignored nearest-neighbor biquadratic exchange is responsible for the in-plane AFM-ZZ ground state in NiPS3. We additionally calculate spin-wave spectrum of AFM-ZZ structure in the two monolayers based on the exact spin Hamiltonian, which can be directly verified by the experimental investigation. Our work provides a theoretical framework for the origin of AFM-ZZ ground state in two-dimensional materials.

preprint2023arXiv

Defending Backdoor Attacks on Vision Transformer via Patch Processing

Vision Transformers (ViTs) have a radically different architecture with significantly less inductive bias than Convolutional Neural Networks. Along with the improvement in performance, security and robustness of ViTs are also of great importance to study. In contrast to many recent works that exploit the robustness of ViTs against adversarial examples, this paper investigates a representative causative attack, i.e., backdoor. We first examine the vulnerability of ViTs against various backdoor attacks and find that ViTs are also quite vulnerable to existing attacks. However, we observe that the clean-data accuracy and backdoor attack success rate of ViTs respond distinctively to patch transformations before the positional encoding. Then, based on this finding, we propose an effective method for ViTs to defend both patch-based and blending-based trigger backdoor attacks via patch processing. The performances are evaluated on several benchmark datasets, including CIFAR10, GTSRB, and TinyImageNet, which show the proposed novel defense is very successful in mitigating backdoor attacks for ViTs. To the best of our knowledge, this paper presents the first defensive strategy that utilizes a unique characteristic of ViTs against backdoor attacks. The paper will appear in the Proceedings of the AAAI'23 Conference. This work was initially submitted in November 2021 to CVPR'22, then it was re-submitted to ECCV'22. The paper was made public in June 2022. The authors sincerely thank all the referees from the Program Committees of CVPR'22, ECCV'22, and AAAI'23.

preprint2023arXiv

Exploration in Model-based Reinforcement Learning with Randomized Reward

Model-based Reinforcement Learning (MBRL) has been widely adapted due to its sample efficiency. However, existing worst-case regret analysis typically requires optimistic planning, which is not realistic in general. In contrast, motivated by the theory, empirical study utilizes ensemble of models, which achieve state-of-the-art performance on various testing environments. Such deviation between theory and empirical study leads us to question whether randomized model ensemble guarantee optimism, and hence the optimal worst-case regret? This paper partially answers such question from the perspective of reward randomization, a scarcely explored direction of exploration with MBRL. We show that under the kernelized linear regulator (KNR) model, reward randomization guarantees a partial optimism, which further yields a near-optimal worst-case regret in terms of the number of interactions. We further extend our theory to generalized function approximation and identified conditions for reward randomization to attain provably efficient exploration. Correspondingly, we propose concrete examples of efficient reward randomization. To the best of our knowledge, our analysis establishes the first worst-case regret analysis on randomized MBRL with function approximation.

preprint2023arXiv

Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation

The stochastic proximal point (SPP) methods have gained recent attention for stochastic optimization, with strong convergence guarantees and superior robustness to the classic stochastic gradient descent (SGD) methods showcased at little to no cost of computational overhead added. In this article, we study a minibatch variant of SPP, namely M-SPP, for solving convex composite risk minimization problems. The core contribution is a set of novel excess risk bounds of M-SPP derived through the lens of algorithmic stability theory. Particularly under smoothness and quadratic growth conditions, we show that M-SPP with minibatch-size $n$ and iteration count $T$ enjoys an in-expectation fast rate of convergence consisting of an $\mathcal{O}\left(\frac{1}{T^2}\right)$ bias decaying term and an $\mathcal{O}\left(\frac{1}{nT}\right)$ variance decaying term. In the small-$n$-large-$T$ setting, this result substantially improves the best known results of SPP-type approaches by revealing the impact of noise level of model on convergence rate. In the complementary small-$T$-large-$n$ regime, we provide a two-phase extension of M-SPP to achieve comparable convergence rates. Moreover, we derive a near-tight high probability (over the randomness of data) bound on the parameter estimation error of a sampling-without-replacement variant of M-SPP. Numerical evidences are provided to support our theoretical predictions when substantialized to Lasso and logistic regression models.

preprint2022arXiv

$k$-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy

When designing clustering algorithms, the choice of initial centers is crucial for the quality of the learned clusters. In this paper, we develop a new initialization scheme, called HST initialization, for the $k$-median problem in the general metric space (e.g., discrete space induced by graphs), based on the construction of metric embedding tree structure of the data. From the tree, we propose a novel and efficient search algorithm, for good initial centers that can be used subsequently for the local search algorithm. Our proposed HST initialization can produce initial centers achieving lower errors than those from another popular initialization method, $k$-median++, with comparable efficiency. The HST initialization can also be extended to the setting of differential privacy (DP) to generate private initial centers. We show that the error from applying DP local search followed by our private HST initialization improves previous results on the approximation error, and approaches the lower bound within a small factor. Experiments justify the theory and demonstrate the effectiveness of our proposed method. Our approach can also be extended to the $k$-means problem.

preprint2022arXiv

A Class of Two-Timescale Stochastic EM Algorithms for Nonconvex Latent Variable Models

The Expectation-Maximization (EM) algorithm is a popular choice for learning latent variable models. Variants of the EM have been initially introduced, using incremental updates to scale to large datasets, and using Monte Carlo (MC) approximations to bypass the intractable conditional expectation of the latent data for most nonconvex models. In this paper, we propose a general class of methods called Two-Timescale EM Methods based on a two-stage approach of stochastic updates to tackle an essential nonconvex optimization task for latent variable models. We motivate the choice of a double dynamic by invoking the variance reduction virtue of each stage of the method on both sources of noise: the index sampling for the incremental update and the MC approximation. We establish finite-time and global convergence bounds for nonconvex objective functions. Numerical applications on various models such as deformable template for image analysis or nonlinear models for pharmacokinetics are also presented to illustrate our findings.

preprint2022arXiv

A Tale of Two Flows: Cooperative Learning of Langevin Flow and Normalizing Flow Toward Energy-Based Model

This paper studies the cooperative learning of two generative flow models, in which the two models are iteratively updated based on the jointly synthesized examples. The first flow model is a normalizing flow that transforms an initial simple density to a target density by applying a sequence of invertible transformations. The second flow model is a Langevin flow that runs finite steps of gradient-based MCMC toward an energy-based model. We start from proposing a generative framework that trains an energy-based model with a normalizing flow as an amortized sampler to initialize the MCMC chains of the energy-based model. In each learning iteration, we generate synthesized examples by using a normalizing flow initialization followed by a short-run Langevin flow revision toward the current energy-based model. Then we treat the synthesized examples as fair samples from the energy-based model and update the model parameters with the maximum likelihood learning gradient, while the normalizing flow directly learns from the synthesized examples by maximizing the tractable likelihood. Under the short-run non-mixing MCMC scenario, the estimation of the energy-based model is shown to follow the perturbation of maximum likelihood, and the short-run Langevin flow and the normalizing flow form a two-flow generator that we call CoopFlow. We provide an understating of the CoopFlow algorithm by information geometry and show that it is a valid generator as it converges to a moment matching estimator. We demonstrate that the trained CoopFlow is capable of synthesizing realistic images, reconstructing images, and interpolating between images.

preprint2022arXiv

Best Subset Selection with Efficient Primal-Dual Algorithm

Best subset selection is considered the `gold standard' for many sparse learning problems. A variety of optimization techniques have been proposed to attack this non-convex and NP-hard problem. In this paper, we investigate the dual forms of a family of $\ell_0$-regularized problems. An efficient primal-dual method has been developed based on the primal and dual problem structures. By leveraging the dual range estimation along with the incremental strategy, our algorithm potentially reduces redundant computation and improves the solutions of best subset selection. Theoretical analysis and experiments on synthetic and real-world datasets validate the efficiency and statistical properties of the proposed solutions.

preprint2022arXiv

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images

Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection. We first devise a depth decoupling convolutional neural network (DDCNN), which contains a depth estimation branch and a saliency detection branch. The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data. The saliency detection branch is used to fuse the RGB feature and depth feature to predict the RGB-D saliency. Then, the whole DDCNN is assigned as the backbone in a teacher-student framework for semi-supervised learning. Moreover, we also introduce a consistency loss on the intermediate attention and saliency maps for the unlabeled data, as well as a supervised depth and saliency loss for labeled data. Experimental results on seven widely-used benchmark datasets demonstrate that our DDCNN outperforms state-of-the-art methods both quantitatively and qualitatively. We also demonstrate that our semi-supervised DS-Net can further improve the performance, even when using an RGB image with the pseudo depth map.

preprint2022arXiv

Boosting the Confidence of Generalization for $L_2$-Stable Randomized Learning Algorithms

Exponential generalization bounds with near-tight rates have recently been established for uniformly stable learning algorithms. The notion of uniform stability, however, is stringent in the sense that it is invariant to the data-generating distribution. Under the weaker and distribution dependent notions of stability such as hypothesis stability and $L_2$-stability, the literature suggests that only polynomial generalization bounds are possible in general cases. The present paper addresses this long standing tension between these two regimes of results and makes progress towards relaxing it inside a classic framework of confidence-boosting. To this end, we first establish an in-expectation first moment generalization error bound for potentially randomized learning algorithms with $L_2$-stability, based on which we then show that a properly designed subbagging process leads to near-tight exponential generalization bounds over the randomness of both data and algorithm. We further substantialize these generic results to stochastic gradient descent (SGD) to derive improved high-probability generalization bounds for convex or non-convex optimization problems with natural time decaying learning rates, which have not been possible to prove with the existing hypothesis stability or uniform stability based results.

preprint2022arXiv

Breaking the Linear Error Barrier in Differentially Private Graph Distance Release

Releasing all pairwise shortest path (APSP) distances between vertices on general graphs under weight Differential Privacy (DP) is known as a challenging task. In the previous attempt of (Sealfon 2016}, by adding Laplace noise to each edge weight or to each output distance, to achieve DP with some fixed budget, with high probability the maximal absolute error among all published pairwise distances is roughly $O(n)$ where $n$ is the number of nodes. It was shown that this error could be reduced for some special graphs, which, however, is hard for general graphs. Therefore, whether the approximation error can be reduced to sublinear in $n$ is posted as an interesting open problem. We break the linear barrier on the distance approximation error of previous result, by proposing an algorithm that releases a constructed synthetic graph privately. Computing all pairwise distances on the constructed graph only introduces $\tilde O(n^{1/2})$ error in answering all pairwise shortest path distances for fixed privacy parameter. Our method is based on a novel graph diameter (link length) augmentation via constructing "shortcuts" for the paths. By adding a set of shortcut edges to the original graph, we show that any node pair has a shortest path with link length $\tilde O(n^{1/2})$. Then by adding noises with some positive mean to the edge weights, we show that the new graph is differentially private and can be published to answer all pairwise shortest path distances with $\tilde O(n^{1/2})$ approximation error using standard APSP computation. Additionally, we consider the graph with small feedback vertex set number. A feedback vertex set (FVS) of a graph is a set of vertices whose removal leaves a graph without cycles, and the feedback vertex set number of a graph, $k$, is the size of a smallest feedback vertex set. We propose a DP algorithm with error rate $\tilde O(k)$.

preprint2022arXiv

Catoni-style Confidence Sequences under Infinite Variance

In this paper, we provide an extension of confidence sequences for settings where the variance of the data-generating distribution does not exist or is infinite. Confidence sequences furnish confidence intervals that are valid at arbitrary data-dependent stopping times, naturally having a wide range of applications. We first establish a lower bound for the width of the Catoni-style confidence sequences for the finite variance case to highlight the looseness of the existing results. Next, we derive tight Catoni-style confidence sequences for data distributions having a relaxed bounded~$p^{th}-$moment, where~$p \in (1,2]$, and strengthen the results for the finite variance case of~$p =2$. The derived results are shown to better than confidence sequences obtained using Dubins-Savage inequality.

preprint2022arXiv

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

Training a game-playing reinforcement learning agent requires multiple interactions with the environment. Ignorant random exploration may cause a waste of time and resources. It's essential to alleviate such waste. As discussed in this paper, under the settings of the off-policy actor critic algorithms, we demonstrate that the critic can bring more expected discounted rewards than or at least equal to the actor. Thus, the Q value predicted by the critic is a better signal to redistribute the action originally sampled from the policy distribution predicted by the actor. This paper introduces the novel Critic Guided Action Redistribution (CGAR) algorithm and tests it on the OpenAI MuJoCo tasks. The experimental results demonstrate that our method improves the sample efficiency and achieves state-of-the-art performance. Our code can be found at https://github.com/tairanhuang/CGAR.

preprint2022arXiv

Channel Estimation for Wideband MmWave MIMO OFDM System Exploiting Block Sparsity

In this letter, we investigate time-domain channel estimation for wideband millimeter wave (mmWave) MIMO OFDM system. By transmitting frequency-domain pilot symbols as well as different beamforming vectors, we observe that the time-domain mmWave MIMO channels exhibit channel delay sparsity and especially block sparsity among different spatial directions. Then we propose a time-domain channel estimation exploiting block sparsity (TDCEBS) scheme, which always aims at finding the best nonzero block achieving the largest projection of the residue at each iterations. In particular, we evaluate the system performance using the QuaDRiGa which is recommended by 5G New Radio to generate wideband mmWave MIMO channels. The effectiveness of the proposed TDCEBS scheme is verified by the simulation results, as the proposed scheme outperforms the existing schemes.

preprint2022arXiv

Communication-Efficient TeraByte-Scale Model Training Framework for Online Advertising

Click-Through Rate (CTR) prediction is a crucial component in the online advertising industry. In order to produce a personalized CTR prediction, an industry-level CTR prediction model commonly takes a high-dimensional (e.g., 100 or 1000 billions of features) sparse vector (that is encoded from query keywords, user portraits, etc.) as input. As a result, the model requires Terabyte scale parameters to embed the high-dimensional input. Hierarchical distributed GPU parameter server has been proposed to enable GPU with limited memory to train the massive network by leveraging CPU main memory and SSDs as secondary storage. We identify two major challenges in the existing GPU training framework for massive-scale ad models and propose a collection of optimizations to tackle these challenges: (a) the GPU, CPU, SSD rapidly communicate with each other during the training. The connections between GPUs and CPUs are non-uniform due to the hardware topology. The data communication route should be optimized according to the hardware topology; (b) GPUs in different computing nodes frequently communicates to synchronize parameters. We are required to optimize the communications so that the distributed system can become scalable. In this paper, we propose a hardware-aware training workflow that couples the hardware topology into the algorithm design. To reduce the extensive communication between computing nodes, we introduce a $k$-step model merging algorithm for the popular Adam optimizer and provide its convergence rate in non-convex optimization. To the best of our knowledge, this is the first application of $k$-step adaptive optimization method in industrial-level CTR model training. The numerical results on real-world data confirm that the optimized system design considerably reduces the training time of the massive model, with essentially no loss in accuracy.

preprint2022arXiv

Discriminative Similarity for Data Clustering

Similarity-based clustering methods separate data into clusters according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose {\em Clustering by Discriminative Similarity (CDS)}, a novel method which learns discriminative similarity for data clustering. CDS learns an unsupervised similarity-based classifier from each data partition, and searches for the optimal partition of the data by minimizing the generalization error of the learnt classifiers associated with the data partitions. By generalization analysis via Rademacher complexity, the generalization error bound for the unsupervised similarity-based classifier is expressed as the sum of discriminative similarity between the data from different classes. It is proved that the derived discriminative similarity can also be induced by the integrated squared error bound for kernel density classification. In order to evaluate the performance of the proposed discriminative similarity, we propose a new clustering method using a kernel as the similarity function, CDS via unsupervised kernel classification (CDSK), with its effectiveness demonstrated by experimental results.

preprint2022arXiv

Distances Release with Differential Privacy in Tree and Grid Graph

Data about individuals may contain private and sensitive information. The differential privacy (DP) was proposed to address the problem of protecting the privacy of each individual while keeping useful information about a population. Sealfon (2016) introduced a private graph model in which the graph topology is assumed to be public while the weight information is assumed to be private. That model can express hidden congestion patterns in a known transportation system. In this paper, we revisit the problem of privately releasing approximate distances between all pairs of vertices in (Sealfon 2016). Our goal is to minimize the additive error, namely the difference between the released distance and actual distance under private setting. We propose improved solutions to that problem for several cases. For the problem of privately releasing all-pairs distances, we show that for tree with depth $h$, we can release all-pairs distances with additive error $O(\log^{1.5} h \cdot \log^{1.5} V)$ for fixed privacy parameter where $V$ the number of vertices in the tree, which improves the previous error bound $O(\log^{2.5} V)$, since the size of $h$ can be as small as $O(\log V)$. Our result implies that a $\log V$ factor is saved, and the additive error in tree can be smaller than the error on array/path. Additionally, for the grid graph with arbitrary edge weights, we also propose a method to release all-pairs distances with additive error $\tilde O(V^{3/4}) $ for fixed privacy parameters. On the application side, many cities like Manhattan are composed of horizontal streets and vertical avenues, which can be modeled as a grid graph.

preprint2022arXiv

Fast ABC-Boost: A Unified Framework for Selecting the Base Class in Multi-Class Classification

The work in ICML'09 showed that the derivatives of the classical multi-class logistic regression loss function could be re-written in terms of a pre-chosen "base class" and applied the new derivatives in the popular boosting framework. In order to make use of the new derivatives, one must have a strategy to identify/choose the base class at each boosting iteration. The idea of "adaptive base class boost" (ABC-Boost) in ICML'09, adopted a computationally expensive "exhaustive search" strategy for the base class at each iteration. It has been well demonstrated that ABC-Boost, when integrated with trees, can achieve substantial improvements in many multi-class classification tasks. Furthermore, the work in UAI'10 derived the explicit second-order tree split gain formula which typically improved the classification accuracy considerably, compared with using only the fist-order information for tree-splitting, for both multi-class and binary-class classification tasks. In this paper, we develop a unified framework for effectively selecting the base class by introducing a series of ideas to improve the computational efficiency of ABC-Boost. Our framework has parameters $(s,g,w)$. At each boosting iteration, we only search for the "$s$-worst classes" (instead of all classes) to determine the base class. We also allow a "gap" $g$ when conducting the search. That is, we only search for the base class at every $g+1$ iterations. We furthermore allow a "warm up" stage by only starting the search after $w$ boosting iterations. The parameters $s$, $g$, $w$, can be viewed as tunable parameters and certain combinations of $(s,g,w)$ may even lead to better test accuracy than the "exhaustive search" strategy. Overall, our proposed framework provides a robust and reliable scheme for implementing ABC-Boost in practice.

preprint2022arXiv

Finding Control Synthesis for Kinematic Shortest Paths

This work presents the analysis of the properties of the shortest path control synthesis for the rigid body system. The systems we focus on in this work have only kinematic constraints. However, even for seemingly simple systems and constraints, the shortest paths for generic rigid body systems were only found recently, especially for 3D systems. Based on the Pontraygon's Maximum Principle (MPM) and Lagrange equations, we present the necessary conditions for optimal switches, which form the control synthesis boundaries. We formally show that the shortest path for nearby configurations will have similar adjoint functions and parameters, i.e., Lagrange multipliers. We further show that the gradients of the necessary condition equation can be used to verify whether a configuration is inside a control synthesis region or on the boundary. We present a procedure to find the shortest kinematic paths and control synthesis, using the gradients of the control constraints. Given the shortest path and the corresponding control sequences, the optimal control sequence for nearby configurations can be derived if and only if they belong to the same control synthesis region. The proposed procedure can work for both 2D and 3D rigid body systems. We use a 2D Dubins vehicle system to verify the correctness of the proposed approach. More verifications and experiments will be presented in the extensions of this work.

preprint2022arXiv

From a few Accurate 2D Correspondences to 3D Point Clouds

Key points, correspondences, projection matrices, point clouds and dense clouds are the skeletons in image-based 3D reconstruction, of which point clouds have the important role in generating a realistic and natural model for a 3D reconstructed object. To achieve a good 3D reconstruction, the point clouds must be almost everywhere in the surface of the object. In this article, with a main purpose to build the point clouds covering the entire surface of the object, we propose a new feature named a geodesic feature or geo-feature. Based on the new geo-feature, if there are several (given) initial world points on the object's surface along with all accurately estimated projection matrices, some new world points on the geodesics connecting any two of these given world points will be reconstructed. Then the regions on the surface bordering by these initial world points will be covered by the point clouds. Thus, if the initial world points are around the surface, the point clouds will cover the entire surface. This article proposes a new method to estimate the world points and projection matrices from their correspondences. This method derives the closed-form and iterative solutions for the world points and projection matrices and proves that when the number of world points is less than seven and the number of images is at least five, the proposed solutions are global optimal. We propose an algorithm named World points from their Correspondences (WPfC) to estimate the world points and projection matrices from their correspondences, and another algorithm named Creating Point Clouds (CrPC) to create the point clouds from the world points and projection matrices given by the first algorithm.

preprint2022arXiv

GCWSNet: Generalized Consistent Weighted Sampling for Scalable and Accurate Training of Neural Networks

We develop the "generalized consistent weighted sampling" (GCWS) for hashing the "powered-GMM" (pGMM) kernel (with a tuning parameter $p$). It turns out that GCWS provides a numerically stable scheme for applying power transformation on the original data, regardless of the magnitude of $p$ and the data. The power transformation is often effective for boosting the performance, in many cases considerably so. We feed the hashed data to neural networks on a variety of public classification datasets and name our method ``GCWSNet''. Our extensive experiments show that GCWSNet often improves the classification accuracy. Furthermore, it is evident from the experiments that GCWSNet converges substantially faster. In fact, GCWS often reaches a reasonable accuracy with merely (less than) one epoch of the training process. This property is much desired because many applications, such as advertisement click-through rate (CTR) prediction models, or data streams (i.e., data seen only once), often train just one epoch. Another beneficial side effect is that the computations of the first layer of the neural networks become additions instead of multiplications because the input data become binary (and highly sparse). Empirical comparisons with (normalized) random Fourier features (NRFF) are provided. We also propose to reduce the model size of GCWSNet by count-sketch and develop the theory for analyzing the impact of using count-sketch on the accuracy of GCWS. Our analysis shows that an ``8-bit'' strategy should work well in that we can always apply an 8-bit count-sketch hashing on the output of GCWS hashing without hurting the accuracy much. There are many other ways to take advantage of GCWS when training deep neural networks. For example, one can apply GCWS on the outputs of the last layer to boost the accuracy of trained deep neural networks.

preprint2022arXiv

Integrity Authentication in Tree Models

Tree models are very widely used in practice of machine learning and data mining. In this paper, we study the problem of model integrity authentication in tree models. In general, the task of model integrity authentication is the design \& implementation of mechanisms for checking/detecting whether the model deployed for the end-users has been tampered with or compromised, e.g., malicious modifications on the model. We propose an authentication framework that enables the model builders/distributors to embed a signature to the tree model and authenticate the existence of the signature by only making a small number of black-box queries to the model. To the best of our knowledge, this is the first study of signature embedding on tree models. Our proposed method simply locates a collection of leaves and modifies their prediction values, which does not require any training/testing data nor any re-training. The experiments on a large number of public classification datasets confirm that the proposed signature embedding process has a high success rate while only introducing a minimal prediction accuracy loss.

preprint2022arXiv

Intriguing Magnetoelectric Effect in Two-dimensional Ferromagnetic/Perovskite Oxide Ferroelectric Heterostructure

Two-dimensional (2D) magnets have broad application prospects in the spintronics, but how to effectively control them with a small electric field is still an issue. Here we propose that 2D magnets can be efficiently controlled in a multiferroic heterostructure composed of 2D magnetic material and perovskite oxide ferroelectric (POF) whose dielectric polarization is easily flipped under a small electric field. We illustrate the feasibility of such strategy in the bilayer CrI3/BiFeO3(001) heterostructure by using the first-principles calculations. Different from the traditional POF multiferroic heterostructures which have strong interface interactions, we find that the interface interaction between CrI3 and BiFeO3(001) is van der Waals type. Whereas, the heterostructure has particular strong magnetoelectric coupling where the bilayer CrI3 can be efficiently switched between ferromagnetic and antiferromagnetic types by the polarized states of BiFeO3(001). We also discover the competing effect between electron doping and the additional electric field on the interlayer exchange coupling interaction of CrI3, which is responsible to the magnetic phase transition. Our results provide a new avenue for the tuning of 2D magnets with a small electric field.

preprint2022arXiv

Intrinsic anomalous spin Hall effect

Charge-spin interconversion in magnetic materials is investigated by using first-principles calculations. In addition to the conventional spin Hall effect (SHE) that requires mutual orthogonality of the charge current, spin-flow direction, and spin polarization, the recently proposed anomalous SHE (ASHE) is confirmed in Mn2Au and WTe2. The interaction of the order parameter with conduction electrons leads to sizeable nonzero spin Berry curvatures that give rise to nonzero anomalous spin Hall conductivity (ASHC). Our calculations show that the ASHE is intrinsic and originates from the order-parameter-controlled spin-orbit interaction, which generates an extra anomalous effective field. A useful relationship among the order parameter, spin Berry curvature and ASHC is revealed. Our findings open a new avenue for arbitrary-type spin current generation and detection.

preprint2022arXiv

Layer-wise and Dimension-wise Locally Adaptive Federated Learning

In the emerging paradigm of Federated Learning (FL), large amount of clients such as mobile devices are used to train possibly high-dimensional models on their respective data. Combining (dimension-wise) adaptive gradient methods (e.g. Adam, AMSGrad) with FL has been an active direction, which is shown to outperform traditional SGD based FL in many cases. In this paper, we focus on the problem of training federated deep neural networks, and propose a novel FL framework which further introduces layer-wise adaptivity to the local model updates. Our framework can be applied to locally adaptive FL methods including two recent algorithms, Mime and Fed-AMS. Theoretically, we provide a convergence analysis of our layer-wise FL methods, coined Fed-LAMB and Mime-LAMB, which matches the convergence rate of state-of-the-art results in FL and exhibits linear speedup in terms of the number of workers. Experimental results on various datasets and models, under both IID and non-IID local data settings, show that both Fed-LAMB and Mime-LAMB achieve faster convergence speed and better generalization performance, compared to the various recent adaptive FL methods.

preprint2022arXiv

Low-dose CT reconstruction by self-supervised learning in the projection domain

In the intention of minimizing excessive X-ray radiation administration to patients, low-dose computed tomography (LDCT) has become a distinct trend in radiology. However, while lowering the radiation dose reduces the risk to the patient, it also increases noise and artifacts, compromising image quality and clinical diagnosis. In most supervised learning methods, paired CT images are required, but such images are unlikely to be available in the clinic. We present a self-supervised learning model (Noise2Projection) that fully exploits the raw projection images to reduce noise and improve the quality of reconstructed LDCT images. Unlike existing self-supervised algorithms, the proposed method only requires noisy CT projection images and reduces noise by exploiting the correlation between nearby projection images. We trained and tested the model using clinical data and the quantitative and qualitative results suggest that our model can effectively reduce LDCT image noise while also drastically removing artifacts in LDCT images.

preprint2022arXiv

MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image Inpainting

Although achieving significant progress, existing deep generative inpainting methods are far from real-world applications due to the low generalization across different scenes. As a result, the generated images usually contain artifacts or the filled pixels differ greatly from the ground truth. Image-level predictive filtering is a widely used image restoration technique, predicting suitable kernels adaptively according to different input scenes. Inspired by this inherent advantage, we explore the possibility of addressing image inpainting as a filtering task. To this end, we first study the advantages and challenges of image-level predictive filtering for image inpainting: the method can preserve local structures and avoid artifacts but fails to fill large missing areas. Then, we propose semantic filtering by conducting filtering on the deep feature level, which fills the missing semantic information but fails to recover the details. To address the issues while adopting the respective advantages, we propose a novel filtering technique, i.e., Multilevel Interactive Siamese Filtering (MISF), which contains two branches: kernel prediction branch (KPB) and semantic & image filtering branch (SIFB). These two branches are interactively linked: SIFB provides multi-level features for KPB while KPB predicts dynamic kernels for SIFB. As a result, the final method takes the advantage of effective semantic & image-level filling for high-fidelity inpainting. We validate our method on three challenging datasets, i.e., Dunhuang, Places2, and CelebA. Our method outperforms state-of-the-art baselines on four metrics, i.e., L1, PSNR, SSIM, and LPIPS. Please try the released code and model at https://github.com/tsingqguo/misf.

preprint2022arXiv

Multi-view Geometry: Correspondences Refinement Based on Algebraic Properties

Correspondences estimation or feature matching is a key step in the image-based 3D reconstruction problem. In this paper, we propose two algebraic properties for correspondences. The first is a rank deficient matrix construct from the correspondences of at least nine key-points on two images (two-view correspondences) and the second is also another rank deficient matrix built from the other correspondences of six key-points on at least five images (multi-view correspondences). To our knowledge, there are no theoretical results for multi-view correspondences prior to this paper. To obtain accurate correspondences, multi-view correspondences seem to be more useful than two-view correspondences. From these two algebraic properties, we propose an refinement algorithm for correspondences. This algorithm is a combination of correspondences refinement, outliers recognition and missing key-points recovery. Real experiments from the project of reconstructing Buddha statue show that the proposed refinement algorithm can reduce the average error from 77 pixels to 55 pixels on the correspondences estimation. This drop is substantial and it validates our results.

preprint2022arXiv

NL2GDPR: Automatically Develop GDPR Compliant Android Application Features from Natural Language

The recent privacy leakage incidences and the more strict policy regulations demand a much higher standard of compliance for companies and mobile apps. However, such obligations also impose significant challenges on app developers for complying with these regulations that contain various perspectives, activities, and roles, especially for small companies and developers who are less experienced in this matter or with limited resources. To address these hurdles, we develop an automatic tool, NL2GDPR, which can generate policies from natural language descriptions from the developer while also ensuring the app's functionalities are compliant with General Data Protection Regulation (GDPR). NL2GDPR is developed by leveraging an information extraction tool, OIA (Open Information Annotation), developed by Baidu Cognitive Computing Lab. At the core, NL2GDPR is a privacy-centric information extraction model, appended with a GDPR policy finder and a policy generator. We perform a comprehensive study to grasp the challenges in extracting privacy-centric information and generating privacy policies, while exploiting optimizations for this specific task. With NL2GDPR, we can achieve 92.9%, 95.2%, and 98.4% accuracy in correctly identifying GDPR policies related to personal data storage, process, and share types, respectively. To the best of our knowledge, NL2GDPR is the first tool that allows a developer to automatically generate GDPR compliant policies, with only the need of entering the natural language for describing the app features. Note that other non-GDPR-related features might be integrated with the generated features to build a complex app.

preprint2022arXiv

Noisy $\ell^{0}$-Sparse Subspace Clustering on Dimensionality Reduced Data

Sparse subspace clustering methods with sparsity induced by $\ell^{0}$-norm, such as $\ell^{0}$-Sparse Subspace Clustering ($\ell^{0}$-SSC)~\citep{YangFJYH16-L0SSC-ijcv}, are demonstrated to be more effective than its $\ell^{1}$ counterpart such as Sparse Subspace Clustering (SSC)~\citep{ElhamifarV13}. However, the theoretical analysis of $\ell^{0}$-SSC is restricted to clean data that lie exactly in subspaces. Real data often suffer from noise and they may lie close to subspaces. In this paper, we show that an optimal solution to the optimization problem of noisy $\ell^{0}$-SSC achieves subspace detection property (SDP), a key element with which data from different subspaces are separated, under deterministic and semi-random model. Our results provide theoretical guarantee on the correctness of noisy $\ell^{0}$-SSC in terms of SDP on noisy data for the first time, which reveals the advantage of noisy $\ell^{0}$-SSC in terms of much less restrictive condition on subspace affinity. In order to improve the efficiency of noisy $\ell^{0}$-SSC, we propose Noisy-DR-$\ell^{0}$-SSC which provably recovers the subspaces on dimensionality reduced data. Noisy-DR-$\ell^{0}$-SSC first projects the data onto a lower dimensional space by random projection, then performs noisy $\ell^{0}$-SSC on the projected data for improved efficiency. Experimental results demonstrate the effectiveness of Noisy-DR-$\ell^{0}$-SSC.

preprint2022arXiv

Offline Change Detection under Contamination

In this work, we propose a non-parametric and robust change detection algorithm to detect multiple change points in time series data under contamination. The contamination model is sufficiently general, in that, the most common model used in the context of change detection -- Huber contamination model -- is a special case. Also, the contamination model is oblivious and arbitrary. The change detection algorithm is designed for the offline setting, where the objective is to detect changes when all data are received. We only make weak moment assumptions on the inliers (uncorrupted data) to handle a large class of distributions. The robust scan statistic in the algorithm is fashioned using mean estimators based on influence functions. We establish the consistency of the estimated change point indexes as the number of samples increases, and provide empirical evidence to support the consistency results.

preprint2022arXiv

On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond

The FedProx algorithm is a simple yet powerful distributed proximal point optimization method widely used for federated learning (FL) over heterogeneous data. Despite its popularity and remarkable success witnessed in practice, the theoretical understanding of FedProx is largely underinvestigated: the appealing convergence behavior of FedProx is so far characterized under certain non-standard and unrealistic dissimilarity assumptions of local functions, and the results are limited to smooth optimization problems. In order to remedy these deficiencies, we develop a novel local dissimilarity invariant convergence theory for FedProx and its minibatch stochastic extension through the lens of algorithmic stability. As a result, we contribute to derive several new and deeper insights into FedProx for non-convex federated optimization including: 1) convergence guarantees independent on local dissimilarity type conditions; 2) convergence guarantees for non-smooth FL problems; and 3) linear speedup with respect to size of minibatch and number of sampled devices. Our theory for the first time reveals that local dissimilarity and smoothness are not must-have for FedProx to get favorable complexity bounds. Preliminary experimental results on a series of benchmark FL datasets are reported to demonstrate the benefit of minibatching for improving the sample efficiency of FedProx.

preprint2022arXiv

On Distributed Adaptive Optimization with Gradient Compression

We study COMP-AMS, a distributed optimization framework based on gradient averaging and adaptive AMSGrad algorithm. Gradient compression with error feedback is applied to reduce the communication cost in the gradient transmission process. Our convergence analysis of COMP-AMS shows that such compressed gradient averaging strategy yields same convergence rate as standard AMSGrad, and also exhibits the linear speedup effect w.r.t. the number of local workers. Compared with recently proposed protocols on distributed adaptive methods, COMP-AMS is simple and convenient. Numerical experiments are conducted to justify the theoretical findings, and demonstrate that the proposed method can achieve same test accuracy as the full-gradient AMSGrad with substantial communication savings. With its simplicity and efficiency, COMP-AMS can serve as a useful distributed training framework for adaptive gradient methods.

preprint2022arXiv

On the Power-Law Hessian Spectrums in Deep Learning

It is well-known that the Hessian of deep loss landscape matters to optimization, generalization, and even robustness of deep learning. Recent works empirically discovered that the Hessian spectrum in deep learning has a two-component structure that consists of a small number of large eigenvalues and a large number of nearly-zero eigenvalues. However, the theoretical mechanism or the mathematical behind the Hessian spectrum is still largely under-explored. To the best of our knowledge, we are the first to demonstrate that the Hessian spectrums of well-trained deep neural networks exhibit simple power-law structures. Inspired by the statistical physical theories and the spectral analysis of natural proteins, we provide a maximum-entropy theoretical interpretation for explaining why the power-law structure exist and suggest a spectral parallel between protein evolution and training of deep neural networks. By conducing extensive experiments, we further use the power-law spectral framework as a useful tool to explore multiple novel behaviors of deep learning.

preprint2022arXiv

One Loss for Quantization: Deep Hashing with Discrete Wasserstein Distributional Matching

Image hashing is a principled approximate nearest neighbor approach to find similar items to a query in a large collection of images. Hashing aims to learn a binary-output function that maps an image to a binary vector. For optimal retrieval performance, producing balanced hash codes with low-quantization error to bridge the gap between the learning stage's continuous relaxation and the inference stage's discrete quantization is important. However, in the existing deep supervised hashing methods, coding balance and low-quantization error are difficult to achieve and involve several losses. We argue that this is because the existing quantization approaches in these methods are heuristically constructed and not effective to achieve these objectives. This paper considers an alternative approach to learning the quantization constraints. The task of learning balanced codes with low quantization error is re-formulated as matching the learned distribution of the continuous codes to a pre-defined discrete, uniform distribution. This is equivalent to minimizing the distance between two distributions. We then propose a computationally efficient distributional distance by leveraging the discrete property of the hash functions. This distributional distance is a valid distance and enjoys lower time and sample complexities. The proposed single-loss quantization objective can be integrated into any existing supervised hashing method to improve code balance and quantization error. Experiments confirm that the proposed approach substantially improves the performance of several representative hashing~methods.

preprint2022arXiv

Package for Fast ABC-Boost

This report presents the open-source package which implements the series of our boosting works in the past years. In particular, the package includes mainly three lines of techniques, among which the following two are already the standard implementations in popular boosted tree platforms: (i) The histogram-based (feature-binning) approach makes the tree implementation convenient and efficient. In Li et al (2007), a simple fixed-length adaptive binning algorithm was developed. In this report, we demonstrate that such a simple algorithm is still surprisingly effective compared to more sophisticated variants in popular tree platforms. (ii) The explicit gain formula in Li (20010) for tree splitting based on second-order derivatives of the loss function typically improves, often considerably, over the first-order methods. Although the gain formula in Li (2010) was derived for logistic regression loss, it is a generic formula for loss functions with second-derivatives. For example, the open-source package also includes $L_p$ regression for $p\geq 1$. The main contribution of this package is the ABC-Boost (adaptive base class boosting) for multi-class classification. The initial work in Li (2008) derived a new set of derivatives of the classical multi-class logistic regression by specifying a "base class". The accuracy can be substantially improved if the base class is chosen properly. The major technical challenge is to design a search strategy to select the base class. The prior published works implemented an exhaustive search procedure to find the base class which is computationally too expensive. Recently, a new report (Li and Zhao, 20022) presents a unified framework of "Fast ABC-Boost" which allows users to efficiently choose the proper search space for the base class. The package provides interfaces for linux, windows, mac, matlab, R, python.

preprint2022arXiv

pGMM Kernel Regression and Comparisons with Boosted Trees

In this work, we demonstrate the advantage of the pGMM (``powered generalized min-max'') kernel in the context of (ridge) regression. In recent prior studies, the pGMM kernel has been extensively evaluated for classification tasks, for logistic regression, support vector machines, as well as deep neural networks. In this paper, we provide an experimental study on ridge regression, to compare the pGMM kernel regression with the ordinary ridge linear regression as well as the RBF kernel ridge regression. Perhaps surprisingly, even without a tuning parameter (i.e., $p=1$ for the power parameter of the pGMM kernel), the pGMM kernel already performs well. Furthermore, by tuning the parameter $p$, this (deceptively simple) pGMM kernel even performs quite comparably to boosted trees. Boosting and boosted trees are very popular in machine learning practice. For regression tasks, typically, practitioners use $L_2$ boost, i.e., for minimizing the $L_2$ loss. Sometimes for the purpose of robustness, the $L_1$ boost might be a choice. In this study, we implement $L_p$ boost for $p\geq 1$ and include it in the package of ``Fast ABC-Boost''. Perhaps also surprisingly, the best performance (in terms of $L_2$ regression loss) is often attained at $p>2$, in some cases at $p\gg 2$. This phenomenon has already been demonstrated by Li et al (UAI 2010) in the context of k-nearest neighbor classification using $L_p$ distances. In summary, the implementation of $L_p$ boost provides practitioners the additional flexibility of tuning boosting algorithms for potentially achieving better accuracy in regression applications.

preprint2022arXiv

Planar Turán number of disjoint union of $C_3$ and $C_4$

The {\em planar Turán number} of $H$, denoted by $ex_{\mathcal{P}}(n,H)$, is the maximum number of edges in an $H$-free planar graph. The planar Turán number of $k\geq 3$ vertex-disjoint union of cycles is a trivial value $3n-6$. Lan, Shi and Song determine the exact value of $ex_{\mathcal{P}}(n,2C_3)$. We continue to study planar Turán number of vertex-disjoint union of cycles and obtain the exact value of $ex_{\mathcal{P}}(n,H)$, where $H$ is vertex-disjoint union of $C_3$ and $C_4$. The extremal graphs are also characterized. We also improve the lower bound of $ex_{\mathcal{P}}(n,2C_k)$ when $k$ is sufficiently large.

preprint2022arXiv

Planning and Formulations in Pursuit-Evasion: Keep-away Games and Their Strategies

We study a pursuit-evasion problem which can be viewed as an extension of the keep-away game. In the game, pursuer(s) will attempt to intersect or catch the evader, while the evader can visit a fixed set of locations, which we denote as the anchors. These anchors may or may not be stationary. When the velocity of the pursuers is limited and considered low compared to the evaders, we are interested in whether a winning strategy exists for the pursuers or the evaders, or the game will draw. When the anchors are stationary, we show an algorithm that can help answer the above question. The primary motivation for this study is to explore the boundaries between kinematic and dynamic constraints. In particular, whether the solution of the kinematic problem can be used to speed up the search for the problems with dynamic constraints and how to discretize the problem to utilize such relations best. In this work, we show that a geometric branch-and-bound type of approach can be used to solve the stationary anchor problem, and the approach and the solution can be extended to solve the dynamic problem where the pursuers have dynamic constraints, including velocity and acceleration bounds.

preprint2022arXiv

Planning through Workspace Constraint Satisfaction and Optimization

In this work, we present a workspace-based planning framework, which though using redundant workspace key-points to represent robot states, can take advantage of the interpretable geometric information to derive good quality collision-free paths for even complex robots. Using workspace geometries, we first find collision-free piece-wise linear paths for each key point so that at the endpoints of each segment, the distance constraints are satisfied among the key points. Using these piece-wise linear paths as initial conditions, we can perform optimization steps to quickly find paths that satisfy various constraints and piece together all segments to obtain a valid path. We show that these adjusted paths are unlikely to create a collision, and the proposed approach is fast and can produce good quality results.

preprint2022arXiv

Proximity Graph Maintenance for Fast Online Nearest Neighbor Search

Approximate Nearest Neighbor (ANN) search is a fundamental technique for (e.g.,) the deployment of recommender systems. Recent studies bring proximity graph-based methods into practitioners' attention -- proximity graph-based methods outperform other solutions such as quantization, hashing, and tree-based ANN algorithm families. In current recommendation systems, data point insertions, deletions, and queries are streamed into the system in an online fashion as users and items change dynamically. As proximity graphs are constructed incrementally by inserting data points as new vertices into the graph, online insertions and queries are well-supported in proximity graph. However, a data point deletion incurs removing a vertex from the proximity graph index, while no proper graph index updating mechanisms are discussed in previous studies. To tackle the challenge, we propose an incremental proximity graph maintenance (IPGM) algorithm for online ANN. IPGM supports both vertex deletion and insertion on proximity graphs. Given a vertex deletion request, we thoroughly investigate solutions to update the connections of the vertex. The proposed updating scheme eliminates the performance drop in online ANN methods on proximity graphs, making the algorithm suitable for practical systems.

preprint2022arXiv

Quantum Anomalous Hall and Valley Quantum Anomalous Hall Effects in Two-Dimensional d0 Orbital XY Monolayers

We propose a new family of the d0 orbital XY (X = K, Rb, Cs; Y = N, P, As, Sb, Bi) monolayers with abundant and novel topology and valley properties. The KN, RbN, RbP, RbAs, CsP, CsAs, and CsSb monolayers possess remarkable quantum anomalous Hall effect (QAHE). CsSb monolayer also exhibits extraordinary valley QAHE with giant splitting. Moreover, the topological properties of XY monolayers can be effciently tuned by the in-plane strain, owing to the strain-induced band inversion between the px;y and pz orbitals. Our findings suggest that the d0 orbital XY monolayers can be good candidates for promising applications in the spintronics and multifunctional topological-based devices.

preprint2022arXiv

Regression with Label Permutation in Generalized Linear Model

The assumption that response and predictor belong to the same statistical unit may be violated in practice. Unbiased estimation and recovery of true label ordering based on unlabeled data are challenging tasks and have attracted increasing attentions in the recent literature. In this paper, we present a relatively complete analysis of label permutation problem for the generalized linear model with multivariate responses. The theory is established under different scenarios, with knowledge of true parameters, with partial knowledge of underlying label permutation matrix and without any knowledge. Our results remove the stringent conditions required by the current literature and are further extended to the missing observation setting which has never been considered in the field of label permutation problem. On computational side, we propose two methods, "maximum likelihood estimation" algorithm and "two-step estimation" algorithm, to accommodate for different settings. When the proportion of permuted labels is moderate, both methods work effectively. Multiple numerical experiments are provided and corroborate our theoretical findings.

preprint2022arXiv

SpaceE: Knowledge Graph Embedding by Relational Linear Transformation in the Entity Space

Translation distance based knowledge graph embedding (KGE) methods, such as TransE and RotatE, model the relation in knowledge graphs as translation or rotation in the vector space. Both translation and rotation are injective; that is, the translation or rotation of different vectors results in different results. In knowledge graphs, different entities may have a relation with the same entity; for example, many actors starred in one movie. Such a non-injective relation pattern cannot be well modeled by the translation or rotation operations in existing translation distance based KGE methods. To tackle the challenge, we propose a translation distance-based KGE method called SpaceE to model relations as linear transformations. The proposed SpaceE embeds both entities and relations in knowledge graphs as matrices and SpaceE naturally models non-injective relations with singular linear transformations. We theoretically demonstrate that SpaceE is a fully expressive model with the ability to infer multiple desired relation patterns, including symmetry, skew-symmetry, inversion, Abelian composition, and non-Abelian composition. Experimental results on link prediction datasets illustrate that SpaceE substantially outperforms many previous translation distance based knowledge graph embedding methods, especially on datasets with many non-injective relations. The code is available based on the PaddlePaddle deep learning platform https://www.paddlepaddle.org.cn.

preprint2022arXiv

Stability and Risk Bounds of Iterative Hard Thresholding

In this paper, we analyze the generalization performance of the Iterative Hard Thresholding (IHT) algorithm widely used for sparse recovery problems. The parameter estimation and sparsity recovery consistency of IHT has long been known in compressed sensing. From the perspective of statistical learning, another fundamental question is how well the IHT estimation would predict on unseen data. This paper makes progress towards answering this open question by introducing a novel sparse generalization theory for IHT under the notion of algorithmic stability. Our theory reveals that: 1) under natural conditions on the empirical risk function over $n$ samples of dimension $p$, IHT with sparsity level $k$ enjoys an $\mathcal{\tilde O}(n^{-1/2}\sqrt{k\log(n)\log(p)})$ rate of convergence in sparse excess risk; 2) a tighter $\mathcal{\tilde O}(n^{-1/2}\sqrt{\log(n)})$ bound can be established by imposing an additional iteration stability condition on a hypothetical IHT procedure invoked to the population risk; and 3) a fast rate of order $\mathcal{\tilde O}\left(n^{-1}k(\log^3(n)+\log(p))\right)$ can be derived for strongly convex risk function under proper strong-signal conditions. The results have been substantialized to sparse linear regression and sparse logistic regression models to demonstrate the applicability of our theory. Preliminary numerical evidence is provided to confirm our theoretical predictions.

preprint2022arXiv

Variational Flow Graphical Model

This paper introduces a novel approach to embed flow-based models with hierarchical structures. The proposed framework is named Variational Flow Graphical (VFG) Model. VFGs learn the representation of high dimensional data via a message-passing scheme by integrating flow-based functions through variational inference. By leveraging the expressive power of neural networks, VFGs produce a representation of the data using a lower dimension, thus overcoming the drawbacks of many flow-based models, usually requiring a high dimensional latent space involving many trivial variables. Aggregation nodes are introduced in the VFG models to integrate forward-backward hierarchical information via a message passing scheme. Maximizing the evidence lower bound (ELBO) of data likelihood aligns the forward and backward messages in each aggregation node achieving a consistency node state. Algorithms have been developed to learn model parameters through gradient updating regarding the ELBO objective. The consistency of aggregation nodes enable VFGs to be applicable in tractable inference on graphical structures. Besides representation learning and numerical inference, VFGs provide a new approach for distribution modeling on datasets with graphical latent structures. Additionally, theoretical study shows that VFGs are universal approximators by leveraging the implicitly invertible flow-based structures. With flexible graphical structures and superior excessive power, VFGs could potentially be used to improve probabilistic inference. In the experiments, VFGs achieves improved evidence lower bound (ELBO) and likelihood values on multiple datasets.

preprint2021arXiv

Absorption of massless scalar field by furry black holes in de Rham-Gabadadze-Tolley theory

We study the absorption of massless scalar field by two types furry charged black holes in de Rham-Gabadadze-Tolley (dRGT) theory. The absorption cross section is calculated in high frequency limit $σ_{hf}$ and low frequency limit $σ_{lf}$. We show that the high frequency limit $σ_{hf}$ is the area of shadow and the low frequency limit $σ_{lf}$ is the area of horizon. The ratio $R_{f}=\frac{σ_{hf}}{σ_{lf}}$ is used to measure the impact of charges on the absorption spectra of furry black hole. If the black hole possess an extra charge except mass, the interval value of absorption ratio $R_{f}$ is different: $[1,\frac{27}{16}]$ for electric charge, $[0.7675,\frac{27}{16}]$ for positive charge and $[\frac{27}{16},3.1835]$ for negative charge. We also use a numerical method to compute the absorption cross section in the finite frequency domain. A series of numerical results are presented.

preprint2021arXiv

High-Order Statistical Functional Expansion and Its Application To Some Nonsmooth Problems

Let $\bx_j = \btheta +\bep_j, j=1,...,n$, be observations of an unknown parameter $\btheta$ in a Euclidean or separable Hilbert space $\scrH$, where $\bep_j$ are noises as random elements in $\scrH$ from a general distribution. We study the estimation of $f(\btheta)$ for a given functional $f:\scrH\rightarrow \RR$ based on $\bx_j$'s. The key element of our approach is a new method which we call High-Order Degenerate Statistical Expansion. It leverages the use of classical multivariate Taylor expansion and degenerate $U$-statistic and yields an elegant explicit formula. In the univariate case of $\scrH=\R$, the formula expresses the error of the proposed estimator as a sum of order $k$ degenerate $U$-products of the noises with coefficient $f^{(k)}(\btheta)/k!$ and an explicit remainder term in the form of the Riemann-Liouville integral as in the Taylor expansion around the true $\btheta$. For general $\scrH$, the formula expresses the estimation error in terms of the inner product of $f^{(k)}(\btheta)/k!$ and the average of the tensor products of $k$ noises with distinct indices and a parallel extension of the remainder term from the univariate case. This makes the proposed method a natural statistical version of the classical Taylor expansion. The proposed estimator can be viewed as a jackknife estimator of an ideal degenerate expansion of $f(\cdot)$ around the true $\btheta$ with the degenerate $U$-product of the noises, and can be approximated by bootstrap. Thus, the jackknife, bootstrap and Taylor expansion approaches all converge to the proposed estimator. We develop risk bounds for the proposed estimator and a central limit theorem under a second moment condition (even in expansions of higher than the second order). We apply this new method to generalize several existing results with smooth and nonsmooth $f$ to universal $\bep_j$'s with only minimum moment constraints.

preprint2021arXiv

Quantization Algorithms for Random Fourier Features

The method of random projection (RP) is the standard technique in machine learning and many other areas, for dimensionality reduction, approximate near neighbor search, compressed sensing, etc. Basically, RP provides a simple and effective scheme for approximating pairwise inner products and Euclidean distances in massive data. Closely related to RP, the method of random Fourier features (RFF) has also become popular, for approximating the Gaussian kernel. RFF applies a specific nonlinear transformation on the projected data from random projections. In practice, using the (nonlinear) Gaussian kernel often leads to better performance than the linear kernel (inner product), partly due to the tuning parameter $(γ)$ introduced in the Gaussian kernel. Recently, there has been a surge of interest in studying properties of RFF. After random projections, quantization is an important step for efficient data storage, computation, and transmission. Quantization for RP has also been extensive studied in the literature. In this paper, we focus on developing quantization algorithms for RFF. The task is in a sense challenging due to the tuning parameter $γ$ in the Gaussian kernel. For example, the quantizer and the quantized data might be tied to each specific tuning parameter $γ$. Our contribution begins with an interesting discovery, that the marginal distribution of RFF is actually free of the Gaussian kernel parameter $γ$. This small finding significantly simplifies the design of the Lloyd-Max (LM) quantization scheme for RFF in that there would be only one LM quantizer for RFF (regardless of $γ$). We also develop a variant named LM$^2$-RFF quantizer, which in certain cases is more accurate. Experiments confirm that the proposed quantization schemes perform well.

preprint2021arXiv

Simulation on the Transparency of Electrons and Ion Back Flow for a Time Projection Chamber based on Staggered Multiple THGEMs

The IBF and the transparent rate of electrons are two essential indicators of TPC, which affect the energy resolution and counting rate respectively. In this paper, we propose several novel strategies of staggered multi-THGEM to suppress IBF, where the geometry of the first layer THGEM will be optimized to increase the electron transparent rate. By Garfield++ simulation, the electron transparency rate can be more than 90% of single THGEM with a optimized large hole. By simulating these configurations of triple and quadruple THGEM structures, we conclude that the IBF can be reduced to 0.2% level in an optimized configuration denoted as "ACBA". This strategy for staggered THGEM could have potential applications in future TPC projects.

preprint2020arXiv

A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data

A tacit assumption in linear regression is that (response, predictor)-pairs correspond to identical observational units. A series of recent works have studied scenarios in which this assumption is violated under terms such as ``Unlabeled Sensing and ``Regression with Unknown Permutation''. In this paper, we study the setup of multiple response variables and a notion of mismatches that generalizes permutations in order to allow for missing matches as well as for one-to-many matches. A two-stage method is proposed under the assumption that most pairs are correctly matched. In the first stage, the regression parameter is estimated by handling mismatches as contaminations, and subsequently the generalized permutation is estimated by a basic variant of matching. The approach is both computationally convenient and equipped with favorable statistical guarantees. Specifically, it is shown that the conditions for permutation recovery become considerably less stringent as the number of responses $m$ per observation increase. Particularly, for $m = Ω(\log n)$, the required signal-to-noise ratio no longer depends on the sample size $n$. Numerical results on synthetic and real data are presented to support the main findings of our analysis.

preprint2020arXiv

Adaptive Randomization in Network Data

Network data have appeared frequently in recent research. For example, in comparing the effects of different types of treatment, network models have been proposed to improve the quality of estimation and hypothesis testing. In this paper, we focus on efficiently estimating the average treatment effect using an adaptive randomization procedure in networks. We work on models of causal frameworks, for which the treatment outcome of a subject is affected by its own covariate as well as those of its neighbors. Moreover, we consider the case in which, when we assign treatments to the current subject, only the subnetwork of existing subjects is revealed. New randomized procedures are proposed to minimize the mean squared error of the estimated differences between treatment effects. In network data, it is usually difficult to obtain theoretical properties because the numbers of nodes and connections increase simultaneously. Under mild assumptions, our proposed procedure is closely related to a time-varying inhomogeneous Markov chain. We then use Lyapunov functions to derive the theoretical properties of the proposed procedures. The advantages of the proposed procedures are also demonstrated by extensive simulations and experiments on real network data.

preprint2020arXiv

An Inverse-free Truncated Rayleigh-Ritz Method for Sparse Generalized Eigenvalue Problem

This paper considers the sparse generalized eigenvalue problem (SGEP), which aims to find the leading eigenvector with at most $k$ nonzero entries. SGEP naturally arises in many applications in machine learning, statistics, and scientific computing, for example, the sparse principal component analysis (SPCA), the sparse discriminant analysis (SDA), and the sparse canonical correlation analysis (SCCA). In this paper, we focus on the development of a three-stage algorithm named {\em inverse-free truncated Rayleigh-Ritz method} ({\em IFTRR}) to efficiently solve SGEP. In each iteration of IFTRR, only a small number of matrix-vector products is required. This makes IFTRR well-suited for large scale problems. Particularly, a new truncation strategy is proposed, which is able to find the support set of the leading eigenvector effectively. Theoretical results are developed to explain why IFTRR works well. Numerical simulations demonstrate the merits of IFTRR.

preprint2020arXiv

Characterization on projective submanifolds of codimensions 2 and 3

In this article we give a necessary and sufficient condition to characterize projective submanifolds in ${\mathbb P}^N$ with codimensions 2 and 3. The conditions involve the Chern classes of the manifold and a very ample line bundle on the manifold. This generalizes our earlier characterization for hypersurfaces. The higher codimensional cases are proposed as a general question.

preprint2020arXiv

Chern class inequalities on polarized manifolds and nef vector bundles

This article is concerned with Chern class and Chern number inequalities on polarized manifolds and nef vector bundles. For a polarized pair $(M,L)$ with $L$ very ample, our first main result is a family of sharp Chern class inequalities. Among them the first one is a variant of a classical result and the equality case of the second one is a characterization of hypersurfaces. The second main result is a Chern number inequality on it, which includes a reverse Miyaoka-Yau type inequality. The third main result is that the Chern numbers of a nef vector bundle over a compact Kähler manifold are bounded below by the Euler number. As an application, we classify compact Kähler manifolds with nonnegative bisectional curvature whose Chern numbers are all positive. A conjecture related to the Euler number of compact Kähler manifolds with nonpositive bisectional curvature is proposed, which can be regarded as a complex analogue to the Hopf conjecture.

preprint2020arXiv

Cluster-Adaptive Network A/B Testing: From Randomization to Estimation

A/B testing is an important decision-making tool in product development for evaluating user engagement or satisfaction from a new service, feature or product. The goal of A/B testing is to estimate the average treatment effects (ATE) of a new change, which becomes complicated when users are interacting. When the important assumption of A/B testing, the Stable Unit Treatment Value Assumption (SUTVA), which states that each individual's response is affected by their own treatment only, is not valid, the classical estimate of the ATE usually leads to a wrong conclusion. In this paper, we propose a cluster-adaptive network A/B testing procedure, which involves a sequential cluster-adaptive randomization and a cluster-adjusted estimator. The cluster-adaptive randomization is employed to minimize the cluster-level Mahalanobis distance within the two treatment groups, so that the variance of the estimate of the ATE can be reduced. In addition, the cluster-adjusted estimator is used to eliminate the bias caused by network interference, resulting in a consistent estimation for the ATE. Numerical studies suggest our cluster-adaptive network A/B testing achieves consistent estimation with higher efficiency. An empirical study is conducted based on a real world network to illustrate how our method can benefit decision-making in application.

preprint2020arXiv

Complexity results for two kinds of colored disconnections of graphs

The concept of rainbow disconnection number of graphs was introduced by Chartrand et al. in 2018. Inspired by this concept, we put forward the concepts of rainbow vertex-disconnection and proper disconnection in graphs. In this paper, we first show that it is $NP$-complete to decide whether a given edge-colored graph $G$ with maximum degree $Δ(G)=4$ is proper disconnected. Then, for a graph $G$ with $Δ(G)\leq 3$ we show that $pd(G)\leq 2$ and determine the graphs with $pd(G)=1$ and $2$, respectively. Furthermore, we show that for a general graph $G$, deciding whether $pd(G)=1$ is $NP$-complete, even if $G$ is bipartite. We also show that it is $NP$-complete to decide whether a given vertex-colored graph $G$ is rainbow vertex-disconnected, even though the graph $G$ has $Δ(G)=3$ or is bipartite.

preprint2020arXiv

Cooperative Rate-Splitting for Secrecy Sum-Rate Enhancement in Multi-antenna Broadcast Channels

In this paper, we employ Cooperative Rate-Splitting (CRS) technique to enhance the Secrecy Sum Rate (SSR) for the Multiple Input Single Output (MISO) Broadcast Channel (BC), consisting of two legitimate users and one eavesdropper, with perfect Channel State Information (CSI) available at all nodes. For CRS based on the three-node relay channel, the transmitter splits and encodes the messages of legitimate users into common and private streams based on Rate-Splitting (RS). With the goal of maximizing SSR, the proposed CRS strategy opportunistically asks the relaying legitimate user to forward its decoded common message. During the transmission, the eavesdropper keeps wiretapping silently. To ensure secure transmission, the common message is used for the dual purpose, serving both as a desired message and Artificial Noise (AN) without consuming extra transmit power comparing to the conventional AN design. Taking into account the total power constraint and the Physical Layer (PHY) security, the precoders and time-slot allocation are jointly optimized by solving the non-convex SSR maximization problem based on Sequential Convex Approximation (SCA) algorithm. Numerical results show that the proposed CRS secure transmission scheme outperforms existing Multi-User Linear Precoding (MU-LP) and Cooperative Non-Orthogonal Multiple Access (C-NOMA) strategies. Therefore, CRS is a promising strategy to enhance the PHY security in multi-antenna BC systems.

preprint2020arXiv

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Neural networks of ads systems usually take input from multiple resources, e.g., query-ad relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot binary features, with typically only a tiny fraction of nonzero feature values per example. Deep learning models in online advertising industries can have terabyte-scale parameters that do not fit in the GPU memory nor the CPU main memory on a computing node. For example, a sponsored online advertising system can contain more than $10^{11}$ sparse features, making the neural network a massive model with around 10 TB parameters. In this paper, we introduce a distributed GPU hierarchical parameter server for massive scale deep learning ads systems. We propose a hierarchical workflow that utilizes GPU High-Bandwidth Memory, CPU main memory and SSD as 3-layer hierarchical storage. All the neural network training computations are contained in GPUs. Extensive experiments on real-world data confirm the effectiveness and the scalability of the proposed system. A 4-node hierarchical GPU parameter server can train a model more than 2X faster than a 150-node in-memory distributed parameter server in an MPI cluster. In addition, the price-performance ratio of our proposed system is 4-9 times better than an MPI-cluster solution.

preprint2020arXiv

Distributed Primal-Dual Optimization for Online Multi-Task Learning

Conventional online multi-task learning algorithms suffer from two critical limitations: 1) Heavy communication caused by delivering high velocity of sequential data to a central machine; 2) Expensive runtime complexity for building task relatedness. To address these issues, in this paper we consider a setting where multiple tasks are geographically located in different places, where one task can synchronize data with others to leverage knowledge of related tasks. Specifically, we propose an adaptive primal-dual algorithm, which not only captures task-specific noise in adversarial learning but also carries out a projection-free update with runtime efficiency. Moreover, our model is well-suited to decentralized periodic-connected tasks as it allows the energy-starved or bandwidth-constraint tasks to postpone the update. Theoretical results demonstrate the convergence guarantee of our distributed algorithm with an optimal regret. Empirical results confirm that the proposed model is highly effective on various real-world datasets.

preprint2020arXiv

FedSKETCH: Communication-Efficient and Private Federated Learning via Sketching

Communication complexity and privacy are the two key challenges in Federated Learning where the goal is to perform a distributed learning through a large volume of devices. In this work, we introduce FedSKETCH and FedSKETCHGATE algorithms to address both challenges in Federated learning jointly, where these algorithms are intended to be used for homogeneous and heterogeneous data distribution settings respectively. The key idea is to compress the accumulation of local gradients using count sketch, therefore, the server does not have access to the gradients themselves which provides privacy. Furthermore, due to the lower dimension of sketching used, our method exhibits communication-efficiency property as well. We provide, for the aforementioned schemes, sharp convergence guarantees. Finally, we back up our theory with various set of experiments.

preprint2020arXiv

Generalization Bounds for High-dimensional M-estimation under Sparsity Constraint

The $\ell_0$-constrained empirical risk minimization ($\ell_0$-ERM) is a promising tool for high-dimensional statistical estimation. The existing analysis of $\ell_0$-ERM estimator is mostly on parameter estimation and support recovery consistency. From the perspective of statistical learning, another fundamental question is how well the $\ell_0$-ERM estimator would perform on unseen samples. The answer to this question is important for understanding the learnability of such a non-convex (and also NP-hard) M-estimator but still relatively under explored. In this paper, we investigate this problem and develop a generalization theory for $\ell_0$-ERM. We establish, in both white-box and black-box statistical regimes, a set of generalization gap and excess risk bounds for $\ell_0$-ERM to characterize its sparse prediction and optimization capability. Our theory mainly reveals three findings: 1) tighter generalization bounds can be attained by $\ell_0$-ERM than those of $\ell_2$-ERM if the risk function is (with high probability) restricted strongly convex; 2) tighter uniform generalization bounds can be established for $\ell_0$-ERM than the conventional dense ERM; and 3) sparsity level invariant bounds can be established by imposing additional strong-signal conditions to ensure the stability of $\ell_0$-ERM. In light of these results, we further provide generalization guarantees for the Iterative Hard Thresholding (IHT) algorithm which serves as one of the most popular greedy pursuit methods for approximately solving $\ell_0$-ERM. Numerical evidence is provided to confirm our theoretical predictions when implied to sparsity-constrained linear regression and logistic regression models.

preprint2020arXiv

IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation

Feature selection is an important tool to deal with high dimensional data. In unsupervised case, many popular algorithms aim at maintaining the structure of the original data. In this paper, we propose a simple and effective feature selection algorithm to enhance sample similarity preservation through a new perspective, topology preservation, which is represented by persistent diagrams from the context of computational topology. This method is designed upon a unified feature selection framework called IVFS, which is inspired by random subset method. The scheme is flexible and can handle cases where the problem is analytically intractable. The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data. We demonstrate that our algorithm can provide satisfactory performance under a sharp sub-sampling rate, which supports efficient implementation of our proposed method to large scale datasets. Extensive experiments validate the effectiveness of the proposed feature selection scheme.

preprint2020arXiv

MeDaS: An open-source platform as service to help break the walls between medicine and informatics

In the past decade, deep learning (DL) has achieved unprecedented success in numerous fields including computer vision, natural language processing, and healthcare. In particular, DL is experiencing an increasing development in applications for advanced medical image analysis in terms of analysis, segmentation, classification, and furthermore. On the one hand, tremendous needs that leverage the power of DL for medical image analysis are arising from the research community of a medical, clinical, and informatics background to jointly share their expertise, knowledge, skills, and experience. On the other hand, barriers between disciplines are on the road for them often hampering a full and efficient collaboration. To this end, we propose our novel open-source platform, i.e., MeDaS -- the MeDical open-source platform as Service. To the best of our knowledge, MeDaS is the first open-source platform proving a collaborative and interactive service for researchers from a medical background easily using DL related toolkits, and at the same time for scientists or engineers from information sciences to understand the medical knowledge side. Based on a series of toolkits and utilities from the idea of RINV (Rapid Implementation aNd Verification), our proposed MeDaS platform can implement pre-processing, post-processing, augmentation, visualization, and other phases needed in medical image analysis. Five tasks including the subjects of lung, liver, brain, chest, and pathology, are validated and demonstrated to be efficiently realisable by using MeDaS.

preprint2020arXiv

Meta-CoTGAN: A Meta Cooperative Training Paradigm for Improving Adversarial Text Generation

Training generative models that can generate high-quality text with sufficient diversity is an important open problem for Natural Language Generation (NLG) community. Recently, generative adversarial models have been applied extensively on text generation tasks, where the adversarially trained generators alleviate the exposure bias experienced by conventional maximum likelihood approaches and result in promising generation quality. However, due to the notorious defect of mode collapse for adversarial training, the adversarially trained generators face a quality-diversity trade-off, i.e., the generator models tend to sacrifice generation diversity severely for increasing generation quality. In this paper, we propose a novel approach which aims to improve the performance of adversarial text generation via efficiently decelerating mode collapse of the adversarial training. To this end, we introduce a cooperative training paradigm, where a language model is cooperatively trained with the generator and we utilize the language model to efficiently shape the data distribution of the generator against mode collapse. Moreover, instead of engaging the cooperative update for the generator in a principled way, we formulate a meta learning mechanism, where the cooperative update to the generator serves as a high level meta task, with an intuition of ensuring the parameters of the generator after the adversarial update would stay resistant against mode collapse. In the experiment, we demonstrate our proposed approach can efficiently slow down the pace of mode collapse for the adversarial text generators. Overall, our proposed method is able to outperform the baseline approaches with significant margins in terms of both generation quality and diversity in the testified domains.

preprint2020arXiv

Monochromatic disconnection: Erdős-Gallai-type problems and product graphs

For an edge-colored graph $G$, we call an edge-cut $M$ of $G$ monochromatic if the edges of $M$ are colored with a same color. The graph $G$ is called monochromatically disconnected if any two distinct vertices of $G$ are separated by a monochromatic edge-cut. The monochromatic disconnection number, denoted by $md(G)$, of a connected graph $G$ is the maximum number of colors that are allowed to make $G$ monochromatically disconnected. In this paper, we solve the Erdős-Gallai-type problems for the monochromatic disconnection, and give the monochromatic disconnection numbers for four graph products, i.e., Cartesian, strong, lexicographic, and tensor products.

preprint2020arXiv

Nonnegative Hermitian vector bundles and Chern numbers

We show in this article that if a holomorphic vector bundle has a nonnegative Hermitian metric in the sense of Bott and Chern, which always exists on globally generated holomorphic vector bundles, then some special linear combinations of Chern forms are strongly nonnegative. This particularly implies that all the Chern numbers of such a holomorphic vector bundle are nonnegative and can be bounded below and above respectively by two special Chern numbers. As applications, we obtain a family of new results on compact connected complex manifolds which are homogeneous or can be holomorphically immersed into complex tori, some of which improve several classical results.

preprint2020arXiv

Rainbow monochromatic $k$-edge-connection colorings of graphs

A path in an edge-colored graph is called a monochromatic path if all edges of the path have a same color. We call $k$ paths $P_1,\cdots,P_k$ rainbow monochromatic paths if every $P_i$ is monochromatic and for any two $i\neq j$, $P_i$ and $P_j$ have different colors. An edge-coloring of a graph $G$ is said to be a rainbow monochromatic $k$-edge-connection coloring (or $RMC_k$-coloring for short) if every two distinct vertices of $G$ are connected by at least $k$ rainbow monochromatic paths. We use $rmc_k(G)$ to denote the maximum number of colors that ensures $G$ has an $RMC_k$-coloring, and this number is called the rainbow monochromatic $k$-edge-connection number. We prove the existence of $RMC_k$-colorings of graphs, and then give some bounds of $rmc_k(G)$ and present some graphs whose $rmc_k(G)$ reaches the lower bound. We also obtain the threshold function for $rmc_k(G(n,p))\geq f(n)$, where $\lfloor\frac{n}{2}\rfloor> k\geq 1$.

preprint2020arXiv

Randomized Kernel Multi-view Discriminant Analysis

In many artificial intelligence and computer vision systems, the same object can be observed at distinct viewpoints or by diverse sensors, which raises the challenges for recognizing objects from different, even heterogeneous views. Multi-view discriminant analysis (MvDA) is an effective multi-view subspace learning method, which finds a discriminant common subspace by jointly learning multiple view-specific linear projections for object recognition from multiple views, in a non-pairwise way. In this paper, we propose the kernel version of multi-view discriminant analysis, called kernel multi-view discriminant analysis (KMvDA). To overcome the well-known computational bottleneck of kernel methods, we also study the performance of using random Fourier features (RFF) to approximate Gaussian kernels in KMvDA, for large scale learning. Theoretical analysis on stability of this approximation is developed. We also conduct experiments on several popular multi-view datasets to illustrate the effectiveness of our proposed strategy.

preprint2020arXiv

RGB-D SLAM in Dynamic Environments Using Point Correlations

In this paper, a simultaneous localization and mapping (SLAM) method that eliminates the influence of moving objects in dynamic environments is proposed. This method utilizes the correlation between map points to separate points that are part of the static scene and points that are part of different moving objects into different groups. A sparse graph is first created using Delaunay triangulation from all map points. In this graph, the vertices represent map points, and each edge represents the correlation between adjacent points. If the relative position between two points remains consistent over time, there is correlation between them, and they are considered to be moving together rigidly. If not, they are considered to have no correlation and to be in separate groups. After the edges between the uncorrelated points are removed during point-correlation optimization, the remaining graph separates the map points of the moving objects from the map points of the static scene. The largest group is assumed to be the group of reliable static map points. Finally, motion estimation is performed using only these points. The proposed method was implemented for RGB-D sensors, evaluated with a public RGB-D benchmark, and tested in several additional challenging environments. The experimental results demonstrate that robust and accurate performance can be achieved by the proposed SLAM method in both slightly and highly dynamic environments. Compared with other state-of-the-art methods, the proposed method can provide competitive accuracy with good real-time performance.

preprint2020arXiv

Selective Convolutional Network: An Efficient Object Detector with Ignoring Background

It is well known that attention mechanisms can effectively improve the performance of many CNNs including object detectors. Instead of refining feature maps prevalently, we reduce the prohibitive computational complexity by a novel attempt at attention. Therefore, we introduce an efficient object detector called Selective Convolutional Network (SCN), which selectively calculates only on the locations that contain meaningful and conducive information. The basic idea is to exclude the insignificant background areas, which effectively reduces the computational cost especially during the feature extraction. To solve it, we design an elaborate structure with negligible overheads to guide the network where to look next. It's end-to-end trainable and easy-embedding. Without additional segmentation datasets, we explores two different train strategies including direct supervision and indirect supervision. Extensive experiments assess the performance on PASCAL VOC2007 and MS COCO detection datasets. Results show that SSD and Pelee integrated with our method averagely reduce the calculations in a range of 1/5 and 1/3 with slight loss of accuracy, demonstrating the feasibility of SCN.

preprint2020arXiv

Solving the Robust Matrix Completion Problem via a System of Nonlinear Equations

We consider the problem of robust matrix completion, which aims to recover a low rank matrix $L_*$ and a sparse matrix $S_*$ from incomplete observations of their sum $M=L_*+S_*\in\mathbb{R}^{m\times n}$. Algorithmically, the robust matrix completion problem is transformed into a problem of solving a system of nonlinear equations, and the alternative direction method is then used to solve the nonlinear equations. In addition, the algorithm is highly parallelizable and suitable for large scale problems. Theoretically, we characterize the sufficient conditions for when $L_*$ can be approximated by a low rank approximation of the observed $M_*$. And under proper assumptions, it is shown that the algorithm converges to the true solution linearly. Numerical simulations show that the simple method works as expected and is comparable with state-of-the-art methods.

preprint2020arXiv

Structure-Feature based Graph Self-adaptive Pooling

Various methods to deal with graph data have been proposed in recent years. However, most of these methods focus on graph feature aggregation rather than graph pooling. Besides, the existing top-k selection graph pooling methods have a few problems. First, to construct the pooled graph topology, current top-k selection methods evaluate the importance of the node from a single perspective only, which is simplistic and unobjective. Second, the feature information of unselected nodes is directly lost during the pooling process, which inevitably leads to a massive loss of graph feature information. To solve these problems mentioned above, we propose a novel graph self-adaptive pooling method with the following objectives: (1) to construct a reasonable pooled graph topology, structure and feature information of the graph are considered simultaneously, which provide additional veracity and objectivity in node selection; and (2) to make the pooled nodes contain sufficiently effective graph information, node feature information is aggregated before discarding the unimportant nodes; thus, the selected nodes contain information from neighbor nodes, which can enhance the use of features of the unselected nodes. Experimental results on four different datasets demonstrate that our method is effective in graph classification and outperforms state-of-the-art graph pooling methods.

preprint2020arXiv

The Benefits of Diversity: Permutation Recovery in Unlabeled Sensing from Multiple Measurement Vectors

In "Unlabeled Sensing", one observes a set of linear measurements of an underlying signal with incomplete or missing information about their ordering, which can be modeled in terms of an unknown permutation. Previous work on the case of a single noisy measurement vector has exposed two main challenges: 1) a high requirement concerning the \emph{signal-to-noise ratio} ($\snr$), i.e., approximately of the order of $n^{5}$, and 2) a massive computational burden in light of NP-hardness in general. In this paper, we study the case of \emph{multiple} noisy measurement vectors (MMVs) resulting from a \emph{common} permutation and investigate to what extent the number of MMVs $m$ facilitates permutation recovery by "borrowing strength". The above two challenges have at least partially been resolved within our work. First, we show that a large stable rank of the signal significantly reduces the required snr which can drop from a polynomial in $n$ for $m = 1$ to a constant for $m = Ω(\log n)$, where $m$ denotes the number of MMVs and $n$ denotes the number of measurements per MV. This bound is shown to be sharp and is associated with a phase transition phenomenon. Second, we propose a computational scheme for recovering the unknown permutation in practice. For the "oracle case" with the known signal, the maximum likelihood (ML) estimator reduces to a linear assignment problem whose global optimum can be obtained efficiently. For the case in which both the signal and permutation are unknown, the problem is reformulated as a bi-convex optimization problem with an auxiliary variable, which can be solved by the Alternating Direction Method of Multipliers (ADMM). Numerical experiments based on the proposed computational scheme confirm the tightness of our theoretical analysis.

preprint2020arXiv

The rainbow vertex-disconnection in graphs

Let $G$ be a nontrivial connected and vertex-colored graph. A subset $X$ of the vertex set of $G$ is called rainbow if any two vertices in $X$ have distinct colors. The graph $G$ is called \emph{rainbow vertex-disconnected} if for any two vertices $x$ and $y$ of $G$, there exists a vertex subset $S$ of $G$ such that when $x$ and $y$ are nonadjacent, $S$ is rainbow and $x$ and $y$ belong to different components of $G-S$; whereas when $x$ and $y$ are adjacent, $S+x$ or $S+y$ is rainbow and $x$ and $y$ belong to different components of $(G-xy)-S$. For a connected graph $G$, the \emph{rainbow vertex-disconnection number} of $G$, denoted by $rvd(G)$, is the minimum number of colors that are needed to make $G$ rainbow vertex-disconnected. In this paper, we characterize all graphs of order $n$ with rainbow vertex-disconnection number $k$ for $k\in\{1,2,n\}$, and determine the rainbow vertex-disconnection numbers of some special graphs. Moreover, we study the extremal problems on the number of edges of a connected graph $G$ with order $n$ and $rvd(G)=k$ for given integers $k$ and $n$ with $1\leq k\leq n$.

preprint2020arXiv

Two-dimensional transition metal oxides Mn2O3 realized quantum anomalous Hall effect

The quantum anomalous Hall effect is a intriguing topological nontrivial phase arising from spontaneous magnetization and spin-orbit coupling. However, the tremendously harsh realizing requirements of the quantum anomalous Hall effects in magnetic topological insulators of Cr or V-doped (Bi,Sb)2Te3 film, hinder its practical applications. Here, we use first principles calculations to predict that the three Mn2O3 structure is an intrinsic ferromagnetic Chern insulator. Remarkably, a quantum anomalous Hall phase of Chern number C = -2 is found, and there are two corresponding gapless chiral edge states appearing inside the bulk gap. More interestingly, only a small tensile strain is needed to induce the phase transition from Cmm2 and C222 phase to P6/mmm phase. Meanwhile, a topological quantum phase transition between a quantum anomalous Hall phase and a trivial insulating phase can be realize. The combination of these novel properties renders the two-dimensional ferromagnet a promising platform for high effciency electronic and spintronic applications.

preprint2020arXiv

Understanding and Detecting Convergence for Stochastic Gradient Descent with Momentum

Convergence detection of iterative stochastic optimization methods is of great practical interest. This paper considers stochastic gradient descent (SGD) with a constant learning rate and momentum. We show that there exists a transient phase in which iterates move towards a region of interest, and a stationary phase in which iterates remain bounded in that region around a minimum point. We construct a statistical diagnostic test for convergence to the stationary phase using the inner product between successive gradients and demonstrate that the proposed diagnostic works well. We theoretically and empirically characterize how momentum can affect the test statistic of the diagnostic, and how the test statistic captures a relatively sparse signal within the gradients in convergence. Finally, we demonstrate an application to automatically tune the learning rate by reducing it each time stationarity is detected, and show the procedure is robust to mis-specified initial rates.

preprint2020arXiv

Upper bounds for the $MD$-numbers and characterization of extremal graphs

For an edge-colored graph $G$, we call an edge-cut $M$ of $G$ monochromatic if the edges of $M$ are colored with the same color. The graph $G$ is called monochromatic disconnected if any two distinct vertices of $G$ are separated by a monochromatic edge-cut. For a connected graph $G$, the monochromatic disconnection number (or $MD$-number for short) of $G$, denoted by $md(G)$, is the maximum number of colors that are allowed in order to make $G$ monochromatic disconnected. For graphs with diameter one, they are complete graphs and so their $MD$-numbers are $1$. For graphs with diameter at least 3, we can construct $2$-connected graphs such that their $MD$-numbers can be arbitrarily large; whereas for graphs $G$ with diameter two, we show that if $G$ is a $2$-connected graph then $md(G)\leq 2$, and if $G$ has a cut-vertex then $md(G)$ is equal to the number of blocks of $G$. So, we will focus on studying $2$-connected graphs with diameter two, and give two upper bounds of their $MD$-numbers depending on their connectivity and independent numbers, respectively. We also characterize the $\left\lfloor\frac{n}{2}\right\rfloor$-connected graphs (with large connectivity) whose $MD$-numbers are $2$ and the $2$-connected graphs (with small connectivity) whose $MD$-numbers archive the upper bound $\left\lfloor\frac{n}{2}\right\rfloor.$ For graphs with connectivity less than $\frac n 2$, we show that if the connectivity of a graph is in linear with its order $n$, then its $MD$-number is upper bounded by a constant, and this suggests us to leave a conjecture that for a $k$-connected graph $G$, $md(G)\leq \left\lfloor\frac{n}{k}\right\rfloor$.

preprint2019arXiv

Fully spin-polarized quadratic non-Dirac bands realized quantum anomalous Hall effect

The quantum anomalous Hall effect is a intriguing quantum state which exhibits the chiral edge states in the absence of magnetic field. While the search for quantum anomalous Hall insulators is still active, the researchers mainly search for the systems containing magnetic atom. Here, based on first-principles density functional theory, we predict a new family of chern insulators with fully spin-polarized quadratic px;y non-Dirac bands in the alkali earth metal BaX (X = Si, Ge, Sn) system. We show that BaX monolayer has a half-metallic ferromagnetic ground state. The ferromagnetism is mainly originated from the p orbitals of Si, Ge and Sn atoms. The 2D BaSn monolayer exhibits a large magnetocrystalline anisotropic energy of 12.20 meV/cell and a nontrivial band gap of 159.10 meV. Interestingly, both the spin polarization of the chiral edge currents and the sign of Chern number can be tuned by doping. Furthermore, the 4 % compressive strain can drive structural phase transition but the nontrivial topological properties remain reserve in the 2D BaX systems. Our findings not only extend the novel concepts but also provide fascinating opportunities for the realization of quantum anomalous Hall effect experimentally.

preprint2019arXiv

Joule-Thomson expansion of the Bardeen-AdS black holes

The Joule-Thomson expansion process is studied for Bardeen-AdS black holes in the extended phase space. Firstly, we get Joule-Thomson coefficient and find that the divergent point of Joule-Thomson coefficient coincides with the zero point of temperature. The inversion curves are also obtained from the zero point of Joule-Thomson coefficient. Then the minimum inversion temperature and the corresponding mass are obtained. In addition, the ratio between minimum inversion and critical temperature for Bardeen-AdS black holes is also calculated. We obtain the isenthalpic curve in $T$-$P$ graph and demonstrate the cooling-heating region by the inversion curve. An interesting phenomenon we get is that black hole is always in heating process. The same phenomenon can be also obtained from the charged AdS black holes.

preprint2019arXiv

On some applications of Gauduchon metrics

We apply the existence and special properties of Gauduchon metrics to give several applications. The first one is concerned with the implications of algebro-geometric nature under the existence of a Hermitian metric with nonnegative holomorphic sectional curvature. The second one is to show the non-existence of holomorphic sections on Hermitian vector bundles under certain conditions. The third one is to give a restriction on the $\partial\bar{\partial}$-closedness of some real $(n-1,n-1)$-forms on compact complex manifolds.

preprint2019arXiv

Outlier Detection and Data Clustering via Innovation Search

The idea of Innovation Search was proposed as a data clustering method in which the directions of innovation were utilized to compute the adjacency matrix and it was shown that Innovation Pursuit can notably outperform the self representation based subspace clustering methods. In this paper, we present a new discovery that the directions of innovation can be used to design a provable and strong robust (to outlier) PCA method. The proposed approach, dubbed iSearch, uses the direction search optimization problem to compute an optimal direction corresponding to each data point. iSearch utilizes the directions of innovation to measure the innovation of the data points and it identifies the outliers as the most innovative data points. Analytical performance guarantees are derived for the proposed robust PCA method under different models for the distribution of the outliers including randomly distributed outliers, clustered outliers, and linearly dependent outliers. In addition, we study the problem of outlier detection in a union of subspaces and it is shown that iSearch provably recovers the span of the inliers when the inliers lie in a union of subspaces. Moreover, we present theoretical studies which show that the proposed measure of innovation remains stable in the presence of noise and the performance of iSearch is robust to noisy data. In the challenging scenarios in which the outliers are close to each other or they are close to the span of the inliers, iSearch is shown to remarkably outperform most of the existing methods. The presented method shows that the directions of innovation are useful representation of the data which can be used to perform both data clustering and outlier detection.

preprint2018arXiv

A Tight Bound of Hard Thresholding

This paper is concerned with the hard thresholding operator which sets all but the $k$ largest absolute elements of a vector to zero. We establish a {\em tight} bound to quantitatively characterize the deviation of the thresholded solution from a given signal. Our theoretical result is universal in the sense that it holds for all choices of parameters, and the underlying analysis depends only on fundamental arguments in mathematical optimization. We discuss the implications for two domains: Compressed Sensing. On account of the crucial estimate, we bridge the connection between the restricted isometry property (RIP) and the sparsity parameter for a vast volume of hard thresholding based algorithms, which renders an improvement on the RIP condition especially when the true sparsity is unknown. This suggests that in essence, many more kinds of sensing matrices or fewer measurements are admissible for the data acquisition procedure. Machine Learning. In terms of large-scale machine learning, a significant yet challenging problem is learning accurate sparse models in an efficient manner. In stark contrast to prior work that attempted the $\ell_1$-relaxation for promoting sparsity, we present a novel stochastic algorithm which performs hard thresholding in each iteration, hence ensuring such parsimonious solutions. Equipped with the developed bound, we prove the {\em global linear convergence} for a number of prevalent statistical models under mild assumptions, even though the problem turns out to be non-convex.

preprint2016arXiv

2-Bit Random Projections, NonLinear Estimators, and Approximate Near Neighbor Search

The method of random projections has become a standard tool for machine learning, data mining, and search with massive data at Web scale. The effective use of random projections requires efficient coding schemes for quantizing (real-valued) projected data into integers. In this paper, we focus on a simple 2-bit coding scheme. In particular, we develop accurate nonlinear estimators of data similarity based on the 2-bit strategy. This work will have important practical applications. For example, in the task of near neighbor search, a crucial step (often called re-ranking) is to compute or estimate data similarities once a set of candidate data points have been identified by hash table techniques. This re-ranking step can take advantage of the proposed coding scheme and estimator. As a related task, in this paper, we also study a simple uniform quantization scheme for the purpose of building hash tables with projected data. Our analysis shows that typically only a small number of bits are needed. For example, when the target similarity level is high, 2 or 3 bits might be sufficient. When the target similarity level is not so high, it is preferable to use only 1 or 2 bits. Therefore, a 2-bit scheme appears to be overall a good choice for the task of sublinear time approximate near neighbor search via hash tables. Combining these results, we conclude that 2-bit random projections should be recommended for approximate near neighbor search and similarity estimation. Extensive experimental results are provided.

preprint2016arXiv

A Comparison Study of Nonlinear Kernels

In this paper, we compare 5 different nonlinear kernels: min-max, RBF, fRBF (folded RBF), acos, and acos-$χ^2$, on a wide range of publicly available datasets. The proposed fRBF kernel performs very similarly to the RBF kernel. Both RBF and fRBF kernels require an important tuning parameter ($γ$). Interestingly, for a significant portion of the datasets, the min-max kernel outperforms the best-tuned RBF/fRBF kernels. The acos kernel and acos-$χ^2$ kernel also perform well in general and in some datasets achieve the best accuracies. One crucial issue with the use of nonlinear kernels is the excessive computational and memory cost. These days, one increasingly popular strategy is to linearize the kernels through various randomization algorithms. In our study, the randomization method for the min-max kernel demonstrates excellent performance compared to the randomization methods for other types of nonlinear kernels, measured in terms of the number of nonzero terms in the transformed dataset. Our study provides evidence for supporting the use of the min-max kernel and the corresponding randomized linearization method (i.e., the so-called "0-bit CWS"). Furthermore, the results motivate at least two directions for future research: (i) To develop new (and linearizable) nonlinear kernels for better accuracies; and (ii) To develop better linearization algorithms for improving the current linearization methods for the RBF kernel, the acos kernel, and the acos-$χ^2$ kernel. One attempt is to combine the min-max kernel with the acos kernel or the acos-$χ^2$ kernel. The advantages of these two new and tuning-free nonlinear kernels are demonstrated vias our extensive experiments.

preprint2016arXiv

Analytical expression for a class of spherically symmetric solutions in Lorentz breaking massive gravity

We present a detailed study of the spherically symmetric solutions in Lorentz breaking massive gravity. There is an undetermined function $\mathcal{F}(X, w_1, w_2, w_3)$ in the action of Stückelberg fields $S_ϕ=Λ^4\int{d^4x\sqrt{-g}\mathcal{F}}$, which should be resolved through physical means. In the general relativity, the spherically symmetric solution to the Einstein equation is a benchmark and its massive deformation also play a crucial role in Lorentz breaking massive gravity. $\mathcal{F}$ will satisfy the constraint equation $T_0^1=0$ from the spherically symmetric Einstein tensor $G_0^1=0$, if we maintain that any reasonable physical theory should possess the spherically symmetric solutions. The Stückelberg field $ϕ^i$ is taken as a 'hedgehog' configuration $ϕ^i=ϕ(r)x^i/r$, whose stability is guaranteed by the topological one. Under this ansätz, $T_0^1=0$ is reduced to $d\mathcal{F}=0$. The functions $\mathcal{F}$ for $d\mathcal{F}=0$ form a commutative ring $R^{\mathcal{F}}$. We obtain a general expression of solution to the functional differential equation with spherically symmetry if $\mathcal{F}\in R^{\mathcal{F}}$. If $\mathcal{F}\in R^{\mathcal{F}}$ and $\partial\mathcal{F}/\partial X=0$, the functions $\mathcal{F}$ form a subring $S^{\mathcal{F}}\subset R^{\mathcal{F}}$. We show that the metric is Schwarzschild, AdS or dS if $\mathcal{F}\in S^{\mathcal{F}}$. When $\mathcal{F}\in R^{\mathcal{F}}$ but $\mathcal{F}\notin S^{\mathcal{F}}$, we will obtain some new metric solutions. Using the general formula and the basic property of function ring $R^{\mathcal{F}}$, we give some analytical examples and their phenomenological applications. Furthermore, we also discuss the stability of gravitational field by the analysis of Komar integral and the results of QNMs.

preprint2016arXiv

Binary and Multi-Bit Coding for Stable Random Projections

We develop efficient binary (i.e., 1-bit) and multi-bit coding schemes for estimating the scale parameter of $α$-stable distributions. The work is motivated by the recent work on one scan 1-bit compressed sensing (sparse signal recovery) using $α$-stable random projections, which requires estimating of the scale parameter at bits-level. Our technique can be naturally applied to data stream computations for estimating the $α$-th frequency moment. In fact, the method applies to the general scale family of distributions, not limited to $α$-stable distributions. Due to the heavy-tailed nature of $α$-stable distributions, using traditional estimators will potentially need many bits to store each measurement in order to ensure sufficient accuracy. Interestingly, our paper demonstrates that, using a simple closed-form estimator with merely 1-bit information does not result in a significant loss of accuracy if the parameter is chosen appropriately. For example, when $α=0+$, 1, and 2, the coefficients of the optimal estimation variances using full (i.e., infinite-bit) information are 1, 2, and 2, respectively. With the 1-bit scheme and appropriately chosen parameters, the corresponding variance coefficients are 1.544, $π^2/4$, and 3.066, respectively. Theoretical tail bounds are also provided. Using 2 or more bits per measurements reduces the estimation variance and importantly, stabilizes the estimate so that the variance is not sensitive to parameters. With look-up tables, the computational cost is minimal.

preprint2016arXiv

Bismuthylene Monolayer: A Promising Quantum Spin Hall Insulator with Large Band Gaps

By means of first-principles calculations, we predict a new 2D QSH insulator in the porous allotrope of Bismuth monolayer, bismuthylene, its dynamics stability being confirmed by phonon spectrum and molecular dynamics simulations. The analyses of electronic structures reveal that it is a native QSH state with a gap much as large as 0.29 eV at the Γ point, which is larger than the buckled (0.2 eV) and flattened (0.2 eV) bismuth, Bi4Br4 (0.18 eV), as well as stanene (0.1 eV), also more stable energetically than these systems. Interestingly, the bismuthylene has tunable band gaps and nontrivial band topology under strains within -6 - 5 % and electric fields up to 0.8 eV/Å. Furthermore, a tight-binding model is constructed to explain the low-energy physics behind band topology induced by spin-orbit coupling. We also propose a quantum well by sandwiching bismuthylene between two BN sheets and reveals that this structure remains topologically nontrivial with a sizeable band gap. This findings on QSH effect of bismuthylene provide a viable platform in new generation of dissipationless electronics and spintronics devices.

preprint2016arXiv

Black hole solutions in de Rham-Gabadadze-Tolley massive gravity

We present a detailed study of the static spherically symmetric solutions in de Rham-Gabadadze-Tolley (dRGT) theory. Since the diffeomorphism invariance can be restored by introducing the Stückelberg fields $ϕ^a$, there is new invariant $I^{ab}=g^{μν}\partial_μϕ^a\partial_νϕ^b$ in the massive gravity, which adds to the ones usually encountered in general relativity (GR). In the unitary gauge $ϕ^a=x^μδ_μ^a$, any inverse metric $g^{μν}$ that has divergence including the coordinate singularity in GR would exhibit a singularity in the invariant $I^{ab}$. Therefore, there is no conventional Schwarzschild metric if we choose unitary gauge. In this paper, we obtain a self-consistent static spherically symmetric ansatz in the nonunitary gauge. Under this ansatz, we find that there are seven solutions including the Schwarzschild solution, Reissner-Nordström solution and five other solutions. These solutions may possess an event horizon depending upon the physical parameters (Schwarzschild radius $r_s$, scalar charge $S$ and/or electric charge $Q$). If these solutions possess an event horizon, we show that the singularity of $I^{ab}$ is absent at the horizon. Therefore, these solutions may become candidates for black holes in dRGT.

preprint2016arXiv

Constrained Low-Rank Learning Using Least Squares-Based Regularization

Low-rank learning has attracted much attention recently due to its efficacy in a rich variety of real-world tasks, e.g., subspace segmentation and image categorization. Most low-rank methods are incapable of capturing low-dimensional subspace for supervised learning tasks, e.g., classification and regression. This paper aims to learn both the discriminant low-rank representation (LRR) and the robust projecting subspace in a supervised manner. To achieve this goal, we cast the problem into a constrained rank minimization framework by adopting the least squares regularization. Naturally, the data label structure tends to resemble that of the corresponding low-dimensional representation, which is derived from the robust subspace projection of clean data by low-rank learning. Moreover, the low-dimensional representation of original data can be paired with some informative structure by imposing an appropriate constraint, e.g., Laplacian regularizer. Therefore, we propose a novel constrained LRR method. The objective function is formulated as a constrained nuclear norm minimization problem, which can be solved by the inexact augmented Lagrange multiplier algorithm. Extensive experiments on image classification, human pose estimation, and robust face recovery have confirmed the superiority of our method.

preprint2016arXiv

Generalized Intersection Kernel

Following the very recent line of work on the ``generalized min-max'' (GMM) kernel, this study proposes the ``generalized intersection'' (GInt) kernel and the related ``normalized generalized min-max'' (NGMM) kernel. In computer vision, the (histogram) intersection kernel has been popular, and the GInt kernel generalizes it to data which can have both negative and positive entries. Through an extensive empirical classification study on 40 datasets from the UCI repository, we are able to show that this (tuning-free) GInt kernel performs fairly well. The empirical results also demonstrate that the NGMM kernel typically outperforms the GInt kernel. Interestingly, the NGMM kernel has another interpretation --- it is the ``asymmetrically transformed'' version of the GInt kernel, based on the idea of ``asymmetric hashing''. Just like the GMM kernel, the NGMM kernel can be efficiently linearized through (e.g.,) generalized consistent weighted sampling (GCWS), as empirically validated in our study. Owing to the discrete nature of hashed values, it also provides a scheme for approximate near neighbor search.

preprint2016arXiv

Large-gap quantum spin Hall state in functionalized dumbbell stanene

Two-dimensional dumbbell (DB) stanene has been proposed as a promising candidate material for realizing quantum spin Hall effect (QSHE) by Tang et al [P. Tang, P. Chen, W. Cao, H. Huang, S. Cahangirov, L. Xian, Y. Xu, S. C. Zhang, W. Duan, A. Rubio. Phys. Rev. B, 90, 121408 (2014)]. However, the small bulk-gap limits its possible applications at room temperature. Based on first-principles calculations, we predict that its band gap can be enhanced to 148 meV under methyl-functionalization, which can be further tuned by applying lattice strain. The QSHE is confirmed by s-px,y band inversion, topological invariant Z2 = 1, and helical gapless edge within bulk band gap. Notably, the characteristic properties of edge states, such as the large Fermi velocity and Dirac cone, can be modulated by edge modification. The effects of substrates on topological properties are explored when it is grown on various substrates, like SiC, h-BN, and Bi2Te3 sheets. These findings provide significant guidance for future fabrication and realistic applications of QSHE based on stanene in spintronics.

preprint2016arXiv

Linear signal recovery from $b$-bit-quantized linear measurements: precise analysis of the trade-off between bit depth and number of measurements

We consider the problem of recovering a high-dimensional structured signal from independent Gaussian linear measurements each of which is quantized to $b$ bits. Our interest is in linear approaches to signal recovery, where "linear" means that non-linearity resulting from quantization is ignored and the observations are treated as if they arose from a linear measurement model. Specifically, the focus is on a generalization of a method for one-bit observations due to Plan and Vershynin [\emph{IEEE~Trans. Inform. Theory, \textbf{59} (2013), 482--494}]. At the heart of the present paper is a precise characterization of the optimal trade-off between the number of measurements $m$ and the bit depth per measurement $b$ given a total budget of $B = m \cdot b$ bits when the goal is to minimize the $\ell_2$-error in estimating the signal. It turns out that the choice $b = 1$ is optimal for estimating the unit vector (direction) corresponding to the signal for any level of additive Gaussian noise before quantization as well as for a specific model of adversarial noise, while the choice $b = 2$ is optimal for estimating the direction and the norm (scale) of the signal. Moreover, Lloyd-Max quantization is shown to be an optimal quantization scheme w.r.t. $\ell_2$-estimation error. Our analysis is corroborated by numerical experiments showing nearly perfect agreement with our theoretical predictions. The paper is complemented by an empirical comparison to alternative methods of signal recovery taking the non-linearity resulting from quantization into account. The results of that comparison point to a regime change depending on the noise level: in a low-noise setting, linear signal recovery falls short of more sophisticated competitors while being competitive in moderate- and high-noise settings.

preprint2016arXiv

Methods for Sparse and Low-Rank Recovery under Simplex Constraints

The de-facto standard approach of promoting sparsity by means of $\ell_1$-regularization becomes ineffective in the presence of simplex constraints, i.e.,~the target is known to have non-negative entries summing up to a given constant. The situation is analogous for the use of nuclear norm regularization for low-rank recovery of Hermitian positive semidefinite matrices with given trace. In the present paper, we discuss several strategies to deal with this situation, from simple to more complex. As a starting point, we consider empirical risk minimization (ERM). It follows from existing theory that ERM enjoys better theoretical properties w.r.t.~prediction and $\ell_2$-estimation error than $\ell_1$-regularization. In light of this, we argue that ERM combined with a subsequent sparsification step like thresholding is superior to the heuristic of using $\ell_1$-regularization after dropping the sum constraint and subsequent normalization. At the next level, we show that any sparsity-promoting regularizer under simplex constraints cannot be convex. A novel sparsity-promoting regularization scheme based on the inverse or negative of the squared $\ell_2$-norm is proposed, which avoids shortcomings of various alternative methods from the literature. Our approach naturally extends to Hermitian positive semidefinite matrices with given trace. Numerical studies concerning compressed sensing, sparse mixture density estimation, portfolio optimization and quantum state tomography are used to illustrate the key points of the paper.

preprint2016arXiv

Nystrom Method for Approximating the GMM Kernel

The GMM (generalized min-max) kernel was recently proposed (Li, 2016) as a measure of data similarity and was demonstrated effective in machine learning tasks. In order to use the GMM kernel for large-scale datasets, the prior work resorted to the (generalized) consistent weighted sampling (GCWS) to convert the GMM kernel to linear kernel. We call this approach as ``GMM-GCWS''. In the machine learning literature, there is a popular algorithm which we call ``RBF-RFF''. That is, one can use the ``random Fourier features'' (RFF) to convert the ``radial basis function'' (RBF) kernel to linear kernel. It was empirically shown in (Li, 2016) that RBF-RFF typically requires substantially more samples than GMM-GCWS in order to achieve comparable accuracies. The Nystrom method is a general tool for computing nonlinear kernels, which again converts nonlinear kernels into linear kernels. We apply the Nystrom method for approximating the GMM kernel, a strategy which we name as ``GMM-NYS''. In this study, our extensive experiments on a set of fairly large datasets confirm that GMM-NYS is also a strong competitor of RBF-RFF.

preprint2016arXiv

On $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$-$(1+u)$-additive constacyclic

In this paper, we study $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$-$(1+u)$-additive constacyclic code of arbitrary length. Firstly, we study the algebraic structure of this family of codes and a set of generator polynomials for this family as a $(\mathbb{Z}_{2}+u\mathbb{Z}_{2})[x]$-submodule of the ring $R_{α,β}$. Secondly, we give the minimal generating sets of this family codes, and we determine the relationship of generators between the $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$-$(1+u)$-additive constacyclic codes and its dual and give the parameters in terms of the degrees of the generator polynomials of the code. Lastly, we also study $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$-$(1+u)$-additive constacyclic code in terms of the Gray images.

preprint2016arXiv

Online Optimization for Large-Scale Max-Norm Regularization

Max-norm regularizer has been extensively studied in the last decade as it promotes an effective low-rank estimation for the underlying data. However, such max-norm regularized problems are typically formulated and solved in a batch manner, which prevents it from processing big data due to possible memory budget. In this paper, hence, we propose an online algorithm that is scalable to large-scale setting. Particularly, we consider the matrix decomposition problem as an example, although a simple variant of the algorithm and analysis can be adapted to other important problems such as matrix completion. The crucial technique in our implementation is to reformulating the max-norm to an equivalent matrix factorization form, where the factors consist of a (possibly overcomplete) basis component and a coefficients one. In this way, we may maintain the basis component in the memory and optimize over it and the coefficients for each sample alternatively. Since the memory footprint of the basis component is independent of the sample size, our algorithm is appealing when manipulating a large collection of samples. We prove that the sequence of the solutions (i.e., the basis component) produced by our algorithm converges to a stationary point of the expected loss function asymptotically. Numerical study demonstrates encouraging results for the efficacy and robustness of our algorithm compared to the widely used nuclear norm solvers.

preprint2016arXiv

Relational Multi-Manifold Co-Clustering

Co-clustering targets on grouping the samples (e.g., documents, users) and the features (e.g., words, ratings) simultaneously. It employs the dual relation and the bilateral information between the samples and features. In many realworld applications, data usually reside on a submanifold of the ambient Euclidean space, but it is nontrivial to estimate the intrinsic manifold of the data space in a principled way. In this study, we focus on improving the co-clustering performance via manifold ensemble learning, which is able to maximally approximate the intrinsic manifolds of both the sample and feature spaces. To achieve this, we develop a novel co-clustering algorithm called Relational Multi-manifold Co-clustering (RMC) based on symmetric nonnegative matrix tri-factorization, which decomposes the relational data matrix into three submatrices. This method considers the intertype relationship revealed by the relational data matrix, and also the intra-type information reflected by the affinity matrices encoded on the sample and feature data distributions. Specifically, we assume the intrinsic manifold of the sample or feature space lies in a convex hull of some pre-defined candidate manifolds. We want to learn a convex combination of them to maximally approach the desired intrinsic manifold. To optimize the objective function, the multiplicative rules are utilized to update the submatrices alternatively. Besides, both the entropic mirror descent algorithm and the coordinate descent algorithm are exploited to learn the manifold coefficient vector. Extensive experiments on documents, images and gene expression data sets have demonstrated the superiority of the proposed algorithm compared to other well-established methods.

preprint2016arXiv

Robust Room-Temperature Quantum Spin Hall Effect in Methyl-functionalized InBi honeycomb film

Two-dimensional (2D) group-III-V honeycomb films have attracted significant interest for their potential application in fields of quantum computing and nanoeletronics. Searching for 2D III-V films with high structural stability and large-gap are crucial for the realizations of dissipationless transport edge states using quantum spin Hall (QSH) effect. Based on first-principles calculations, we predict that the methyl-functionalized InBi monolayer (InBiCH3) has no dynamic instability, and host a QSH state with a band gap as large as 0.29 eV, exhibiting an interesting electronic behavior viable for room-temperature applications. The topological characteristic is confirmed by s-pxy bands inversion, topological invariant Z2 number, and the time-reversal symmetry protected helical edge states. Noticeably, the QSH states are tunable and robust against the mechanical strain, electric field and different levels of methyl coverages. We also find that InBiCH3 supported on h-BN substrate maintains a nontrivial QSH state, which harbors the edge states lying within the band gap of substrate. These findings demonstrate that the methyl-functionalized III-V films may be a good QSH platform for device design and fabrication in spintronics.

preprint2016arXiv

Some results of linear codes over the ring $\mathbb{Z}_4+u\mathbb{Z}_4+v\mathbb{Z}_4+uv\mathbb{Z}_4$

In this paper, we mainly study the theory of linear codes over the ring $R =\mathbb{Z}_4+u\mathbb{Z}_4+v\mathbb{Z}_4+uv\mathbb{Z}_4$. By the Chinese Remainder Theorem, we have $R$ is isomorphic to the direct sum of four rings $\mathbb{Z}_4$. We define a Gray map $Φ$ from $R^{n}$ to $\mathbb{Z}_4^{4n}$, which is a distance preserving map. The Gray image of a cyclic code over $R^{n}$ is a linear code over $\mathbb{Z}_4$. Furthermore, we study the MacWilliams identities of linear codes over $R$ and give the the generator polynomials of cyclic codes over $R$. Finally, we discuss some properties of MDS codes over $R$.

preprint2016arXiv

Theory of the GMM Kernel

We develop some theoretical results for a robust similarity measure named "generalized min-max" (GMM). This similarity has direct applications in machine learning as a positive definite kernel and can be efficiently computed via probabilistic hashing. Owing to the discrete nature, the hashed values can also be used for efficient near neighbor search. We prove the theoretical limit of GMM and the consistency result, assuming that the data follow an elliptical distribution, which is a very general family of distributions and includes the multivariate $t$-distribution as a special case. The consistency result holds as long as the data have bounded first moment (an assumption which essentially holds for datasets commonly encountered in practice). Furthermore, we establish the asymptotic normality of GMM. Compared to the "cosine" similarity which is routinely adopted in current practice in statistics and machine learning, the consistency of GMM requires much weaker conditions. Interestingly, when the data follow the $t$-distribution with $ν$ degrees of freedom, GMM typically provides a better measure of similarity than "cosine" roughly when $ν<8$ (which is already very close to normal). These theoretical results will help explain the recent success of GMM in learning tasks.

preprint2016arXiv

Two-Dimensional Group-IV Chalcogenide Si2Te2 film: A New Quantum Spin Hall Insulator with Sizable Band Gap

Quantum spin Hall (QSH) effect is promising for achieving dissipationless transport devices due to the robust gapless states inside insulating bulk gap. Here, by using first-principles calculations, we discover group-IV chalcogenide Si2Te2 film to be a 2D QSH insulator with a fundamental band gap of 0.29 eV, which is tunable under external strain. This nontrivial topological phase stems from band inversion between the Si-px,y and Te-px,y orbitals, demonstrated by a single pair of topologically protected helical edge states with Dirac point locating in the bulk gap. Notably, the characteristic properties of edge states, such as the Fermi velocity and edge shape, can be tuned by edge modifications. Additionally, the h-BN semiconductor is an ideal substrate for experimental realization of 2D Si2Te2 film, without destroying its nontrivial topology. Our works open a new route for designing topological spintronics devices based on 2D silicon-based films.

preprint2016arXiv

Vaidya solution and its generalization in de Rham-Gabadadze-Tolley massive gravity

We present a detailed study of the Vaidya solution and its generalization in de Rham-Gabadadze-Tolley (dRGT) theory. Since the diffeomorphism invariance can be restored with the Stückelberg fields $ϕ^a$ introduced, there is a new invariant $I^{ab}=g^{μν}\partial_μϕ^a\partial_νϕ^b$ in the massive gravity, which adds to the ones usually encountered in general relativity. There is no conventional Vaidya solution if we choose unitary gauge. In this paper, we obtain three types of self-consistent ansatz with some nonunitary gauge, and find accordingly the Vaidya, generalized Vaidya and furry Vaidya solution. As by-products, we obtain a series of furry black hole. The Vaidya solution and its generalization in dRGT massive gravity describe the black holes with a variable horizon.

preprint2015arXiv

A new structure of two-dimensional allotropes of group V elements

The elemental two-dimensional (2D) materials such as graphene, silicene, germanene, and black phosphorus have attracted considerable attention due to their fascinating physical properties. Structurally they possess the honeycomb or distorted honeycomb lattices, which are composed of six-atom rings. Here we find a new structure of 2D allotropes of group V elements composed of eight-atom rings, which we name as the octagonal tiling (OT) structure. First-principles calculations indicate that these allotropes are dynamically stable and are also thermally stable at temperatures up to 600 K. These allotropes are semiconductors with band gaps ranging from 0.3 to 2.0 eV, thus they are potentially useful in near- and mid-infrared optoelectronic devices. OT-Bi is also a 2D topological insulator (TI) with a band gap of 0.33 eV, which is the largest among the reported elemental 2D TIs, and this gap can be increased further by applying compressive strains.

preprint2015arXiv

Cauchy-Schwarz-type inequalities on Kähler manifolds-II

We establish in this note some Cauchy-Schwarz-type inequalities on compact Kähler manifolds, which generalize the classical Khovanskii-Teissier inequalities to higher-dimensional cases. Our proof is to make full use of the mixed Hodge-Riemann bilinear relations due to Dinh and Nguy$\hat{\text{e}}$n. A proportionality problem related to our main result is also proposed.

preprint2015arXiv

Functionalized Thallium Antimony Films as Excellent Candidates for Large-Gap Quantum Spin Hall Insulator

Group III-V films are of great importance for their potential application in spintronics and quantum computing. Search for two-dimensional III-V films with a nontrivial large-gap are quite crucial for the realization of dissipationless transport edge channels using quantum spin Hall (QSH) effects. Here we use first-principles calculations to predict a class of large-gap QSH insulators in functionalized TlSb monolayers (TlSbX2; (X = H, F, Cl, Br, I)), with sizable bulk gaps as large as 0.22~0.40 eV. The QSH state is identified by Z2 topological invariant together with helical edge states induced by spin-orbit coupling (SOC). Noticeably, the inverted band gap in the nontrivial states can be effectively tuned by the electric field and strain. Additionally, these films on BN substrate also maintain a nontrivial QSH state, which harbors a Dirac cone lying within the band gap. These findings may shed new light in future design and fabrication of QSH insulators based on two-dimensional honeycomb lattices in spintronics.

preprint2015arXiv

Min-Max Kernels

The min-max kernel is a generalization of the popular resemblance kernel (which is designed for binary data). In this paper, we demonstrate, through an extensive classification study using kernel machines, that the min-max kernel often provides an effective measure of similarity for nonnegative data. As the min-max kernel is nonlinear and might be difficult to be used for industrial applications with massive data, we show that the min-max kernel can be linearized via hashing techniques. This allows practitioners to apply min-max kernel to large-scale applications using well matured linear algorithms such as linear SVM or logistic regression. The previous remarkable work on consistent weighted sampling (CWS) produces samples in the form of ($i^*, t^*$) where the $i^*$ records the location (and in fact also the weights) information analogous to the samples produced by classical minwise hashing on binary data. Because the $t^*$ is theoretically unbounded, it was not immediately clear how to effectively implement CWS for building large-scale linear classifiers. In this paper, we provide a simple solution by discarding $t^*$ (which we refer to as the "0-bit" scheme). Via an extensive empirical study, we show that this 0-bit scheme does not lose essential information. We then apply the "0-bit" CWS for building linear classifiers to approximate min-max kernel classifiers, as extensively validated on a wide range of publicly available classification datasets. We expect this work will generate interests among data mining practitioners who would like to efficiently utilize the nonlinear information of non-binary and nonnegative data.

preprint2015arXiv

New approach for fabrication germanene with Dirac electrons preserved: A first principle study

How to obtain germanene with Dirac electrons preserved is still an open challenge. Here we report a sandwich-dehydrogenation approach, i.e., to fabricate germanene through dehydrogenating germanane in a sandwiched structure. The dehydrogenation can spontaneously occur for the sandwiched structure, which overcomes the problem of amorphization in the heating dehydrogenation approach. The obtained germanene preserve the Dirac electronic properties very well. Moreover, the Fermi velocity of germanene can be efficiently manipulated through controlling the interlayer spacing between germanane and the sandwiching surfaces. Our results indicate a guideline for fabrication of prefect two-dimensional materials.

preprint2015arXiv

One Scan 1-Bit Compressed Sensing

Based on $α$-stable random projections with small $α$, we develop a simple algorithm for compressed sensing (sparse signal recovery) by utilizing only the signs (i.e., 1-bit) of the measurements. Using only 1-bit information of the measurements results in substantial cost reduction in collection, storage, communication, and decoding for compressed sensing. The proposed algorithm is efficient in that the decoding procedure requires only one scan of the coordinates. Our analysis can precisely show that, for a $K$-sparse signal of length $N$, $12.3K\log N/δ$ measurements (where $δ$ is the confidence) would be sufficient for recovering the support and the signs of the signal. While the method is very robust against typical measurement noises, we also provide the analysis of the scheme under random flipping of the signs of the measurements. \noindent Compared to the well-known work on 1-bit marginal regression (which can also be viewed as a one-scan method), the proposed algorithm requires orders of magnitude fewer measurements. Compared to 1-bit Iterative Hard Thresholding (IHT) (which is not a one-scan algorithm), our method is still significantly more accurate. Furthermore, the proposed method is reasonably robust against random sign flipping while IHT is known to be very sensitive to this type of noise.

preprint2015arXiv

Recovery of Sparse Signals Using Multiple Orthogonal Least Squares

We study the problem of recovering sparse signals from compressed linear measurements. This problem, often referred to as sparse recovery or sparse reconstruction, has generated a great deal of interest in recent years. To recover the sparse signals, we propose a new method called multiple orthogonal least squares (MOLS), which extends the well-known orthogonal least squares (OLS) algorithm by allowing multiple $L$ indices to be chosen per iteration. Owing to inclusion of multiple support indices in each selection, the MOLS algorithm converges in much fewer iterations and improves the computational efficiency over the conventional OLS algorithm. Theoretical analysis shows that MOLS ($L > 1$) performs exact recovery of all $K$-sparse signals within $K$ iterations if the measurement matrix satisfies the restricted isometry property (RIP) with isometry constant $δ_{LK} < \frac{\sqrt{L}}{\sqrt{K} + 2 \sqrt{L}}.$ The recovery performance of MOLS in the noisy scenario is also studied. It is shown that stable recovery of sparse signals can be achieved with the MOLS algorithm when the signal-to-noise ratio (SNR) scales linearly with the sparsity level of input signals.

preprint2015arXiv

Recovery of Sparse Signals via Generalized Orthogonal Matching Pursuit: A New Analysis

As an extension of orthogonal matching pursuit (OMP) improving the recovery performance of sparse signals, generalized OMP (gOMP) has recently been studied in the literature. In this paper, we present a new analysis of the gOMP algorithm using restricted isometry property (RIP). We show that if the measurement matrix $\mathbfΦ \in \mathcal{R}^{m \times n}$ satisfies the RIP with $$δ_{\max \left\{9, S + 1 \right\}K} \leq \frac{1}{8},$$ then gOMP performs stable reconstruction of all $K$-sparse signals $\mathbf{x} \in \mathcal{R}^n$ from the noisy measurements $\mathbf{y} = \mathbf{Φx} + \mathbf{v}$ within $\max \left\{K, \left\lfloor \frac{8K}{S} \right\rfloor \right\}$ iterations where $\mathbf{v}$ is the noise vector and $S$ is the number of indices chosen in each iteration of the gOMP algorithm. For Gaussian random measurements, our results indicate that the number of required measurements is essentially $m = \mathcal{O}(K \log \frac{n}{K})$, which is a significant improvement over the existing result $m = \mathcal{O}(K^2 \log \frac{n}{K})$, especially for large $K$.

preprint2015arXiv

Reexamining generalized Chaplygin gas with the sign-changeable interaction

In this paper, we reexamine the generalized Chaplygin gas (GCG) cosmology with the sign-changeable interaction. The dynamical analysis show that there exists de-Sitter attractors in this model, which means that the late-time behaviors of the model is insensitive to the initial condition and thus alleviates the coincidence problem. Furthermore, we find that this interaction oscillates and tends to zero finally by numerical calculations of the nonlinear equations. In some specific cases of initial conditions, the interaction is positive (the energy transition from dark energy to dark matter) at high redshift while it is negative (the energy transition from dark matter to dark energy) at low redshift for suitable ranges of the parameters.

preprint2015arXiv

Regularization-free estimation in trace regression with symmetric positive semidefinite matrices

Over the past few years, trace regression models have received considerable attention in the context of matrix completion, quantum state tomography, and compressed sensing. Estimation of the underlying matrix from regularization-based approaches promoting low-rankedness, notably nuclear norm regularization, have enjoyed great popularity. In the present paper, we argue that such regularization may no longer be necessary if the underlying matrix is symmetric positive semidefinite (\textsf{spd}) and the design satisfies certain conditions. In this situation, simple least squares estimation subject to an \textsf{spd} constraint may perform as well as regularization-based approaches with a proper choice of the regularization parameter, which entails knowledge of the noise level and/or tuning. By contrast, constrained least squares estimation comes without any tuning parameter and may hence be preferred due to its simplicity.

preprint2015arXiv

Sign Stable Random Projections for Large-Scale Learning

We study the use of "sign $α$-stable random projections" (where $0<α\leq 2$) for building basic data processing tools in the context of large-scale machine learning applications (e.g., classification, regression, clustering, and near-neighbor search). After the processing by sign stable random projections, the inner products of the processed data approximate various types of nonlinear kernels depending on the value of $α$. Thus, this approach provides an effective strategy for approximating nonlinear learning algorithms essentially at the cost of linear learning. When $α=2$, it is known that the corresponding nonlinear kernel is the arc-cosine kernel. When $α=1$, the procedure approximates the arc-cos-$χ^2$ kernel (under certain condition). When $α\rightarrow0+$, it corresponds to the resemblance kernel. From practitioners' perspective, the method of sign $α$-stable random projections is ready to be tested for large-scale learning applications, where $α$ can be simply viewed as a tuning parameter. What is missing in the literature is an extensive empirical study to show the effectiveness of sign stable random projections, especially for $α\neq 2$ or 1. The paper supplies such a study on a wide variety of classification datasets. In particular, we compare shoulder-by-shoulder sign stable random projections with the recently proposed "0-bit consistent weighted sampling (CWS)" (Li 2015).

preprint2015arXiv

Tests for High-Dimensional Covariance Matrices Using Random Matrix Projection

The classic likelihood ratio test for testing the equality of two covariance matrices breakdowns due to the singularity of the sample covariance matrices when the data dimension $p$ is larger than the sample size $n$. In this paper, we present a conceptually simple method using random projection to project the data onto the one-dimensional random subspace so that the conventional methods can be applied. Both one-sample and two-sample tests for high-dimensional covariance matrices are studied. Asymptotic results are established and numerical results are given to compare our method with state-of-the-art methods in the literature.

preprint2015arXiv

Tunable Quantum Spin Hall Effect via Strain in two-Dimensional Arsenene Monolayer

The search for new quantum spin Hall (QSH) phase and effective manipulations of their edge states are very important for both fundamental sciences and practical applications. Here, we use first-principles calculations to study the strain-driven topological phase transition of two-dimensional (2D) arsenene monolayer. We find that the band gap of arsenene decreases with increasing strain and changes from indirect to direct, and then the s-p band inversion takes place at Γ point as the tensile strain is larger than 11.14%, which lead to a nontrivially topological state. A single pair of topologically protected helical edge states is established for the edge of arsenene, and their QSH states are confirmed with nontrivial topological invariant Z2 = 1. We also propose high-dielectric BN as an ideal substrate for the experimental synthesis of arsenene, maintaining its nontrivial topology. These findings provide a promising candidate platform for topological phenomena and new quantum devices operating at nanoelectronics.

preprint2014arXiv

-1-Phenomena for the pluri $χ_y$-genus and elliptic genus

Several independent articles have observed that the Hirzebruch $χ_y$-genus has an important feature, which the author calls -1-phenomenon and tells us that the coefficients of the Taylor expansion of the $χ_y$-genus at $y=-1$ have explicit expressions. Hirzebruch's original $χ_y$-genus can be extended towards two directions: the pluri-case and the case of elliptic genus. This paper contains two parts in which we investigate the -1-phenomena in these two generalized cases respectively and show that in each case there exists a -1-phenomenon in a suitable sense. Our main results in the first part have an application, which states that all characteristic numbers (Chern numbers and Pontrjagin numbers) on manifolds can be expressed, in a very explicit way, in terms of some rationally linear combination of indices of some elliptic operators. This gives an analytic interpretation of characteristic numbers and affirmatively answers a question posed by the author several years ago. The second part contains our attempt to generalize this -1-phenomenon to elliptic genus, a modern version of the $χ_y$-genus. We first extend the elliptic genus of an almost-complex manifold to a twisted version where an extra complex vector bundle is involved, and show that it is a weak Jacobi form under some assumptions. A suitable manipulation on the theory of Jacobi form will produce new modular forms from this weak Jacobi form and thus much arithmetic information related to the underlying manifold can be obtained, in which the -1-phenomenon of the original $χ_y$-genus is hidden.

preprint2014arXiv

A New Space for Comparing Graphs

Finding a new mathematical representations for graph, which allows direct comparison between different graph structures, is an open-ended research direction. Having such a representation is the first prerequisite for a variety of machine learning algorithms like classification, clustering, etc., over graph datasets. In this paper, we propose a symmetric positive semidefinite matrix with the $(i,j)$-{th} entry equal to the covariance between normalized vectors $A^ie$ and $A^je$ ($e$ being vector of all ones) as a representation for graph with adjacency matrix $A$. We show that the proposed matrix representation encodes the spectrum of the underlying adjacency matrix and it also contains information about the counts of small sub-structures present in the graph such as triangles and small paths. In addition, we show that this matrix is a \emph{"graph invariant"}. All these properties make the proposed matrix a suitable object for representing graphs. The representation, being a covariance matrix in a fixed dimensional metric space, gives a mathematical embedding for graphs. This naturally leads to a measure of similarity on graph objects. We define similarity between two given graphs as a Bhattacharya similarity measure between their corresponding covariance matrix representations. As shown in our experimental study on the task of social network classification, such a similarity measure outperforms other widely used state-of-the-art methodologies. Our proposed method is also computationally efficient. The computation of both the matrix representation and the similarity value can be performed in operations linear in the number of edges. This makes our method scalable in practice. We believe our theoretical and empirical results provide evidence for studying truncated power iterations, of the adjacency matrix, to characterize social networks.

preprint2014arXiv

Adaptive Stochastic Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers (ADMM) has been studied for years. The traditional ADMM algorithm needs to compute, at each iteration, an (empirical) expected loss function on all training examples, resulting in a computational complexity proportional to the number of training examples. To reduce the time complexity, stochastic ADMM algorithms were proposed to replace the expected function with a random loss function associated with one uniformly drawn example plus a Bregman divergence. The Bregman divergence, however, is derived from a simple second order proximal function, the half squared norm, which could be a suboptimal choice. In this paper, we present a new family of stochastic ADMM algorithms with optimal second order proximal functions, which produce a new family of adaptive subgradient methods. We theoretically prove that their regret bounds are as good as the bounds which could be achieved by the best proximal function that can be chosen in hindsight. Encouraging empirical results on a variety of real-world datasets confirm the effectiveness and efficiency of the proposed algorithms.

preprint2014arXiv

Advancing Matrix Completion by Modeling Extra Structures beyond Low-Rankness

A well-known method for completing low-rank matrices based on convex optimization has been established by Cand{è}s and Recht. Although theoretically complete, the method may not entirely solve the low-rank matrix completion problem. This is because the method captures only the low-rankness property which gives merely a rough constraint that the data points locate on some low-dimensional subspace, but generally ignores the extra structures which specify in more detail how the data points locate on the subspace. Whenever the geometric distribution of the data points is not uniform, the coherence parameters of data might be large and, accordingly, the method might fail even if the latent matrix we want to recover is fairly low-rank. To better handle non-uniform data, in this paper we propose a method termed Low-Rank Factor Decomposition (LRFD), which imposes an additional restriction that the data points must be represented as linear combinations of the bases in a dictionary constructed or learnt in advance. We show that LRFD can well handle non-uniform data, provided that the dictionary is configured properly: We mathematically prove that if the dictionary itself is low-rank then LRFD is immune to the coherence parameters which might be large on non-uniform data. This provides an elementary principle for learning the dictionary in LRFD and, naturally, leads to a practical algorithm for advancing matrix completion. Extensive experiments on randomly generated matrices and motion datasets show encouraging results.

preprint2014arXiv

Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)

We present the first provably sublinear time algorithm for approximate \emph{Maximum Inner Product Search} (MIPS). Our proposal is also the first hashing algorithm for searching with (un-normalized) inner product as the underlying similarity measure. Finding hashing schemes for MIPS was considered hard. We formally show that the existing Locality Sensitive Hashing (LSH) framework is insufficient for solving MIPS, and then we extend the existing LSH framework to allow asymmetric hashing schemes. Our proposal is based on an interesting mathematical phenomenon in which inner products, after independent asymmetric transformations, can be converted into the problem of approximate near neighbor search. This key observation makes efficient sublinear hashing scheme for MIPS possible. In the extended asymmetric LSH (ALSH) framework, we provide an explicit construction of provably fast hashing scheme for MIPS. The proposed construction and the extended LSH framework could be of independent theoretical interest. Our proposed algorithm is simple and easy to implement. We evaluate the method, for retrieving inner products, in the collaborative filtering task of item recommendations on Netflix and Movielens datasets.

preprint2014arXiv

Asymmetric Minwise Hashing

Minwise hashing (Minhash) is a widely popular indexing scheme in practice. Minhash is designed for estimating set resemblance and is known to be suboptimal in many applications where the desired measure is set overlap (i.e., inner product between binary vectors) or set containment. Minhash has inherent bias towards smaller sets, which adversely affects its performance in applications where such a penalization is not desirable. In this paper, we propose asymmetric minwise hashing (MH-ALSH), to provide a solution to this problem. The new scheme utilizes asymmetric transformations to cancel the bias of traditional minhash towards smaller sets, making the final "collision probability" monotonic in the inner product. Our theoretical comparisons show that for the task of retrieving with binary inner products asymmetric minhash is provably better than traditional minhash and other recently proposed hashing algorithms for general inner products. Thus, we obtain an algorithmic improvement over existing approaches in the literature. Experimental evaluations on four publicly available high-dimensional datasets validate our claims and the proposed scheme outperforms, often significantly, other hashing algorithms on the task of near neighbor retrieval with set containment. Our proposal is simple and easy to implement in practice.

preprint2014arXiv

Characterizations of minimal graphs with equal edge connectivity and spanning tree packing number

With graphs considered as natural models for many network design problems, edge connectivity $κ'(G)$ and maximum number of edge-disjoint spanning trees $τ(G)$ of a graph $G$ have been used as measures for reliability and strength in communication networks modeled as graph $G$ (see \cite{Cunn85, Matula87}, among others). Mader \cite{Mader71} and Matula \cite{Matula72} introduced the maximum subgraph edge connectivity $\overline{κ'}(G)=\max \{κ'(H): H \mbox{ is a subgraph of } G \}$. Motivated by their applications in network design and by the established inequalities \[ \overline{κ'}(G)\ge κ'(G) \ge τ(G), \] we present the following in this paper: (i) For each integer $k>0$, a characterization for graphs $G$ with the property that $\overline{κ'}(G) \le k$ but for any edge $e$ not in $G$, $\overline{κ'}(G+e)\ge k+1$. (ii) For any integer $n > 0$, a characterization for graphs $G$ with $|V(G)| = n$ such that $κ'(G) = τ(G)$ with $|E(G)|$ minimized.

preprint2014arXiv

Coding for Random Projections and Approximate Near Neighbor Search

This technical note compares two coding (quantization) schemes for random projections in the context of sub-linear time approximate near neighbor search. The first scheme is based on uniform quantization while the second scheme utilizes a uniform quantization plus a uniformly random offset (which has been popular in practice). The prior work compared the two schemes in the context of similarity estimation and training linear classifiers, with the conclusion that the step of random offset is not necessary and may hurt the performance (depending on the similarity level). The task of near neighbor search is related to similarity estimation with importance distinctions and requires own study. In this paper, we demonstrate that in the context of near neighbor search, the step of random offset is not needed either and may hurt the performance (sometimes significantly so, depending on the similarity and other parameters).

preprint2014arXiv

Compressed Sensing with Very Sparse Gaussian Random Projections

We study the use of very sparse random projections for compressed sensing (sparse signal recovery) when the signal entries can be either positive or negative. In our setting, the entries of a Gaussian design matrix are randomly sparsified so that only a very small fraction of the entries are nonzero. Our proposed decoding algorithm is simple and efficient in that the major cost is one linear scan of the coordinates. We have developed two estimators: (i) the {\em tie estimator}, and (ii) the {\em absolute minimum estimator}. Using only the tie estimator, we are able to recover a $K$-sparse signal of length $N$ using $1.551 eK \log K/δ$ measurements (where $δ\leq 0.05$ is the confidence). Using only the absolute minimum estimator, we can detect the support of the signal using $eK\log N/δ$ measurements. For a particular coordinate, the absolute minimum estimator requires fewer measurements (i.e., with a constant $e$ instead of $1.551e$). Thus, the two estimators can be combined to form an even more practical decoding framework. Prior studies have shown that existing one-scan (or roughly one-scan) recovery algorithms using sparse matrices would require substantially more (e.g., one order of magnitude) measurements than L1 decoding by linear programming, when the nonzero entries of signals can be either negative or positive. In this paper, following a known experimental setup, we show that, at the same number of measurements, the recovery accuracies of our proposed method are (at least) similar to the standard L1 decoding.

preprint2014arXiv

CoRE Kernels

The term "CoRE kernel" stands for correlation-resemblance kernel. In many applications (e.g., vision), the data are often high-dimensional, sparse, and non-binary. We propose two types of (nonlinear) CoRE kernels for non-binary sparse data and demonstrate the effectiveness of the new kernels through a classification experiment. CoRE kernels are simple with no tuning parameters. However, training nonlinear kernel SVM can be (very) costly in time and memory and may not be suitable for truly large-scale industrial applications (e.g. search). In order to make the proposed CoRE kernels more practical, we develop basic probabilistic hashing algorithms which transform nonlinear kernels into linear kernels.

preprint2014arXiv

Epitaxial growth of large-gap quantum spin Hall insulator on semiconductor surface

Formation of topological quantum phase on conventional semiconductor surface is of both scientific and technological interest. Here, we demonstrate epitaxial growth of 2D topological insulator, i.e. quantum spin Hall (QSH) state, on Si(111) surface with a large energy gap, based on first-principles calculations. We show that Si(111) surface functionalized with 1/3 monolayer of halogen atoms [Si(111)-sqrt(3) x sqrt(3)-X (X=Cl, Br, I)] exhibiting a trigonal superstructure, provides an ideal template for epitaxial growth of heavy metals, such as Bi, which self-assemble into a hexagonal lattice with high kinetic and thermodynamic stability. Most remarkably, the Bi overlayer is "atomically" bonded to but "electronically" decoupled from the underlying Si substrate, exhibiting isolated QSH state with an energy gap as large as 0.8 eV. This surprising phenomenon is originated from an intriguing substrate orbital filtering effect, which critically select the orbital composition around the Fermi level leading to different topological phases. Particularly, the substrate-orbital-filtering effect converts the otherwise topologically trivial freestanding Bi lattice into a nontrivial phase; while the reverse is true for Au lattice. The underlying physical mechanism is generally applicable, opening a new and exciting avenue for exploration of large-gap topological surface/interface states.

preprint2014arXiv

Graph Kernels via Functional Embedding

We propose a representation of graph as a functional object derived from the power iteration of the underlying adjacency matrix. The proposed functional representation is a graph invariant, i.e., the functional remains unchanged under any reordering of the vertices. This property eliminates the difficulty of handling exponentially many isomorphic forms. Bhattacharyya kernel constructed between these functionals significantly outperforms the state-of-the-art graph kernels on 3 out of the 4 standard benchmark graph classification datasets, demonstrating the superiority of our approach. The proposed methodology is simple and runs in time linear in the number of edges, which makes our kernel more efficient and scalable compared to many widely adopted graph kernels with running time cubic in the number of vertices.

preprint2014arXiv

Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS)

Recently it was shown that the problem of Maximum Inner Product Search (MIPS) is efficient and it admits provably sub-linear hashing algorithms. Asymmetric transformations before hashing were the key in solving MIPS which was otherwise hard. In the prior work, the authors use asymmetric transformations which convert the problem of approximate MIPS into the problem of approximate near neighbor search which can be efficiently solved using hashing. In this work, we provide a different transformation which converts the problem of approximate MIPS into the problem of approximate cosine similarity search which can be efficiently solved using signed random projections. Theoretical analysis show that the new scheme is significantly better than the original scheme for MIPS. Experimental evaluations strongly support the theoretical findings.

preprint2014arXiv

Improved Densification of One Permutation Hashing

The existing work on densification of one permutation hashing reduces the query processing cost of the $(K,L)$-parameterized Locality Sensitive Hashing (LSH) algorithm with minwise hashing, from $O(dKL)$ to merely $O(d + KL)$, where $d$ is the number of nonzeros of the data vector, $K$ is the number of hashes in each hash table, and $L$ is the number of hash tables. While that is a substantial improvement, our analysis reveals that the existing densification scheme is sub-optimal. In particular, there is no enough randomness in that procedure, which affects its accuracy on very sparse datasets. In this paper, we provide a new densification procedure which is provably better than the existing scheme. This improvement is more significant for very sparse datasets which are common over the web. The improved technique has the same cost of $O(d + KL)$ for query processing, thereby making it strictly preferable over the existing procedure. Experimental evaluations on public datasets, in the task of hashing based near neighbor search, support our theoretical findings.

preprint2014arXiv

In Defense of MinHash Over SimHash

MinHash and SimHash are the two widely adopted Locality Sensitive Hashing (LSH) algorithms for large-scale data processing applications. Deciding which LSH to use for a particular problem at hand is an important question, which has no clear answer in the existing literature. In this study, we provide a theoretical answer (validated by experiments) that MinHash virtually always outperforms SimHash when the data are binary, as common in practice such as search. The collision probability of MinHash is a function of resemblance similarity ($\mathcal{R}$), while the collision probability of SimHash is a function of cosine similarity ($\mathcal{S}$). To provide a common basis for comparison, we evaluate retrieval results in terms of $\mathcal{S}$ for both MinHash and SimHash. This evaluation is valid as we can prove that MinHash is a valid LSH with respect to $\mathcal{S}$, by using a general inequality $\mathcal{S}^2\leq \mathcal{R}\leq \frac{\mathcal{S}}{2-\mathcal{S}}$. Our worst case analysis can show that MinHash significantly outperforms SimHash in high similarity region. Interestingly, our intensive experiments reveal that MinHash is also substantially better than SimHash even in datasets where most of the data points are not too similar to each other. This is partly because, in practical data, often $\mathcal{R}\geq \frac{\mathcal{S}}{z-\mathcal{S}}$ holds where $z$ is only slightly larger than 2 (e.g., $z\leq 2.1$). Our restricted worst case analysis by assuming $\frac{\mathcal{S}}{z-\mathcal{S}}\leq \mathcal{R}\leq \frac{\mathcal{S}}{2-\mathcal{S}}$ shows that MinHash indeed significantly outperforms SimHash even in low similarity region. We believe the results in this paper will provide valuable guidelines for search in practice, especially when the data are sparse.

preprint2014arXiv

Multi-label ensemble based on variable pairwise constraint projection

Multi-label classification has attracted an increasing amount of attention in recent years. To this end, many algorithms have been developed to classify multi-label data in an effective manner. However, they usually do not consider the pairwise relations indicated by sample labels, which actually play important roles in multi-label classification. Inspired by this, we naturally extend the traditional pairwise constraints to the multi-label scenario via a flexible thresholding scheme. Moreover, to improve the generalization ability of the classifier, we adopt a boosting-like strategy to construct a multi-label ensemble from a group of base classifiers. To achieve these goals, this paper presents a novel multi-label classification framework named Variable Pairwise Constraint projection for Multi-label Ensemble (VPCME). Specifically, we take advantage of the variable pairwise constraint projection to learn a lower-dimensional data representation, which preserves the correlations between samples and labels. Thereafter, the base classifiers are trained in the new data space. For the boosting-like strategy, we employ both the variable pairwise constraints and the bootstrap steps to diversify the base classifiers. Empirical studies have shown the superiority of the proposed method in comparison with other approaches.

preprint2014arXiv

RAPTT: An Exact Two-Sample Test in High Dimensions Using Random Projections

In high dimensions, the classical Hotelling's $T^2$ test tends to have low power or becomes undefined due to singularity of the sample covariance matrix. In this paper, this problem is overcome by projecting the data matrix onto lower dimensional subspaces through multiplication by random matrices. We propose RAPTT (RAndom Projection T-Test), an exact test for equality of means of two normal populations based on projected lower dimensional data. RAPTT does not require any constraints on the dimension of the data or the sample size. A simulation study indicates that in high dimensions the power of this test is often greater than that of competing tests. The advantage of RAPTT is illustrated on high-dimensional gene expression data involving the discrimination of tumor and normal colon tissues.

preprint2014arXiv

Recovery of Coherent Data via Low-Rank Dictionary Pursuit

The recently established RPCA method provides us a convenient way to restore low-rank matrices from grossly corrupted observations. While elegant in theory and powerful in reality, RPCA may be not an ultimate solution to the low-rank matrix recovery problem. Indeed, its performance may not be perfect even when data are strictly low-rank. This is because conventional RPCA ignores the clustering structures of the data which are ubiquitous in modern applications. As the number of cluster grows, the coherence of data keeps increasing, and accordingly, the recovery performance of RPCA degrades. We show that the challenges raised by coherent data (i.e., the data with high coherence) could be alleviated by Low-Rank Representation (LRR), provided that the dictionary in LRR is configured appropriately. More precisely, we mathematically prove that if the dictionary itself is low-rank then LRR is immune to the coherence parameter which increases with the underlying cluster number. This provides an elementary principle for dealing with coherent data. Subsequently, we devise a practical algorithm to obtain proper dictionaries in unsupervised environments. Our extensive experiments on randomly generated matrices verify our claims.

preprint2013arXiv

Bayesian Ultrahigh-Dimensional Screening Via MCMC

We explore the theoretical and numerical property of a fully Bayesian model selection method in sparse ultrahigh-dimensional settings, i.e., $p\gg n$, where $p$ is the number of covariates and $n$ is the sample size. Our method consists of (1) a hierarchical Bayesian model with a novel prior placed over the model space which includes a hyperparameter $t_n$ controlling the model size, and (2) an efficient MCMC algorithm for automatic and stochastic search of the models. Our theory shows that, when specifying $t_n$ correctly, the proposed method yields selection consistency, i.e., the posterior probability of the true model asymptotically approaches one; when $t_n$ is misspecified, the selected model is still asymptotically nested in the true model. The theory also reveals insensitivity of the selection result with respect to the choice of $t_n$. In implementations, a reasonable prior is further assumed on $t_n$ which allows us to draw its samples stochastically. Our approach conducts selection, estimation and even inference in a unified framework. No additional prescreening or dimension reduction step is needed. Two novel $g$-priors are proposed to make our approach more flexible. A simulation study is given to display the numerical advantage of our method.

preprint2013arXiv

Coding for Random Projections

The method of random projections has become very popular for large-scale applications in statistical learning, information retrieval, bio-informatics and other applications. Using a well-designed coding scheme for the projected data, which determines the number of bits needed for each projected value and how to allocate these bits, can significantly improve the effectiveness of the algorithm, in storage cost as well as computational speed. In this paper, we study a number of simple coding schemes, focusing on the task of similarity estimation and on an application to training linear classifiers. We demonstrate that uniform quantization outperforms the standard existing influential method (Datar et. al. 2004). Indeed, we argue that in many cases coding with just a small number of bits suffices. Furthermore, we also develop a non-uniform 2-bit coding scheme that generally performs well in practice, as confirmed by our experiments on training linear support vector machines (SVM).

preprint2013arXiv

Compressed Counting Meets Compressed Sensing

Compressed sensing (sparse signal recovery) has been a popular and important research topic in recent years. By observing that natural signals are often nonnegative, we propose a new framework for nonnegative signal recovery using Compressed Counting (CC). CC is a technique built on maximally-skewed p-stable random projections originally developed for data stream computations. Our recovery procedure is computationally very efficient in that it requires only one linear scan of the coordinates. Our analysis demonstrates that, when 0<p<=0.5, it suffices to use M= O(C/eps^p log N) measurements so that all coordinates will be recovered within eps additive precision, in one scan of the coordinates. The constant C=1 when p->0 and C=pi/2 when p=0.5. In particular, when p->0 the required number of measurements is essentially M=K\log N, where K is the number of nonzero coordinates of the signal.

preprint2013arXiv

Effect of sampling on the estimation of drift parameter of continuous time AR(1) processes

We study the effect of stochastic sampling on the estimation of the drift parameter of continuous time AR(1) process. A natural distribution free moment estimator is considered for the drift based on stochastically observed time points. The effect of the constraint of the minimum separation between successive samples on the estimation of the drift is studied.

preprint2013arXiv

Exact Sparse Recovery with L0 Projections

Many applications concern sparse signals, for example, detecting anomalies from the differences between consecutive images taken by surveillance cameras. This paper focuses on the problem of recovering a K-sparse signal x in N dimensions. In the mainstream framework of compressed sensing (CS), the vector x is recovered from M non-adaptive linear measurements y = xS, where S (of size N x M) is typically a Gaussian (or Gaussian-like) design matrix, through some optimization procedure such as linear programming (LP). In our proposed method, the design matrix S is generated from an $α$-stable distribution with $α\approx 0$. Our decoding algorithm mainly requires one linear scan of the coordinates, followed by a few iterations on a small number of coordinates which are "undetermined" in the previous iteration. Comparisons with two strong baselines, linear programming (LP) and orthogonal matching pursuit (OMP), demonstrate that our algorithm can be significantly faster in decoding speed and more accurate in recovery quality, for the task of exact spare recovery. Our procedure is robust against measurement noise. Even when there are no sufficient measurements, our algorithm can still reliably recover a significant portion of the nonzero coordinates. To provide the intuition for understanding our method, we also analyze the procedure by assuming an idealistic setting. Interestingly, when K=2, the "idealized" algorithm achieves exact recovery with merely 3 measurements, regardless of N. For general K, the required sample size of the "idealized" algorithm is about 5K.

preprint2013arXiv

Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization

Hard Thresholding Pursuit (HTP) is an iterative greedy selection procedure for finding sparse solutions of underdetermined linear systems. This method has been shown to have strong theoretical guarantee and impressive numerical performance. In this paper, we generalize HTP from compressive sensing to a generic problem setup of sparsity-constrained convex optimization. The proposed algorithm iterates between a standard gradient descent step and a hard thresholding step with or without debiasing. We prove that our method enjoys the strong guarantees analogous to HTP in terms of rate of convergence and parameter estimation accuracy. Numerical evidences show that our method is superior to the state-of-the-art greedy selection methods in sparse logistic regression and sparse precision matrix estimation tasks.

preprint2013arXiv

High-Dimensional Bayesian Inference in Nonparametric Additive Models

A fully Bayesian approach is proposed for ultrahigh-dimensional nonparametric additive models in which the number of additive components may be larger than the sample size, though ideally the true model is believed to include only a small number of components. Bayesian approaches can conduct stochastic model search and fulfill flexible parameter estimation by stochastic draws. The theory shows that the proposed model selection method has satisfactory properties. For instance, when the hyperparameter associated with the model prior is correctly specified, the true model has posterior probability approaching one as the sample size goes to infinity; when this hyperparameter is incorrectly specified, the selected model is still acceptable since asymptotically it is proved to be nested in the true model. To enhance model flexibility, two new $g$-priors are proposed and their theoretical performance is examined. We also propose an efficient MCMC algorithm to handle the computational issues. Several simulation examples are provided to demonstrate the computational advantages of our method.

preprint2013arXiv

Learning Pairwise Graphical Models with Nonlinear Sufficient Statistics

We investigate a generic problem of learning pairwise exponential family graphical models with pairwise sufficient statistics defined by a global mapping function, e.g., Mercer kernels. This subclass of pairwise graphical models allow us to flexibly capture complex interactions among variables beyond pairwise product. We propose two $\ell_1$-norm penalized maximum likelihood estimators to learn the model parameters from i.i.d. samples. The first one is a joint estimator which estimates all the parameters simultaneously. The second one is a node-wise conditional estimator which estimates the parameters individually for each node. For both estimators, we show that under proper conditions the extra flexibility gained in our model comes at almost no cost of statistical and computational efficiency. We demonstrate the advantages of our model over state-of-the-art methods on synthetic and real datasets.

preprint2013arXiv

Sign Stable Projections, Sign Cauchy Projections and Chi-Square Kernels

The method of stable random projections is popular for efficiently computing the Lp distances in high dimension (where 0<p<=2), using small space. Because it adopts nonadaptive linear projections, this method is naturally suitable when the data are collected in a dynamic streaming fashion (i.e., turnstile data streams). In this paper, we propose to use only the signs of the projected data and analyze the probability of collision (i.e., when the two signs differ). We derive a bound of the collision probability which is exact when p=2 and becomes less sharp when p moves away from 2. Interestingly, when p=1 (i.e., Cauchy random projections), we show that the probability of collision can be accurately approximated as functions of the chi-square similarity. For example, when the (un-normalized) data are binary, the maximum approximation error of the collision probability is smaller than 0.0192. In text and vision applications, the chi-square similarity is a popular measure for nonnegative data when the features are generated from histograms. Our experiments confirm that the proposed method is promising for large-scale learning applications.

preprint2013arXiv

Sparse Recovery with Very Sparse Compressed Counting

Compressed sensing (sparse signal recovery) often encounters nonnegative data (e.g., images). Recently we developed the methodology of using (dense) Compressed Counting for recovering nonnegative K-sparse signals. In this paper, we adopt very sparse Compressed Counting for nonnegative signal recovery. Our design matrix is sampled from a maximally-skewed p-stable distribution (0<p<1), and we sparsify the design matrix so that on average (1-g)-fraction of the entries become zero. The idea is related to very sparse stable random projections (Li et al 2006 and Li 2007), the prior work for estimating summary statistics of the data. In our theoretical analysis, we show that, when p->0, it suffices to use M= K/(1-exp(-gK) log N measurements, so that all coordinates can be recovered in one scan of the coordinates. If g = 1 (i.e., dense design), then M = K log N. If g= 1/K or 2/K (i.e., very sparse design), then M = 1.58K log N or M = 1.16K log N. This means the design matrix can be indeed very sparse at only a minor inflation of the sample complexity. Interestingly, as p->1, the required number of measurements is essentially M = 2.7K log N, provided g= 1/K. It turns out that this result is a general worst-case bound.

preprint2012arXiv

A New Class of Parametrization for Dark Energy without Divergence

In this paper, we propose a new class of parametrization of the equation of state of dark energy. In contrast with the famous CPL parametrization, these new parametrization of the equation of state does not divergent during the evolution of the Universe even in the future. Also, we perform a observational constraint on two simplest dark energy models belonging to this new class of parametrization, by using the Markov Chain Monte Carlo (MCMC) method and the combined latest observational data from the type Ia supernova compilations including Union2(557), cosmic microwave background, and baryon acoustic oscillation.

preprint2012arXiv

Approximating Higher-Order Distances Using Random Projections

We provide a simple method and relevant theoretical analysis for efficiently estimating higher-order lp distances. While the analysis mainly focuses on l4, our methodology extends naturally to p = 6,8,10..., (i.e., when p is even). Distance-based methods are popular in machine learning. In large-scale applications, storing, computing, and retrieving the distances can be both space and time prohibitive. Efficient algorithms exist for estimating lp distances if 0 < p <= 2. The task for p > 2 is known to be difficult. Our work partially fills this gap.

preprint2012arXiv

b-Bit Minwise Hashing in Practice: Large-Scale Batch and Online Learning and Using GPUs for Fast Preprocessing with Simple Hash Functions

In this paper, we study several critical issues which must be tackled before one can apply b-bit minwise hashing to the volumes of data often used industrial applications, especially in the context of search. 1. (b-bit) Minwise hashing requires an expensive preprocessing step that computes k (e.g., 500) minimal values after applying the corresponding permutations for each data vector. We developed a parallelization scheme using GPUs and observed that the preprocessing time can be reduced by a factor of 20-80 and becomes substantially smaller than the data loading time. 2. One major advantage of b-bit minwise hashing is that it can substantially reduce the amount of memory required for batch learning. However, as online algorithms become increasingly popular for large-scale learning in the context of search, it is not clear if b-bit minwise yields significant improvements for them. This paper demonstrates that $b$-bit minwise hashing provides an effective data size/dimension reduction scheme and hence it can dramatically reduce the data loading time for each epoch of the online training process. This is significant because online learning often requires many (e.g., 10 to 100) epochs to reach a sufficient accuracy. 3. Another critical issue is that for very large data sets it becomes impossible to store a (fully) random permutation matrix, due to its space requirements. Our paper is the first study to demonstrate that $b$-bit minwise hashing implemented using simple hash functions, e.g., the 2-universal (2U) and 4-universal (4U) hash families, can produce very similar learning results as using fully random permutations. Experiments on datasets of up to 200GB are presented.

preprint2012arXiv

Improving Compressed Counting

Compressed Counting (CC) [22] was recently proposed for estimating the ath frequency moments of data streams, where 0 < a <= 2. CC can be used for estimating Shannon entropy, which can be approximated by certain functions of the ath frequency moments as a -> 1. Monitoring Shannon entropy for anomaly detection (e.g., DDoS attacks) in large networks is an important task. This paper presents a new algorithm for improving CC. The improvement is most substantial when a -> 1--. For example, when a = 0:99, the new algorithm reduces the estimation variance roughly by 100-fold. This new algorithm would make CC considerably more practical for estimating Shannon entropy. Furthermore, the new algorithm is statistically optimal when a = 0.5.

preprint2012arXiv

One Permutation Hashing for Efficient Search and Learning

Recently, the method of b-bit minwise hashing has been applied to large-scale linear learning and sublinear time near-neighbor search. The major drawback of minwise hashing is the expensive preprocessing cost, as the method requires applying (e.g.,) k=200 to 500 permutations on the data. The testing time can also be expensive if a new data point (e.g., a new document or image) has not been processed, which might be a significant issue in user-facing applications. We develop a very simple solution based on one permutation hashing. Conceptually, given a massive binary data matrix, we permute the columns only once and divide the permuted columns evenly into k bins; and we simply store, for each data vector, the smallest nonzero location in each bin. The interesting probability analysis (which is validated by experiments) reveals that our one permutation scheme should perform very similarly to the original (k-permutation) minwise hashing. In fact, the one permutation scheme can be even slightly more accurate, due to the "sample-without-replacement" effect. Our experiments with training linear SVM and logistic regression on the webspam dataset demonstrate that this one permutation hashing scheme can achieve the same (or even slightly better) accuracies compared to the original k-permutation scheme. To test the robustness of our method, we also experiment with the small news20 dataset which is very sparse and has merely on average 500 nonzeros in each data vector. Interestingly, our one permutation scheme noticeably outperforms the k-permutation scheme when k is not too small on the news20 dataset. In summary, our method can achieve at least the same accuracy as the original k-permutation scheme, at merely 1/k of the original preprocessing cost.

preprint2012arXiv

Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost

Logitboost is an influential boosting algorithm for classification. In this paper, we develop robust logitboost to provide an explicit formulation of tree-split criterion for building weak learners (regression trees) for logitboost. This formulation leads to a numerically stable implementation of logitboost. We then propose abc-logitboost for multi-class classification, by combining robust logitboost with the prior work of abc-boost. Previously, abc-boost was implemented as abc-mart using the mart algorithm. Our extensive experiments on multi-class classification compare four algorithms: mart, abcmart, (robust) logitboost, and abc-logitboost, and demonstrate the superiority of abc-logitboost. Comparisons with other learning methods including SVM and deep learning are also available through prior publications.

preprint2011arXiv

Accurate Estimators for Improving Minwise Hashing and b-Bit Minwise Hashing

Minwise hashing is the standard technique in the context of search and databases for efficiently estimating set (e.g., high-dimensional 0/1 vector) similarities. Recently, b-bit minwise hashing was proposed which significantly improves upon the original minwise hashing in practice by storing only the lowest b bits of each hashed value, as opposed to using 64 bits. b-bit hashing is particularly effective in applications which mainly concern sets of high similarities (e.g., the resemblance >0.5). However, there are other important applications in which not just pairs of high similarities matter. For example, many learning algorithms require all pairwise similarities and it is expected that only a small fraction of the pairs are similar. Furthermore, many applications care more about containment (e.g., how much one object is contained by another object) than the resemblance. In this paper, we show that the estimators for minwise hashing and b-bit minwise hashing used in the current practice can be systematically improved and the improvements are most significant for set pairs of low resemblance and high containment.

preprint2011arXiv

b-Bit Minwise Hashing for Large-Scale Linear SVM

In this paper, we propose to (seamlessly) integrate b-bit minwise hashing with linear SVM to substantially improve the training (and testing) efficiency using much smaller memory, with essentially no loss of accuracy. Theoretically, we prove that the resemblance matrix, the minwise hashing matrix, and the b-bit minwise hashing matrix are all positive definite matrices (kernels). Interestingly, our proof for the positive definiteness of the b-bit minwise hashing kernel naturally suggests a simple strategy to integrate b-bit hashing with linear SVM. Our technique is particularly useful when the data can not fit in memory, which is an increasingly critical issue in large-scale machine learning. Our preliminary experimental results on a publicly available webspam dataset (350K samples and 16 million dimensions) verified the effectiveness of our algorithm. For example, the training time was reduced to merely a few seconds. In addition, our technique can be easily extended to many other linear and nonlinear machine learning applications such as logistic regression.

preprint2011arXiv

Hashing Algorithms for Large-Scale Learning

In this paper, we first demonstrate that b-bit minwise hashing, whose estimators are positive definite kernels, can be naturally integrated with learning algorithms such as SVM and logistic regression. We adopt a simple scheme to transform the nonlinear (resemblance) kernel into linear (inner product) kernel; and hence large-scale problems can be solved extremely efficiently. Our method provides a simple effective solution to large-scale learning in massive and extremely high-dimensional datasets, especially when data do not fit in memory. We then compare b-bit minwise hashing with the Vowpal Wabbit (VW) algorithm (which is related the Count-Min (CM) sketch). Interestingly, VW has the same variances as random projections. Our theoretical and empirical comparisons illustrate that usually $b$-bit minwise hashing is significantly more accurate (at the same storage) than VW (and random projections) in binary data. Furthermore, $b$-bit minwise hashing can be combined with VW to achieve further improvements in terms of training speed, especially when $b$ is large.

preprint2011arXiv

Optimizing and controlling functions of complex networks by manipulating rich-club connections

Traditionally, there is no evidence suggesting that there are strong ties between the rich-club property and the function of complex networks. In this study, we find that whether a very small portion of rich nodes connected to each other or not can strongly affect the frequency of occurrence of basic building blocks (motif) within networks, and therefore the function, of a heterogeneous network. Conversely whether a homogeneous network has a rich-club property or not generally has no significant effect on its structure and function. These findings open the possibility to optimize and control the function of complex networks by manipulating rich-club connections. Furthermore, based on the subgraph ratio profile, we develop a more rigorous approach to judge whether a network has a rich-club or not. The new method does not calculate how many links there are among rich nodes but depends on how the links among rich nodes can affect the overall structure as well as function of a given network. These results can also help us to understand the evolution of dynamical networks and design new models for characterizing real-world networks.

preprint2011arXiv

Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)

We generated a dataset of 200 GB with 10^9 features, to test our recent b-bit minwise hashing algorithms for training very large-scale logistic regression and SVM. The results confirm our prior work that, compared with the VW hashing algorithm (which has the same variance as random projections), b-bit minwise hashing is substantially more accurate at the same storage. For example, with merely 30 hashed values per data point, b-bit minwise hashing can achieve similar accuracies as VW with 2^14 hashed values per data point. We demonstrate that the preprocessing cost of b-bit minwise hashing is roughly on the same order of magnitude as the data loading time. Furthermore, by using a GPU, the preprocessing cost can be reduced to a small fraction of the data loading time. Minwise hashing has been widely used in industry, at least in the context of search. One reason for its popularity is that one can efficiently simulate permutations by (e.g.,) universal hashing. In other words, there is no need to store the permutation matrix. In this paper, we empirically verify this practice, by demonstrating that even using the simplest 2-universal hashing does not degrade the learning performance.

preprint2011arXiv

User-level sentiment analysis incorporating social networks

We show that information about social relationships can be used to improve user-level sentiment analysis. The main motivation behind our approach is that users that are somehow "connected" may be more likely to hold similar opinions; therefore, relationship information can complement what we can extract about a user's viewpoints from their utterances. Employing Twitter as a source for our experimental data, and working within a semi-supervised framework, we propose models that are induced either from the Twitter follower/followee network or from the network in Twitter formed by users referring to each other using "@" mentions. Our transductive learning results reveal that incorporating social-network information can indeed lead to statistically significant sentiment-classification improvements over the performance of an approach based on Support Vector Machines having access only to textual features.

preprint2010arXiv

An Empirical Evaluation of Four Algorithms for Multi-Class Classification: Mart, ABC-Mart, Robust LogitBoost, and ABC-LogitBoost

This empirical study is mainly devoted to comparing four tree-based boosting algorithms: mart, abc-mart, robust logitboost, and abc-logitboost, for multi-class classification on a variety of publicly available datasets. Some of those datasets have been thoroughly tested in prior studies using a broad range of classification algorithms including SVM, neural nets, and deep learning. In terms of the empirical classification errors, our experiment results demonstrate: 1. Abc-mart considerably improves mart. 2. Abc-logitboost considerably improves (robust) logitboost. 3. Robust) logitboost} considerably improves mart on most datasets. 4. Abc-logitboost considerably improves abc-mart on most datasets. 5. These four boosting algorithms (especially abc-logitboost) outperform SVM on many datasets. 6. Compared to the best deep learning methods, these four boosting algorithms (especially abc-logitboost) are competitive.

preprint2010arXiv

Fast ABC-Boost for Multi-Class Classification

Abc-boost is a new line of boosting algorithms for multi-class classification, by utilizing the commonly used sum-to-zero constraint. To implement abc-boost, a base class must be identified at each boosting step. Prior studies used a very expensive procedure based on exhaustive search for determining the base class at each boosting step. Good testing performances of abc-boost (implemented as abc-mart and abc-logitboost) on a variety of datasets were reported. For large datasets, however, the exhaustive search strategy adopted in prior abc-boost algorithms can be too prohibitive. To overcome this serious limitation, this paper suggests a heuristic by introducing Gaps when computing the base class during training. That is, we update the choice of the base class only for every $G$ boosting steps (i.e., G=1 in prior studies). We test this idea on large datasets (Covertype and Poker) as well as datasets of moderate sizes. Our preliminary results are very encouraging. On the large datasets, even with G=100 (or larger), there is essentially no loss of test accuracy. On the moderate datasets, no obvious loss of test accuracy is observed when G<= 20~50. Therefore, aided by this heuristic, it is promising that abc-boost will be a practical tool for accurate multi-class classification.

preprint2010arXiv

On Practical Algorithms for Entropy Estimation and the Improved Sample Complexity of Compressed Counting

Estimating the p-th frequency moment of data stream is a very heavily studied problem. The problem is actually trivial when p = 1, assuming the strict Turnstile model. The sample complexity of our proposed algorithm is essentially O(1) near p=1. This is a very large improvement over the previously believed O(1/eps^2) bound. The proposed algorithm makes the long-standing problem of entropy estimation an easy task, as verified by the experiments included in the appendix.

Ping Li

What is connected

Connect this record

See the researcher in context

Building this map preview

161 published item(s)

Research on Vision-Language Question Answering Models for Industrial Robots

Tidally Trapped Two-pole Pulsations Discovered in a Close Binary with a Massive $β$ Cephei Star

Origin of zigzag antiferromagnetic orders in XPS3 (X= Fe, Ni) monolayers

Defending Backdoor Attacks on Vision Transformer via Patch Processing

Exploration in Model-based Reinforcement Learning with Randomized Reward

Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation

$k$-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy

A Class of Two-Timescale Stochastic EM Algorithms for Nonconvex Latent Variable Models

A Tale of Two Flows: Cooperative Learning of Langevin Flow and Normalizing Flow Toward Energy-Based Model

Best Subset Selection with Efficient Primal-Dual Algorithm

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images

Boosting the Confidence of Generalization for $L_2$-Stable Randomized Learning Algorithms

Breaking the Linear Error Barrier in Differentially Private Graph Distance Release

Catoni-style Confidence Sequences under Infinite Variance

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

Channel Estimation for Wideband MmWave MIMO OFDM System Exploiting Block Sparsity

Communication-Efficient TeraByte-Scale Model Training Framework for Online Advertising

Discriminative Similarity for Data Clustering

Distances Release with Differential Privacy in Tree and Grid Graph

Fast ABC-Boost: A Unified Framework for Selecting the Base Class in Multi-Class Classification

Finding Control Synthesis for Kinematic Shortest Paths

From a few Accurate 2D Correspondences to 3D Point Clouds

GCWSNet: Generalized Consistent Weighted Sampling for Scalable and Accurate Training of Neural Networks

Integrity Authentication in Tree Models

Intriguing Magnetoelectric Effect in Two-dimensional Ferromagnetic/Perovskite Oxide Ferroelectric Heterostructure

Intrinsic anomalous spin Hall effect

Layer-wise and Dimension-wise Locally Adaptive Federated Learning

Low-dose CT reconstruction by self-supervised learning in the projection domain

MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image Inpainting

Multi-view Geometry: Correspondences Refinement Based on Algebraic Properties

NL2GDPR: Automatically Develop GDPR Compliant Android Application Features from Natural Language

Noisy $\ell^{0}$-Sparse Subspace Clustering on Dimensionality Reduced Data

Offline Change Detection under Contamination

On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond

On Distributed Adaptive Optimization with Gradient Compression

On the Power-Law Hessian Spectrums in Deep Learning

One Loss for Quantization: Deep Hashing with Discrete Wasserstein Distributional Matching

Package for Fast ABC-Boost

pGMM Kernel Regression and Comparisons with Boosted Trees

Planar Turán number of disjoint union of $C_3$ and $C_4$

Planning and Formulations in Pursuit-Evasion: Keep-away Games and Their Strategies

Planning through Workspace Constraint Satisfaction and Optimization

Proximity Graph Maintenance for Fast Online Nearest Neighbor Search

Quantum Anomalous Hall and Valley Quantum Anomalous Hall Effects in Two-Dimensional d0 Orbital XY Monolayers

Regression with Label Permutation in Generalized Linear Model

SpaceE: Knowledge Graph Embedding by Relational Linear Transformation in the Entity Space

Stability and Risk Bounds of Iterative Hard Thresholding

Variational Flow Graphical Model

Absorption of massless scalar field by furry black holes in de Rham-Gabadadze-Tolley theory

High-Order Statistical Functional Expansion and Its Application To Some Nonsmooth Problems

Quantization Algorithms for Random Fourier Features

Simulation on the Transparency of Electrons and Ion Back Flow for a Time Projection Chamber based on Staggered Multiple THGEMs

A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data

Adaptive Randomization in Network Data

An Inverse-free Truncated Rayleigh-Ritz Method for Sparse Generalized Eigenvalue Problem

Characterization on projective submanifolds of codimensions 2 and 3

Chern class inequalities on polarized manifolds and nef vector bundles

Cluster-Adaptive Network A/B Testing: From Randomization to Estimation

Complexity results for two kinds of colored disconnections of graphs

Cooperative Rate-Splitting for Secrecy Sum-Rate Enhancement in Multi-antenna Broadcast Channels

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Distributed Primal-Dual Optimization for Online Multi-Task Learning

FedSKETCH: Communication-Efficient and Private Federated Learning via Sketching

Generalization Bounds for High-dimensional M-estimation under Sparsity Constraint

IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation

MeDaS: An open-source platform as service to help break the walls between medicine and informatics

Meta-CoTGAN: A Meta Cooperative Training Paradigm for Improving Adversarial Text Generation

Monochromatic disconnection: Erdős-Gallai-type problems and product graphs

Nonnegative Hermitian vector bundles and Chern numbers

Rainbow monochromatic $k$-edge-connection colorings of graphs

Randomized Kernel Multi-view Discriminant Analysis

RGB-D SLAM in Dynamic Environments Using Point Correlations

Selective Convolutional Network: An Efficient Object Detector with Ignoring Background

Solving the Robust Matrix Completion Problem via a System of Nonlinear Equations