Source author record

Han Liu

Han Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

113works

35topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Low Rank Adaptation for Adversarial Perturbation

Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved the training efficiency of Large Language Models (LLMs) by updating neural network layers using low-rank matrices. Since the generation of adversarial examples is an optimization process analogous to model training, this naturally raises the question: Do adversarial perturbations exhibit a similar low-rank structure? In this paper, we provide both theoretical analysis and extensive empirical investigation across various attack methods, model architectures, and datasets to show that adversarial perturbations indeed possess an inherently low-rank structure. This insight opens up new opportunities for improving both adversarial attacks and defenses. We mainly focus on leveraging this low-rank property to improve the efficiency and effectiveness of black-box adversarial attacks, which often suffer from excessive query requirements. Our method follows a two-step approach. First, we use a reference model and auxiliary data to guide the projection of gradients into a low-dimensional subspace. Next, we confine the perturbation search in black-box attacks to this low-rank subspace, significantly improving the efficiency and effectiveness of the adversarial attacks. We evaluated our approach across a range of attack methods, benchmark models, datasets, and threat models. The results demonstrate substantial and consistent improvements in the performance of our low-rank adversarial attacks compared to conventional methods.

preprint2023arXiv

Beyond PID Controllers: PPO with Neuralized PID Policy for Proton Beam Intensity Control in Mu2e

We introduce a novel Proximal Policy Optimization (PPO) algorithm aimed at addressing the challenge of maintaining a uniform proton beam intensity delivery in the Muon to Electron Conversion Experiment (Mu2e) at Fermi National Accelerator Laboratory (Fermilab). Our primary objective is to regulate the spill process to ensure a consistent intensity profile, with the ultimate goal of creating an automated controller capable of providing real-time feedback and calibration of the Spill Regulation System (SRS) parameters on a millisecond timescale. We treat the Mu2e accelerator system as a Markov Decision Process suitable for Reinforcement Learning (RL), utilizing PPO to reduce bias and enhance training stability. A key innovation in our approach is the integration of a neuralized Proportional-Integral-Derivative (PID) controller into the policy function, resulting in a significant improvement in the Spill Duty Factor (SDF) by 13.6%, surpassing the performance of the current PID controller baseline by an additional 1.6%. This paper presents the preliminary offline results based on a differentiable simulator of the Mu2e accelerator. It paves the groundwork for real-time implementations and applications, representing a crucial step towards automated proton beam intensity control for the Mu2e experiment.

preprint2023arXiv

HS-GCN: Hamming Spatial Graph Convolutional Networks for Recommendation

An efficient solution to the large-scale recommender system is to represent users and items as binary hash codes in the Hamming space. Towards this end, existing methods tend to code users by modeling their Hamming similarities with the items they historically interact with, which are termed as the first-order similarities in this work. Despite their efficiency, these methods suffer from the suboptimal representative capacity, since they forgo the correlation established by connecting multiple first-order similarities, i.e., the relation among the indirect instances, which could be defined as the high-order similarity. To tackle this drawback, we propose to model both the first- and the high-order similarities in the Hamming space through the user-item bipartite graph. Therefore, we develop a novel learning to hash framework, namely Hamming Spatial Graph Convolutional Networks (HS-GCN), which explicitly models the Hamming similarity and embeds it into the codes of users and items. Extensive experiments on three public benchmark datasets demonstrate that our proposed model significantly outperforms several state-of-the-art hashing models, and obtains performance comparable with the real-valued recommendation models.

preprint2023arXiv

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-series representation using two tandem sparse Hopfield layers. In addition, StanHop incorporates two additional external memory modules: a Plug-and-Play module and a Tune-and-Play module for train-less and task-aware memory-enhancements, respectively. They allow StanHop-Net to swiftly respond to certain sudden events. Methodologically, we construct the StanHop-Net by stacking STanHop blocks in a hierarchical fashion, enabling multi-resolution feature extraction with resolution-specific sparsity. Theoretically, we introduce a sparse extension of the modern Hopfield model (Generalized Sparse Modern Hopfield Model) and show that it endows a tighter memory retrieval error compared to the dense counterpart without sacrificing memory capacity. Empirically, we validate the efficacy of our framework on both synthetic and real-world settings.

preprint2022arXiv

A Multi-rater Comparative Study of Automatic Target Localization Methods for Epilepsy Deep Brain Stimulation Procedures

Epilepsy is the fourth most common neurological disorder and affects people of all ages worldwide. Deep Brain Stimulation (DBS) has emerged as an alternative treatment option when anti-epileptic drugs or resective surgery cannot lead to satisfactory outcomes. To facilitate the planning of the procedure and for its standardization, it is desirable to develop an algorithm to automatically localize the DBS stimulation target, i.e., Anterior Nucleus of Thalamus (ANT), which is a challenging target to plan. In this work, we perform an extensive comparative study by benchmarking various localization methods for ANT-DBS. Specifically, the methods involved in this study include traditional registration method and deep-learning-based methods including heatmap matching and differentiable spatial to numerical transform (DSNT). Our experimental results show that the deep-learning (DL)-based localization methods that are trained with pseudo labels can achieve a performance that is comparable to the inter-rater and intra-rater variability and that they are orders of magnitude faster than traditional methods.

preprint2022arXiv

A Simple Meta-learning Paradigm for Zero-shot Intent Classification with Mixture Attention Mechanism

Zero-shot intent classification is a vital and challenging task in dialogue systems, which aims to deal with numerous fast-emerging unacquainted intents without annotated training data. To obtain more satisfactory performance, the crucial points lie in two aspects: extracting better utterance features and strengthening the model generalization ability. In this paper, we propose a simple yet effective meta-learning paradigm for zero-shot intent classification. To learn better semantic representations for utterances, we introduce a new mixture attention mechanism, which encodes the pertinent word occurrence patterns by leveraging the distributional signature attention and multi-layer perceptron attention simultaneously. To strengthen the transfer ability of the model from seen classes to unseen classes, we reformulate zero-shot intent classification with a meta-learning strategy, which trains the model by simulating multiple zero-shot classification tasks on seen categories, and promotes the model generalization ability with a meta-adapting procedure on mimic unseen categories. Extensive experiments on two real-world dialogue datasets in different languages show that our model outperforms other strong baselines on both standard and generalized zero-shot intent classification tasks.

preprint2022arXiv

An Optical Parametric Amplifier via $ χ^{(2)} $ in AlGaAs Waveguides

We report parametric gain by utilizing $ χ^{(2)} $ non-linearities in a semiconductor Bragg Reflection Waveguide (BRW) waveguide chip. Under the two-mode degenerate type II phase matching, it can be shown that more than 18 dBs of parametric gain for both TE and TM modes is tenable in 100s of micrometers of device length. Polarization insensitive parametric gain can be attained within the 1550 nm region of the spectrum. These AlGaAs BRW waveguides exhibit sub-photon per pulse sensitivity. This is in sharp contrast to other types of parametric gain devices which utilize $ χ^{(3)} $, where the pump wavelength is in the vicinity of the signal wavelength. This sensitivity, which reached 0.1~photon/pulse, can usher a new era for on-chip quantum information processing using compact, micrometer-scale devices.

preprint2022arXiv

Bregman Proximal Langevin Monte Carlo via Bregman--Moreau Envelopes

We propose efficient Langevin Monte Carlo algorithms for sampling distributions with nonsmooth convex composite potentials, which is the sum of a continuously differentiable function and a possibly nonsmooth function. We devise such algorithms leveraging recent advances in convex analysis and optimization methods involving Bregman divergences, namely the Bregman--Moreau envelopes and the Bregman proximity operators, and in the Langevin Monte Carlo algorithms reminiscent of mirror descent. The proposed algorithms extend existing Langevin Monte Carlo algorithms in two aspects -- the ability to sample nonsmooth distributions with mirror descent-like algorithms, and the use of the more general Bregman--Moreau envelope in place of the Moreau envelope as a smooth approximation of the nonsmooth part of the potential. A particular case of the proposed scheme is reminiscent of the Bregman proximal gradient algorithm. The efficiency of the proposed methodology is illustrated with various sampling tasks at which existing Langevin Monte Carlo methods are known to perform poorly.

preprint2022arXiv

Cats: Complementary CNN and Transformer Encoders for Segmentation

Recently, deep learning methods have achieved state-of-the-art performance in many medical image segmentation tasks. Many of these are based on convolutional neural networks (CNNs). For such methods, the encoder is the key part for global and local information extraction from input images; the extracted features are then passed to the decoder for predicting the segmentations. In contrast, several recent works show a superior performance with the use of transformers, which can better model long-range spatial dependencies and capture low-level details. However, transformer as sole encoder underperforms for some tasks where it cannot efficiently replace the convolution based encoder. In this paper, we propose a model with double encoders for 3D biomedical image segmentation. Our model is a U-shaped CNN augmented with an independent transformer encoder. We fuse the information from the convolutional encoder and the transformer, and pass it to the decoder to obtain the results. We evaluate our methods on three public datasets from three different challenges: BTCV, MoDA and Decathlon. Compared to the state-of-the-art models with and without transformers on each task, our proposed method obtains higher Dice scores across the board.

preprint2022arXiv

Label-enhanced Prototypical Network with Contrastive Learning for Multi-label Few-shot Aspect Category Detection

Multi-label aspect category detection allows a given review sentence to contain multiple aspect categories, which is shown to be more practical in sentiment analysis and attracting increasing attention. As annotating large amounts of data is time-consuming and labor-intensive, data scarcity occurs frequently in real-world scenarios, which motivates multi-label few-shot aspect category detection. However, research on this problem is still in infancy and few methods are available. In this paper, we propose a novel label-enhanced prototypical network (LPN) for multi-label few-shot aspect category detection. The highlights of LPN can be summarized as follows. First, it leverages label description as auxiliary knowledge to learn more discriminative prototypes, which can retain aspect-relevant information while eliminating the harmful effect caused by irrelevant aspects. Second, it integrates with contrastive learning, which encourages that the sentences with the same aspect label are pulled together in embedding space while simultaneously pushing apart the sentences with different aspect labels. In addition, it introduces an adaptive multi-label inference module to predict the aspect count in the sentence, which is simple yet effective. Extensive experimental results on three datasets demonstrate that our proposed model LPN can consistently achieve state-of-the-art performance.

preprint2022arXiv

Learning to Infer Belief Embedded Communication

In multi-agent collaboration problems with communication, an agent's ability to encode their intention and interpret other agents' strategies is critical for planning their future actions. This paper introduces a novel algorithm called Intention Embedded Communication (IEC) to mimic an agent's language learning ability. IEC contains a perception module for decoding other agents' intentions in response to their past actions. It also includes a language generation module for learning implicit grammar during communication with two or more agents. Such grammar, by construction, should be compact for efficient communication. Both modules undergo conjoint evolution - similar to an infant's babbling that enables it to learn a language of choice by trial and error. We utilised three multi-agent environments, namely predator/prey, traffic junction and level-based foraging and illustrate that such a co-evolution enables us to learn much quicker (50%) than state-of-the-art algorithms like MADDPG. Ablation studies further show that disabling the inferring belief module, communication module, and the hidden states reduces the model performance by 38%, 60% and 30%, respectively. Hence, we suggest that modelling other agents' behaviour accelerates another agent to learn grammar and develop a language to communicate efficiently. We evaluate our method on a set of cooperative scenarios and show its superior performance to other multi-agent baselines. We also demonstrate that it is essential for agents to reason about others' states and learn this ability by continuous communication.

preprint2022arXiv

ModDrop++: A Dynamic Filter Network with Intra-subject Co-training for Multiple Sclerosis Lesion Segmentation with Missing Modalities

Multiple Sclerosis (MS) is a chronic neuroinflammatory disease and multi-modality MRIs are routinely used to monitor MS lesions. Many automatic MS lesion segmentation models have been developed and have reached human-level performance. However, most established methods assume the MRI modalities used during training are also available during testing, which is not guaranteed in clinical practice. Previously, a training strategy termed Modality Dropout (ModDrop) has been applied to MS lesion segmentation to achieve the state-of-the-art performance with missing modality. In this paper, we present a novel method dubbed ModDrop++ to train a unified network adaptive to an arbitrary number of input MRI sequences. ModDrop++ upgrades the main idea of ModDrop in two key ways. First, we devise a plug-and-play dynamic head and adopt a filter scaling strategy to improve the expressiveness of the network. Second, we design a co-training strategy to leverage the intra-subject relation between full modality and missing modality. Specifically, the intra-subject co-training strategy aims to guide the dynamic head to generate similar feature representations between the full- and missing-modality data from the same subject. We use two public MS datasets to show the superiority of ModDrop++. Source code and trained models are available at https://github.com/han-liu/ModDropPlusPlus.

preprint2022arXiv

Review Polarity-wise Recommender

Utilizing review information to enhance recommendation, the de facto review-involved recommender systems, have received increasing interests over the past few years. Thereinto, one advanced branch is to extract salient aspects from textual reviews (i.e., the item attributes that users express) and combine them with the matrix factorization technique. However, existing approaches all ignore the fact that semantically different reviews often include opposite aspect information. In particular, positive reviews usually express aspects that users prefer, while negative ones describe aspects that users reject. As a result, it may mislead the recommender systems into making incorrect decisions pertaining to user preference modeling. Towards this end, in this paper, we propose a Review Polarity-wise Recommender model, dubbed as RPR, to discriminately treat reviews with different polarities. To be specific, in this model, positive and negative reviews are separately gathered and utilized to model the user-preferred and user-rejected aspects, respectively. Besides, in order to overcome the imbalance problem of semantically different reviews, we also develop an aspect-aware importance weighting approach to align the aspect importance for these two kinds of reviews. Extensive experiments conducted on eight benchmark datasets have demonstrated the superiority of our model as compared to a series of state-of-the-art review-involved baselines. Moreover, our method can provide certain explanations to the real-world rating prediction scenarios.

preprint2022arXiv

Survival Prediction of Brain Cancer with Incomplete Radiology, Pathology, Genomics, and Demographic Data

Integrating cross-department multi-modal data (e.g., radiological, pathological, genomic, and clinical data) is ubiquitous in brain cancer diagnosis and survival prediction. To date, such an integration is typically conducted by human physicians (and panels of experts), which can be subjective and semi-quantitative. Recent advances in multi-modal deep learning, however, have opened a door to leverage such a process to a more objective and quantitative manner. Unfortunately, the prior arts of using four modalities on brain cancer survival prediction are limited by a "complete modalities" setting (i.e., with all modalities available). Thus, there are still open questions on how to effectively predict brain cancer survival from the incomplete radiological, pathological, genomic, and demographic data (e.g., one or more modalities might not be collected for a patient). For instance, should we use both complete and incomplete data, and more importantly, how to use those data? To answer the preceding questions, we generalize the multi-modal learning on cross-department multi-modal data to a missing data setting. Our contribution is three-fold: 1) We introduce optimal multi-modal learning with missing data (MMD) pipeline with optimized hardware consumption and computational efficiency; 2) We extend multi-modal learning on radiological, pathological, genomic, and demographic data into missing data scenarios; 3) a large-scale public dataset (with 962 patients) is collected to systematically evaluate glioma tumor survival prediction using four modalities. The proposed method improved the C-index of survival prediction from 0.7624 to 0.8053.

preprint2022arXiv

Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning

We propose SwitchTT, a multi-task extension to Trajectory Transformer but enhanced with two striking features: (i) exploiting a sparsely activated model to reduce computation cost in multi-task offline model learning and (ii) adopting a distributional trajectory value estimator that improves policy performance, especially in sparse reward settings. These two enhancements make SwitchTT suitable for solving multi-task offline reinforcement learning problems, where model capacity is critical for absorbing the vast quantities of knowledge available in the multi-task dataset. More specifically, SwitchTT exploits switch transformer model architecture for multi-task policy learning, allowing us to improve model capacity without proportional computation cost. Also, SwitchTT approximates the distribution rather than the expectation of trajectory value, mitigating the effects of the Monte-Carlo Value estimator suffering from poor sample complexity, especially in the sparse-reward setting. We evaluate our method using the suite of ten sparse-reward tasks from the gym-mini-grid environment.We show an improvement of 10% over Trajectory Transformer across 10-task learning and obtain up to 90% increase in offline model training speed. Our results also demonstrate the advantage of the switch transformer model for absorbing expert knowledge and the importance of value distribution in evaluating the trajectory.

preprint2022arXiv

Synthetic CT Skull Generation for Transcranial MR Imaging-Guided Focused Ultrasound Interventions with Conditional Adversarial Networks

Transcranial MRI-guided focused ultrasound (TcMRgFUS) is a therapeutic ultrasound method that focuses sound through the skull to a small region noninvasively under MRI guidance. It is clinically approved to thermally ablate regions of the thalamus and is being explored for other therapies, such as blood brain barrier opening and neuromodulation. To accurately target ultrasound through the skull, the transmitted waves must constructively interfere at the target region. However, heterogeneity of the sound speed, density, and ultrasound attenuation in different individuals' skulls requires patient-specific estimates of these parameters for optimal treatment planning. CT imaging is currently the gold standard for estimating acoustic properties of an individual skull during clinical procedures, but CT imaging exposes patients to radiation and increases the overall number of imaging procedures required for therapy. A method to estimate acoustic parameters in the skull without the need for CT would be desirable. Here, we synthesized CT images from routinely acquired T1-weighted MRI by using a 3D patch-based conditional generative adversarial network and evaluated the performance of synthesized CT images for treatment planning with transcranial focused ultrasound. We compared the performance of synthetic CT to real CT images using Kranion and k-Wave acoustic simulation. Our work demonstrates the feasibility of replacing real CT with the MR-synthesized CT for TcMRgFUS planning.

preprint2022arXiv

Wasserstein Distributionally Robust Optimization with Wasserstein Barycenters

In many applications in statistics and machine learning, the availability of data samples from multiple possibly heterogeneous sources has become increasingly prevalent. On the other hand, in distributionally robust optimization, we seek data-driven decisions which perform well under the most adverse distribution from a nominal distribution constructed from data samples within a certain discrepancy of probability distributions. However, it remains unclear how to achieve such distributional robustness in model learning and estimation when data samples from multiple sources are available. In this work, we propose constructing the nominal distribution in optimal transport-based distributionally robust optimization problems through the notion of Wasserstein barycenter as an aggregation of data samples from multiple sources. Under specific choices of the loss function, the proposed formulation admits a tractable reformulation as a finite convex program, with powerful finite-sample and asymptotic guarantees. As an illustrative example, we demonstrate with the problem of distributionally robust sparse inverse covariance matrix estimation for zero-mean Gaussian random vectors that our proposed scheme outperforms other widely used estimators in both the low- and high-dimensional regimes.

preprint2021arXiv

A Survey on Epistemic (Model) Uncertainty in Supervised Learning: Recent Advances and Applications

Quantifying the uncertainty of supervised learning models plays an important role in making more reliable predictions. Epistemic uncertainty, which usually is due to insufficient knowledge about the model, can be reduced by collecting more data or refining the learning models. Over the last few years, scholars have proposed many epistemic uncertainty handling techniques which can be roughly grouped into two categories, i.e., Bayesian and ensemble. This paper provides a comprehensive review of epistemic uncertainty learning techniques in supervised learning over the last five years. As such, we, first, decompose the epistemic uncertainty into bias and variance terms. Then, a hierarchical categorization of epistemic uncertainty learning techniques along with their representative models is introduced. In addition, several applications such as computer vision (CV) and natural language processing (NLP) are presented, followed by a discussion on research gaps and possible future research directions.

preprint2021arXiv

BLOCKEYE: Hunting For DeFi Attacks on Blockchain

Decentralized finance, i.e., DeFi, has become the most popular type of application on many public blockchains (e.g., Ethereum) in recent years. Compared to the traditional finance, DeFi allows customers to flexibly participate in diverse blockchain financial services (e.g., lending, borrowing, collateralizing, exchanging etc.) via smart contracts at a relatively low cost of trust. However, the open nature of DeFi inevitably introduces a large attack surface, which is a severe threat to the security of participants funds. In this paper, we proposed BLOCKEYE, a real-time attack detection system for DeFi projects on the Ethereum blockchain. Key capabilities provided by BLOCKEYE are twofold: (1) Potentially vulnerable DeFi projects are identified based on an automatic security analysis process, which performs symbolic reasoning on the data flow of important service states, e.g., asset price, and checks whether they can be externally manipulated. (2) Then, a transaction monitor is installed offchain for a vulnerable DeFi project. Transactions sent not only to that project but other associated projects as well are collected for further security analysis. A potential attack is flagged if a violation is detected on a critical invariant configured in BLOCKEYE, e.g., Benefit is achieved within a very short time and way much bigger than the cost. We applied BLOCKEYE in several popular DeFi projects and managed to discover potential security attacks that are unreported before. A video of BLOCKEYE is available at https://youtu.be/7DjsWBLdlQU.

preprint2021arXiv

Converse, Focus and Guess -- Towards Multi-Document Driven Dialogue

We propose a novel task, Multi-Document Driven Dialogue (MD3), in which an agent can guess the target document that the user is interested in by leading a dialogue. To benchmark progress, we introduce a new dataset of GuessMovie, which contains 16,881 documents, each describing a movie, and associated 13,434 dialogues. Further, we propose the MD3 model. Keeping guessing the target document in mind, it converses with the user conditioned on both document engagement and user feedback. In order to incorporate large-scale external documents into the dialogue, it pretrains a document representation which is sensitive to attributes it talks about an object. Then it tracks dialogue state by detecting evolvement of document belief and attribute belief, and finally optimizes dialogue policy in principle of entropy decreasing and reward increasing, which is expected to successfully guess the user's target in a minimum number of turns. Experiments show that our method significantly outperforms several strong baseline methods and is very close to human's performance.

preprint2021arXiv

High-Temperature Structure Detection in Ferromagnets

This paper studies structure detection problems in high temperature ferromagnetic (positive interaction only) Ising models. The goal is to distinguish whether the underlying graph is empty, i.e., the model consists of independent Rademacher variables, versus the alternative that the underlying graph contains a subgraph of a certain structure. We give matching upper and lower minimax bounds under which testing this problem is possible/impossible respectively. Our results reveal that a key quantity called graph arboricity drives the testability of the problem. On the computational front, under a conjecture of the computational hardness of sparse principal component analysis, we prove that, unless the signal is strong enough, there are no polynomial time tests which are capable of testing this problem. In order to prove this result we exhibit a way to give sharp inequalities for the even moments of sums of i.i.d. Rademacher random variables which may be of independent interest.

preprint2020arXiv

"Why is 'Chicago' deceptive?" Towards Building Model-Driven Tutorials for Humans

To support human decision making with machine learning models, we often need to elucidate patterns embedded in the models that are unsalient, unknown, or counterintuitive to humans. While existing approaches focus on explaining machine predictions with real-time assistance, we explore model-driven tutorials to help humans understand these patterns in a training phase. We consider both tutorials with guidelines from scientific papers, analogous to current practices of science communication, and automatically selected examples from training data with explanations. We use deceptive review detection as a testbed and conduct large-scale, randomized human-subject experiments to examine the effectiveness of such tutorials. We find that tutorials indeed improve human performance, with and without real-time assistance. In particular, although deep learning provides superior predictive performance than simple models, tutorials and explanations from simple models are more useful to humans. Our work suggests future directions for human-centered tutorials and explanations towards a synergy between humans and AI.

preprint2020arXiv

A Deep Learning based Wearable Healthcare IoT Device for AI-enabled Hearing Assistance Automation

With the recent booming of artificial intelligence (AI), particularly deep learning techniques, digital healthcare is one of the prevalent areas that could gain benefits from AI-enabled functionality. This research presents a novel AI-enabled Internet of Things (IoT) device operating from the ESP-8266 platform capable of assisting those who suffer from impairment of hearing or deafness to communicate with others in conversations. In the proposed solution, a server application is created that leverages Google's online speech recognition service to convert the received conversations into texts, then deployed to a micro-display attached to the glasses to display the conversation contents to deaf people, to enable and assist conversation as normal with the general population. Furthermore, in order to raise alert of traffic or dangerous scenarios, an 'urban-emergency' classifier is developed using a deep learning model, Inception-v4, with transfer learning to detect/recognize alerting/alarming sounds, such as a horn sound or a fire alarm, with texts generated to alert the prospective user. The training of Inception-v4 was carried out on a consumer desktop PC and then implemented into the AI based IoT application. The empirical results indicate that the developed prototype system achieves an accuracy rate of 92% for sound recognition and classification with real-time performance.

preprint2020arXiv

Covariance-based sample selection for heterogeneous data: Applications to gene expression and autism risk gene detection

Risk for autism can be influenced by genetic mutations in hundreds of genes. Based on findings showing that genes with highly correlated gene expressions are functionally interrelated, "guilt by association" methods such as DAWN have been developed to identify these autism risk genes. Previous research analyzes the BrainSpan dataset, which contains gene expression of brain tissues from varying regions and developmental periods. Since the spatiotemporal properties of brain tissue is known to affect the gene expression's covariance, previous research have focused only on a specific subset of samples to avoid the issue of heterogeneity. This leads to a potential loss of power when detecting risk genes. In this article, we develop a new method called COBS (COvariance-Based sample Selection) to find a larger and more homogeneous subset of samples that share the same population covariance matrix for the downstream DAWN analysis. To demonstrate COBS's effectiveness, we utilize genetic risk scores from two sequential data freezes obtained in 2014 and 2019. We show COBS improves DAWN's ability to predict risk genes detected in the newer data freeze when utilizing the risk scores of the older data freeze as input.

preprint2020arXiv

Enhancing classical target detection performance using nonclassical Light

In this article, we demonstrate theoretically and experimentally how one can exploit correlations generated in monolithic semiconductor quantum light sources to enhance the performance of optical target detection. A prototype target detection protocol, the quantum time-correlation (QTC) detection protocol, with spontaneous parametric down-converted photon-pair sources, is discussed. The QTC protocol only requires time-resolved photon-counting detection, which is phase-insensitive and therefore suitable for optical target detection. As a comparison to the QTC detection protocol, we also consider a classical phase-insensitive target detection protocol based on intensity detection. We formulated the target detection problem as a probe light transmission estimation problem, and we quantify the target detection performance with the Fisher information criterion and the receiver operation characteristic analysis. Unlike classical target detection and ranging protocols, the probe photons in our QTC detection protocol are completely indistinguishable from the background noise and therefore useful for covert ranging applications. Finally, our technological platform is highly scalable and tunable and thus amenable to large scale integration necessary for practical applications.

preprint2020arXiv

Enhancing LIDAR performance metrics using continuous-wave photon-pair sources

In order to enhance LIDAR performance metrics such as target detection sensitivity, noise resilience and ranging accuracy, we exploit the strong temporal correlation within the photon pairs generated in continuous-wave pumped semiconductor waveguides. The enhancement attained through the use of such non-classical sources is measured and compared to a corresponding target detection scheme based on simple photon-counting detection. The performances of both schemes are quantified by the estimation uncertainty and Fisher information of the probe photon transmission, which is a widely adopted sensing figure of merit. The target detection experiments are conducted with high probe channel loss ($\simeq 1-5\times10^{-5}$) and formidable environment noise up to 36 dB stronger than the detected probe power of $1.64\times 10^{-5}$ pW. The experimental result shows significant advantages offered by the enhanced scheme with up to 26.3 dB higher performance in terms of estimation uncertainty, which is equivalent to a reduction of target detection time by a factor of 430 or 146 (21.6 dB) times more resilience to noise. We also experimentally demonstrated ranging with these non-classical photon pairs generated with continuous-wave pump in the presence of strong noise and loss, achieving $\approx$5 cm distance resolution that is limited by the temporal resolution of the detectors.

preprint2020arXiv

EQL -- an extremely easy to learn knowledge graph query language, achieving highspeed and precise search

EQL, also named as Extremely Simple Query Language, can be widely used in the field of knowledge graph, precise search, strong artificial intelligence, database, smart speaker ,patent search and other fields. EQL adopt the principle of minimalism in design and pursues simplicity and easy to learn so that everyone can master it quickly. EQL language and lambda calculus are interconvertible, that reveals the mathematical nature of EQL language, and lays a solid foundation for rigor and logical integrity of EQL language. The EQL language and a comprehensive knowledge graph system with the world's commonsense can together form the foundation of strong AI in the future, and make up for the current lack of understanding of world's commonsense by current AI system. EQL language can be used not only by humans, but also as a basic language for data query and data exchange between robots.

preprint2020arXiv

Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network

In this paper, we explore the slot tagging with only a few labeled support sentences (a.k.a. few-shot). Few-shot slot tagging faces a unique challenge compared to the other few-shot classification problems as it calls for modeling the dependencies between labels. But it is hard to apply previously learned label dependencies to an unseen domain, due to the discrepancy of label sets. To tackle this, we introduce a collapsed dependency transfer mechanism into the conditional random field (CRF) to transfer abstract label dependency patterns as transition scores. In the few-shot setting, the emission score of CRF can be calculated as a word's similarity to the representation of each label. To calculate such similarity, we propose a Label-enhanced Task-Adaptive Projection Network (L-TapNet) based on the state-of-the-art few-shot classification model -- TapNet, by leveraging label name semantics in representing labels. Experimental results show that our model significantly outperforms the strongest few-shot learning baseline by 14.64 F1 scores in the one-shot setting.

preprint2020arXiv

Joint measurement of time-frequency entanglement via sum frequency generation

We propose, analyze, and evaluate a technique for the joint measurement of time-frequency entanglement between two photons. In particular, we show that the frequency sum and time difference of two photons could be simultaneously measured through the sum-frequency generation process, without measuring the time or frequency of each individual photon. We demonstrate the usefulness of this technique by using it to design a time-frequency entanglement based continuous variable superdense coding and a quantum illumination protocol. Performance analysis of these two protocols suggests that the joint measurement of strong time-frequency entanglement of non-classical photon pairs can significantly enhance the performance of joint-measurement based quantum communication and metrology protocols.

preprint2020arXiv

Label-Wise Document Pre-Training for Multi-Label Text Classification

A major challenge of multi-label text classification (MLTC) is to stimulatingly exploit possible label differences and label correlations. In this paper, we tackle this challenge by developing Label-Wise Pre-Training (LW-PT) method to get a document representation with label-aware information. The basic idea is that, a multi-label document can be represented as a combination of multiple label-wise representations, and that, correlated labels always cooccur in the same or similar documents. LW-PT implements this idea by constructing label-wise document classification tasks and trains label-wise document encoders. Finally, the pre-trained label-wise encoder is fine-tuned with the downstream MLTC task. Extensive experimental results validate that the proposed method has significant advantages over the previous state-of-the-art models and is able to discover reasonable label relationship. The code is released to facilitate other researchers.

preprint2020arXiv

Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees

We propose a meta path planning algorithm named \emph{Neural Exploration-Exploitation Trees~(NEXT)} for learning from prior experience for solving new path planning problems in high dimensional continuous state and action spaces. Compared to more classical sampling-based methods like RRT, our approach achieves much better sample efficiency in high-dimensions and can benefit from prior experience of planning in similar environments. More specifically, NEXT exploits a novel neural architecture which can learn promising search directions from problem structures. The learned prior is then integrated into a UCB-type algorithm to achieve an online balance between \emph{exploration} and \emph{exploitation} when solving a new problem. We conduct thorough experiments to show that NEXT accomplishes new planning problems with more compact search trees and significantly outperforms state-of-the-art methods on several benchmarks.

preprint2020arXiv

Neural Polysynthetic Language Modelling

Research in natural language processing commonly assumes that approaches that work well for English and and other widely-used languages are "language agnostic". In high-resource languages, especially those that are analytic, a common approach is to treat morphologically-distinct variants of a common root as completely independent word types. This assumes, that there are limited morphological inflections per root, and that the majority will appear in a large enough corpus, so that the model can adequately learn statistics about each form. Approaches like stemming, lemmatization, or subword segmentation are often used when either of those assumptions do not hold, particularly in the case of synthetic languages like Spanish or Russian that have more inflection than English. In the literature, languages like Finnish or Turkish are held up as extreme examples of complexity that challenge common modelling assumptions. Yet, when considering all of the world's languages, Finnish and Turkish are closer to the average case. When we consider polysynthetic languages (those at the extreme of morphological complexity), approaches like stemming, lemmatization, or subword modelling may not suffice. These languages have very high numbers of hapax legomena, showing the need for appropriate morphological handling of words, without which it is not possible for a model to capture enough word statistics. We examine the current state-of-the-art in language modelling, machine translation, and text prediction for four polysynthetic languages: Guaraní, St. Lawrence Island Yupik, Central Alaskan Yupik, and Inuktitut. We then propose a novel framework for language modelling that combines knowledge representations from finite-state morphological analyzers with Tensor Product Representations in order to enable neural language models capable of handling the full range of typologically variant languages.

preprint2020arXiv

Non-classical Semiconductor Photon Sources Enhancing the Performance of Classical Target Detection Systems

We demonstrate and analyze how deploying non-classical intensity correlations obtained from a monolithic semiconductor quantum photon source can enhance classical target detection systems. This is demonstrated by examining the advantages offered by the utilization of the non-classical correlations in a correlation based target detection protocol. We experimentally demonstrate that under the same condition, the target contrast obtained from the protocol when non-classical correlations are utilized exhibits an improvement of up to 17.79dB over the best classical intensity correlation-based target detection protocol, under 29.69dB channel loss and excess noise 13.40dB stronger than the probe signal. We also assessed how the strong frequency correlations within the non-classical photon pairs can be used to further enhance this protocol.

preprint2020arXiv

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e.g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies. Besides, the library allows users to choose different sparsity-inducing regularizers, including the convex $\ell_1$, nonconvex MCP and SCAD regularizers. The library is coded in C++ and has user-friendly R and Python wrappers. Numerical experiments demonstrate that picasso can scale up to large problems efficiently.

preprint2020arXiv

SDFN: Segmentation-based Deep Fusion Network for Thoracic Disease Classification in Chest X-ray Images

This study aims to automatically diagnose thoracic diseases depicted on the chest x-ray (CXR) images using deep convolutional neural networks. The existing methods generally used the entire CXR images for training purposes, but this strategy may suffer from two drawbacks. First, potential misalignment or the existence of irrelevant objects in the entire CXR images may cause unnecessary noise and thus limit the network performance. Second, the relatively low image resolution caused by the resizing operation, which is a common preprocessing procedure for training neural networks, may lead to the loss of image details, making it difficult to detect pathologies with small lesion regions. To address these issues, we present a novel method termed as segmentation-based deep fusion network (SDFN), which leverages the domain knowledge and the higherresolution information of local lung regions. Specifically, the local lung regions were identified and cropped by the Lung Region Generator (LRG). Two CNN-based classification models were then used as feature extractors to obtain the discriminative features of the entire CXR images and the cropped lung region images. Lastly, the obtained features were fused by the feature fusion module for disease classification. Evaluated by the NIH benchmark split on the Chest X-ray 14 Dataset, our experimental result demonstrated that the developed method achieved more accurate disease classification compared with the available approaches via the receiver operating characteristic (ROC) analyses. It was also found that the SDFN could localize the lesion regions more precisely as compared to the traditional method.

preprint2020arXiv

Target Detection aided by Quantum Temporal Correlations: Theoretical Analysis and Experimental Validation

The detection of objects in the presence of significant background noise is a problem of fundamental interest in sensing. In this work, we theoretically analyze a prototype target detection protocol, the quantum temporal correlation (QTC) detection protocol, which is implemented in this work utilizing spontaneous parametric down-converted photon-pair sources. The QTC detection protocol only requires time-resolved photon-counting detection, which is phase-insensitive and therefore suitable for optical target detection. As a comparison to the QTC detection protocol, we also consider a classical phase-insensitive target detection protocol based on intensity detection that is practical in the optical regime. We formulated the target detection problem as a total probe photon transmission estimation problem and obtain an analytical expression of the receiver operating characteristic (ROC) curves. We carry out experiments using a semiconductor waveguide source, which we developed and previously reported. The experimental results agree very well with the theoretical prediction. In particular, we find that in a high-level environment noise and loss, the QTC detection protocol can achieve performance comparable to that of the classical protocol (that is practical in the optical regime) but with $\simeq 57$ times lower detection time in terms of ROC curve metric. The performance of the QTC detection protocol experiment setup could be further improved with a higher transmission of the reference photon and better detector time uncertainty. Furthermore, the probe photons in the QTC detection protocol are completely indistinguishable from the background noise and therefore useful for covert ranging applications. Finally, our technological platform is highly scalable as well as tunable and thus amenable to large scale integration, which is necessary for practical applications.

preprint2020arXiv

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, $\ell_q$ Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME). These methods exploit different nonsmooth loss functions to gain modeling flexibility, estimation robustness, and tuning insensitiveness. The developed solver is based on the alternating direction method of multipliers (ADMM). The package flare is coded in double precision C, and called from R by a user-friendly interface. The memory usage is optimized by using the sparse matrix output. The experiments show that flare is efficient and can scale up to large problems.

preprint2020arXiv

The huge Package for High-dimensional Undirected Graph Estimation in R

We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency.

preprint2016arXiv

A Partially Linear Framework for Massive Heterogeneous Data

We consider a partially linear framework for modelling massive heterogeneous data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator for the commonality parameter that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. This oracular result holds when the number of sub-populations does not grow too fast. A plug-in estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. We also test the heterogeneity among a large number of sub-populations. All the above results require to regularize each sub-estimation as though it had the entire sample size. Our general theory applies to the divide-and-conquer approach that is often used to deal with massive homogeneous data. A technical by-product of this paper is the statistical inferences for the general kernel ridge regression. Thorough numerical results are also provided to back up our theory.

preprint2016arXiv

A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations

We propose a new inferential framework for constructing confidence regions and testing hypotheses in statistical models specified by a system of high dimensional estimating equations. We construct an influence function by projecting the fitted estimating equations to a sparse direction obtained by solving a large-scale linear program. Our main theoretical contribution is to establish a unified Z-estimation theory of confidence regions for high dimensional problems. Different from existing methods, all of which require the specification of the likelihood or pseudo-likelihood, our framework is likelihood-free. As a result, our approach provides valid inference for a broad class of high dimensional constrained estimating equation problems, which are not covered by existing methods. Such examples include, noisy compressed sensing, instrumental variable regression, undirected graphical models, discriminant analysis and vector autoregressive models. We present detailed theoretical results for all these examples. Finally, we conduct thorough numerical simulations, and a real dataset analysis to back up the developed theoretical results.

preprint2016arXiv

ECA: High Dimensional Elliptical Component Analysis in non-Gaussian Distributions

We present a robust alternative to principal component analysis (PCA) --- called elliptical component analysis (ECA) --- for analyzing high dimensional, elliptically distributed data. ECA estimates the eigenspace of the covariance matrix of the elliptical data. To cope with heavy-tailed elliptical distributions, a multivariate rank statistic is exploited. At the model-level, we consider two settings: either that the leading eigenvectors of the covariance matrix are non-sparse or that they are sparse. Methodologically, we propose ECA procedures for both non-sparse and sparse settings. Theoretically, we provide both non-asymptotic and asymptotic analyses quantifying the theoretical performances of ECA. In the non-sparse setting, we show that ECA's performance is highly related to the effective rank of the covariance matrix. In the sparse setting, the results are twofold: (i) We show that the sparse ECA estimator based on a combinatoric program attains the optimal rate of convergence; (ii) Based on some recent developments in estimating sparse leading eigenvectors, we show that a computationally efficient sparse ECA estimator attains the optimal rate of convergence under a suboptimal scaling.

preprint2016arXiv

Heterogeneity Adjustment with Applications to Graphical Model Inference

Heterogeneity is an unwanted variation when analyzing aggregated datasets from multiple sources. Though different methods have been proposed for heterogeneity adjustment, no systematic theory exists to justify these methods. In this work, we propose a generic framework named ALPHA (short for Adaptive Low-rank Principal Heterogeneity Adjustment) to model, estimate, and adjust heterogeneity from the original data. Once the heterogeneity is adjusted, we are able to remove the biases of batch effects and to enhance the inferential power by aggregating the homogeneous residuals from multiple sources. Under a pervasive assumption that the latent heterogeneity factors simultaneously affect a large fraction of observed variables, we provide a rigorous theory to justify the proposed framework. Our framework also allows the incorporation of informative covariates and appeals to the "Bless of Dimensionality". As an illustrative application of this generic framework, we consider a problem of estimating high-dimensional precision matrix for graphical model inference based on multiple datasets. We also provide thorough numerical studies on both synthetic datasets and a brain imaging dataset to demonstrate the efficacy of the developed theory and methods.

preprint2016arXiv

How2Sketch: Generating Easy-To-Follow Tutorials for Sketching 3D Objects

Accurately drawing 3D objects is difficult for untrained individuals, as it requires an understanding of perspective and its effects on geometry and proportions. Step-by-step tutorials break the complex task of sketching an entire object down into easy-to-follow steps that even a novice can follow. However, creating such tutorials requires expert knowledge and is a time-consuming task. As a result, the availability of tutorials for a given object or viewpoint is limited. How2Sketch addresses this problem by automatically generating easy-to-follow tutorials for arbitrary 3D objects. Given a segmented 3D model and a camera viewpoint,it computes a sequence of steps for constructing a drawing scaffold comprised of geometric primitives, which helps the user draw the final contours in correct perspective and proportion. To make the drawing scaffold easy to construct, the algorithm solves for an ordering among the scaffolding primitives and explicitly makes small geometric modifications to the size and location of the object parts to simplify relative positioning. Technically, we formulate this scaffold construction as a single selection problem that simultaneously solves for the ordering and geometric changes of the primitives. We demonstrate our algorithm for generating tutorials on a variety of man-made objects and evaluate how easily the tutorials can be followed with a user study.

preprint2016arXiv

Max-Norm Optimization for Robust Matrix Recovery

This paper studies the matrix completion problem under arbitrary sampling schemes. We propose a new estimator incorporating both max-norm and nuclear-norm regularization, based on which we can conduct efficient low-rank matrix recovery using a random subset of entries observed with additive noise under general non-uniform and unknown sampling distributions. This method significantly relaxes the uniform sampling assumption imposed for the widely used nuclear-norm penalized approach, and makes low-rank matrix recovery feasible in more practical settings. Theoretically, we prove that the proposed estimator achieves fast rates of convergence under different settings. Computationally, we propose an alternating direction method of multipliers algorithm to efficiently compute the estimator, which bridges a gap between theory and practice of machine learning methods with max-norm regularization. Further, we provide thorough numerical studies to evaluate the proposed method using both simulated and real datasets.

preprint2016arXiv

Nonparametric Heterogeneity Testing For Massive Data

A massive dataset often consists of a growing number of (potentially) heterogeneous sub-populations. This paper is concerned about testing various forms of heterogeneity arising from massive data. In a general nonparametric framework, a set of testing procedures are designed to accommodate a growing number of sub-populations, denoted as $s$, with computational feasibility. In theory, their null limit distributions are derived as being nearly Chi-square with diverging degrees of freedom as long as $s$ does not grow too fast. Interestingly, we find that a lower bound on $s$ needs to be set for obtaining a sufficiently powerful testing result, so-called "blessing of aggregation." As a by-produc, a type of homogeneity testing is also proposed with a test statistic being aggregated over all sub-populations. Numerical results are presented to support our theory.

preprint2016arXiv

Provable Sparse Tensor Decomposition

We propose a novel sparse tensor decomposition method, namely Tensor Truncated Power (TTP) method, that incorporates variable selection into the estimation of decomposition components. The sparsity is achieved via an efficient truncation step embedded in the tensor power iteration. Our method applies to a broad family of high dimensional latent variable models, including high dimensional Gaussian mixture and mixtures of sparse regressions. A thorough theoretical investigation is further conducted. In particular, we show that the final decomposition estimator is guaranteed to achieve a local statistical rate, and further strengthen it to the global statistical rate by introducing a proper initialization procedure. In high dimensional regimes, the obtained statistical rate significantly improves those shown in the existing non-sparse decomposition methods. The empirical advantages of TTP are confirmed in extensive simulated results and two real applications of click-through rate prediction and high-dimensional gene clustering.

preprint2016arXiv

Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution

Correlation matrices play a key role in many multivariate methods (e.g., graphical model estimation and factor analysis). The current state-of-the-art in estimating large correlation matrices focuses on the use of Pearson's sample correlation matrix. Although Pearson's sample correlation matrix enjoys various good properties under Gaussian models, it is not an effective estimator when facing heavy-tailed distributions. As a robust alternative, Han and Liu [J. Am. Stat. Assoc. 109 (2015) 275-287] advocated the use of a transformed version of the Kendall's tau sample correlation matrix in estimating high dimensional latent generalized correlation matrix under the transelliptical distribution family (or elliptical copula). The transelliptical family assumes that after unspecified marginal monotone transformations, the data follow an elliptical distribution. In this paper, we study the theoretical properties of the Kendall's tau sample correlation matrix and its transformed version proposed in Han and Liu [J. Am. Stat. Assoc. 109 (2015) 275-287] for estimating the population Kendall's tau correlation matrix and the latent Pearson's correlation matrix under both spectral and restricted spectral norms. With regard to the spectral norm, we highlight the role of "effective rank" in quantifying the rate of convergence. With regard to the restricted spectral norm, we for the first time present a "sign sub-Gaussian condition" which is sufficient to guarantee that the rank-based correlation matrix estimator attains the fast rate of convergence. In both cases, we do not need any moment condition.

preprint2016arXiv

Typical dynamics of plane rational maps with equal degrees

Let $f:\mathbb{CP}^2\dashrightarrow\mathbb{CP^2}$ be a rational map with algebraic and topological degrees both equal to $d\geq 2$. Little is known in general about the ergodic properties of such maps. We show here, however, that for an open set of automorphisms $T:\mathbb{CP}^2\to\mathbb{CP}^2$, the perturbed map $T\circ f$ admits exactly two ergodic measures of maximal entropy $\log d$, one of saddle and one of repelling type. Neither measure is supported in an algebraic curve, and $T\circ f$ is `fully two dimensional' in the sense that it does not preserve any singular holomorphic foliation. Absence of an invariant foliation extends to all $T$ outside a countable union of algebraic subsets. Finally, we illustrate all of our results in a more concrete particular instance connected with a two dimensional version of the well-known quadratic Chebyshev map.

preprint2015arXiv

A General Framework for Robust Testing and Confidence Regions in High-Dimensional Quantile Regression

We propose a robust inferential procedure for assessing uncertainties of parameter estimation in high-dimensional linear models, where the dimension $p$ can grow exponentially fast with the sample size $n$. Our method combines the de-biasing technique with the composite quantile function to construct an estimator that is asymptotically normal. Hence it can be used to construct valid confidence intervals and conduct hypothesis tests. Our estimator is robust and does not require the existence of first or second moment of the noise distribution. It also preserves efficiency in the sense that the worst case efficiency loss is less than 30\% compared to the square-loss-based de-biased Lasso estimator. In many cases our estimator is close to or better than the latter, especially when the noise is heavy-tailed. Our de-biasing procedure does not require solving the $L_1$-penalized composite quantile regression. Instead, it allows for any first-stage estimator with desired convergence rate and empirical sparsity. The paper also provides new proof techniques for developing theoretical guarantees of inferential procedures with non-smooth loss functions. To establish the main results, we exploit the local curvature of the conditional expectation of composite quantile loss and apply empirical process theories to control the difference between empirical quantities and their conditional expectations. Our results are established under weaker assumptions compared to existing work on inference for high-dimensional quantile regression. Furthermore, we consider a high-dimensional simultaneous test for the regression parameters by applying the Gaussian approximation and multiplier bootstrap theories. We also study distributed learning and exploit the divide-and-conquer estimator to reduce computation complexity when the sample size is massive. Finally, we provide empirical results to verify the theory.

preprint2015arXiv

A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models

We consider the problem of uncertainty assessment for low dimensional components in high dimensional models. Specifically, we propose a decorrelated score function to handle the impact of high dimensional nuisance parameters. We consider both hypothesis tests and confidence regions for generic penalized M-estimators. Unlike most existing inferential methods which are tailored for individual models, our approach provides a general framework for high dimensional inference and is applicable to a wide range of applications. From the testing perspective, we develop general theorems to characterize the limiting distributions of the decorrelated score test statistic under both null hypothesis and local alternatives. These results provide asymptotic guarantees on the type I errors and local powers of the proposed test. Furthermore, we show that the decorrelated score function can be used to construct point and confidence region estimators that are semiparametrically efficient. We also generalize this framework to broaden its applications. First, we extend it to handle high dimensional null hypothesis, where the number of parameters of interest can increase exponentially fast with the sample size. Second, we establish the theory for model misspecification. Third, we go beyond the likelihood framework, by introducing the generalized score test based on general loss functions. Thorough numerical studies are conducted to back up the developed theoretical results.

preprint2015arXiv

A Likelihood Ratio Framework for High Dimensional Semiparametric Regression

We propose a likelihood ratio based inferential framework for high dimensional semiparametric generalized linear models. This framework addresses a variety of challenging problems in high dimensional data analysis, including incomplete data, selection bias, and heterogeneous multitask learning. Our work has three main contributions. (i) We develop a regularized statistical chromatography approach to infer the parameter of interest under the proposed semiparametric generalized linear model without the need of estimating the unknown base measure function. (ii) We propose a new framework to construct post-regularization confidence regions and tests for the low dimensional components of high dimensional parameters. Unlike existing post-regularization inferential methods, our approach is based on a novel directional likelihood. In particular, the framework naturally handles generic regularized estimators with nonconvex penalty functions and it can be used to infer least false parameters under misspecified models. (iii) We develop new concentration inequalities and normal approximation results for U-statistics with unbounded kernels, which are of independent interest. We demonstrate the consequences of the general theory by using an example of missing data problem. Extensive simulation studies and real data analysis are provided to illustrate our proposed approach.

preprint2015arXiv

An Overview on the Estimation of Large Covariance and Precision Matrices

Estimating large covariance and precision matrices are fundamental in modern multivariate analysis. The problems arise from statistical analysis of large panel economics and finance data. The covariance matrix reveals marginal correlations between variables, while the precision matrix encodes conditional correlations between pairs of variables given the remaining variables. In this paper, we provide a selective review of several recent developments on estimating large covariance and precision matrices. We focus on two general approaches: rank based method and factor model based method. Theories and applications of both approaches are presented. These methods are expected to be widely applicable to analysis of economic and financial data.

preprint2015arXiv

Distributed Estimation and Inference with Statistical Guarantees

This paper studies hypothesis testing and parameter estimation in the context of the divide and conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from $k$ subsamples of size $n/k$, where $n$ is the sample size. In both low dimensional and high dimensional settings, we address the important question of how to choose $k$ as $n$ grows large, providing a theoretical upper bound on $k$ such that the information loss due to the divide and conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as a practically infeasible oracle with access to the full sample. Thorough numerical results are provided to back up the theory.

preprint2015arXiv

Graphical Fermat's Principle and Triangle-Free Graph Estimation

We consider the problem of estimating undirected triangle-free graphs of high dimensional distributions. Triangle-free graphs form a rich graph family which allows arbitrary loopy structures but 3-cliques. For inferential tractability, we propose a graphical Fermat's principle to regularize the distribution family. Such principle enforces the existence of a distribution-dependent pseudo-metric such that any two nodes have a smaller distance than that of two other nodes who have a geodesic path include these two nodes. Guided by this principle, we show that a greedy strategy is able to recover the true graph. The resulting algorithm only requires a pairwise distance matrix as input and is computationally even more efficient than calculating the minimum spanning tree. We consider graph estimation problems under different settings, including discrete and nonparametric distribution families. Thorough numerical results are provided to illustrate the usefulness of the proposed method.

preprint2015arXiv

High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality

We provide a general theory of the expectation-maximization (EM) algorithm for inferring high dimensional latent variable models. In particular, we make two contributions: (i) For parameter estimation, we propose a novel high dimensional EM algorithm which naturally incorporates sparsity structure into parameter estimation. With an appropriate initialization, this algorithm converges at a geometric rate and attains an estimator with the (near-)optimal statistical rate of convergence. (ii) Based on the obtained estimator, we propose new inferential procedures for testing hypotheses and constructing confidence intervals for low dimensional components of high dimensional parameters. For a broad family of statistical models, our framework establishes the first computationally feasible approach for optimal estimation and asymptotic inference in high dimensions. Our theory is supported by thorough numerical results.

preprint2015arXiv

Large Covariance Estimation through Elliptical Factor Models

We proposed a general Principal Orthogonal complEment Thresholding (POET) framework for large-scale covariance matrix estimation based on an approximate factor model. A set of high level sufficient conditions for the procedure to achieve optimal rates of convergence under different matrix norms were brought up to better understand how POET works. Such a framework allows us to recover the results for sub-Gaussian in a more transparent way that only depends on the concentration properties of the sample covariance matrix. As a new theoretical contribution, for the first time, such a framework allows us to exploit conditional sparsity covariance structure for the heavy-tailed data. In particular, for the elliptical data, we proposed a robust estimator based on marginal and multivariate Kendall's tau to satisfy these conditions. In addition, conditional graphical model was also studied under the same framework. The technical tools developed in this paper are of general interest to high dimensional principal component analysis. Thorough numerical results were also provided to back up the developed theory.

preprint2015arXiv

Local and Global Inference for High Dimensional Nonparanormal Graphical Models

This paper proposes a unified framework to quantify local and global inferential uncertainty for high dimensional nonparanormal graphical models. In particular, we consider the problems of testing the presence of a single edge and constructing a uniform confidence subgraph. Due to the presence of unknown marginal transformations, we propose a pseudo likelihood based inferential approach. In sharp contrast to the existing high dimensional score test method, our method is free of tuning parameters given an initial estimator, and extends the scope of the existing likelihood based inferential framework. Furthermore, we propose a U-statistic multiplier bootstrap method to construct the confidence subgraph. We show that the constructed subgraph is contained in the true graph with probability greater than a given nominal level. Compared with existing methods for constructing confidence subgraphs, our method does not rely on Gaussian or sub-Gaussian assumptions. The theoretical properties of the proposed inferential methods are verified by thorough numerical experiments and real data analysis.

preprint2015arXiv

On Semiparametric Exponential Family Graphical Models

We propose a new class of semiparametric exponential family graphical models for the analysis of high dimensional mixed data. Different from the existing mixed graphical models, we allow the nodewise conditional distributions to be semiparametric generalized linear models with unspecified base measure functions. Thus, one advantage of our method is that it is unnecessary to specify the type of each node and the method is more convenient to apply in practice. Under the proposed model, we consider both problems of parameter estimation and hypothesis testing in high dimensions. In particular, we propose a symmetric pairwise score test for the presence of a single edge in the graph. Compared to the existing methods for hypothesis tests, our approach takes into account of the symmetry of the parameters, such that the inferential results are invariant with respect to the different parametrizations of the same edge. Thorough numerical simulations and a real data example are provided to back up our results.

preprint2015arXiv

Optimal computational and statistical rates of convergence for sparse nonconvex learning problems

We provide theoretical analysis of the statistical and computational properties of penalized $M$-estimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regularization path-following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence for any local solution attained by the algorithm. Computationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all first-order algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In particular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty.

preprint2015arXiv

Optimal linear estimation under unknown nonlinear transform

Linear regression studies the problem of estimating a model parameter $β^* \in \mathbb{R}^p$, from $n$ observations $\{(y_i,\mathbf{x}_i)\}_{i=1}^n$ from linear model $y_i = \langle \mathbf{x}_i,β^* \rangle + ε_i$. We consider a significant generalization in which the relationship between $\langle \mathbf{x}_i,β^* \rangle$ and $y_i$ is noisy, quantized to a single bit, potentially nonlinear, noninvertible, as well as unknown. This model is known as the single-index model in statistics, and, among other things, it represents a significant generalization of one-bit compressed sensing. We propose a novel spectral-based estimation procedure and show that we can recover $β^*$ in settings (i.e., classes of link function $f$) where previous algorithms fail. In general, our algorithm requires only very mild restrictions on the (unknown) functional relationship between $y_i$ and $\langle \mathbf{x}_i,β^* \rangle$. We also consider the high dimensional setting where $β^*$ is sparse ,and introduce a two-stage nonconvex framework that addresses estimation challenges in high dimensional regimes where $p \gg n$. For a broad class of link functions between $\langle \mathbf{x}_i,β^* \rangle$ and $y_i$, we establish minimax lower bounds that demonstrate the optimality of our estimators in both the classical and high dimensional regimes.

preprint2015arXiv

QUADRO: A supervised dimension reduction method via Rayleigh quotient optimization

We propose a novel Rayleigh quotient based sparse quadratic dimension reduction method - named QUADRO (Quadratic Dimension Reduction via Rayleigh Optimization) - for analyzing high- dimensional data. Unlike in the linear setting where Rayleigh quotient optimization coincides with classification, these two problems are very different under nonlinear settings. In this paper, we clarify this difference and show that Rayleigh quotient optimization may be of independent scientific interests. One major challenge of Rayleigh quotient optimization is that the variance of quadratic statistics involves all fourth cross-moments of predictors, which are infeasible to compute for high-dimensional applications and may accumulate too many stochastic errors. This issue is resolved by considering a family of elliptical models. Moreover, for heavy-tail distributions, robust estimates of mean vectors and covariance matrices are employed to guarantee uniform convergence in estimating nonpolynomially many parameters, even though only the fourth moments are assumed. Methodologically, QUADRO is based on elliptical models which allow us to formulate the Rayleigh quotient maximization as a convex optimization problem. Computationally, we propose an efficient linearized augmented Lagrangian method to solve the constrained optimization problem. Theoretically, we provide explicit rates of convergence in terms of Rayleigh quotient under both Gaussian and general elliptical models. Thorough numerical results on both synthetic and real datasets are also provided to back up our theoretical results.

preprint2015arXiv

Robust Inference of Risks of Large Portfolios

We propose a bootstrap-based robust high-confidence level upper bound (Robust H-CLUB) for assessing the risks of large portfolios. The proposed approach exploits rank-based and quantile-based estimators, and can be viewed as a robust extension of the H-CLUB method (Fan et al., 2015). Such an extension allows us to handle possibly misspecified models and heavy-tailed data. Under mixing conditions, we analyze the proposed approach and demonstrate its advantage over the H-CLUB. We further provide thorough numerical results to back up the developed theory. We also apply the proposed method to analyze a stock market dataset.

preprint2015arXiv

Sharp Computational-Statistical Phase Transitions via Oracle Computational Model

We study the fundamental tradeoffs between computational tractability and statistical accuracy for a general family of hypothesis testing problems with combinatorial structures. Based upon an oracle model of computation, which captures the interactions between algorithms and data, we establish a general lower bound that explicitly connects the minimum testing risk under computational budget constraints with the intrinsic probabilistic and combinatorial structures of statistical problems. This lower bound mirrors the classical statistical lower bound by Le Cam (1986) and allows us to quantify the optimal statistical performance achievable given limited computational budgets in a systematic fashion. Under this unified framework, we sharply characterize the statistical-computational phase transition for two testing problems, namely, normal mean detection and sparse principal component detection. For normal mean detection, we consider two combinatorial structures, namely, sparse set and perfect matching. For these problems we identify significant gaps between the optimal statistical accuracy that is achievable under computational tractability constraints and the classical statistical lower bounds. Compared with existing works on computational lower bounds for statistical problems, which consider general polynomial-time algorithms on Turing machines, and rely on computational hardness hypotheses on problems like planted clique detection, we focus on the oracle computational model, which covers a broad range of popular algorithms, and do not rely on unproven hypotheses. Moreover, our result provides an intuitive and concrete interpretation for the intrinsic computational intractability of high-dimensional statistical problems. One byproduct of our result is a lower bound for a strict generalization of the matrix permanent problem, which is of independent interest.

preprint2015arXiv

Sparse Nonlinear Regression: Parameter Estimation and Asymptotic Inference

We study parameter estimation and asymptotic inference for sparse nonlinear regression. More specifically, we assume the data are given by $y = f( x^\top β^* ) + ε$, where $f$ is nonlinear. To recover $β^*$, we propose an $\ell_1$-regularized least-squares estimator. Unlike classical linear regression, the corresponding optimization problem is nonconvex because of the nonlinearity of $f$. In spite of the nonconvexity, we prove that under mild conditions, every stationary point of the objective enjoys an optimal statistical rate of convergence. In addition, we provide an efficient algorithm that provably converges to a stationary point. We also access the uncertainty of the obtained estimator. Specifically, based on any stationary point of the objective, we construct valid hypothesis tests and confidence intervals for the low dimensional components of the high-dimensional parameter $β^*$. Detailed numerical results are provided to back up our theory.

preprint2015arXiv

Statistical Limits of Convex Relaxations

Many high dimensional sparse learning problems are formulated as nonconvex optimization. A popular approach to solve these nonconvex optimization problems is through convex relaxations such as linear and semidefinite programming. In this paper, we study the statistical limits of convex relaxations. Particularly, we consider two problems: Mean estimation for sparse principal submatrix and edge probability estimation for stochastic block model. We exploit the sum-of-squares relaxation hierarchy to sharply characterize the limits of a broad class of convex relaxations. Our result shows statistical optimality needs to be compromised for achieving computational tractability using convex relaxations. Compared with existing results on computational lower bounds for statistical problems, which consider general polynomial-time algorithms and rely on computational hardness hypotheses on problems like planted clique detection, our theory focuses on a broad class of convex relaxations and does not rely on unproven hypotheses.

preprint2015arXiv

The Knowledge Gradient Policy Using A Sparse Additive Belief Model

We propose a sequential learning policy for noisy discrete global optimization and ranking and selection (R\&S) problems with high dimensional sparse belief functions, where there are hundreds or even thousands of features, but only a small portion of these features contain explanatory power. We aim to identify the sparsity pattern and select the best alternative before the finite budget is exhausted. We derive a knowledge gradient policy for sparse linear models (KGSpLin) with group Lasso penalty. This policy is a unique and novel hybrid of Bayesian R\&S with frequentist learning. Particularly, our method naturally combines B-spline basis expansion and generalizes to the nonparametric additive model (KGSpAM) and functional ANOVA model. Theoretically, we provide the estimation error bounds of the posterior mean estimate and the functional estimate. Controlled experiments show that the algorithm efficiently learns the correct set of nonzero parameters even when the model is imbedded with hundreds of dummy parameters. Also it outperforms the knowledge gradient for a linear model.

preprint2015arXiv

Towards High-Performance Two-Dimensional Black Phosphorus Optoelectronic Devices: the Role of Metal Contacts

The metal contacts on 2D black phosphorus field-effect transistor and photodetectors are studied. The metal work functions can significantly impact the Schottky barrier at the metal-semiconductor contact in black phosphorus devices. Higher metal work functions lead to larger output hole currents in p-type transistors, while ambipolar characteristics can be observed with lower work function metals. Photodetectors with record high photoresponsivity (223 mA/W) are demonstrated on black phosphorus through contact-engineering.

preprint2014arXiv

A Direct Estimation of High Dimensional Stationary Vector Autoregressions

The vector autoregressive (VAR) model is a powerful tool in modeling complex time series and has been exploited in many fields. However, fitting high dimensional VAR model poses some unique challenges: On one hand, the dimensionality, caused by modeling a large number of time series and higher order autoregressive processes, is usually much higher than the time series length; On the other hand, the temporal dependence structure in the VAR model gives rise to extra theoretical challenges. In high dimensions, one popular approach is to assume the transition matrix is sparse and fit the VAR model using the "least squares" method with a lasso-type penalty. In this manuscript, we propose an alternative way in estimating the VAR model. The main idea is, via exploiting the temporal dependence structure, to formulate the estimating problem into a linear program. There is instant advantage for the proposed approach over the lasso-type estimators: The estimation equation can be decomposed into multiple sub-equations and accordingly can be efficiently solved in a parallel fashion. In addition, our method brings new theoretical insights into the VAR model analysis. So far the theoretical results developed in high dimensions (e.g., Song and Bickel (2011) and Kock and Callot (2012)) mainly pose assumptions on the design matrix of the formulated regression problems. Such conditions are indirect about the transition matrices and not transparent. In contrast, our results show that the operator norm of the transition matrices plays an important role in estimation accuracy. We provide explicit rates of convergence for both estimation and prediction. In addition, we provide thorough experiments on both synthetic and real-world equity data to show that there are empirical advantages of our method over the lasso-type estimators in both parameter estimation and forecasting.

preprint2014arXiv

Black Phosphorus-Monolayer MoS2 van der Waals Heterojunction P-N Diode

Phosphorene, an elemental 2D material, which is the monolayer of black phosphorus, has been mechanically exfoliated recently. In its bulk form, black phosphorus shows high carrier mobility (~10000 cm2/Vs) and a ~0.3 eV direct bandgap. Well-behaved p-type field-effect transistors with mobilities of up to 1000 cm2/Vs, as well as phototransistors, have been demonstrated on few-layer black phosphorus, showing its promise for electronics and optoelectronics applications due to its high hole mobility and thickness-dependence direct bandgap. However, p-n junctions, the basic building blocks of modern electronic and optoelectronic devices, have not yet been realized based on black phosphorus. In this paper, we demonstrate a gate tunable p-n diode based on a p-type black phosphorus/n-type monolayer MoS2 van der Waals p-n heterojunction. Upon illumination, these ultra-thin p-n diodes show a maximum photodetection responsivity of 418 mA/W at the wavelength of 633 nm, and photovoltaic energy conversion with an external quantum efficiency of 0.3%. These p-n diodes show promise for broadband photodetection and solar energy harvesting.

preprint2014arXiv

Challenges of Big Data Analysis

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

preprint2014arXiv

Chloride Molecular Doping Technique on 2D Materials: WS2 and MoS2

Low-resistivity metal-semiconductor (M-S) contact is one of the urgent challenges in the research of 2D transition metal dichalcogenides (TMDs). Here, we report a chloride molecular doping technique which greatly reduces the contact resistance (Rc) in the few-layer WS2 and MoS2. After doping, the Rc of WS2 and MoS2 have been decreased to 0.7 kohm*um and 0.5 kohm*um, respectively. The significant reduction of the Rc is attributed to the achieved high electron doping density thus significant reduction of Schottky barrier width. As a proof-ofconcept, high-performance few-layer WS2 field-effect transistors (FETs) are demonstrated, exhibiting a high drain current of 380 uA/um, an on/off ratio of 4*106, and a peak field-effect mobility of 60 cm2/V*s. This doping technique provides a highly viable route to diminish the Rc in TMDs, paving the way for high-performance 2D nano-electronic devices.

preprint2014arXiv

Contact Research Strategy for Emerging Molybdenum Disulfide and Other Two-Dimensional Field-effect Transistors

Layered two-dimensional (2D) semiconducting transition metal dichalcogenides (TMD) have been widely isolated, synthesized, and characterized recently. Numerous 2D materials are identified as the potential candidates as channel materials for future thin film technology due to their high mobility and the exhibiting bandgaps. While many TMD filed-effect transistors (FETs) have been widely demonstrated along with a significant progress to clearly understand the device physics, large contact resistance at metal/semiconductor interface still remain a challenge. From 2D device research point of view, how to minimize the Schottky barrier effects on contacts thus reduce the contact resistance of metals on 2D materials is very critical for the further development of the field. Here, we present a review of contact research on molybdenum disulfide and other TMD FETs from the fundamental understanding of metal-semiconductor interfaces on 2D materials. A clear contact research strategy on 2D semiconducting materials is developed for future high-performance 2D FETs with aggressively scaled dimensions.

preprint2014arXiv

Device Perspective for Black Phosphorus Field-Effect Transistors: Contact Resistance, Ambipolar and Scaling

Although monolayer black phosphorus (BP) or phosphorene has been successfully exfoliated and its optical properties have been explored, most of electrical performance of the devices is demonstrated on few-layer phosphorene and ultra-thin BP films. In this paper, we study the channel length scaling of ultra-thin BP field-effect transistors (FETs), and discuss a scheme for using various contact metals to change transistor characteristics. Through studying transistor behaviors with various channel lengths, the contact resistance can be extracted from the transfer length method (TLM). With different contact metals, we find out that the metal/BP interface has different Schottky barrier heights, leading to a significant difference in contact resistance, which is quite different from previous studies of transition metal dichalcogenides (TMDs) such as MoS2 where Fermi-level is strongly pinned near conduction band edge at metal/MoS2 interface. The nature of BP transistors are Schottky barrier FETs, where the on and off states are controlled by tuning the Schottky barriers at the two contacts. We also observe the ambipolar characteristics of BP transistors with enhanced n-type drain current and demonstrate that the p-type carriers can be easily shifted to n-type or vice versus by controlling the gate bias and drain bias, showing the potential to realize BP CMOS logic circuits.

preprint2014arXiv

High Dimensional Semiparametric Latent Graphical Model for Mixed Data

Graphical models are commonly used tools for modeling multivariate random variables. While there exist many convenient multivariate distributions such as Gaussian distribution for continuous data, mixed data with the presence of discrete variables or a combination of both continuous and discrete variables poses new challenges in statistical modeling. In this paper, we propose a semiparametric model named latent Gaussian copula model for binary and mixed data. The observed binary data are assumed to be obtained by dichotomizing a latent variable satisfying the Gaussian copula distribution or the nonparanormal distribution. The latent Gaussian model with the assumption that the latent variables are multivariate Gaussian is a special case of the proposed model. A novel rank-based approach is proposed for both latent graph estimation and latent principal component analysis. Theoretically, the proposed methods achieve the same rates of convergence for both precision matrix estimation and eigenvector estimation, as if the latent variables were observed. Under similar conditions, the consistency of graph structure recovery and feature selection for leading eigenvectors is established. The performance of the proposed methods is numerically assessed through simulation studies, and the usage of our methods is illustrated by a genetic dataset.

preprint2014arXiv

High Dimensional Semiparametric Scale-Invariant Principal Component Analysis

We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world datasets.

preprint2014arXiv

High-Performance MoS2 Field-Effect Transistors Enabled by Chloride Doping: Record Low Contact Resistance (0.5 kohm*um) and Record High Drain Current (460 uA/um)

In this paper, we report a novel chemical doping technique to reduce the contact resistance (Rc) of transition metal dichalcogenides (TMDs) - eliminating two major roadblocks (namely, doping and high Rc) towards demonstration of high-performance TMDs field-effect transistors (FETs). By using 1,2 dichloroethane (DCE) as the doping reagent, we demonstrate an active n-type doping density > 2*1019 cm-3 in a few-layer MoS2 film. This enabled us to reduce the Rc value to a record low number of 0.5 kohm*um, which is ~10x lower than the control sample without doping. The corresponding specific contact resistivity (pc) is found to decrease by two orders of magnitude. With such low Rc, we demonstrate 100 nm channel length (Lch) MoS2 FET with a drain current (Ids) of 460 uA/um at Vds = 1.6 V, which is twice the best value reported so far on MoS2 FETs.

preprint2014arXiv

Joint Estimation of Multiple Graphical Models from High Dimensional Time Series

In this manuscript we consider the problem of jointly estimating multiple graphical models in high dimensions. We assume that the data are collected from n subjects, each of which consists of T possibly dependent observations. The graphical models of subjects vary, but are assumed to change smoothly corresponding to a measure of closeness between subjects. We propose a kernel based method for jointly estimating all graphical models. Theoretically, under a double asymptotic framework, where both (T,n) and the dimension d can increase, we provide the explicit rate of convergence in parameter estimation. It characterizes the strength one can borrow across different individuals and impact of data dependence on parameter estimation. Empirically, experiments on both synthetic and real resting state functional magnetic resonance imaging (rs-fMRI) data illustrate the effectiveness of the proposed method.

preprint2014arXiv

Molecular Doping of Multilayer MoS2 Field-effect Transistors: Reduction in Sheet and Contact Resistances

For the first time, polyethyleneimine (PEI) doping on multilayer MoS2 field-effect transistors are investigated. A 2.6 times reduction in sheet resistance, and 1.2 times reduction in contact resistance have been achieved. The enhanced electrical characteristics are also reflected in a 70% improvement in ON current, and 50% improvement in extrinsic field-effect mobility. The threshold voltage also confirms a negative shift upon the molecular doping. All studies demonstrate the feasibility of PEI molecular doping in MoS2 transistors, and its potential applications in layer-structured semiconducting 2D crystals.

preprint2014arXiv

MoS2 Field-effect Transistors with Graphene/Metal Heterocontacts

For the first time, n-type few-layer MoS2 field-effect transistors with graphene/Ti as the hetero-contacts have been fabricated, showing more than 160 mA/mm drain current at 1 μm gate length with an on-off current ratio of 107. The enhanced electrical characteristic is confirmed in a nearly 2.1 times improvement in on-resistance and a 3.3 times improvement in contact resistance with hetero-contacts compared to the MoS2 FETs without graphene contact layer. Temperature dependent study on MoS2/graphene hetero-contacts has been also performed, still unveiling its Schottky contact nature. Transfer length method and a devised I-V method have been introduced to study the contact resistance and Schottky barrier height in MoS2/graphene /metal hetero-contacts structure.

preprint2014arXiv

Nonconvex Statistical Optimization: Minimax-Optimal Sparse PCA in Polynomial Time

Sparse principal component analysis (PCA) involves nonconvex optimization for which the global solution is hard to obtain. To address this issue, one popular approach is convex relaxation. However, such an approach may produce suboptimal estimators due to the relaxation effect. To optimally estimate sparse principal subspaces, we propose a two-stage computational framework named "tighten after relax": Within the 'relax' stage, we approximately solve a convex relaxation of sparse PCA with early stopping to obtain a desired initial estimator; For the 'tighten' stage, we propose a novel algorithm called sparse orthogonal iteration pursuit (SOAP), which iteratively refines the initial estimator by directly solving the underlying nonconvex problem. A key concept of this two-stage framework is the basin of attraction. It represents a local region within which the `tighten' stage has desired computational and statistical guarantees. We prove that, the initial estimator obtained from the 'relax' stage falls into such a region, and hence SOAP geometrically converges to a principal subspace estimator which is minimax-optimal within a certain model class. Unlike most existing sparse PCA estimators, our approach applies to the non-spiked covariance models, and adapts to non-Gaussianity as well as dependent data settings. Moreover, through analyzing the computational complexity of the two stages, we illustrate an interesting phenomenon that larger sample size can reduce the total iteration complexity. Our framework motivates a general paradigm for solving many complex statistical problems which involve nonconvex optimization with provable guarantees.

preprint2014arXiv

Nonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning

Tree structured graphical models are powerful at expressing long range or hierarchical dependency among many variables, and have been widely applied in different areas of computer science and statistics. However, existing methods for parameter estimation, inference, and structure learning mainly rely on the Gaussian or discrete assumptions, which are restrictive under many applications. In this paper, we propose new nonparametric methods based on reproducing kernel Hilbert space embeddings of distributions that can recover the latent tree structures, estimate the parameters, and perform inference for high dimensional continuous and non-Gaussian variables. The usefulness of the proposed methods are illustrated by thorough numerical results.

preprint2014arXiv

On the Impact of Dimension Reduction on Graphical Structures

Statisticians and quantitative neuroscientists have actively promoted the use of independence relationships for investigating brain networks, genomic networks, and other measurement technologies. Estimation of these graphs depends on two steps. First is a feature extraction by summarizing measurements within a parcellation, regional or set definition to create nodes. Secondly, these summaries are then used to create a graph representing relationships of interest. In this manuscript we study the impact of dimension reduction on graphs that describe different notions of relations among a set of random variables. We are particularly interested in undirected graphs that capture the random variables' independence and conditional independence relations. A dimension reduction procedure can be any mapping from high dimensional spaces to low dimensional spaces. We exploit a general framework for modeling the raw data and advocate that in estimating the undirected graphs, any acceptable dimension reduction procedure should be a graph-homotopic mapping, i.e., the graphical structure of the data after dimension reduction should inherit the main characteristics of the graphical structure of the raw data. We show that, in terms of inferring undirected graphs that characterize the conditional independence relations among random variables, many dimension reduction procedures, such as the mean, median, or principal components, cannot be theoretically guaranteed to be a graph-homotopic mapping. The implications of this work are broad. In the most charitable setting for researchers, where the correct node definition is known, graphical relationships can be contaminated merely via the dimension reduction. The manuscript ends with a concrete example, characterizing a subset of graphical structures such that the dimension reduction procedure using the principal components can be a graph-homotopic mapping.

preprint2014arXiv

Phosphorene: A New 2D Material with High Carrier Mobility

Preceding the current interest in layered materials for electronic applications, research in the 1960's found that black phosphorus combines high carrier mobility with a fundamental band gap. We introduce its counterpart, dubbed few-layer phosphorene, as a new 2D p-type material. Same as graphene and MoS2, phosphorene is flexible and can be mechanically exfoliated. We find phosphorene to be stable and, unlike graphene, to have an inherent, direct and appreciable band-gap that depends on the number of layers. Our transport studies indicate a carrier mobility that reflects its structural anisotropy and is superior to MoS2. At room temperature, our phosphorene field-effect transistors with 1.0 um channel length display a high on-current of 194 mA/mm, a high hole field-effect mobility of 286 cm2/Vs, and an on/off ratio up to 1E4. We demonstrate the possibility of phosphorene integration by constructing the first 2D CMOS inverter of phosphorene PMOS and MoS2 NMOS transistors.

preprint2014arXiv

Semiconducting Black Phosphorus: Synthesis, Transport Properties and Electronic Applications

Phosphorus is one of the most abundant elements preserved in earth, constructing with a fraction of ~0.1% of the earth crust. In general, phosphorus has several allotropes. The two most commonly seen allotropes, white and red phosphorus, are widely used in explosives and safety matches. In addition, black phosphorus, though rarely mentioned, is a layered semiconductor and have great potentials in optical and electronic applications. Remarkably, this layered material can be reduced to one single atomic layer in the vertical direction owing to the van der Waals structure, known as phosphorene, where the physical properties can be tremendously different from its bulk counterpart. In this review article, we trace back to the 100 years research history on black phosphorus from the synthesis to material properties, and extend the topic from black phosphorus to phosphorene. The physical and transport properties are highlighted, aiming at further applications in electronic and optoelectronics devices.

preprint2014arXiv

Smooth projected density estimation

We introduce and analyse a new nonparametric estimator of a multi-dimensional density. Our smooth projection estimator (SPE) is defined by a least squares projection of the sample onto an infinite dimensional mixture class via an undersmoothed nonparametric pilot estimate, which acts as a structural filter to regularise the solution. The undersmoothing is required to optimise the convergence rate of the SPE, which is jointly determined by that of the pilot estimator to the true density in squared $\mathbb{L}_{2}$ norm, and by that of the pilot distribution function to the empirical distribution function in uniform norm. Our procedure was conceived with a view to exploiting well known results in convex analysis and their connection to mixture densities. In the context of our work, this translates to the observation that the infinite dimensional minimisation problem, implicit in the construction of the SPE, possesses a solution of dimension at most $n+1$, where $n$ is the sample size. The SPE thus enjoys practical advantages such as computational efficiency, ease of storage and rapid evaluation at a new data point.

preprint2014arXiv

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Compositions of Expected-Value Functions

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., problems of the form $\min_x \mathbf{E}_v [f_v\big(\mathbf{E}_w [g_w(x)]\big)]$. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of $f_v,g_{w}$ and use an auxiliary variable to track the unknown quantity $\mathbf{E}_w[g_w(x)]$. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieve a convergence rate of $O(k^{-1/4})$ in the general case and $O(k^{-2/3})$ in the strongly convex case, after taking $k$ samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of $O(k^{-2/7})$ in the general case and $O(k^{-4/5})$ in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.

preprint2014arXiv

Temporal and Thermal Stability of Al2O3-passivated Phosphorene MOSFETs

This letter evaluates temporal and thermal stability of a state-of-the-art few-layer phosphorene MOSFET with Al2O3 surface passivation and Ti/Au top gate. As fabricated, the phosphorene MOSFET was stable in atmosphere for at least 100 h. With annealing at 200°C in dry nitrogen for 1 h, its drain current increased by an order of magnitude to approximately 100 mA/mm, which could be attributed to the reduction of trapped charge in Al2O3 and/or Schottky barrier at the source and drain contacts. Thereafter, the drain current was stable between -50°C and 150°C up to at least 1000 h. These promising results suggest that environmental protection of phosphorene should not be a major concern, and passivation of phosphorene should focus on its effect on electronic control and transport as in conventional silicon MOSFETs. With cutoff frequencies approaching the gigahertz range, the present phosphorene MOSFET, although far from being optimized, can meet the frequency and stability requirements of most flexible electronics for which phosphorene is intrinsically advantageous due to its corrugated lattice structure.

preprint2014arXiv

Testing and Confidence Intervals for High Dimensional Proportional Hazards Model

This paper proposes a decorrelation-based approach to test hypotheses and construct confidence intervals for the low dimensional component of high dimensional proportional hazards models. Motivated by the geometric projection principle, we propose new decorrelated score, Wald and partial likelihood ratio statistics. Without assuming model selection consistency, we prove the asymptotic normality of these test statistics, establish their semiparametric optimality. We also develop new procedures for constructing pointwise confidence intervals for the baseline hazard function and baseline survival function. Thorough numerical results are provided to back up our theory.

preprint2014arXiv

The Effect of Dielectric Capping on Few-Layer Phosphorene Transistors: Tuning the Schottky Barrier Heights

Phosphorene is a unique single elemental semiconductor with two-dimensional layered structures. In this letter, we study the transistor behavior on mechanically exfoliated few-layer phosphorene with the top-gate. We achieve a high on-current of 144 mA/mm and hole mobility of 95.6 cm2/Vs. We deposit Al2O3 by atomic layer deposition (ALD) and study the effects of dielectric capping. We observe that the polarity of the transistors alternated from p-type to ambipolar with Al2O3 grown on the top. We attribute this transition to the changes for the effective Schottky barrier heights for both electrons and holes at the metal contact edges, which is originated from fixed charges in the ALD dielectric.

preprint2014arXiv

Two-Dimensional TaSe2 Metallic Crystals: Spin-Orbit Scattering Length and Breakdown Current Density

We have determined the spin-orbit scattering length of two-dimensional layered 2H-TaSe2 metallic crystals by detailed characterization of the weak anti-localization phenomena in this strong spin-orbit interaction material. By fitting the observed magneto-conductivity, the spin-orbit scattering length for 2H-TaSe2 is determined to be 17 nm in the few-layer films. This small spin-orbit scattering length is comparable to that of Pt, which is widely used to study the spin Hall effect, and indicates the potential of TaSe2 for use in spin Hall effect devices. In addition to strong spin-orbit coupling, a material must also support large charge currents to achieve spin-transfer-torque via the spin Hall effect. Therefore, we have characterized the room temperature breakdown current density of TaSe2 in air, where the best breakdown current density reaches 3.7$\times$10$^7$ A/cm$^2$. This large breakdown current further indicates the potential of TaSe2 for use in spin-torque devices and two-dimensional device interconnect applications.

preprint2013arXiv

Graph Estimation From Multi-attribute Data

Many real world network problems often concern multivariate nodal attributes such as image, textual, and multi-view feature vectors on nodes, rather than simple univariate nodal attributes. The existing graph estimation methods built on Gaussian graphical models and covariance selection algorithms can not handle such data, neither can the theories developed around such methods be directly applied. In this paper, we propose a new principled framework for estimating graphs from multi-attribute data. Instead of estimating the partial correlation as in current literature, our method estimates the partial canonical correlations that naturally accommodate complex nodal features. Computationally, we provide an efficient algorithm which utilizes the multi-attribute structure. Theoretically, we provide sufficient conditions which guarantee consistent graph recovery. Extensive simulation studies demonstrate performance of our method under various conditions. Furthermore, we provide illustrative applications to uncovering gene regulatory networks from gene and protein profiles, and uncovering brain connectivity graph from functional magnetic resonance imaging data.

preprint2013arXiv

Magneto-Transport in MoS2: Phase Coherence, Spin Orbit Scattering and the Hall Factor

We have characterized phase coherence length, spin orbit scattering length, and the Hall factor in n-type MoS2 2D crystals via weak localization measurements and Hall-effect measurements. Weak localization measurements reveal a phase coherence length of ~50 nm at T = 400 mK for a few-layer MoS2 film, decreasing as T^-1/2 with increased temperatures. Weak localization measurements also allow us, for the first time without optical techniques, to estimate the spin orbit scattering length to be 430 nm, pointing to the potential of MoS2 for spintronics applications. Via Hall-effect measurements, we observe a low temperature Hall mobility of 311 cm2/Vs at T = 1 K which decreases as a power law with a characteristic exponent γ=1.5 from 10 K to 60 K. At room temperature, we observe Hall mobility of 24 cm2/Vs. By determining the Hall factor for MoS2 to be 1.35 at T = 1 K and 2.4 at room temperature, we observe drift mobility of 420 cm2/Vs and 56 cm2/Vs at T = 1 K and room temperature, respectively.

preprint2013arXiv

Optimal Feature Selection in High-Dimensional Discriminant Analysis

We consider the high-dimensional discriminant analysis problem. For this problem, different methods have been proposed and justified by establishing exact convergence rates for the classification risk, as well as the l2 convergence results to the discriminative rule. However, sharp theoretical analysis for the variable selection performance of these procedures have not been established, even though model interpretation is of fundamental importance in scientific data analysis. This paper bridges the gap by providing sharp sufficient conditions for consistent variable selection using the sparse discriminant analysis (Mai et al., 2012). Through careful analysis, we establish rates of convergence that are significantly faster than the best known results and admit an optimal scaling of the sample size n, dimensionality p, and sparsity level s in the high-dimensional setting. Sufficient conditions are complemented by the necessary information theoretic limits on the variable selection problem in the context of high-dimensional discriminant analysis. Exploiting a numerical equivalence result, our method also establish the optimal results for the ROAD estimator (Fan et al., 2012) and the sparse optimal scaling estimator (Clemmensen et al., 2011). Furthermore, we analyze an exhaustive search procedure, whose performance serves as a benchmark, and show that it is variable selection consistent under weaker conditions. Extensive simulations demonstrating the sharpness of the bounds are also provided.

preprint2013arXiv

Optimal Tests of Treatment Effects for the Overall Population and Two Subpopulations in Randomized Trials, using Sparse Linear Programming

We propose new, optimal methods for analyzing randomized trials, when it is suspected that treatment effects may differ in two predefined subpopulations. Such sub-populations could be defined by a biomarker or risk factor measured at baseline. The goal is to simultaneously learn which subpopulations benefit from an experimental treatment, while providing strong control of the familywise Type I error rate. We formalize this as a multiple testing problem and show it is computationally infeasible to solve using existing techniques. Our solution involves a novel approach, in which we first transform the original multiple testing problem into a large, sparse linear program. We then solve this problem using advanced optimization techniques. This general method can solve a variety of multiple testing problems and decision theory problems related to optimal trial design, for which no solution was previously available. In particular, we construct new multiple testing procedures that satisfy minimax and Bayes optimality criteria. For a given optimality criterion, our new approach yields the optimal tradeoff? between power to detect an effect in the overall population versus power to detect effects in subpopulations. We demonstrate our approach in examples motivated by two randomized trials of new treatments for HIV.

preprint2013arXiv

Optimization for Compressed Sensing: the Simplex Method and Kronecker Sparsification

In this paper we present two new approaches to efficiently solve large-scale compressed sensing problems. These two ideas are independent of each other and can therefore be used either separately or together. We consider all possibilities. For the first approach, we note that the zero vector can be taken as the initial basic (infeasible) solution for the linear programming problem and therefore, if the true signal is very sparse, some variants of the simplex method can be expected to take only a small number of pivots to arrive at a solution. We implemented one such variant and demonstrate a dramatic improvement in computation time on very sparse signals. The second approach requires a redesigned sensing mechanism in which the vector signal is stacked into a matrix. This allows us to exploit the Kronecker compressed sensing (KCS) mechanism. We show that the Kronecker sensing requires stronger conditions for perfect recovery compared to the original vector problem. However, the Kronecker sensing, modeled correctly, is a much sparser linear optimization problem. Hence, algorithms that benefit from sparse problem representation, such as interior-point methods, can solve the Kronecker sensing problems much faster than the corresponding vector problem. In our numerical studies, we demonstrate a ten-fold improvement in the computation time.

preprint2013arXiv

Soft Null Hypotheses: A Case Study of Image Enhancement Detection in Brain Lesions

This work is motivated by a study of a population of multiple sclerosis (MS) patients using dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to identify active brain lesions. At each visit, a contrast agent is administered intravenously to a subject and a series of images is acquired to reveal the location and activity of MS lesions within the brain. Our goal is to identify and quantify lesion enhancement location at the subject level and lesion enhancement patterns at the population level. With this example, we aim to address the difficult problem of transforming a qualitative scientific null hypothesis, such as "this voxel does not enhance", to a well-defined and numerically testable null hypothesis based on existing data. We call the procedure "soft null hypothesis" testing as opposed to the standard "hard null hypothesis" testing. This problem is fundamentally different from: 1) testing when a quantitative null hypothesis is given; 2) clustering using a mixture distribution; or 3) identifying a reasonable threshold with a parametric null assumption. We analyze a total of 20 subjects scanned at 63 visits (~30Gb), the largest population of such clinical brain images.

preprint2013arXiv

Sparse Median Graphs Estimation in a High Dimensional Semiparametric Model

In this manuscript a unified framework for conducting inference on complex aggregated data in high dimensional settings is proposed. The data are assumed to be a collection of multiple non-Gaussian realizations with underlying undirected graphical structures. Utilizing the concept of median graphs in summarizing the commonality across these graphical structures, a novel semiparametric approach to modeling such complex aggregated data is provided along with robust estimation of the median graph, which is assumed to be sparse. The estimator is proved to be consistent in graph recovery and an upper bound on the rate of convergence is given. Experiments on both synthetic and real datasets are conducted to illustrate the empirical usefulness of the proposed models and methods.

preprint2013arXiv

Sparse Nonparametric Graphical Models

We present some nonparametric methods for graphical modeling. In the discrete case, where the data are binary or drawn from a finite alphabet, Markov random fields are already essentially nonparametric, since the cliques can take only a finite number of values. Continuous data are different. The Gaussian graphical model is the standard parametric model for continuous data, but it makes distributional assumptions that are often unrealistic. We discuss two approaches to building more flexible graphical models. One allows arbitrary graphs and a nonparametric extension of the Gaussian; the other uses kernel density estimation and restricts the graphs to trees and forests. Examples of both methods are presented. We also discuss possible future research directions for nonparametric graphical modeling.

preprint2013arXiv

Sparse Principal Component Analysis for High Dimensional Vector Autoregressive Models

We study sparse principal component analysis for high dimensional vector autoregressive time series under a doubly asymptotic framework, which allows the dimension $d$ to scale with the series length $T$. We treat the transition matrix of time series as a nuisance parameter and directly apply sparse principal component analysis on multivariate time series as if the data are independent. We provide explicit non-asymptotic rates of convergence for leading eigenvector estimation and extend this result to principal subspace estimation. Our analysis illustrates that the spectral norm of the transition matrix plays an essential role in determining the final rates. We also characterize sufficient conditions under which sparse principal component analysis attains the optimal parametric rate. Our theoretical results are backed up by thorough numerical studies.

preprint2013arXiv

Statistical Study of Deep Sub-Micron Dual-Gated Field-Effect Transistors on Monolayer CVD Molybdenum Disulfide Films

Monolayer Molybdenum Disulfide (MoS2) with a direct band gap of 1.8 eV is a promising two-dimensional material with a potential to surpass graphene in next generation nanoelectronic applications. In this letter, we synthesize monolayer MoS2 on Si/SiO2 substrate via chemical vapor deposition (CVD) method and comprehensively study the device performance based on dual-gated MoS2 field-effect transistors. Over 100 devices are studied to obtain a statistical description of device performance in CVD MoS2. We examine and scale down the channel length of the transistors to 100 nm and achieve record high drain current of 62.5 mA/mm in CVD monolayer MoS2 film ever reported. We further extract the intrinsic contact resistance of low work function metal Ti on monolayer CVD MoS2 with an expectation value of 175 Ω.mm, which can be significantly decreased to 10 Ω.mm by appropriate gating. Finally, field-effect mobilities (μFE) of the carriers at various channel lengths are obtained. By taking the impact of contact resistance into account, an average and maximum intrinsic μFE is estimated to be 13.0 and 21.6 cm2/Vs in monolayer CVD MoS2 films, respectively.

preprint2013arXiv

Switching Mechanism in Single-Layer Molybdenum Disulfide Transistors: an Insight into Current Flow across Schottky Barriers

In this article, we study the properties of metal contacts to single-layer molybdenum disulfide (MoS2) crystals, revealing the nature of switching mechanism in MoS2 transistors. On investigating transistor behavior as contact length changes, we find that the contact resistivity for metal/MoS2 junctions is defined by contact area instead of contact width. The minimum gate dependent transfer length is ~0.63 μm in the on-state for metal (Ti) contacted single-layer MoS2. These results reveal that MoS2 transistors are Schottky barrier transistors, where the on/off states are switched by the tuning the Schottky barriers at contacts. The effective barrier heights for source and drain barriers are primarily controlled by gate and drain biases, respectively. We discuss the drain induced barrier narrowing effect for short channel devices, which may reduce the influence of large contact resistance for MoS2 Schottky barrier transistors at the channel length scaling limit.

preprint2012arXiv

Channel Length Scaling of MoS2 MOSFETs

In this article, we investigate electrical transport properties in ultrathin body (UTB) MoS2 two-dimensional (2D) crystals with channel lengths ranging from 2 μm down to 50 nm. We compare the short channel behavior of sets of MOSFETs with various channel thickness, and reveal the superior immunity to short channel effects of MoS2 transistors. We observe no obvious short channel effects on the device with 100 nm channel length (Lch) fabricated on a 5 nm thick MoS2 2D crystal even when using 300 nm thick SiO2 as gate dielectric, and has a current on/off ratio up to ~109. We also observe the on-current saturation at short channel devices with continuous scaling due to the carrier velocity saturation. Also, we reveal the performance limit of short channel MoS2 transistors is dominated by the large contact resistance from the Schottky barrier between Ni and MoS2 interface, where a fully transparent contact is needed to achieve a high-performance short channel device.

preprint2012arXiv

High Dimensional Semiparametric Gaussian Copula Graphical Models

In this paper, we propose a semiparametric approach, named nonparanormal skeptic, for efficiently and robustly estimating high dimensional undirected graphical models. To achieve modeling flexibility, we consider Gaussian Copula graphical models (or the nonparanormal) as proposed by Liu et al. (2009). To achieve estimation robustness, we exploit nonparametric rank-based correlation coefficient estimators, including Spearman's rho and Kendall's tau. In high dimensional settings, we prove that the nonparanormal skeptic achieves the optimal parametric rate of convergence in both graph and parameter estimation. This celebrating result suggests that the Gaussian copula graphical models can be used as a safe replacement of the popular Gaussian graphical models, even when the data are truly Gaussian. Besides theoretical analysis, we also conduct thorough numerical simulations to compare different estimators for their graph recovery performance under both ideal and noisy settings. The proposed methods are then applied on a large-scale genomic dataset to illustrate their empirical usefulness. The R language software package huge implementing the proposed methods is available on the Comprehensive R Archive Network: http://cran. r-project.org/.

preprint2012arXiv

MoS2 Nanoribbon Transistors: Transition from Depletion-mode to Enhancement-mode by Channel Width Trimming

We study the channel width scaling of back-gated MoS2 metal-oxide-semiconductor field-effect transistors (MOSFETs) from 2 μm down to 60 nm. We reveal that the channel conductance scales linearly with channel width, indicating no evident edge damage for MoS2 nanoribbons with widths down to 60 nm as defined by plasma dry etching. However, these transistors show a strong positive threshold voltage (VT) shift with narrow channel widths of less than 200 nm. Our results also show that transistors with thinner channel thicknesses have larger VT shifts associated with width scaling. Devices fabricated on a 6 nm thick MoS2 crystal underwent the transition from depletion-mode to enhancement-mode.

preprint2012arXiv

The Integration of High-k Dielectric on Two-Dimensional Crystals by Atomic Layer Deposition

We investigate the integration of Al2O3 high-k dielectric on two-dimensional (2D) crystals of boron nitride (BN) and molybdenum disulfide (MoS2) by atomic layer deposition (ALD). We demonstrate the feasibility of direct ALD growth with trimethylaluminum(TMA) and water as precursors on both 2D crystals. Through theoretical and experimental studies, we found that the initial ALD cycles play the critical role, during which physical adsorption dominates precursor adsorption at the semiconductor surface. We model the initial ALD growth stages at the 2D surface by analyzing Lennard-Jones Potentials, which could guide future optimization of the ALD process on 2D crystals.

preprint2012arXiv

The Nonparanormal SKEPTIC

We propose a semiparametric approach, named nonparanormal skeptic, for estimating high dimensional undirected graphical models. In terms of modeling, we consider the nonparanormal family proposed by Liu et al (2009). In terms of estimation, we exploit nonparametric rank-based correlation coefficient estimators including the Spearman's rho and Kendall's tau. In high dimensional settings, we prove that the nonparanormal skeptic achieves the optimal parametric rate of convergence in both graph and parameter estimation. This result suggests that the nonparanormal graphical models are a safe replacement of the Gaussian graphical models, even when the data are Gaussian.

preprint2012arXiv

TIGER: A Tuning-Insensitive Approach for Optimally Estimating Gaussian Graphical Models

We propose a new procedure for estimating high dimensional Gaussian graphical models. Our approach is asymptotically tuning-free and non-asymptotically tuning-insensitive: it requires very few efforts to choose the tuning parameter in finite sample settings. Computationally, our procedure is significantly faster than existing methods due to its tuning-insensitive property. Theoretically, the obtained estimator is simultaneously minimax optimal for precision matrix estimation under different norms. Empirically, we illustrate the advantages of our method using thorough simulated and real examples. The R package bigmatrix implementing the proposed methods is available on the Comprehensive R Archive Network: http://cran.r-project.org/.

preprint2011arXiv

Atomic-Layer-Deposited Al2O3 on Bi2Te3 for Topological Insulator Field-Effect Transistors

We report dual-gate modulation of topological insulator field-effect transistors (TI FETs) made on Bi2Te3 thin flakes with integration of atomic-layer-deposited (ALD) Al2O3 high-k dielectric. Atomic force microscopy study shows that ALD Al2O3 is uniformly grown on this layer-structured channel material. Electrical characterization reveals that the right selection of ALD precursors and the related surface chemistry play a critical role in device performance of Bi2Te3 based TI FETs. We realize both top-gate and bottom-gate control on these devices, and the highest modulation rate of 76.1% is achieved by using simultaneous dual gate control.

preprint2011arXiv

Compressive Network Analysis

Modern data acquisition routinely produces massive amounts of network data. Though many methods and models have been proposed to analyze such data, the research of network data is largely disconnected with the classical theory of statistical learning and signal processing. In this paper, we present a new framework for modeling network data, which connects two seemingly different areas: network data analysis and compressed sensing. From a nonparametric perspective, we model an observed network using a large dictionary. In particular, we consider the network clique detection problem and show connections between our formulation with a new algebraic tool, namely Randon basis pursuit in homogeneous spaces. Such a connection allows us to identify rigorous recovery conditions for clique detection problems. Though this paper is mainly conceptual, we also develop practical approximation algorithms for solving empirical problems and demonstrate their usefulness on real-world datasets.

preprint2011arXiv

MoS2 Dual-Gate MOSFET with Atomic-Layer-Deposited Al2O3 as Top-Gate Dielectric

We demonstrate atomic-layer-deposited (ALD) high-k dielectric integration on two-dimensional (2D) layer-structured molybdenum disulfide (MoS2) crystals and MoS2 dual-gate n-channel MOSFETs with ALD Al2O3 as top-gate dielectric. Our C-V study of MOSFET structures shows good interface between 2D MoS2 crystal and ALD Al2O3. Maximum drain currents using back-gates and top-gates are measured to be 7.07mA/mm and 6.42mA/mm at Vds=2V with a channel width of 3 μm, a channel length of 9 μm, and a top-gate length of 3 μm. We achieve the highest field-effect mobility of electrons using back-gate control to be 517 cm^2/Vs. The highest current on/off ratio is over 10^8.

preprint2010arXiv

Forest Density Estimation

We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models.

preprint2010arXiv

Graph-Valued Regression

Undirected graphical models encode in a graph $G$ the dependency structure of a random vector $Y$. In many applications, it is of interest to model $Y$ given another random vector $X$ as input. We refer to the problem of estimating the graph $G(x)$ of $Y$ conditioned on $X=x$ as ``graph-valued regression.'' In this paper, we propose a semiparametric method for estimating $G(x)$ that builds a tree on the $X$ space just as in CART (classification and regression trees), but at each leaf of the tree estimates a graph. We call the method ``Graph-optimized CART,'' or Go-CART. We study the theoretical properties of Go-CART using dyadic partitioning trees, establishing oracle inequalities on risk minimization and tree partition consistency. We also demonstrate the application of Go-CART to a meteorological dataset, showing how graph-valued regression can provide a useful tool for analyzing complex data.

preprint2010arXiv

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include $K$-fold cross-validation ($K$-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including $K$-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.

Han Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

113 published item(s)

Low Rank Adaptation for Adversarial Perturbation

Beyond PID Controllers: PPO with Neuralized PID Policy for Proton Beam Intensity Control in Mu2e

HS-GCN: Hamming Spatial Graph Convolutional Networks for Recommendation

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

A Multi-rater Comparative Study of Automatic Target Localization Methods for Epilepsy Deep Brain Stimulation Procedures

A Simple Meta-learning Paradigm for Zero-shot Intent Classification with Mixture Attention Mechanism

An Optical Parametric Amplifier via $ χ^{(2)} $ in AlGaAs Waveguides

Bregman Proximal Langevin Monte Carlo via Bregman--Moreau Envelopes

Cats: Complementary CNN and Transformer Encoders for Segmentation

Label-enhanced Prototypical Network with Contrastive Learning for Multi-label Few-shot Aspect Category Detection

Learning to Infer Belief Embedded Communication

ModDrop++: A Dynamic Filter Network with Intra-subject Co-training for Multiple Sclerosis Lesion Segmentation with Missing Modalities

Review Polarity-wise Recommender

Survival Prediction of Brain Cancer with Incomplete Radiology, Pathology, Genomics, and Demographic Data

Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning

Synthetic CT Skull Generation for Transcranial MR Imaging-Guided Focused Ultrasound Interventions with Conditional Adversarial Networks

Wasserstein Distributionally Robust Optimization with Wasserstein Barycenters

A Survey on Epistemic (Model) Uncertainty in Supervised Learning: Recent Advances and Applications

BLOCKEYE: Hunting For DeFi Attacks on Blockchain

Converse, Focus and Guess -- Towards Multi-Document Driven Dialogue

High-Temperature Structure Detection in Ferromagnets

"Why is 'Chicago' deceptive?" Towards Building Model-Driven Tutorials for Humans

A Deep Learning based Wearable Healthcare IoT Device for AI-enabled Hearing Assistance Automation

Covariance-based sample selection for heterogeneous data: Applications to gene expression and autism risk gene detection

Enhancing classical target detection performance using nonclassical Light

Enhancing LIDAR performance metrics using continuous-wave photon-pair sources

EQL -- an extremely easy to learn knowledge graph query language, achieving highspeed and precise search

Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network

Joint measurement of time-frequency entanglement via sum frequency generation

Label-Wise Document Pre-Training for Multi-Label Text Classification

Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees

Neural Polysynthetic Language Modelling

Non-classical Semiconductor Photon Sources Enhancing the Performance of Classical Target Detection Systems

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

SDFN: Segmentation-based Deep Fusion Network for Thoracic Disease Classification in Chest X-ray Images

Target Detection aided by Quantum Temporal Correlations: Theoretical Analysis and Experimental Validation

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

The huge Package for High-dimensional Undirected Graph Estimation in R

A Partially Linear Framework for Massive Heterogeneous Data

A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations

ECA: High Dimensional Elliptical Component Analysis in non-Gaussian Distributions

Heterogeneity Adjustment with Applications to Graphical Model Inference

How2Sketch: Generating Easy-To-Follow Tutorials for Sketching 3D Objects

Max-Norm Optimization for Robust Matrix Recovery

Nonparametric Heterogeneity Testing For Massive Data

Provable Sparse Tensor Decomposition

Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution

Typical dynamics of plane rational maps with equal degrees

A General Framework for Robust Testing and Confidence Regions in High-Dimensional Quantile Regression

A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models

A Likelihood Ratio Framework for High Dimensional Semiparametric Regression

An Overview on the Estimation of Large Covariance and Precision Matrices

Distributed Estimation and Inference with Statistical Guarantees

Graphical Fermat's Principle and Triangle-Free Graph Estimation

High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality

Large Covariance Estimation through Elliptical Factor Models

Local and Global Inference for High Dimensional Nonparanormal Graphical Models

On Semiparametric Exponential Family Graphical Models

Optimal computational and statistical rates of convergence for sparse nonconvex learning problems

Optimal linear estimation under unknown nonlinear transform

QUADRO: A supervised dimension reduction method via Rayleigh quotient optimization

Robust Inference of Risks of Large Portfolios

Sharp Computational-Statistical Phase Transitions via Oracle Computational Model

Sparse Nonlinear Regression: Parameter Estimation and Asymptotic Inference

Statistical Limits of Convex Relaxations

The Knowledge Gradient Policy Using A Sparse Additive Belief Model

Towards High-Performance Two-Dimensional Black Phosphorus Optoelectronic Devices: the Role of Metal Contacts

A Direct Estimation of High Dimensional Stationary Vector Autoregressions

Black Phosphorus-Monolayer MoS2 van der Waals Heterojunction P-N Diode

Challenges of Big Data Analysis

Chloride Molecular Doping Technique on 2D Materials: WS2 and MoS2

Contact Research Strategy for Emerging Molybdenum Disulfide and Other Two-Dimensional Field-effect Transistors

Device Perspective for Black Phosphorus Field-Effect Transistors: Contact Resistance, Ambipolar and Scaling

High Dimensional Semiparametric Latent Graphical Model for Mixed Data