Source author record

Tao Lei

Tao Lei appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence cond-mat.mes-hall cond-mat.mtrl-sci Machine Learning Neural and Evolutionary Computing Computer Vision eess.AS eess.IV Human-Computer Interaction Information Retrieval math.PR Sound

Catalog footprint

What is connected

11works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Simple Recurrence Improves Masked Language Models

In this work, we explore whether modeling recurrence into the Transformer architecture can both be beneficial and efficient, by building an extremely simple recurrent module into the Transformer. We compare our model to baselines following the training and evaluation recipe of BERT. Our results confirm that recurrence can indeed improve Transformer models by a consistent margin, without requiring low-level performance optimizations, and while keeping the number of parameters constant. For example, our base model achieves an absolute improvement of 2.1 points averaged across 10 tasks and also demonstrates increased stability in fine-tuning over a range of learning rates.

preprint2021arXiv

Medical Image Segmentation Using Deep Learning: A Survey

Deep learning has been widely used for medical image segmentation and a large number of papers has been presented recording the success of deep learning in the field. In this paper, we present a comprehensive thematic survey on medical image segmentation using deep learning techniques. This paper makes two original contributions. Firstly, compared to traditional surveys that directly divide literatures of deep learning on medical image segmentation into many groups and introduce literatures in detail for each group, we classify currently popular literatures according to a multi-level structure from coarse to fine. Secondly, this paper focuses on supervised and weakly supervised learning approaches, without including unsupervised approaches since they have been introduced in many old surveys and they are not popular currently. For supervised learning approaches, we analyze literatures in three aspects: the selection of backbone networks, the design of network blocks, and the improvement of loss functions. For weakly supervised learning approaches, we investigate literature according to data augmentation, transfer learning, and interactive segmentation, separately. Compared to existing surveys, this survey classifies the literatures very differently from before and is more convenient for readers to understand the relevant rationale and will guide them to think of appropriate improvements in medical image segmentation based on deep learning approaches.

preprint2020arXiv

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling. In the hybrid ASR framework, the multistream CNN acoustic model processes an input of speech frames in multiple parallel pipelines where each stream has a unique dilation rate for diversity. Trained with the SpecAugment data augmentation method, it achieves relative word error rate (WER) improvements of 4% on test-clean and 14% on test-other. We further improve the performance via N-best rescoring using a 24-layer self-attentive SRU language model, achieving WERs of 1.75% on test-clean and 4.46% on test-other.

preprint2020arXiv

Interactive Classification by Asking Informative Questions

We study the potential for interaction in natural language classification. We add a limited form of interaction for intent classification, where users provide an initial query using natural language, and the system asks for additional information using binary or multi-choice questions. At each turn, our system decides between asking the most informative question or making the final classification prediction.The simplicity of the model allows for bootstrapping of the system without interaction data, instead relying on simple crowdsourcing tasks. We evaluate our approach on two domains, showing the benefit of interaction and the advantage of learning to balance between asking additional questions and making the final prediction.

preprint2020arXiv

Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

Selecting input features of top relevance has become a popular method for building self-explaining models. In this work, we extend this selective rationalization approach to text matching, where the goal is to jointly select and align text pieces, such as tokens or sentences, as a justification for the downstream prediction. Our approach employs optimal transport (OT) to find a minimal cost alignment between the inputs. However, directly applying OT often produces dense and therefore uninterpretable alignments. To overcome this limitation, we introduce novel constrained variants of the OT problem that result in highly sparse alignments with controllable sparsity. Our model is end-to-end differentiable using the Sinkhorn algorithm for OT and can be trained without any alignment annotations. We evaluate our model on the StackExchange, MultiNews, e-SNLI, and MultiRC datasets. Our model achieves very sparse rationale selections with high fidelity while preserving prediction accuracy compared to strong attention baseline models.

preprint2020arXiv

Spontaneous Formation of a Superconductor-Topological Insulator-Normal Metal Layered Heterostructure

The discovery of graphene has spurred vigorous investigation of 2D materials, revealing a wide range of extraordinary properties and functionalities. 2D heterostructural materials have recently been fabricated by assembling isolated planes layer-by-layer in a desired sequence. Unusual properties and novel physical phenomena have been unveiled in such layered heterostructures. For example, Hofstadter's butterfly, an intriguing pattern of the energy states of Bloch electrons, was predicted several decades ago to be observable only under unfeasibly strong magnetic fields in conventional materials. But it has been observed recently under current experimental conditions in graphene/BN layered heterostructures, one of the outstanding new kinds of 2D materials. Moreover, another amazing physics phenomenon, Majorana fermions was predicted to exist in heterostructural systems consisting of a superconductor (SC) and a topological insulator (TI) Journal.

preprint2016arXiv

Rationalizing Neural Predictions

Prediction without justification has limited applicability. As a remedy, we learn to extract pieces of input text as justifications -- rationales -- that are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by desiderata for rationales. We evaluate the approach on multi-aspect sentiment analysis against manually annotated test cases. Our approach outperforms attention-based baseline by a significant margin. We also successfully illustrate the method on the question retrieval task.

preprint2016arXiv

Semi-supervised Question Retrieval with Gated Convolutions

Question answering forums are rapidly growing in size with no effective automated ability to refer to and reuse answers already available for previous posted questions. In this paper, we develop a methodology for finding semantically related questions. The task is difficult since 1) key pieces of information are often buried in extraneous details in the question body and 2) available annotations on similar questions are scarce and fragmented. We design a recurrent and convolutional model (gated convolution) to effectively map questions to their semantic representations. The models are pre-trained within an encoder-decoder framework (from body to title) on the basis of the entire raw corpus, and fine-tuned discriminatively from limited annotations. Our evaluation demonstrates that our model yields substantial gains over a standard IR baseline and various neural network architectures (including CNNs, LSTMs and GRUs).

preprint2015arXiv

Molding CNNs for text: non-linear, non-consecutive convolutions

The success of deep learning often derives from well-chosen operational building blocks. In this work, we revise the temporal convolution operation in CNNs to better adapt it to text processing. Instead of concatenating word representations, we appeal to tensor algebra and use low-rank n-gram tensors to directly exploit interactions between words already at the convolution stage. Moreover, we extend the n-gram convolution to non-consecutive words to recognize patterns with intervening words. Through a combination of low-rank tensors, and pattern weighting, we can efficiently evaluate the resulting convolution operation via dynamic programming. We test the resulting architecture on standard sentiment classification and news categorization tasks. Our model achieves state-of-the-art performance both in terms of accuracy and training speed. For instance, we obtain 51.2% accuracy on the fine-grained sentiment classification task.

preprint2014arXiv

Effects of Oxygen Adsorption on the Surface State of Epitaxial Silicene on Ag(111)

Epitaxial silicene, which is one single layer of silicon atoms packed in a honeycomb structure, demonstrates a strong interaction with the substrate that dramatically affects its electronic structure. The role of electronic coupling in the chemical reactivity between the silicene and the substrate is still unclear so far, which is of great importance for functionalization of silicene layers. Here, we report the reconstructions and hybridized electronic structures of epitaxial 4x4 silicene on Ag(111), which are revealed by scanning tunneling microscopy and angle-resolved photoemission spectroscopy. The hybridization between Si and Ag results in a metallic surface state, which can gradually decay due to oxygen adsorption. X-ray photoemission spectroscopy confirms the decoupling of Si-Ag bonds after oxygen treatment as well as the relatively oxygen resistance of Ag(111) surface, in contrast to 4x4 silicene [with respect to Ag(111)]. First-principles calculations have confirmed the evolution of the electronic structure of silicene during oxidation. It has been verified experimentally and theoretically that the high chemical activity of 4x4 silicene is attributable to the Si pz state, while the Ag(111) substrate exhibits relatively inert chemical behavior.

preprint2012arXiv

The mixing time of the Newman--Watts small world

"Small worlds" are large systems in which any given node has only a few connections to other points, but possessing the property that all pairs of points are connected by a short path, typically logarithmic in the number of nodes. The use of random walks for sampling a uniform element from a large state space is by now a classical technique; to prove that such a technique works for a given network, a bound on the mixing time is required. However, little detailed information is known about the behaviour of random walks on small-world networks, though many predictions can be found in the physics literature. The principal contribution of this paper is to show that for a famous small-world random graph model known as the Newman--Watts small world, the mixing time is of order (log n)^2. This confirms a prediction of Richard Durrett, who proved a lower bound of order (log n)^2 and an upper bound of order (log n)^3.

Tao Lei

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Simple Recurrence Improves Masked Language Models

Medical Image Segmentation Using Deep Learning: A Survey

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

Interactive Classification by Asking Informative Questions

Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

Spontaneous Formation of a Superconductor-Topological Insulator-Normal Metal Layered Heterostructure

Rationalizing Neural Predictions

Semi-supervised Question Retrieval with Gated Convolutions

Molding CNNs for text: non-linear, non-consecutive convolutions

Effects of Oxygen Adsorption on the Surface State of Epitaxial Silicene on Ag(111)

The mixing time of the Newman--Watts small world