Source author record

Nevin L. Zhang

Nevin L. Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation and Language Information Retrieval Artificial Intelligence Computer Vision Applications

Catalog footprint

What is connected

13works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Deep Clustering with Features from Self-Supervised Pretraining

A deep clustering model conceptually consists of a feature extractor that maps data points to a latent space, and a clustering head that groups data points into clusters in the latent space. Although the two components used to be trained jointly in an end-to-end fashion, recent works have proved it beneficial to train them separately in two stages. In the first stage, the feature extractor is trained via self-supervised learning, which enables the preservation of the cluster structures among the data points. To preserve the cluster structures even better, we propose to replace the first stage with another model that is pretrained on a much larger dataset via self-supervised learning. The method is simple and might suffer from domain shift. Nonetheless, we have empirically shown that it can achieve superior clustering performance. When a vision transformer (ViT) architecture is used for feature extraction, our method has achieved clustering accuracy 94.0%, 55.6% and 97.9% on CIFAR-10, CIFAR-100 and STL-10 respectively. The corresponding previous state-of-the-art results are 84.3%, 47.7% and 80.8%. Our code will be available online with the publication of the paper.

preprint2022arXiv

Example Perplexity

Some examples are easier for humans to classify than others. The same should be true for deep neural networks (DNNs). We use the term example perplexity to refer to the level of difficulty of classifying an example. In this paper, we propose a method to measure the perplexity of an example and investigate what factors contribute to high example perplexity. The related codes and resources are available at https://github.com/vaynexie/Example-Perplexity.

preprint2022arXiv

Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation

Building models of natural language processing (NLP) is challenging in low-resource scenarios where only limited data are available. Optimization-based meta-learning algorithms achieve promising results in low-resource scenarios by adapting a well-generalized model initialization to handle new tasks. Nonetheless, these approaches suffer from the memorization overfitting issue, where the model tends to memorize the meta-training tasks while ignoring support sets when adapting to new tasks. To address this issue, we propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation. Specifically, we introduce a task-specific memory module to store support set information and construct an imitation module to force query sets to imitate the behaviors of some representative support-set samples stored in the memory. A theoretical analysis is provided to prove the effectiveness of our method, and empirical results also demonstrate that our method outperforms competitive baselines on both text classification and generation tasks.

preprint2020arXiv

Handling Collocations in Hierarchical Latent Tree Analysis for Topic Modeling

Topic modeling has been one of the most active research areas in machine learning in recent years. Hierarchical latent tree analysis (HLTA) has been recently proposed for hierarchical topic modeling and has shown superior performance over state-of-the-art methods. However, the models used in HLTA have a tree structure and cannot represent the different meanings of multiword expressions sharing the same word appropriately. Therefore, we propose a method for extracting and selecting collocations as a preprocessing step for HLTA. The selected collocations are replaced with single tokens in the bag-of-words model before running HLTA. Our empirical evaluation shows that the proposed method led to better performance of HLTA on three of the four data sets tested.

preprint2020arXiv

Response-Anticipated Memory for On-Demand Knowledge Integration in Response Generation

Neural conversation models are known to generate appropriate but non-informative responses in general. A scenario where informativeness can be significantly enhanced is Conversing by Reading (CbR), where conversations take place with respect to a given external document. In previous work, the external document is utilized by (1) creating a context-aware document memory that integrates information from the document and the conversational context, and then (2) generating responses referring to the memory. In this paper, we propose to create the document memory with some anticipated responses in mind. This is achieved using a teacher-student framework. The teacher is given the external document, the context, and the ground-truth response, and learns how to build a response-aware document memory from three sources of information. The student learns to construct a response-anticipated document memory from the first two sources, and the teacher's insight on memory creation. Empirical results show that our model outperforms the previous state-of-the-art for the CbR task.

preprint2016arXiv

A data-driven method for syndrome type identification and classification in traditional Chinese medicine

Objective: The efficacy of traditional Chinese medicine (TCM) treatments for Western medicine (WM) diseases relies heavily on the proper classification of patients into TCM syndrome types. We develop a data-driven method for solving the classification problem, where syndrome types are identified and quantified based on patterns detected in unlabeled symptom survey data. Method: Latent class analysis (LCA) has been applied in WM research to solve a similar problem, i.e., to identify subtypes of a patient population in the absence of a gold standard. A widely known weakness of LCA is that it makes an unrealistically strong independence assumption. We relax the assumption by first detecting symptom co-occurrence patterns from survey data and use those patterns instead of the symptoms as features for LCA. Results: The result of the investigation is a six-step method: Data collection, symptom co-occurrence pattern discovery, pattern interpretation, syndrome identification, syndrome type identification, and syndrome type classification. A software package called Lantern is developed to support the application of the method. The method is illustrated using a data set on Vascular Mild Cognitive Impairment (VMCI). Conclusions: A data-driven method for TCM syndrome identification and classification is presented. The method can be used to answer the following questions about a Western medicine disease: What TCM syndrome types are there among the patients with the disease? What is the prevalence of each syndrome type? What are the statistical characteristics of each syndrome type in terms of occurrence of symptoms? How can we determine the syndrome type(s) of a patient?

preprint2016arXiv

Identification and classification of TCM syndrome types among patients with vascular mild cognitive impairment using latent tree analysis

Objective: To treat patients with vascular mild cognitive impairment (VMCI) using TCM, it is necessary to classify the patients into TCM syndrome types and to apply different treatments to different types. We investigate how to properly carry out the classification using a novel data-driven method known as latent tree analysis. Method: A cross-sectional survey on VMCI was carried out in several regions in northern China from 2008 to 2011, which resulted in a data set that involves 803 patients and 93 symptoms. Latent tree analysis was performed on the data to reveal symptom co-occurrence patterns, and the patients were partitioned into clusters in multiple ways based on the patterns. The patient clusters were matched up with syndrome types, and population statistics of the clusters are used to quantify the syndrome types and to establish classification rules. Results: Eight syndrome types are identified: Qi Deficiency, Qi Stagnation, Blood Deficiency, Blood Stasis, Phlegm-Dampness, Fire-Heat, Yang Deficiency, and Yin Deficiency. The prevalence and symptom occurrence characteristics of each syndrome type are determined. Quantitative classification rules are established for determining whether a patient belongs to each of the syndrome types. Conclusions: A solution for the TCM syndrome classification problem associated with VMCI is established based on the latent tree analysis of unlabeled symptom survey data. The results can be used as a reference in clinic practice to improve the quality of syndrome differentiation and to reduce diagnosis variances across physicians. They can also be used for patient selection in research projects aimed at finding biomarkers for the syndrome types and in randomized control trials aimed at determining the efficacy of TCM treatments of VMCI.

preprint2016arXiv

Latent Tree Analysis

Latent tree analysis seeks to model the correlations among a set of random variables using a tree of latent variables. It was proposed as an improvement to latent class analysis --- a method widely used in social sciences and medicine to identify homogeneous subgroups in a population. It provides new and fruitful perspectives on a number of machine learning areas, including cluster analysis, topic detection, and deep probabilistic modeling. This paper gives an overview of the research on latent tree analysis and various ways it is used in practice.

preprint2016arXiv

Latent Tree Models for Hierarchical Topic Detection

We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables, with those at the lowest latent level representing word co-occurrence patterns and those at higher levels representing co-occurrence of patterns at the level below. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. Unlike LDA-based topic models, HLTMs do not refer to a document generation process and use word variables instead of token variables. They use a tree structure to model the relationships between topics and words, which is conducive to the discovery of meaningful topics and topic hierarchies.

preprint2016arXiv

Topic Browsing for Research Papers with Hierarchical Latent Tree Analysis

Academic researchers often need to face with a large collection of research papers in the literature. This problem may be even worse for postgraduate students who are new to a field and may not know where to start. To address this problem, we have developed an online catalog of research papers where the papers have been automatically categorized by a topic model. The catalog contains 7719 papers from the proceedings of two artificial intelligence conferences from 2000 to 2015. Rather than the commonly used Latent Dirichlet Allocation, we use a recently proposed method called hierarchical latent tree analysis for topic modeling. The resulting topic model contains a hierarchy of topics so that users can browse the topics from the top level to the bottom level. The topic model contains a manageable number of general topics at the top level and allows thousands of fine-grained topics at the bottom level. It also can detect topics that have emerged recently.

preprint2015arXiv

Progressive EM for Latent Tree Models and Hierarchical Topic Detection

Hierarchical latent tree analysis (HLTA) is recently proposed as a new method for topic detection. It differs fundamentally from the LDA-based methods in terms of topic definition, topic-document relationship, and learning method. It has been shown to discover significantly more coherent topics and better topic hierarchies. However, HLTA relies on the Expectation-Maximization (EM) algorithm for parameter estimation and hence is not efficient enough to deal with large datasets. In this paper, we propose a method to drastically speed up HLTA using a technique inspired by recent advances in the moments method. Empirical experiments show that our method greatly improves the efficiency of HLTA. It is as efficient as the state-of-the-art LDA-based method for hierarchical topic detection and finds substantially better topics and topic hierarchies.

preprint2014arXiv

A Survey on Latent Tree Models and Applications

In data analysis, latent variables play a central role because they help provide powerful insights into a wide variety of phenomena, ranging from biological to human sciences. The latent tree model, a particular type of probabilistic graphical models, deserves attention. Its simple structure - a tree - allows simple and efficient inference, while its latent variables capture complex relationships. In the past decade, the latent tree model has been subject to significant theoretical and methodological developments. In this review, we propose a comprehensive study of this model. First we summarize key ideas underlying the model. Second we explain how it can be efficiently learned from data. Third we illustrate its use within three types of applications: latent structure discovery, multidimensional clustering, and probabilistic inference. Finally, we conclude and give promising directions for future researches in this field.

preprint2014arXiv

Latent Tree Models and Approximate Inference in Bayesian Networks

We propose a novel method for approximate inference in Bayesian networks (BNs). The idea is to sample data from a BN, learn a latent tree model (LTM) from the data offline, and when online, make inference with the LTM instead of the original BN. Because LTMs are tree-structured, inference takes linear time. In the meantime, they can represent complex relationship among leaf nodes and hence the approximation accuracy is often good. Empirical evidence shows that our method can achieve good approximation accuracy at low online computational cost.

Nevin L. Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Deep Clustering with Features from Self-Supervised Pretraining

Example Perplexity

Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation

Handling Collocations in Hierarchical Latent Tree Analysis for Topic Modeling

Response-Anticipated Memory for On-Demand Knowledge Integration in Response Generation

A data-driven method for syndrome type identification and classification in traditional Chinese medicine

Identification and classification of TCM syndrome types among patients with vascular mild cognitive impairment using latent tree analysis

Latent Tree Analysis

Latent Tree Models for Hierarchical Topic Detection

Topic Browsing for Research Papers with Hierarchical Latent Tree Analysis

Progressive EM for Latent Tree Models and Hierarchical Topic Detection

A Survey on Latent Tree Models and Applications

Latent Tree Models and Approximate Inference in Bayesian Networks