Source author record

Alexander Liu

Alexander Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph Computation and Language Machine Learning

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Masked Autoencoders As The Unified Learners For Pre-Trained Sentence Representation

Despite the progresses on pre-trained language models, there is a lack of unified frameworks for pre-trained sentence representation. As such, it calls for different pre-training methods for specific scenarios, and the pre-trained models are likely to be limited by their universality and representation quality. In this work, we extend the recently proposed MAE style pre-training strategy, RetroMAE, such that it may effectively support a wide variety of sentence representation tasks. The extended framework consists of two stages, with RetroMAE conducted throughout the process. The first stage performs RetroMAE over generic corpora, like Wikipedia, BookCorpus, etc., from which the base model is learned. The second stage takes place on domain-specific data, e.g., MS MARCO and NLI, where the base model is continuingly trained based on RetroMAE and contrastive learning. The pre-training outputs at the two stages may serve different applications, whose effectiveness are verified with comprehensive experiments. Concretely, the base model are proved to be effective for zero-shot retrieval, with remarkable performances achieved on BEIR benchmark. The continuingly pre-trained models further benefit more downstream tasks, including the domain-specific dense retrieval on MS MARCO, Natural Questions, and the sentence embeddings' quality for standard STS and transfer tasks in SentEval. The empirical insights of this work may inspire the future design of sentence representation pre-training. Our pre-trained models and source code will be released to the public communities.

preprint2022arXiv

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

Large neural networks excel in many domains, but they are expensive to train and fine-tune. A popular approach to reduce their compute or memory requirements is to replace dense weight matrices with structured ones (e.g., sparse, low-rank, Fourier transform). These methods have not seen widespread adoption (1) in end-to-end training due to unfavorable efficiency--quality tradeoffs, and (2) in dense-to-sparse fine-tuning due to lack of tractable algorithms to approximate a given dense weight matrix. To address these issues, we propose a class of matrices (Monarch) that is hardware-efficient (they are parameterized as products of two block-diagonal matrices for better hardware utilization) and expressive (they can represent many commonly used transforms). Surprisingly, the problem of approximating a dense weight matrix with a Monarch matrix, though nonconvex, has an analytical optimal solution. These properties of Monarch matrices unlock new ways to train and fine-tune sparse and dense models. We empirically validate that Monarch can achieve favorable accuracy-efficiency tradeoffs in several end-to-end sparse training applications: speeding up ViT and GPT-2 training on ImageNet classification and Wikitext-103 language modeling by 2x with comparable model quality, and reducing the error on PDE solving and MRI reconstruction tasks by 40%. In sparse-to-dense training, with a simple technique called "reverse sparsification," Monarch matrices serve as a useful intermediate representation to speed up GPT-2 pretraining on OpenWebText by 2x without quality drop. The same technique brings 23% faster BERT pretraining than even the very optimized implementation from Nvidia that set the MLPerf 1.1 record. In dense-to-sparse fine-tuning, as a proof-of-concept, our Monarch approximation algorithm speeds up BERT fine-tuning on GLUE by 1.7x with comparable accuracy.

preprint2004arXiv

Evidence for large superhumps in TX Col and V4742 Sgr

Since the discovery of the largest positive superhump period in TV Col, we have started a program to search for superhumps in CVs with large orbital periods. Here, we summarize preliminary results of TX Col and V4742 Sgr. TX Col is an intermediate polar with a 5.7-h orbital period. V4742 Sgr is a recent nova with no known periods. CCD unfiltered continuous photometry of these 2 objects was carried out during 56 nights in 2002-3. In TX Col, in addition to the orbital period of 5.7 h, we found peaks at 7.1 h and 5.0 h. These are interpreted as positive and negative superhumps correspondingly, although the effects of the quasi-periodic oscillations at about 2 h were not taken into consideration. In the light curve of V4742 Sgr 2 long periods are detected -- 6.1 and 5.4 h as well as a short-term period at 1.6 h. This result suggests that V4742 Sgr is an intermediate polar candidate and a permanent superhump system with a large orbital period (5.4 h) and a superhump period excess of 13 percent. If these results are confirmed, TX Col, V4742 Sgr and TV Col form a group of intermediate polars with extremely large superhump periods. There seems to be now growing evidence that superhumps can occur in intermediate polars with long orbital periods, which is very likely inconsistent with the theoretical prediction that superhumps can only occur in systems with mass ratios below 0.33. Alternatively, if the mass ratio in these systems is nevertheless below the theoretical limit, they should harbour undermassive secondaries and massive white dwarfs, near the Chandrasekhar limit, which would make them excellent candidates for progenitors of supernovae type Ia.