Source author record

Zhuohan Li

Zhuohan Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation and Language astro-ph.GA astro-ph.IM cond-mat.mtrl-sci Distributed, Parallel, and Cluster Computing Programming Languages

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Li+/H+ exchange in solid-state oxide Li-ion conductors

Understanding the moisture stability of oxide Li-ion conductors is important for their practical applications in solid-state batteries. Unlike sulfide or halide conductors, oxide conductors generally better resist degradation when in contact with water, but can still undergo topotactic \ch{Li+}/\ch{H+} exchange (LHX). Here, we combine density functional theory (DFT) calculations with a machine-learning interatomic potential model to investigate the thermodynamic driving force of the LHX reaction for two representative oxide Li-ion conductor families: garnets and NASICONs. Li-stuffed garnets exhibit a strong driving force for proton exchange due to their high Li chemical potential. In contrast, NASICONs demonstrate a higher resistance against proton exchange due to the lower Li chemical potential and the lower O-H bond covalency for polyanion-bonded oxygens. Our findings reveal a critical trade-off: Li stuffing enhances conductivity but increases moisture susceptibility. This study underscores the importance of designing Li-ion conductors that possess both high conductivity and high stability in practical environments.

preprint2022arXiv

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations. They do not suffice to scale out complex DL models on distributed compute devices. Alpa distributes the training of large DL models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive efficient parallel execution plans at each parallelism level. Alpa implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Our evaluation shows Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans. Alpa's source code is publicly available at https://github.com/alpa-projects/alpa

preprint2022arXiv

The stellar parameters and elemental abundances from low-resolution spectra I: 1.2 million giants from LAMOST DR8

As a typical data-driven method, deep learning becomes a natural choice for analysing astronomical data nowadays. In this study, we built a deep convolutional neural network to estimate basic stellar parameters $T\rm{_{eff}}$, log g, metallicity ([M/H] and [Fe/H]) and [$α$/M] along with nine individual elemental abundances ([C/Fe], [N/Fe], [O/Fe], [Mg/Fe], [Al/Fe], [Si/Fe], [Ca/Fe], [Mn/Fe], [Ni/Fe]). The neural network is trained using common stars between the APOGEE survey and the LAMOST survey. We used low-resolution spectra from LAMOST survey as input, and measurements from APOGEE as labels. For stellar spectra with the signal-to-noise ratio in g band larger than 10 in the test set, the mean absolute error (MAE) is 29 K for $T\rm{_{eff}}$, 0.07 dex for log g, 0.03 dex for both [Fe/H] and [M/H], and 0.02 dex for [$α$/M]. The MAE of most elements is between 0.02 dex and 0.04 dex. The trained neural network was applied to 1,210,145 giants, including sub-giants, from LAMOST DR8 within the range of stellar parameters 3500 K < $T\rm{_{eff}}$ < 5500 K, 0.0 dex < log g < 4.0 dex, -2.5 dex < [Fe/H] < 0.5 dex. The distribution of our results in the chemical spaces is highly consistent with APOGEE labels and stellar parameters show consistency with external high-resolution measurements from GALAH. The results in this study allow us to further studies based on LAMOST data and deepen our understanding of the accretion and evolution history of the Milky Way. The electronic version of the value added catalog is available at http://www.lamost.org/dr8/v1.1/doc/vac.

preprint2020arXiv

Fast Structured Decoding for Sequence Models

Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently, non-autoregressive sequence models were proposed to reduce the inference time. However, these models assume that the decoding process of each token is conditionally independent of others. Such a generation process sometimes makes the output sentence inconsistent, and thus the learned non-autoregressive models could only achieve inferior accuracy compared to their autoregressive counterparts. To improve then decoding consistency and reduce the inference cost at the same time, we propose to incorporate a structured inference module into the non-autoregressive models. Specifically, we design an efficient approximation for Conditional Random Fields (CRF) for non-autoregressive sequence models, and further propose a dynamic transition technique to model positional contexts in the CRF. Experiments in machine translation show that while increasing little latency (8~14ms), our model could achieve significantly better translation performance than previous non-autoregressive models on different translation datasets. In particular, for the WMT14 En-De dataset, our model obtains a BLEU score of 26.80, which largely outperforms the previous non-autoregressive baselines and is only 0.61 lower in BLEU than purely autoregressive models.

preprint2020arXiv

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.

Zhuohan Li

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Li+/H+ exchange in solid-state oxide Li-ion conductors

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

The stellar parameters and elemental abundances from low-resolution spectra I: 1.2 million giants from LAMOST DR8

Fast Structured Decoding for Sequence Models

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers