Source author record

Hongzhi Liu

Hongzhi Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language math.OA Machine Learning

Catalog footprint

What is connected

4works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Seeking Patterns, Not just Memorizing Procedures: Contrastive Learning for Solving Math Word Problems

Math Word Problem (MWP) solving needs to discover the quantitative relationships over natural language narratives. Recent work shows that existing models memorize procedures from context and rely on shallow heuristics to solve MWPs. In this paper, we look at this issue and argue that the cause is a lack of overall understanding of MWP patterns. We first investigate how a neural network understands patterns only from semantics, and observe that, if the prototype equations are the same, most problems get closer representations and those representations apart from them or close to other prototypes tend to produce wrong solutions. Inspired by it, we propose a contrastive learning approach, where the neural network perceives the divergence of patterns. We collect contrastive examples by converting the prototype equation into a tree and seeking similar tree structures. The solving model is trained with an auxiliary objective on the collected examples, resulting in the representations of problems with similar prototypes being pulled closer. We conduct experiments on the Chinese dataset Math23k and the English dataset MathQA. Our method greatly improves the performance in monolingual and multilingual settings.

preprint2020arXiv

LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning

While pre-training and fine-tuning, e.g., BERT~\citep{devlin2018bert}, GPT-2~\citep{radford2019language}, have achieved great success in language understanding and generation tasks, the pre-trained models are usually too big for online deployment in terms of both memory cost and inference speed, which hinders them from practical online usage. In this paper, we propose LightPAFF, a Lightweight Pre-training And Fine-tuning Framework that leverages two-stage knowledge distillation to transfer knowledge from a big teacher model to a lightweight student model in both pre-training and fine-tuning stages. In this way the lightweight model can achieve similar accuracy as the big teacher model, but with much fewer parameters and thus faster online inference speed. LightPAFF can support different pre-training methods (such as BERT, GPT-2 and MASS~\citep{song2019mass}) and be applied to many downstream tasks. Experiments on three language understanding tasks, three language modeling tasks and three sequence to sequence generation tasks demonstrate that while achieving similar accuracy with the big BERT, GPT-2 and MASS models, LightPAFF reduces the model size by nearly 5x and improves online inference speed by 5x-7x.

preprint2016arXiv

Smooth crossed products induced by minimal unique ergodic diffeomorphisms on odd spheres

For minimal unique ergodic diffeomorphisms $α_n$ of $S^{2n+1} (n>0)$ and $α_m$ of $S^{2m+1}(m>0)$, the $C^*$-crossed product algebra $C(S^{2n+1})\rtimes_{α_n} \mathbb{Z}$ is isomorphic to $C(S^{2m+1})\rtimes_{α_m} \mathbb{Z}$ even though $n\neq m$ . However, by cyclic cohomology, we show that smooth crossed product algebra $C^\infty(S^{2n+1})\rtimes_{α_n} \mathbb{Z}$ is not isomorphic to $C^\infty(S^{2m+1})\rtimes_{α_m} \mathbb{Z}$ if $n\neq m$.

preprint2016arXiv

Two minimal unique ergodic diffeomorphisms on a manifolds and their smooth crossed product algebras

In this article we construct two minimal unique ergodic diffeomorphisms $α$ and $β$ on $S^3 \times S^{6} \times S^{8} $. We will show that $C(S^3 \times S^{6} \times S^{8}) \rtimes_α\mathbb{Z} $ and $C(S^3 \times S^{6} \times S^{8})\rtimes_β\mathbb{Z} $ are equivalent to each other, while $C^\infty (S^3 \times S^{6} \times S^{8})\rtimes_α\mathbb{Z} $ and $C^\infty(S^3 \times S^{6} \times S^{8} )\rtimes_β\mathbb{Z} $ are not.