Source author record

Yongyi Mao

Yongyi Mao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory Machine Learning math.IT Artificial Intelligence Computation and Language

Catalog footprint

What is connected

12works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Parallel Recursive LSTM

Transformers have become the dominant architecture for sequence modeling by using self-attention to enable expressive and highly parallel processing. However, the resulting quadratic time and memory costs limit efficiency in long-context settings. Recurrent models such as LSTMs provide explicit nonlinear state updates and strong state-tracking capabilities, yet their strictly sequential computation limits parallelism. We introduce the Parallel Recursive LSTM (PR-LSTM), a hierarchical recurrent architecture that replaces left-to-right recurrence with recursive nonlinear state composition over a balanced computation tree. Tokens are first mapped independently to latent states, which are then recursively merged by a learned gated composition block. This structure uses the reduction pattern underlying parallel scans as a fixed execution schedule, rather than assuming an associative recurrence. As a result, PR-LSTM retains nonlinear gated state representations while reducing recurrent parallel depth from linear to logarithmic. Empirically, PR-LSTM achieves strong sequence-length generalization on formal-language benchmarks, solving more tasks than standard RNN, LSTM, and Transformer baselines, while avoiding the quadratic scaling of attention. These results suggest that recurrent computation can be reorganized hierarchically to expose parallelism without restricting the transition dynamics to linear or associative forms.

preprint2022arXiv

Cross Domain Few-Shot Learning via Meta Adversarial Training

Few-shot relation classification (RC) is one of the critical problems in machine learning. Current research merely focuses on the set-ups that both training and testing are from the same domain. However, in practice, this assumption is not always guaranteed. In this study, we present a novel model that takes into consideration the afore-mentioned cross-domain situation. Not like previous models, we only use the source domain data to train the prototypical networks and test the model on target domain data. A meta-based adversarial training framework (MBATF) is proposed to fine-tune the trained networks for adapting to data from the target domain. Empirical studies confirm the effectiveness of the proposed model.

preprint2022arXiv

Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation

Contrastive learning has achieved remarkable success in representation learning via self-supervision in unsupervised settings. However, effectively adapting contrastive learning to supervised learning tasks remains as a challenge in practice. In this work, we introduce a dual contrastive learning (DualCL) framework that simultaneously learns the features of input samples and the parameters of classifiers in the same space. Specifically, DualCL regards the parameters of the classifiers as augmented samples associating to different labels and then exploits the contrastive learning between the input samples and the augmented samples. Empirical studies on five benchmark text classification datasets and their low-resource version demonstrate the improvement in classification accuracy and confirm the capability of learning discriminative representations of DualCL.

preprint2022arXiv

On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications

This paper follows up on a recent work of Neu et al. (2021) and presents some new information-theoretic upper bounds for the generalization error of machine learning models, such as neural networks, trained with SGD. We apply these bounds to analyzing the generalization behaviour of linear and two-layer ReLU networks. Experimental study of these bounds provide some insights on the SGD training of neural networks. They also point to a new and simple regularization scheme which we show performs comparably to the current state of the art.

preprint2020arXiv

Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations

The idea of using multi-task learning approaches to address the joint extraction of entity and relation is motivated by the relatedness between the entity recognition task and the relation classification task. Existing methods using multi-task learning techniques to address the problem learn interactions among the two tasks through a shared network, where the shared information is passed into the task-specific networks for prediction. However, such an approach hinders the model from learning explicit interactions between the two tasks to improve the performance on the individual tasks. As a solution, we design a multi-task learning model which we refer to as recurrent interaction network which allows the learning of interactions dynamically, to effectively model task-specific features for classification. Empirical studies on two real-world datasets confirm the superiority of the proposed model.

preprint2016arXiv

On the representation and embedding of knowledge bases beyond binary relations

The models developed to date for knowledge base embedding are all based on the assumption that the relations contained in knowledge bases are binary. For the training and testing of these embedding models, multi-fold (or n-ary) relational data are converted to triples (e.g., in FB15K dataset) and interpreted as instances of binary relations. This paper presents a canonical representation of knowledge bases containing multi-fold relations. We show that the existing embedding models on the popular FB15K datasets correspond to a sub-optimal modelling framework, resulting in a loss of structural information. We advocate a novel modelling framework, which models multi-fold relations directly using this canonical representation. Using this framework, the existing TransH model is generalized to a new model, m-TransH. We demonstrate experimentally that m-TransH outperforms TransH by a large margin, thereby establishing a new state of the art.

preprint2014arXiv

On Stochastic Estimation of Partition Function

In this paper, we show analytically that the duality of normal factor graphs (NFG) can facilitate stochastic estimation of partition functions. In particular, our analysis suggests that for the $q-$ary two-dimensional nearest-neighbor Potts model, sampling from the primal NFG of the model and sampling from its dual exhibit opposite behaviours with respect to the temperature of the model. For high-temperature models, sampling from the primal NFG gives rise to better estimators whereas for low-temperature models, sampling from the dual gives rise to better estimators. This analysis is validated by experiments.

preprint2012arXiv

Convolutional Factor Graphs as Probabilistic Models

Based on a recent development in the area of error control coding, we introduce the notion of convolutional factor graphs (CFGs) as a new class of probabilistic graphical models. In this context, the conventional factor graphs are referred to as multiplicative factor graphs (MFGs). This paper shows that CFGs are natural models for probability functions when summation of independent latent random variables is involved. In particular, CFGs capture a large class of linear models, where the linearity is in the sense that the observed variables are obtained as a linear ransformation of the latent variables taking arbitrary distributions. We use Gaussian models and independent factor models as examples to emonstrate the use of CFGs. The requirement of a linear transformation between latent variables (with certain independence restriction) and the bserved variables, to an extent, limits the modelling flexibility of CFGs. This structural restriction however provides a powerful analytic tool to the framework of CFGs; that is, upon taking the Fourier transform of the function represented by the CFG, the resulting function is represented by a FG with identical structure. This Fourier transform duality allows inference problems on a CFG to be solved on the corresponding dual MFG.

preprint2012arXiv

Normal Factor Graphs as Probabilistic Models

We present a new probabilistic modelling framework based on the recent notion of normal factor graph (NFG). We show that the proposed NFG models and their transformations unify some existing models such as factor graphs, convolutional factor graphs, and cumulative distribution networks. The two subclasses of the NFG models, namely the constrained and generative models, exhibit a duality in their dependence structure. Transformation of NFG models further extends the power of this modelling framework. We point out the well-known NFG representations of parity and generator realizations of a linear code as generative and constrained models, and comment on a more prevailing duality in this context. Finally, we address the algorithmic aspect of computing the exterior function of NFGs and the inference problem on NFGs.

preprint2011arXiv

Normal Factor Graphs and Holographic Transformations

This paper stands at the intersection of two distinct lines of research. One line is "holographic algorithms," a powerful approach introduced by Valiant for solving various counting problems in computer science; the other is "normal factor graphs," an elegant framework proposed by Forney for representing codes defined on graphs. We introduce the notion of holographic transformations for normal factor graphs, and establish a very general theorem, called the generalized Holant theorem, which relates a normal factor graph to its holographic transformation. We show that the generalized Holant theorem on the one hand underlies the principle of holographic algorithms, and on the other hand reduces to a general duality theorem for normal factor graphs, a special case of which was first proved by Forney. In the course of our development, we formalize a new semantics for normal factor graphs, which highlights various linear algebraic properties that potentially enable the use of normal factor graphs as a linear algebraic tool.

preprint2011arXiv

Normal Factor Graphs: A Diagrammatic Approach to Linear Algebra

Inspired by some new advances on normal factor graphs (NFGs), we introduce NFGs as a simple and intuitive diagrammatic approach towards encoding some concepts from linear algebra. We illustrate with examples the workings of such an approach and settle a conjecture of Peterson on the Pfaffian.

preprint2010arXiv

On Holant Theorem and Its Proof

Holographic algorithms are a recent breakthrough in computer science and has found applications in information theory. This paper provides a proof to the central component of holographic algorithms, namely, the Holant theorem. Compared with previous works, the proof appears simpler and more direct. Along the proof, we also develop a mathematical tool, which we call c-tensor. We expect the notion of c-tensor may be applicable over a wide range of analysis.

Yongyi Mao

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Parallel Recursive LSTM

Cross Domain Few-Shot Learning via Meta Adversarial Training

Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation

On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications

Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations

On the representation and embedding of knowledge bases beyond binary relations

On Stochastic Estimation of Partition Function

Convolutional Factor Graphs as Probabilistic Models

Normal Factor Graphs as Probabilistic Models

Normal Factor Graphs and Holographic Transformations

Normal Factor Graphs: A Diagrammatic Approach to Linear Algebra

On Holant Theorem and Its Proof