Graph explorer

Principal Word Vectors

We generalize principal component analysis for embedding words into a vector space. The generalization is made in two major levels. The first is to generalize the concept of the corpus as a counting process which is defined by three key elements vocabulary set, feature (annotation) set, and context. This generalization enables the principal word embedding method to generate word vectors with regard to different types of contexts and different types of annotations provided for a corpus. The second is to generalize the transformation step used in most of the word embedding methods. To this end, we define two levels of transformations. The first is a quadratic transformation, which accounts for different types of weighting over the vocabulary units and contextual features. Second is an adaptive non-linear transformation, which reshapes the data distribution to be meaningful to principal component analysis. The effect of these generalizations on the word vectors is intrinsically studied with regard to the spread and the discriminability of the word vectors. We also provide an extrinsic evaluation of the contribution of the principal word vectors on a word similarity benchmark and the t

5 nodes5 linksoverview previewPrincipal Word Vectors
5 nodes5 links
Principal Word Vectors5 visible / 5 total nodes / 8 links
Co-authorshipCo-authorshipCo-authorshipAuthorshipWorks onAuthorshipAuthorshipTopic signalWPrincipal Word Vectorspreprint / 2020AAli BasiratResearcherAChristian HardmeierResearcherAJoakim NivreResearcherTComputation and Language14115 works
PaperSignal 104 links

Principal Word Vectors

preprint / 2020

Open