Researcher profile

Neil Dey

Neil Dey contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2023arXiv

Word Embeddings as Statistical Estimators

Word embeddings are a fundamental tool in natural language processing. Currently, word embedding methods are evaluated on the basis of empirical performance on benchmark data sets, and there is a lack of rigorous understanding of their theoretical properties. This paper studies word embeddings from a statistical theoretical perspective, which is essential for formal inference and uncertainty quantification. We propose a copula-based statistical model for text data and show that under this model, the now-classical Word2Vec method can be interpreted as a statistical estimation method for estimating the theoretical pointwise mutual information (PMI). Next, by building on the work of Levy and Goldberg (2014), we develop a missing value-based estimator as a statistically tractable and interpretable alternative to the Word2Vec approach. The estimation error of this estimator is comparable to Word2Vec and improves upon the truncation-based method proposed by Levy and Goldberg (2014). The proposed estimator also performs comparably to Word2Vec in a benchmark sentiment analysis task on the IMDb Movie Reviews data set.

preprint2022arXiv

Robust Coordinate Ascent Variational Inference with Markov chain Monte Carlo simulations

Variational Inference (VI) is a method that approximates a difficult-to-compute posterior density using better behaved distributional families. VI is an alternative to the already well-studied Markov chain Monte Carlo (MCMC) method of approximating densities. With each algorithm, there are of course benefits and drawbacks; does there exist a combination of the two that mitigates the flaws of both? We propose a method to combine Coordinate Ascent Variational Inference (CAVI) with MCMC. This new methodology, termed Hybrid CAVI, seeks to improve the sensitivity to initialization and convergence problems of CAVI by proposing an initialization using method of moments estimates obtained from a short MCMC burn-in period. Unlike CAVI, Hybrid CAVI proves to also be effective when the posterior is not from a conditionally conjugate exponential family.