How the analogy-completion task changed word representation
This essay aims to discuss the development of the word2vec and GloVe algorithms as it relates to a secondary purpose for which these algorithms have been applied: the analysis of concepts contained within text corpora. First, the word2vec algorithm is discussed in light of its historical context. Then, the analogy-completion task that highlighted the potential of the semantic arithmetic possible with word2vec embeddings is described. Finally, the development of the GloVe algorithm is contrasted with the word2vec algorithm.
The word2vec algorithm (Mikolov et al., 2013a) combines two main technical insights: (1) continuous vectors can be used to represent semantic information (2) and the internal representations learned by neural networks are conceptually meaningful. When the algorithm was introduced in 2013, however, neither the continuous representation of semantic information nor the conceptual value of internal representations were new ideas. More specifically, in the information retrieval space, latent semantic analysis (LSA; Deerwester et al., 1990) and latent Dirichlet allocation (Blei et al., 2003) were proposed as statistical methods that leverage the semantic information latent in texts to improve upon methods that treated words as indexical features (that exist…