Word Embeddings

Class 4 — July 21, 2020

What are word embeddings?

A word embedding is a numerical vector that represents a given word
These vectors are spatially related to each other, such that more similar words are closer together in the embeddings space
- e.g. The vector for “man” should be more similar to the vector for “boy” than the vector for “sky”
Why are these useful?
- Search
  - If I’m searching for “pandemic”, word embeddings could be used to surface documents that contain very similar words, e.g. “epidemic”, “virus”, etc.
  - A simple string search wouldn’t be able to catch similar words on its own
- Simple text classification
  - For sentiment analysis, could simply calculate whether words in the document are more similar to “good” than they are to “bad”, or something similar
- Be careful – these use cases are susceptible to bias (as are virtually all NLP models), read further for more details

Because word embeddings are spatially related, we can use them to solve simple analogy problems
Below we can see GloVe’s predictions to the following problems, along with the associated distance values
France : Paris :: England : ?
- London (0.646), Manchester (0.510), Birmingham (0.486)
man : woman :: king : ?
- queen (0.690), monarch (0.558), throne (0.557)
tall : taller :: warm : ?
- warmer (0.650), warmed (0.569), cooler (0.554)
author : book :: artist : ?
- artwork (0.642), painting (0.605), art (0.582)

Word embeddings are mostly trained on Wikipedia
84% of these writers are male –> these articles exhibit biases
In all prominent word embeddings, distances are skewed based on stereotypes
- “black” is closer to “good” than “white”
- “female” is closer to “irrational” than “male”
- etc.
Below is a table from some old work I did with GloVe embeddings
- Distances are Euclidean (you should generally use cosine distances)
- All of these comparisons exhibit negative biases