Text Classification with Naive Bayes
Class 3 — July 16, 2020
What is Naive Bayes?
- Solves the problem of text classification
- Given a document, classify it as one of $n$ classes
- What are potential applications of this problem?
- Spam detection
- Sentiment analysis
- Gmail categorizes emails into four tabs
- Language identification on Google Translate
- Topic labeling for news articles
How does Naive Bayes work?
- Pick your training corpus
- List of documents with their labels (e.g. list of emails and whether or not each email is spam)
- Represent each document as a “bag of words”
- Downside: word order isn’t used
- Count how many times each word appears
- Work through the math
In class, I went through a couple derivations and a simplified example. If you’d like to review these, they’re very well illustrated in the reading below.