# Text Classification with Naive Bayes

Class 3 — July 16, 2020

## What is Naive Bayes?

- Solves the problem of
**text classification**
- Given a document, classify it as one of $n$ classes

- What are potential applications of this problem?
- Spam detection
- Sentiment analysis
- Gmail categorizes emails into four tabs
- Language identification on Google Translate
- Topic labeling for news articles

## How does Naive Bayes work?

- Pick your
**training corpus**
- List of documents with their labels (e.g. list of emails and whether or not each email is spam)

- Represent each document as a “bag of words”
- Downside: word order isn’t used

- Count how many times each word appears
- Work through the math

In class, I went through a couple derivations and a simplified example. If you’d like to review these, they’re very well illustrated in the reading below.

## Additional Resources