Why is Natural Language Processing important?
Class 1 — July 09, 2020
NLP is Everywhere
- Siri, Google Assistant, Alexa
- Gmail’s “Smart Compose”
- Google Search
- Email spam detection
- Speech-to-text (e.g. in smartphone keyboards)
- “Predictive Text” on smartphone keyboards
- Google Duplex (video)
- etc.
Problems in NLP
- Automatic Summarization: generating short summaries of articles
- Not quite there yet, but it’s getting better
- Language Modeling: predicting what follows a given sequence of words/characters
- e.g. next word prediction on smartphone keyboards
- Machine Translation: translating content between languages
- e.g. Google Translate
- Question Answering: answering questions about a body of text
- Also not very good yet, but getting better
- Speech Recognition: transcribing freeform speech
- e.g. Siri, Google Assistant, Alexa, rev.ai
- Speech Synthesis: generating synthetic voices that read text aloud
- e.g. Siri, Google Assistant, Alexa, Polly
- Text Classification: sorting bodies of text (Tweets, emails, etc.) into categories
- e.g. sentiment analysis, spam detection
- Text Generation: generating text, usually in response to a prompt
- e.g. chatbots, GPT-2, GPT-3
- and many, many more
Basic Terminology
- NLP (Natural Language Processing): study of how computers can process and manipulate human languages
- dataset: a large body of data that can be used to train a model
- corpus: usually a very large, unstructured textual dataset
- training: providing a machine learning model with data to learn from
- testing: evaluating how well the model learned from your training data
Math Review
- Probability notation
- P(A) = probability of A occurring
- P(B | A) = probability of B occurring, given that A has already happened
- (Basic) Linear algebra
- What is a vector? → list of numbers with spatial meaning
- How can you measure similarity between two vectors? → Euclidean distance, cosine similarity
Expectations
- Short readings before each class
- One 5-minute presentation on a topic of your choice
- Final project
- Please email me if you ever have any questions
- Enjoy your summer! :)
Homework
- Make sure Python 3 and Jupyter are set up on your laptop
- https://cs231n.github.io/python-numpy-tutorial/
- Review probability if you need to
- No readings this week — spend the time brushing up on probability and Python