NLP Basics Repository

Welcome to the NLP-basics repository! This repository contains a comprehensive collection of Jupyter notebooks designed to teach the foundational concepts of Natural Language Processing (NLP). Through a step-by-step approach, you will learn key techniques such as tokenization, stemming, lemmatization, vectorization methods, and apply machine learning models to real-world datasets.

Repository Structure

Below is an overview of the notebooks and files included in this repository:

1-Lesson.ipynb

An introductory notebook that walks through basic concepts in NLP and provides an overview of the steps required to build a simple NLP pipeline.

2-Tokenization.ipynb

Demonstrates how to split text into meaningful units (tokens), covering word tokenization and sentence tokenization.

3-Stemming.ipynb

Shows how to reduce words to their root form using stemming techniques like Porter and Snowball stemmers.

4-Lemmatization.ipynb

Explores lemmatization, a process that reduces words to their base or dictionary form, considering the context.

5-stopWords.ipynb

Covers the concept of stop words and how to remove them to clean up text data.

6-POS-tagging.ipynb

Walks through part-of-speech tagging, a process that labels words with their respective part of speech (e.g., noun, verb).

7-Named-Entity-Recognition.ipynb

Explains Named Entity Recognition (NER), which is used to identify entities like names, locations, and organizations within text.

8-NextSteps.ipynb

Provides an outline of advanced NLP topics that can be explored after mastering the basics.

9-One-Hot-Encoding.ipynb

Introduces One-Hot Encoding, a common method for representing categorical data as binary vectors.

10-BagofWords.ipynb

Introduces the Bag of Words (BoW) model, an important text vectorization technique for representing text data as numerical features.

11-TF-IDF.ipynb

Explains Term Frequency-Inverse Document Frequency (TF-IDF), a technique to weigh the importance of words in a document relative to a corpus.

12-Word2Vec.ipynb

Covers Word2Vec, a popular word embedding model that captures semantic meaning by representing words as vectors in a continuous space.

13-Spam Ham Classification Project Using BOW And ML.ipynb

A project notebook that demonstrates how to classify SMS messages as spam or ham (not spam) using the Bag of Words model and machine learning algorithms.

14-Spam Ham Classification Project Using tf-idf And ML.ipynb

Another spam/ham classification project, this time using TF-IDF for feature extraction, along with machine learning models.

15-Spam Ham Projects Using Word2vec, AvgWord2vec.ipynb

A project notebook that applies Word2Vec and Average Word2Vec for spam and ham classification, using the vectorized representation of text.

16-Kindle Review Sentiment Analysis.ipynb

A sentiment analysis project on Kindle reviews, showcasing how to preprocess reviews and use machine learning models for sentiment classification.

Other Files

SMSSpamCollection.txt: A dataset of SMS messages used for spam/ham classification.
all_kindle_review.csv: A dataset of Kindle reviews used for sentiment analysis.
finalNLPnotes.pdf: A summary of key NLP concepts covered in the repository.

Getting Started

Clone the repository:

git clone https://github.com/Abhigyan-RA/NLP-basics.git
cd NLP-basics

Install Dependencies: Make sure you have Python and Jupyter installed. You can install necessary packages by running:
```
pip install -r requirements.txt
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Basics Repository

Repository Structure

1-Lesson.ipynb

2-Tokenization.ipynb

3-Stemming.ipynb

4-Lemmatization.ipynb

5-stopWords.ipynb

6-POS-tagging.ipynb

7-Named-Entity-Recognition.ipynb

8-NextSteps.ipynb

9-One-Hot-Encoding.ipynb

10-BagofWords.ipynb

11-TF-IDF.ipynb

12-Word2Vec.ipynb

13-Spam Ham Classification Project Using BOW And ML.ipynb

14-Spam Ham Classification Project Using tf-idf And ML.ipynb

15-Spam Ham Projects Using Word2vec, AvgWord2vec.ipynb

16-Kindle Review Sentiment Analysis.ipynb

Other Files

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
1-Lesson.ipynb		1-Lesson.ipynb
10-BagofWords.ipynb		10-BagofWords.ipynb
11-TF-IDF.ipynb		11-TF-IDF.ipynb
12-Word2Vec.ipynb		12-Word2Vec.ipynb
13-Spam Ham Classification Project Using BOW And ML.ipynb		13-Spam Ham Classification Project Using BOW And ML.ipynb
14-Spam Ham Classification Project Using tf-idf And ML.ipynb		14-Spam Ham Classification Project Using tf-idf And ML.ipynb
15-Spam Ham Projects Using Word2vec,AvgWord2vec.ipynb		15-Spam Ham Projects Using Word2vec,AvgWord2vec.ipynb
16- Kindle Review Sentiment Analyis.ipynb		16- Kindle Review Sentiment Analyis.ipynb
2-Tokenization.ipynb		2-Tokenization.ipynb
3-Stemming.ipynb		3-Stemming.ipynb
4-Lemmetization.ipynb		4-Lemmetization.ipynb
5-stopWords.ipynb		5-stopWords.ipynb
6-POS-tagging.ipynb		6-POS-tagging.ipynb
7-Named-Entity-Recognition.ipynb		7-Named-Entity-Recognition.ipynb
8-NextSteps.ipynb		8-NextSteps.ipynb
9-One-Hot-Encoding.ipynb		9-One-Hot-Encoding.ipynb
README.md		README.md
SMSSpamCollection.txt		SMSSpamCollection.txt
all_kindle_review.csv		all_kindle_review.csv
finalNLPnotes.pdf		finalNLPnotes.pdf

Folders and files

Latest commit

History

Repository files navigation

NLP Basics Repository

Repository Structure

1-Lesson.ipynb

2-Tokenization.ipynb

3-Stemming.ipynb

4-Lemmatization.ipynb

5-stopWords.ipynb

6-POS-tagging.ipynb

7-Named-Entity-Recognition.ipynb

8-NextSteps.ipynb

9-One-Hot-Encoding.ipynb

10-BagofWords.ipynb

11-TF-IDF.ipynb

12-Word2Vec.ipynb

13-Spam Ham Classification Project Using BOW And ML.ipynb

14-Spam Ham Classification Project Using tf-idf And ML.ipynb

15-Spam Ham Projects Using Word2vec, AvgWord2vec.ipynb

16-Kindle Review Sentiment Analysis.ipynb

Other Files

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages