In this course, we study the basics of text mining.
The basic operations related to structuring the unstructured data into vector and reading different types of data from the public archives are taught.
Building on it we use Natural Language Processing for pre-processing our dataset.
Machine Learning techniques are used for document classification, clustering and the evaluation of their models.
Information Extraction part is covered with the help of Topic modeling
Sentiment Analysis with a classifier and dictionary based approach
Almost all modules are supported with assignments to practice.
Two projects are given that make use of most of the topics separately covered in these modules.
Finally, a list of possible project suggestions are given for students to choose from and build their own project.
Who this course is for:
- Beginners in python and curious about data science
- Knows programming in Python and basic concepts of Data Science but cannot practically relate the two
- Basics of programming (Any language, python is a bonus)
- Basic understanding of Machine Learning
- Can code with lists, loops and conditions and have basic understanding of models learning patterns from data
- In this course the students will learn the basics of text mining and will build on it to perform document categorization, document grouping and sentiment analysis
- The practicals are carried out in Python language, Natural Language Processing (NLP) is used for pre-processing
- Starting from a very small dummy dataset, we migrate to existing databases and then to building a database of your own to performed text mining tasks
- Sentiment analysis of user hotel reviews