Course: Hadoop & Data Science NLP (All in One Course)

Hadoop & Data Science NLP (All in One Course)

  • Life Time Access
  • Certificate on Completion
  • Access on Android and iOS App
  • Self-Paced
About this Course

The demand for Big Data Hadoop Developers, Architects, Data Scientists, Machine Learning Engineers is increasing day by day and one of the main reason is that companies are more keen these days to get more accurate predictions & forecasting result using data. They want to make sense of data and wants to provide 360 view of customers thereby providing better customer experience. 

This course is designed in such a way that you will get an understanding of best of both worlds i.e. both Hadoop as well as Data Science. You will not only be able to perform Hadoop related operations to gather data from the source directly but also they can perform Data Science specific tasks and build model on the data collected. Also, you will be able to do transformations using Hadoop Ecosystem tools. So in a nutshell, this course will help the students to learn both Hadoop and Data Science Natural Language Processing in one course. 

Companies like Google, Amazon, Facebook, Ebay, LinkedIn, Twitter, and Yahoo! are using Hadoop on a larger scale these days and more and more companies have already started adopting these digital technologies. If we talk about Text Analytics, there are several applications of Text Analytics (given below) and hence companies prefer to have both of these skillset in the professionals.

  • One of the application of text classification is a faster emergency response system can be developed by classifying panic conversation on social media
  • Another application is automating the classification of users into cohorts so that marketers can monitor and classify users based on how they are talking about products, services or brands online
  • Content or product tagging using categories as a way to improve browsing experience or to identify related content on the website. Platforms such as news agencies, directories, E-commerce, blogs, content curators, and likes can use automated technologies to classify and tag content and products

Companies these days are leaning towards candidates who are equipped with best of both worlds and this course will proved to be a very good starting point. This course covers complete pipeline of modern day ELT (Extract, Load and Transform) and Analytics as shown below:

Get data from Source --> Load data into Structured/Semi Structured/Unstructured form --> Perform Transformations --> Pre-process the Data further --> Build the Data Science Model --> Visualize the Results

Learn and get started with the popular Hadoop Ecosystem technologies as well one the most of the most hot topics in Data Science called Natural Language Processing. In this course you will :

  • Do Hadoop Installation using Hortonworks Sandbox. You will also get an opportunity to do some hands-on with Hadoop operations as well as Hadoop Management Service called Amabri on your computer
  • Perform HDFS operations to work with continuous stream of data
  • Install SSH and File Transfer related tools which helps in operational activities of Hadoop
  • Perform NIFI installation and develop complete workflow on Web UI to move the data from source to destination. Also, perform transformations on this data using NIFI processors
  • Spin up Apache Solr which allows full text search and also to receive text for performing Real Time Text Analysis
  • Engage Banana Dashboard to visualize Real Time Analytics on streaming data
  • Store the Real Time streaming JSON data in structured form using Hive Tables as well as in flat file format in HDFS
  • Visualize the data in the form of Charts, Histograms using Apache Zappelin
  • Learn the Building blocks of Natural Language Processing to develop Text Analytics Skills
  • Unleash the Machine Learning capabilities using Data Science Natural Language Processing and build a Machine Learning Model to classify Text Data

Who this course is for:

  • Anyone who wants to learn both Hadoop and Data Science from scratch
  • Developers, Programmers or Database Administrators who want to transition to Hadoop and Hadoop Ecosystem tools like HDFS, Hive, Solr, NIFI, Banana and also wants to explore Data Science
  • Aspiring Data Scientists, Data Analysts, Business Analysts who want to learn Natural Language Processing as an added arsenal as well as wants to learn Hadoop as well
  • Product , Program or Project Managers who wants to understand the complete architecture as well as understand how Hadoop and Data Science can be integrated together
  • Enterprise Architects, Solution Architects who wants to learn about Hadoop Ecosystem and related technologies to design Big Data related solutions
Basic knowledge
  • Basic Python Programming
  • A computer with at least 8 GB of RAM
What you will learn
  • You will be able to develop a real world an end to end application which will encompass both Hadoop as well as Natural Language Processing (Data Science)
  • Setup a Hadoop Cluster on your laptop free of cost and then connect to different hadoop services
  • Develop distributed applications based on Hadoop Framework, Different Hadoop pillars, HDFS Architecture, MapReduce and different types of Data in Hadoop
  • Visualize Hadoop ecosystem services as well as components like Memory usage, Cluster Load etc. in the form of dashboard on a Web Interface called Ambari
  • Design and Develop scalable, fault tolerant and flexible applications which can store and distribute large data sets across inexpensive servers
  • Develop scripts based on several commands in Hadoop to manage files and datasets
  • Understand the different building blocks of Apache NIFI helping in data movement, transformation etc. Also learn about NIFI Architecture and its various applications
  • Steps to Install Apache NIFI and making changes in configuration files to run it seamlessly
  • Develop a complete workflow application in NIFI which can take data from the streaming source, perform transformations on this data and then store it in Hadoop
  • Spin up Apache Solr as one of the service, configure it to receive streaming data from NIFI processor to perform real time analytics on this data
  • Understand the architecture and concepts related to Apache Solr as well as several of its features
  • Create a Banana Dashboard to visualize the real time analytics happening on live streaming data after getting an understanding of components and structure of Banana Dashboard
  • Visualize where does Hive fit in Hadoop Ecosystem, its Architecture as well as how exactly it works
  • Develop an understanding of how data can be stored in structured form in Apache Hive. In depth knowledge of several of its components
  • Develop and Visualize the data in the form of Graphs, Histograms, Pie Charts etc. using another Hadoop Ecosystem tool (notebook) called Apache Zappelin
  • Develop the concepts of Natural Language Processing and integrate them all to develop a working NLP application
  • Develop basic building blocks of Natural Language Processing and write associated python scripts
  • Build a machine learning model using Python for the application going to be built
Number of Lectures: 49
Total Duration: 11:08:09
Introduction to Hadoop
  • General Overview of Hadoop  
  • A quick look at Hadoop History  
  • Hadoop Framework and Ecosystem  
  • Let's learn about HDFS and Mapreduce  
  • Peak into Hadoop YARN  
Let's Tame the Elephant - Install Hadoop Sandbox and Run few Hadoop Commands
  • Download Hadoop and other supporting tools on your Desktop/Laptop  
  • Install Hadoop and make Configuration changes  
  • Access Hadoop Sandbox and Welcome Page  
The Niagara Files - Introduction to Apache NIFI
  • NIFI Concepts  
  • Acquire knowledge on Apache NIFI's UI Canvas Components  
  • Apache NIFI Architecture  
Install and Configure NIFI
  • Download and Install Apache NIFI  
  • Configure Apache NIFI  
Full Text Search with Apache Solr - An Introduction
  • An introduction of Apache Solr and some of its features  
  • Learn Basics and Components of Search Engine  
  • How Search Engine works?  
  • Peak into the Architecture of Apache Solr  
  • Apache Solr - Basic Concepts  
Install and Configure Apache Solr
  • Spin up Apache Solr and configure it to receive data  
Twitter App Setup for Bringing Data into Hadoop
  • Create Twitter App to get the tweets into Hadoop  
Banana Dashboard for Visualizing Real Time Streaming Data
  • Introduction to Banana Dashboard - Overview, Components and Structure  
  • Spin up Banana Dashboard for Real Time Stream Analytics Visualization  
Apache Hive
  • An Introduction to Apache Hive  
  • Apache Hive Architecture  
  • How does Apache Hive works?  
  • Apache Hive Data Types  
  • Apache Hive - Create Database and Table  
  • Apache Hive - Table Partitioning  
  • Apache Hive - Operators and Functions  
  • Apache Hive - Views and Indexes  
  • Setup Hive Tables to receive JSON Format Data  
  • Create Hive Tables and Views for storing JSON Format Data  
  • Visualize Data using Apache Zappelin  
Data Science - Natural Language Processing
  • NLP - Tokenizing Words and Sentences  
  • NLP - Word Stemming  
  • NLP - Get an understanding of Stopwords  
  • NLP - Dive into Part of Speech Tagging  
  • NLP - Locate and Classify entities using Named Entity Recognition  
  • NLP - Understand the concept of Lemmatization  
  • NLP - Build an Algorithmic classifier to classify the Text  
  • NLP - Importance of Words as Features  
  • NLP - Train a Machine Learning model using Naive Bayes Algorithm  
  • NLP - Get the Machine Learning model loaded faster using Pickling  
  • NLP - Putting everything together for Sentiment Analysis  
  • NLP - Real Time Live Twitter Sentiment Analysis  
  • NLP - Plotting Live Twitter Sentiments  
Free Bonus Materials
  • Free eBooks  
  • Free Apache Hive Book  
  • Free Natural Language Processing with Python eBook  
Reviews (0)