Library

Course: Mastering Apache Sqoop with Hadoop, Hive, MySQL, Hortonworks Data Platform

Mastering Apache Sqoop with Hadoop, Hive, MySQL, Hortonworks Data Platform

  • Life Time Access
  • Certificate on Completion
  • Access on Android and iOS App
About this Course

WHY APACHE SQOOP

Apache SQOOP is designed to import data from relational databases such as Oracle, MySQL, etc to Hadoop systems. Hadoop is ideal for batch processing of huge amounts of data. It is industry standard nowadays. In real world scenarios, using SQOOP you can transfer the data from relational tables into Hadoop and then leverage the parallel processing capabilities of Hadoop to process huge amounts of data and generate meaningful data insights. The results of Hadoop processing can again be stored back to relational tables using SQOOP export functionality. 

ABOUT THIS COURSE

In this course, you will learn step by step everything that you need to know about Apache Sqoop and how to integrate it within Hadoop ecosystem. With every concept explained with real world like examples, you will learn how to create Data Pipelines to move in/out the data from Hadoop. In this course, you will learn following major concepts in great details:

APACHE SQOOP - IMPORT TOPICS << MySQL to Hadoop/Hive >>

  • Warehouse hadoop storage
  • Specific target on hadoop storage
  • Controlling parallelism
  • Overwriting existing data
  • Append data
  • Load specific columns from MySQL table
  • Control data splitting logic
  • Default to single mapper when needed
  • Sqoop Option files
  • Debugging Sqoop Operations
  • Importing data in various file formats - TEXT, SEQUENCE, AVRO, PARQUET & ORC
  • Data compression while importing
  • Custom query execution
  • Handling null strings and non string values
  • Setting delimiters for imported data files
  • Setting escaped characters
  • Incremental loading of data
  • Write directly to hive table
  • Using HCATALOG parameters
  • Importing all tables from MySQL database 
  • Importing entire MySQL database into Hive database

APACHE SQOOP - EXPORT TOPICS << Hadoop/Hive to MySQL >> 

  • Move data from Hadoop to MySQL table
  • Move specific columns from Hadoop to MySQL table
  • Avoid partial export issues
  • Update Operation while exporting

APACHE SQOOP - JOBS TOPICS << Automation >>

  • Create sqoop job
  • List existing sqoop jobs
  • Check metadata about sqoop jobs
  • Execute sqoop job
  • Delete sqoop job
  • Enable password storage for easy execution in production

WHAT YOU WILL ACHIEVE AFTER COMPLETING THIS COURSE

After completing this course, you will cover one of the topic that is heavily asked in below certifications. You will need to take other lessons as well to fully prepare for the test. We will be launching other courses soon.

  • CCA Spark and Hadoop Developer Exam (CCA175)
  • Hortonworks Data Platform (HDP) Certified Developer Exam (HDPCD)

WHO ARE YOUR INSTRUCTORS

This course is taught by professionals with extensive experience in handling big data applications for Fortune 100 companies of the world. They have managed to create data pipelines for extracting, transforming & processing over 100's of Terabytes of data in a day for their clients providing data analytics for user services. After successful launch of their course - Complete ElasticSearch with LogStash, Hive, Pig, MR & Kibana, same team has brought to you a complete course on learning Apache Sqoop with Hadoop, Hive, MySQL.

You will also get step by step instructions for installing all required tools and components on your machine in order to run all examples provided in this course. Each video will explain entire process in detail and easy to understand manner.

You will get access to working code for you to play with it and expand on it. All code examples are working and will be demonstrated in video lessons.

Windows users will need to install virtual machine on their device to setup single node hadoop cluster while MacBook or Linux users can directly install hadoop and sqoop components on their machines. The step by step process is illustrated within course.

Basic knowledge
  • The Complete Course on Apache SQOOP. Great for CCA175 Spark & Hortonworks Big Data Hadoop Developer Certifications
What you will learn

You will get following from this course:

  • Get Ready for CCA Spark and Hadoop Developer Exam (CCA175)
  • Get Ready for Hortonworks Data Platform (HDP) Certified Developer Exam (HDPCD)
  • Advance your career by applying for high paying Big Data jobs
  • Crack Big Data Developer Interviews
  • Develop sound understanding about Data Ingestion process from Relational System (MySQL) to Hadoop ecosystem & vice versa
Curriculum
Lectures quantity: 44
Common duration: 03:29:55
Introduction
  • Course Objectives  
Apache SQOOP in a nutshell
  • What is Apache SQOOP  
  • Why Apache SQOOP  
  • How SQOOP Works  
Environment Setup
  • Install Hortonworks Data Platform Sandbox - ( FOR WINDOWS PC USERS ONLY )  

    Please follow the instructions from the attached e-book to install Hortonworks Data Platform Sandbox on Windows PC.

  • Install Hadoop & SQOOP on Machine - ( FOR MAC/LINUX USERS)  

    Please follow the instructions from the attached e-book to install Hadoop/Sqoop on MacBook/Linux machines.

  • Connect to HDP Sandbox Shell  

    Please follow the instructions from the attached e-book to access HDP Sandbox Shell.

  • Get to know SQOOP CLI  
  • What is my Hostname  
  • Data Setup for Exercises  
  • Let's Understand Your Data  
Apache SQOOP - IMPORT
  • Import a Simple MySQL Table into Hadoop HDFS  
  • Import a MySQL Table with Custom Name into Hadoop  
  • Controlling Paralellism in SQOOP Import Flow  
  • Overwrite Existing Data on Hadoop while Importing  
  • Append to Existing Data on Hadoop while Importing  
  • Only load specific columns from MySQL table into Hadoop  
  • Import MySQL tables with No Primary keys in them - 1st Approach  
  • Import MySQL tables with No Primary keys in them - 2nd Approach  
  • Using SQOOP Option files to simplify CLI Commands  
  • Running SQOOP Import in Debug mode  
  • Importing & Storing Data in Textual Format on Hadoop  
  • Importing & Storing Data in AVRO Format on Hadoop  
  • Importing & Storing Data in SEQUENCE Format on Hadoop  
  • Importing & Storing Data in PARQUET Format on Hadoop  
  • Compressing Imported Data  
  • Running Custom MySQL Queries on Source Tables  
  • Handling NULL values in Source Dataset  
  • Setting Custom Field Separators in Imported Data  
  • Handling Escape Characters while Importing  
  • Avoid Enclosing all Data Values while Importing  
  • Incremental Loading of Delta data while Importing - Part 1  
  • Incremental Loading of Delta data while Importing - Part 2  
  • Importing Data Directly into Hive Table  
  • Using HCATALOG to Load Data in ORC File Format  
  • Load ALL tables from MySQL to Hadoop  
  • Load ALL tables from MySQL to Hive Database  
Apache SQOOP - EXPORT
  • Export a Hive table to MySQL table  
  • Export Specific Columns from Hive table to a MySQL table  
  • Avoid Partial Data Exports in SQOOP  
  • When Update Record is OK in SQOOP Export  
Apache SQOOP - JOBS
  • SQOOP Jobs - Create, List, Show, Execute & Delete Operations  
  • Make SQOOP job remember MySQL Database Password For Subsequent executions  
Conclusion
  • What's Next?  
Reviews (0)