Library

Course: Apache Spark and Scala

Apache Spark and Scala

  • Life Time Access
  • Certificate on Completion
  • Access on Android and iOS App
  • Self-Paced
About this Course

This course on Apache Spark and Scala aims at providing an advanced expertise in big data Hadoop ecosystem. This course will provide a standard skillset which helps one become a specialist on the top of Big data Hadoop developer. 

The course starts with a detailed description on limitations of mapreduce and how Spark can help overcome them. Further it covers a deeper dive into the Scala programming language.

Moving on it covers Spark as a standalone cluster and an understanding of Resiliient Distributed Datasets.

The course also covers concepts of Spark SQL using SQL queries through SQL context and Hive Queries through Hive context.

This course certainly provides material required for building a career path from Big data Hadoop developer to BIg data Hadoop architect.

Basic knowledge
  • Prior knowledge of Apache Hadoop will be an added advantage, but not compulsory.
  • Fundamental understanding of any programming language
What you will learn
  • Understand the limitations of Hadoop mapreduce and how Spark overcomes these limitations
  • Gain expertise in Scala programming language and its characteristics
  • Able to work with RDDs' and create applications in Spark
  • A thorough understanding about Spark SQL by using SQL queries in Spark
Curriculum
Number of Lectures: 57
Total Duration: 06:09:27
Module-1 Introduction to Big data, Hadoop and Spark
  • 1.1 Overview of Big Data  
  • 1.2 Introduction to Apache Hadoop  
  • 1.3 Hadoop Distributed File System  
  • 1.4 Hadoop MapReduce  
  • 1.5 Introduction to Apache Spark  
  • 1.6 Characteristics of Apache Spark  
  • 1.7 Users and Use Cases of Apache Spark  
  • 1.8 Job Execution Flow and Spark Execution  
  • 1.9 Spark Unified Stack  
  • 1.10 Complete Picture of Apache Spark  
  • 1.11 Why Spark with Scala  
  • 1.12 Apache spark Architecture  
Module 2: Introduction to Scala Programming Language
  • 2.1 Introduction to Scala  
  • 2.2 Scala Basic Syntax  
  • 2.3 Scala Class and Objects  
  • 2.4 If else Statements in Scala  
  • 2.5 Loops in Scala  
Module 3: Advanced Scala Programming
  • 3.1 Functions and Procedures in Scala  
  • 3.2 Access Modifiers  
  • 3.3 Strings and Arrays  
  • 3.4 Scala Collections  
  • 3.5 Scala Traits  
  • 3.6 Pattern Matching  
  • 3.7 Scala Extractors  
  • 3.8 Scala Exception Handling  
  • 3.9 Scala Files IO  
Apache Spark RDDs
  • 4.1 Programming with RDDs  
  • 4.2 Starting with Spark  
  • 4.3 Creating RDDs  
  • 4.4 RDD Operations  
  • 4.5 Lifecycle of Spark  
Module 2: Apache Spark RDDs II
  • 5.1. Spark Caching  
  • 5.2. Common Transformations and Actions  
  • 5.3 Spark Functions  
  • 5.4 Some more Spark functions  
Module 6: Working with Key-Value Pairs
  • 6.1 Key value pairs  
  • 6.2 Aggregate Functions  
  • 6.3 Working with Aggregate Functions  
  • 6.4 Joins in Spark  
  • 6.5 Practical on Word count example  
Advanced Spark Programming
  • 7.1 Spark Shared Variables  
  • 7.2 Spark and Fault Tolerance  
  • 7.3 Broadcast Variables  
  • 7.4 Numeric RDD Operations  
Running Spark Jobs on Cluster
  • 8.1 Spark Runtime Architecture  
  • 8.2 Spark Driver  
  • 8.3 Executors  
  • 8.4 Cluster Managers  
  • 8.5 Cluster Managers II  
Module 9: Spark SQL
  • 9.1 Introduction to Spark SQL  
  • 9.2 Starting Point-SQL Context  
  • 9.3 Hive with spark SQL  
  • 9.4 Spark SQL caching  
Module 10: Spark Streaming
  • 10.1 Spark Streaming  
  • 10.2 Stream Processing  
  • 10.3 Programming Model  
  • 10.4 Operations Transformations  
Reviews (0)