Introduction to Big Data and Hadoop




Instructor
Engr. Palash Gupta
Overview:

Introduction to Big Data and Hadoop’ is an ideal course package for individuals who want to understand the basic concepts of Big Data and Hadoop. On completing this course, trainees will be able to interpret what goes behind the processing of huge volume & variety of data as the industry moves from relational database analytics consuming high processing time to real-time analytics with extensively expedited processing speed. In addition, the trainees would be able to relate when it is efficient designing solution on top of Big Data and how it is designed to store large data set in distributed mode using a cost-effective model.

Learning Outcomes:

The course focuses on the basics of Big Data, Hadoop & Spark explaining both distributed storage and computing. It is a blend of both theory and hands-on demonstration & exercises which will give the trainee a working level experience about the operational process and associated benefits of Big Data.

Evaluation Criteria:

  • Assignments 50 Marks + Final Examination 50 Marks
  • Thirty (30) Multiple Choice Questions (* 1 Mark) from the topics which cover in the week of 1-5 and
  • Two programming assignment which cover in the week of 6-8 (10 Marks for each assignment).
  • Final Examination (25 Multiple Choice Questions* 2 Marks)

Big Data Hands-on Training Syllabus:

Week Lesson Topics Hours Type
1 Introduction to Big Data & Hadoop • What is Big Data?
• How Big is Big Data?
• What are we trying to solve in Big Data?
• Types of Data Structure
• Hadoop System Principle
• History of Hadoop
• Comparison with RDBMS
• Hadoop Eco System
• Hadoop Distribution
• Supported Operating System, Hardware and Resources
2 Hours Theory
2 Understanding Hadoop HDFS & Map Reduce • HDFS Concept
• HDFS Architecture
• Introduction to Map Reduce
• Working Methodology of Map Reduce
2 Hours Theory
3 Basic Hadoop Configuration, Setup, Administration and Command Reference • Design a Hadoop Cluster
• Procedure setting up a basic Hadoop Cluster
• Setting Up a Hadoop Cluster
• Basic Administration of Hadoop
• Basic Command Reference
2 Hours Theory & Practical
4 Understanding Spark Essential, Architecture • Introduction to Apache Spark
• Sparck Architecture
• Introduction to Spark RDD, Dataset, DataFrame and DAG
• Introduction to Spark Component
• Understanding Spark Execution Model
• Why Spark where we have Hadoop?
2 Hours Theory
5 Spark Configuration, Administration & Setup • Design a Spark Cluster
• Procedure setting up a basic Spark Cluster
• Setting Up a Spark Cluster
• Basic Administration of Spark
• Understanding of how Spark runs our job
2 Hours Theory & Practical
6 Fundamental of Python and Shell Scripting • Introduction to Python
• Installing & utilizing New Python Package
• Introduciton to Shell Scripting
• Running a Python Script with a Sheel Script in Shedule
2 Hours Practical
7 Programming with HDFS & Spark • Accessing HDFS Files using Python
• Loading HDFS Files in Spark
• Running basic transformations and actions in Spark
• Understanding of how Spark runs our job with an example
2 Hours Theory & Practical
8 Practical Experiment with a real-life problem • Understand the problem
• Desing the solution
• Implement the solution
• Q&A
2 Hours Practical