HADOOP Using Cloudera Development & Admin Course

Courses

HADOOP Using Cloudera Development & Admin Course

Working in IT industry from past 13+ years and from past 5 years conducting trainings on BigData Hadoop and job support

Having good amount of knowledge on below technologies.

Programming Languages : Hadoop With Java, Spark with Scala, C#.Net,Python

ETL Tools: Datastage,Abinitio

Scripting: Unix/Perl

Databases: Oracle,Teradata,netezza

Third Pary Schedulars: Control-M/TWS

 

Key Learings:

  • Provide training on Complete Hadoop
  • Explain Real time scenarios and Case Study’s Discussion.
  • Explain Real time project End to End, like Different Stages of Projects, How Data Importing then Cleaning and Processing in Hadoop.
  • Share knowledge on different environments like development, testing and production
  • Explain Production support project issues
  • Explain Development project cycle, Development of mappings, testing and code deployment.
  • Prepare you on interviews part as well Certification Guidance.
  • Resume Preparation
  • BigData Hadoop installation is free

 Introduction to BigData, Hadoop:-

  • Big Data Introduction
  • Hadoop Introduction
  • What is Hadoop? Why Hadoop?
  • Hadoop History?
  • Different types of Components in Hadoop?
  • HDFS, MapReduce, PIG, Hive, SQOOP, HBASE, OOZIE, Flume, Zookeeper and so on…
  • What is the scope of Hadoop?

Deep Drive in HDFS (for Storing the Data):-

  • Introduction of HDFS
  • HDFS Design
  • HDFS role in Hadoop
  • Features of HDFS
  • Daemons of Hadoop and its functionality
    • Name Node
    • Secondary Name Node
    • Job Tracker
    • Data Node
    • Task Tracker
  • Anatomy of File Wright
  • Anatomy of File Read
  • Network Topology
    • Nodes
    • Racks
    • Data Center
  • Parallel Copying using DistCp
  • Basic Configuration for HDFS
  • Data Organization
    • Blocks and
    • Replication
  • Rack Awareness
  • Heartbeat Signal
  • How to Store the Data into HDFS
  • How to Read the Data from HDFS
  • Accessing HDFS (Introduction of Basic UNIX commands)
  • CLI commands

 

Planning Your Hadoop Cluster

  • Local Mode Cluster
  • Single Node Cluster Configuration
  • Multi-Node Cluster Configuration

 

Cluster Monitoring, Troubleshooting, and Optimizing

  • General System conditions to Monitor
  • Name Node and Job Tracker Web Uis
  • View and Manage Hadoop’s Log files
  • Common cluster issues and their resolutions
  • Populating HDFS from External Sources
  • How to use Sqoop to import data from RDBMSs to HDFS
  • How to gather logs from multiple systems using Flume
  • Features of Hive, Hbase and Pig
  • How to populate HDFS from external Sources

 

MapReduce using Java (Processing the Data):-

  • Introduction of MapReduce.
  • MapReduce Architecture
  • Data flow in MapReduce
    • Splits
    • Mapper
    • Portioning
    • Sort and shuffle
    • Combiner
    • Reducer
  • Understand Difference Between Block and InputSplit
  • Role of RecordReader
  • Basic Configuration of MapReduce
  • MapReduce life cycle
    • Driver Code
    • Mapper
    • and Reducer
  • How MapReduce Works
  • Writing and Executing the Basic MapReduce Program using Java
  • Submission & Initialization of MapReduce Job.
  • File Input/output Formats in MapReduce Jobs
    • Text Input Format
    • Key Value Input Format
    • Sequence File Input Format
    • NLine Input Format
  • Joins
    • Map-side Joins
    • Reducer-side Joins
  • Word Count Example
  • Partition MapReduce Program
  • Side Data Distribution
    • Distributed Cache (with Program)
  • Counters (with Program)
    • Types of Counters
    • Task Counters
    • Job Counters
    • User Defined Counters
    • Propagation of Counters
  • Job Scheduling

PIG:-

  • Introduction to Apache PIG
  • Introduction to PIG Data Flow Engine
  • MapReduce vs PIG in detail
  • When should PIG used?
  • Data Types in PIG
  • Basic PIG programming
  • Modes of Execution in PIG
    • Local Mode and
    • MapReduce Mode
  • Execution Mechanisms
    • Grunt Shell
    • Script
    • Embedded
  • Operators/Transformations in PIG
  • PIG UDF’s with Program
  • Word Count Example in PIG
  • The difference between the MapReduce and PIG

SQOOP:-

  • Introduction to SQOOP
  • Use of SQOOP
  • Connect to mySql database
  • SQOOP commands
    • Import
    • Export
    • Eval
    • Codegen and etc…
  • Joins in SQOOP
  • Export to MySQL

HIVE:-

  • Introduction to HIVE
  • HIVE Meta Store
  • HIVE Architecture
  • Xml Data Cleaning,
  • Json Data Cleaning
  • Log files cleaning
  • Tables in HIVE
    • Managed Tables
    • External Tables
  • Hive Data Types
    • Primitive Types
    • Complex Types
  • Partitions
  • Dynamic Partitions
  • Static Partitions
  • Buckets
  • Joins in HIVE
  • HIVE UDF’s and UADF’s with Programs
  • Word Count Example

HBASE:-

  • Introduction to HBASE
  • Basic Configurations of HBASE
  • Fundamentals of HBase
  • What is NoSQL?
  • HBase DataModel
    • Table and Row
    • Column Family and Column Qualifier
    • Cell and its Versioning
  • Categories of NoSQL Data Bases
    • Key-Value Database
    • Document Database
    • Column Family Database
  • SQL vs NOSQL
  • How HBASE is differ from RDBMS
  • HDFS vs HBase
  • Client side buffering or bulk uploads
  • HBase Designing Tables
  • HBase Operations
    • Get
    • Scan
    • Put
    • Delete

MongoDB:–

  • What is MongoDB?
  • Where to Use?
  • Configuration On Windows
  • Inserting the data into MongoDB?
  • Reading the MongoDB data.

Cluster Setup:–

  • Downloading and installing the Ubuntu12.x
  • Installing Java
  • Installing Hadoop
  • Creating Cluster
  • Increasing Decreasing the Cluster size
  • Monitoring the Cluster Health
  • Starting and Stopping the Nodes

OOZIE

  • Introduction to OOZIE
  • Use of OOZIE
  • Where to use?

SPARK

  • What is Spark?
  • Modes of Spark
  • Spark Installation Demo
  • Overview of Spark on a cluster
  • Spark Standalone Cluster
  • SCALA (Object Oriented and Functional Programming)
  • Scala Environment Set up
  • Functional Programming
  • Collections ( Very Important for Spark )
  • Object-Oriented Programming
  • Integrations
  • Invoking Spark Shell
  • Creating the Spark Context
  • Loading a File in Shell
  • Performing Some Basic Operations on Files in Spark Shell
  • Spark SQL
  • Spark Streaming Overview
  • Spark MLLib
  • Example: Streaming Word Count

 

Hadoop Ecosystem Overview

Oozie

HBase

Sqoop

Casandra

Zoo Keeper

Flume