Working in IT industry from past 13+ years and from past 5 years conducting trainings on BigData Hadoop and job support
Having good amount of knowledge on below technologies.
Programming Languages : Hadoop With Java, Spark with Scala, C#.Net,Python
ETL Tools: Datastage,Abinitio
Scripting: Unix/Perl
Databases: Oracle,Teradata,netezza
Third Pary Schedulars: Control-M/TWS
Key Learings:
- Provide training on Complete Hadoop
- Explain Real time scenarios and Case Study’s Discussion.
- Explain Real time project End to End, like Different Stages of Projects, How Data Importing then Cleaning and Processing in Hadoop.
- Share knowledge on different environments like development, testing and production
- Explain Production support project issues
- Explain Development project cycle, Development of mappings, testing and code deployment.
- Prepare you on interviews part as well Certification Guidance.
- Resume Preparation
- BigData Hadoop installation is free
Introduction to BigData, Hadoop:-
- Big Data Introduction
- Hadoop Introduction
- What is Hadoop? Why Hadoop?
- Hadoop History?
- Different types of Components in Hadoop?
- HDFS, MapReduce, PIG, Hive, SQOOP, HBASE, OOZIE, Flume, Zookeeper and so on…
- What is the scope of Hadoop?
Deep Drive in HDFS (for Storing the Data):-
- Introduction of HDFS
- HDFS Design
- HDFS role in Hadoop
- Features of HDFS
- Daemons of Hadoop and its functionality
- Name Node
- Secondary Name Node
- Job Tracker
- Data Node
- Task Tracker
- Anatomy of File Wright
- Anatomy of File Read
- Network Topology
- Nodes
- Racks
- Data Center
- Parallel Copying using DistCp
- Basic Configuration for HDFS
- Data Organization
- Blocks and
- Replication
- Rack Awareness
- Heartbeat Signal
- How to Store the Data into HDFS
- How to Read the Data from HDFS
- Accessing HDFS (Introduction of Basic UNIX commands)
- CLI commands
Planning Your Hadoop Cluster
- Local Mode Cluster
- Single Node Cluster Configuration
- Multi-Node Cluster Configuration
Cluster Monitoring, Troubleshooting, and Optimizing
- General System conditions to Monitor
- Name Node and Job Tracker Web Uis
- View and Manage Hadoop’s Log files
- Common cluster issues and their resolutions
- Populating HDFS from External Sources
- How to use Sqoop to import data from RDBMSs to HDFS
- How to gather logs from multiple systems using Flume
- Features of Hive, Hbase and Pig
- How to populate HDFS from external Sources
MapReduce using Java (Processing the Data):-
- Introduction of MapReduce.
- MapReduce Architecture
- Data flow in MapReduce
- Splits
- Mapper
- Portioning
- Sort and shuffle
- Combiner
- Reducer
- Understand Difference Between Block and InputSplit
- Role of RecordReader
- Basic Configuration of MapReduce
- MapReduce life cycle
- Driver Code
- Mapper
- and Reducer
- How MapReduce Works
- Writing and Executing the Basic MapReduce Program using Java
- Submission & Initialization of MapReduce Job.
- File Input/output Formats in MapReduce Jobs
- Text Input Format
- Key Value Input Format
- Sequence File Input Format
- NLine Input Format
- Joins
- Map-side Joins
- Reducer-side Joins
- Word Count Example
- Partition MapReduce Program
- Side Data Distribution
- Distributed Cache (with Program)
- Counters (with Program)
- Types of Counters
- Task Counters
- Job Counters
- User Defined Counters
- Propagation of Counters
- Job Scheduling