HADOOP

Trainer's Profile

Professionally qualified with more than 12 Years of experience and an extensive BigData experience leading diversified teams spread across different geographical locations.

Training Approach:

The training approach in considereing the following.

  • 1. Deep explination of Concept to lay strong foundation.
  • 2. Application of concept to a close real time environment with examples of real time use cases.
  • 3. Explination of all the possible certification and near possible interview questions.

HADOOP Course Contents

Big Data & Hadoop:

  • 1. What is Big Data
  • 2. Sources of Big Data
  • 3. IBM Definition for Big Data
  • 4. Definition of Hadoop
  • 5. History of Hadoop
  • 6. Features of Hadoop
  • 7. Hadoop Eco-System
  • 8. Other Hadoop related products of Apache.

Hadoop Distributed File System:

  • 1. Distributed File System
  • 2. Definition of HDFS
  • 3. Where not to use HDFS
  • 4. HDFS Concepts
  • 5. Hadoop Architecture
  • 6. NameNode, DataNode & SNN
  • 7. HDFS Federation
  • 8. HDFS High Availability
  • 9. Hadoop IO Operations(Read & Write)
  • 10. HDFS Rack Awareness
  • 11. Hadoop Modes
  • 12. Hadoop Configuration
  • 13. Linux & Hadoop Commands

Java:

  • 1. OOP and Java
  • 2. Object Oriented Concepts
  • 3. Language Fundamentals
  • 4. Inheritance
  • 5. Polymorphism
  • 6. Interface
  • 7. Collections
  • 8. Exceptions
  • 9. Multi Threadings

MapReduce:

  • 1. What is MapReduce & Key Value Concepts
  • 2. Traditional Solution
  • 3. MapReduce Solution
  • 4. Input & Output of M/R
  • 5. MapReduce Phases
  • 6. Anatomy of MapReduce
  • 7. WordCount FlowChart
  • 8. Advantages of MapReduce
  • 9. Input Split in M/R
  • 10. Box Classes in Hadoop
  • 11. Execution of WordCount Program
  • 12. Combiner
  • 13. Partitioner
  • 14. MapReduce Joins
  • 15. Distributed Cache
  • 16. Counters
  • 17. MapReduce Formats(Input & Output)

YARN:

  • 1. Challenges in Hadoop 1.x
  • 2. Hadoop 2.x Features
  • 3. Apache YARN
  • 4. Hadoop 2.x Eco-system
  • 5. Hadoop 2.x High Availability
  • 6. Anatomy of YARN Application Run
  • 7. Run a MapReduce application on YARN

Hive:

  • 1. Applications of Hive
  • 2. Advantages & Disadvantages of Hive
  • 3. Hive Installation & Invoking
  • 4. Hive Metastore
  • 5. Hive Architecture
  • 6. Hive Concepts
  • 7. Hive Data Types
  • 8. Demonstration of DataBase Commands
  • 9. Hive Tables
  • 10. Demonstration of Create, rename, alter & Drop
  • 11. Partitions in Hive
  • 12. Bucketing in Hive
  • 13. Hive Joins
  • 14. Complex Data Types
  • 15. Demonstration of External Table
  • 16. SubQueries
  • 17. Views
  • 18. User Defined Functions (UDFs)

PIG:

  • 1. Need for PIG
  • 2. PIG versus MapReduce
  • 3. Where to use PIG
  • 4. Where NOT to use PIG
  • 5. What is PIG
  • 6. Applications of PIG
  • 7. PIG Installation
  • 8. Execution Types
  • 9. Running PIG programs
  • 10. PIG data types
  • 11. RDBMS Vs Pig
  • 12. Comments in Pig
  • 13. Case Sensitivity in Pig
  • 14. Logical and Physical Plan
  • 15. Pig Operators
  • 16. Pig Built in Functions
  • 17. Diagnostic Operators in PIG
  • 18. Special Joins in PIG:
  • 19. Parameter Substitution in PIG
  • 20. PIG UDFs
  • 21. Pig Best Practices

HBASE:

  • 1. What is HBASE & NOSql
  • 2. History
  • 3. Installation
  • 4. Invoke HBase
  • 5. HBASE Vs RDBMS
  • 6. Uses of HBase
  • 7. Where Not to Use HBase
  • 8. HBase Write Path
  • 9. HBase Read Path
  • 10. HBase Terminology
  • 11. Row Vs Column Oriented DB
  • 12. HBase Architecture
  • 13. Data loading Techniques in HBase
  • 14. HBase Shell Commands
  • 15. Demonstration of HBase shell Commands

Sqoop:

  • 1. Introduction and Installation
  • 2. Sqoop Tools
  • 3. Sqoop Connectors
  • 4. Creating a DB and table in MySql
  • 5. Loading the MySql DB
  • 6. Sqoop Import Process
  • 7. Import Hive Data
  • 8. Sqoop Export Process

Flume:

  • 1. Introduction
  • 2. Applications of Flume
  • 3. Advantages of Flume
  • 4. Features of Flume
  • 5. Data Transfer in Hadoop
  • 6. Apache Flume - Architecture
  • 7. Components of Flume
  • 8. Installation of Flume
  • 9. Fetching Data using Flume.

Spark:

  • 1. MapRedue Vs Spark
  • 2. Apache Spark - By Definition
  • 3. Features of Spark
  • 4. Spark Deployment
  • 5. Spark Core & Components
  • 6. Spark Context & Invoking
  • 7. Prerequisites for Spark
  • 8. Resilient Distributed Datasets (RDDs)
  • 9. RDD Operations
  • 10. RDD Persistence
  • 11. Lazy Evaluation & Lineage Graph
  • 12. Spark SQL
  • 13. Spark SQL Capabilities
  • 14. SchemaRDD, DataFrame & Datasets
  • 15. Linking with SparkSQL
  • 16. Initializing Spark SQL
  • 17. SQL Method
  • 18. Creating a DataFrame
  • 19. Transformations, Actions, Laziness
  • 20. Spark Streaming
  • 21. Input Sources and Batches of Input
  • 22. Discretized Streams
  • 23. Initializing StreamingContext
  • 24. Transformation on DStreams
  • 25. Output Operations on DStreams
  • 26. Machine Learning
  • 27. Spark MLlib
  • 28. Sparks MLlib Packages
  • 29. MLlib Data Types
  • 30. Spark MLlib Algorithms
  • 31. Spark GraphX
  • 32. Graph - Getting Started
  • 33. Graph Operators
  • 34. Graph Builder
  • 35. Graph Algorithms

Scala:

  • 1. Why Scala
  • 2. Functional programming & First Scala Program
  • 3. Data Types
  • 4. Variable
  • 5. Classes
  • 6. Objects
  • 7. Access Modifies
  • 8. Operators, Conditional Statement
  • 9. Functions
  • 10. Closures

About Instructor

KudVenkat

Software Architect, Trainer, Author and Speaker in Pragim Technologies.

Reviews

HADOOP

Average Rating

0

0 ratings

5 1

Details

5 Stars
0
4 Stars
0
3 Stars
0
2 Stars
0
1 Stars
0

ADD A REVIEW

Name
Email
Review Title
Rating
Review Content