Course description


Big Data means really a big data, it is a collection of large data sets that cannot be processed using traditional computing techniques. Big data is not merely a data, rather it has become a complete subject, which involves various tools, techniques and frameworks.

Hadoop is an open-source framework that allows storing and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Unique Service Point(USP)


Most of the data is stored on local networks with servers that may be clustered and sharing storage. This approach has had time to developed into stable architecture, and provide decent redundancy when deployed right. This is the variety characteristic. Considering volume, velocity, and variety, the analytic techniques have also evolved to accommodate these characteristics to scale up to the complex and sophisticated analytic needed. Some practitioners and researchers have introduced characteristic veracity. The implication of this is data assurance. That is both the data and the analytic and outcomes are error-free and credible.

What is BigData & Hadoop?

Big Data is a unique approach to help you act on data for real business gain – not what a tool can do, but what you can do with the output from the tool. Big data as defined by Wiki is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
Apache Hadoop is an Open Source software framework for storage and large-scale processing of data-sets on a clusters of commodity hardware. It is an Open source Data Management software framework with scale-out storage and distributed processing.

BigData & Hadoop Training Course Prerequisites?

1.Basic Unix Commands
2.Core Java (OOPS Concepts, Collections, Exceptions) for Map Reduce Programming
3.SQL Query knowledge for Hive Queries

BigData & Hadoop Training Course Duration?

3 months of Course Duration

What is a Job in Hadoop? What all can be accomplished via a Job?

In Hadoop, a Job is a MapReduce program to process/analyze the data. The term MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform.

1.BigData Overview
  • What is BigData
  • What comes under BigData
  • Benefits of BigData
  • BigData Technologies
  • Operational vs. Analytical Systems
  • Bigdata Challenges

  • 2.BigData Solutions
  • Traditional Enterprise Approach
  • Google’s Solution
  • Hadoop

  • 3.Introducton to Hadoop
  • Hadoop Architecture
  • MapReduce
  • Hadoop Distributed File System
  • How Does Hadoop Work?
  • Advantages of Hadoop

  • 4. Environment Setup
  • Pre-installation Setup
  • Installing Java
  • Downloading Hadoop
  • Hadoop Operation Modes
  • Installing Hadoop in Standalone Mode
  • Installing Hadoop in Pseudo Distributed      Mode
  • Verifying Hadoop Installation

  • 5. HDFS Oerview
  • Features of HDFS
  • HDFS Architecture
  • Goals of HDFS
  • 6. HDFS Operations
  • Starting HDFS
  • Listing Files in HDFS
  • Inserting Data into HDFS
  • Retrieving Data from HDFS
  • Shutting Down the HDFS

  • 7. Command Reference
  • HDFS Command Reference

  • 8. MapReduce
  • What is MapReduce?
  • The Algorithm
  • Inputs and Outputs (Java Perspective)
  • Terminology
  • Example Scenario
  • Compilation and Execution of Process      Units Program
  • Important Commands
  • How to Interact with MapReduce Jobs

  • 9. Streaming
  • Example using Python
  • How Streaming Works
  • Important Commands

  • 10. MULTI-NODE Cluster
  • Installing Java
  • Creating User Account
  • Mapping the nodes
  • Configuring Key Based Login
  • Installing Hadoop
  • Configuring Hadoop
  • Installing Hadoop on Slave Servers
  • Configuring Hadoop on Master Server
  • Starting Hadoop Services
  • Adding a New DataNode in the Hadoop      Cluster
  • Adding a User and SSH Access
  • Set Hostname of New Node
  • Start the DataNode on New Node
  • Removing a DataNode from the Hadoop      Cluster