Big Data means really a big data, it is a collection of large data sets that cannot be processed using traditional computing techniques. Big data is not merely a data, rather it has become a complete subject, which involves various tools, techniques and frameworks.
Hadoop is an open-source framework that allows storing and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
Unique Service Point(USP)
Most of the data is stored on local networks with servers that may be clustered and sharing storage. This approach has had time to developed into stable architecture, and provide decent redundancy when deployed right. This is the variety characteristic. Considering volume, velocity, and variety, the analytic techniques have also evolved to accommodate these characteristics to scale up to the complex and sophisticated analytic needed. Some practitioners and researchers have introduced characteristic veracity. The implication of this is data assurance. That is both the data and the analytic and outcomes are error-free and credible.
Big Data is a unique approach to help you act on data for real business gain – not what a tool can do, but what you can do with the output from the tool. Big data as defined by Wiki is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
Apache Hadoop is an Open Source software framework for storage and large-scale processing of data-sets on a clusters of commodity hardware. It is an Open source Data Management software framework with scale-out storage and distributed processing.
1.Basic Unix Commands
2.Core Java (OOPS Concepts, Collections, Exceptions) for Map Reduce Programming
3.SQL Query knowledge for Hive Queries
3 months of Course Duration
In Hadoop, a Job is a MapReduce program to process/analyze the data. The term MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform.