Friday, January 27, 2017

Introduction to Hadoop

Hadoop is an open source,java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

Hadoop is designed to be robust,in that your Big Data applications will continue to run even when individual servers or clusters fail. It is also designed to scale up from a single server to thousands of machines,with a very high degree of fault tolerance.

Hadoop an efficient distributed file system and not a database.It is designed specially for information that comes in many forms,such as server log files or personal productivity documents. Anything that can be stored as a file can be placed in a Hadoop repository.

Big Data means really a big data, it is a collection of large datasets that can not be processed using traditional computing techniques. Big data is not merely subject,which involves various tools,techniques and frameworks.

Hadoop can provide accurate analysis,which may lead to more concentrate decision-making resulting in greater operational efficiencies,cost reductions and reduced risks for the business.

Recommended to Read:  Prerequisites for Learning Hadoop

History of Hadoop:

Hadoop was inspired by Google's MapReduce,a software framework in which an application is broken down into numerous small parts. Any of these blocks or parts can be run on any node in the cluster.



What type of problems we can achieve using Hadoop?

The Hadoop platform was designed to solve problems where you have a lot of data. Perhaps a mixture of complex and structured data and it does not fit nicely into tables. It is for situations where you want to run analytics that are deep and computationally extensive,like clustering and targeting. That is exactly what Google was doing when it was indexing the web and examining user behavior to improve performance algorithm.

Hadoop applies to a bunch of markets. In finance,if you want to do accurate portfolio evaluation and risk analysis,you can build sophisticated models that are hard to jam into a database engine. But Hadoop can handle it. In online retail,if you want to deliver better search answers to your customers so they are more likely to buy thing you show them,that sort of problem is well addressed by the platform Google built.

Where exactly Hadoop is used in Real time:


  1.  Search   -  Yahoo,Amazing,Z vents
  2.  Log Processing -  Facebook, Yahoo
  3.  Data Warehouse - Facebook, AOL
  4. Video and image Analysis - New York times, Eyealike
Advantage of Hadoop:

It's scalable:

New nodes can be added as needed and added without needing to change data formats,how data is loaded,how jobs are written,or the applications on top.

It's cost effective:

Hadoop brings massively parllel computing to commodity servers. The result is a sizeable decreases in the cost per terabyte of storage,which in turn makes it affordable to model all your data.

It's Flexible:

Hadoop is schema less,and can absorb any type of data,structured or not,from any number of resources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analysis than any one system can provide.

It's Fault tolerant:

When you lose a node,the system redirects work to another processing without missing a beat.


No comments:

Post a Comment

High Paying Jobs after Learning Python

Everyone knows Python is one of the most demand Programming Language. It is a computer programming language to build web applications and sc...