Sunday, April 23, 2017

Why we need Big Data

The process of storing and analysis data to make sense for the organization is called Big data. In simple terms,data which is very large in size and yet growing exponentially with time is called as Big data.The volume of data being made publicly available increases every year, too. Organizations no longer have to merely manage their own data; success in the future will be dictated to a large extent by their ability to extract value from other organizations’ data.

For any application that contains limited amount of data we normally use SQL/Oracle/ MySQL,but what in case of large applications like Facebook,Google,YouTube? This data is so large and complex that none of the traditional data management system is able to store and process it.

Facebook generates 500+ TB data per day as people upload various images,videos,posts etc..Similarly sending text/multimedia messages,updating Facebook/whatsapp status,comments etc..generates huge data.If we use traditional data processing applications(SQL/ORACLE/MySQL)to handle it, it will lead to loss of efficiency. So in order to handle exponential growth of data,data analysis becomes a required task. To overcome this problem,we use Big data. Big data includes both structured and unstructured data.

Structured Data means the data which can be stored and processed in table format is called as a structured data. it is very simple to enter, store and analyze.Example: RDBMS

Unstructured Data means the data with unknown form or structure is called as unstructured data. Example: Text files,images,videos,webpages,PDF files,PPT,social media data etc..

Semi structured Data means combination of both Structured and unstructured data.Example: XML data. 


Traditional management systems and existing tools are facing difficulties to process such a big data.R is one of the main computing tool used in statistical education Research. It is also widely used for data analytics and numerical computing in scientific research.

This type of Big Data come from Social Media,E-Commerce,Share Market and Airplane etc..

1.Social Media: This could be data coming from social media services such as Face Book Likes,photos and videos uploads,putting comments,Tweets and You Tube views.Facebook hosts more than 240 billion photos, growing at 7 petabytes per month.
2. Transport Data: Transport Data includes model,capacity,distance and availability of a vehicle.
3. Share Market:  Stock Exchange generates huge amount of data through its daily transactions.The New York Stock Exchange generates about 4−5 terabytes of data per day.

4. E-Commerce site: E-Commerce Sites Like FlipKart,Amazon,Snapdeal generates huge amount of data.
5.Search Engine Data: Search engines retrieve lots of data from different databases.


6.Airplane: single Airplane can generate 10+ TB of data in 30 Minutes of flight time.

Then you may raise question what is the need to store huge amount of data. Then here is answer.The main reason behind storing data is analysis.Data analysis is a process used to clean,transform and remodel data with a view to reach to a certain conclusion for a given situation.More accurate analysis leads to better decision making leads to increase in efficiency and risk reduction.

Example: When we search on e-commerce websites(FlipKart,Amazon)we get some recommendations of product that we search. The analysis of data that we entered is done by these websites,then accordingly the related products are displayed.

Example: when we search any smart phone,we get recommendations to buy back covers,screen guard etc..

Similarly,Facebook stores our images,videos? The reason is advertisement. 
There are two types of marketing. They are
a)Global Marketing: show advertisement to all users.
b)Target Marketing: Show advertisement to particular groups/people. So in target marketing,Facebook analysis it's data and it shows advertisement to selected people.

Example: If advertiser wants to advertise for cricket kit and he/she wants to show that advertisement to only interested set of people. So Facebook tracks a record of all those people who are member of cricket groups or post anything related to cricket and displays it to them.

Watch Video: What is Big Data?

ALSO READ:

Introduction to Hadoop
What are the prerequisites to learn Hadoop
Basic Hadoop Interview Questions


No comments:

Post a Comment

High Paying Jobs after Learning Python

Everyone knows Python is one of the most demand Programming Language. It is a computer programming language to build web applications and sc...