Jenny (Xiao) Zhang

Big Data – Big Messy Data

You know Big data is a very hot topic. Everybody talks about it.

As all those media tech buzz words, I hear Big Data a lot but I never understand what exactly it is.

After reading through some good articles, I summarize some key points that I feel will be helpful to understand some basics about Big Data.

Big Data is about four Vs:

Volume: a large amount of data, at least TB level. That is why Big Data is normally associated with Could computing. It needs the power of cloud computing to process the data.

Velocity: the data is generated real-time and being processed real-time.

Variety: the data is different types and mostly unstructured ( eg. photos, videos).

Veracity: the data is messy. The quality is low and the accuracy is hard to control.

The picture below from IBM is a very good illustration of the 4 Vs.

big data 4 vs

 

 

 

 

 

 

 

 

 

 

 

 

The Big Data Technology has been through three generations:

Batch processing: represented by MapReduce

Real-time processing: represented by Storm

Hybrid: represented by MillWheel

The picture below shows the history of Big Data processing technologies.

big data processing technology

 

 

 

 

 

 

 

 

 

 

 

 

Does Big Data really have value?

In Bernard Marr’s LinkedIn post, he mentioned another V about Big Data, which is Value.

A golden example about how Big Data is bringing value is Target. Target used Big Data to predict if a customer is pregnant or not based on the lotion and certain type of Vitamin she bought. Then Target sent baby item coupons to these customers, including a girl who did not tell her family that she was pregnant. Another example is that Wall St used Big Data on Twitter to predict investor’s emotions and make decisions on tradings.

I believe there is always some value in data, not matter it is big or small. But the two important things are: 1) Can you justify the value in cost-benefit analysis? 2) Is your organization Agile enough to take advantage of the value?

Hope this is helpful. To learn more, check out the Big Data Guru. Feel free to share your thoughts!