Thursday, September 25, 2014

Five “V”s of Big Data

The value of Big Data starts from asking the BIGGER QUESTIONS. 

There are still many big puzzles in Big Data. Big Data is about Managing 5 V's of Data - Volume,Velocity,Variety, Veracity and Value. In detail, what’re these 5 ‘V’s all about? And how to handle them effectively?

Volume: There’s no restrict line defines the volume of data constitutes Big Data. The volume attribute of Big Data can be  summed up as "The ability to store Big Data", not so much the physical amount, but do you have the ability to hold that data for use. 
Variety: How many different data sources are needed to constitute a wide variety of data? The variety aspect of Big Data is "The ability to acquire a wide variety of data”. Simple: instead of using a single source for a particularly type of measurement, businesses use data from many different and unrelated data sources to produce the measurement and allow them to cross-corroborate each other.

Velocity: The 3rd V aspect of Big Data is "the ability to process at the required velocity". Can we take a transaction, process it and run algorithms on it at the required pace.  There are two aspects of # bigdata. (1) the ability of the platform to capture the raw data as it happens (2) the agility to aggregate, analyze and report on them in near real time. The platform should be flexible enough to incorporate new data models on the fly and the business owners should be empowered to tweak the models and algos per market need and demand.

Veracity: Veracity is another V to focus on making the data useful and trustworthy. Veracity is the lynch pin to all of these aspects. How do you ensure accuracy in unstructured data? What accuracy are you trying to make sure of? So how is veracity assured in cases where a human can never look at the data before it is acted upon? Instead of trusting a single source for ground truth, you let several different systems "vote" on the ground truth. Veracity is achieved by putting as many different data sources into the data model as possible. In addition, the "veracity" has little to do with curation and auditing even though many people treat them as equivalent. Veracity is about minimizing the risk of adverse outcomes due to the inevitable errors, omissions, and noise that are a part of every sufficiently large data set. So business should drive the decisions to invest in building data processes that improve Veracity.

Value: (Volume + Variety + Velocity + Veracity)* visualization = Value. It is still a very fuzzy topic for most people and difficult to show value proposition. Trying to give an answer to a question that hasn't been asked yet is a tough sell to any organization. Input the first 3 V's (Volume, Variety and Velocity) still apply in defining any collection process, the addition of veracity should be applied post-exploration against the variables that have been "discovered" as being business relevant; and then the value to be derived from a big data implementation is dependent on what questions you could ask of the data - "Ask Bigger Questions".  So, unless the right & bigger questions are asked from the data, expecting the answer to provide value is very difficult. And business needs to marry the ability of Big Data technology with the "domain expertise"/ business insights" together to carve value out of such an implementation. Visualization is such a bonus V to clarify Big Data value.

With Big Data, we run the risk of focusing too much on technology and too little on the more arduous aspects, such as organizational aspects. Unless we know for what business objective/decision are we churning the data for, we would just end up spending millions without any result. Hence, thought leadership and close collaboration is the key for Big data success




0 comments:

Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More