Friday, June 12, 2015

Three Aspects of Big Data

Big Data, is like raw diamonds, get their use when becoming treated, processed into industrial diamonds.

Big Data" refers to the ability to be able to store large volumes of data in their raw form quickly on cheap commodity hardware using a distributed file system, specially tailored to perform this task. But more specifically, what are the characteristics of Big Data, and what are the further aspect to deploy it and get true business VALUE from it?

Five “Vs” of Big Data: Big Data is the storage of very large, unstructured datasets. Analysis is domain of data science in order to support fact-based decision making. The storage could be on-premises, a DB appliance, Cloud storage, but in most cases, in NoSQL data structures, so storage isn't necessarily "low-cost, distributed file systems." Big Data is about Five “Vs”: (1) VOLUME, not infrastructure. (2) Second point is VARIETY; The main reason RDBMS data structures haven't been successful in the Big Data space is their limited capability to store data that are not alphanumeric. Images, Video, Web content and other forms of social media, unstructured data and relationships. (3) Third point is VELOCITY: Can you take a transaction, process it and run algorithms on it at the required pace. RDBMS require a large admin overhead, DBAs have to model what they want to store, and how they want it stored before any data are loaded. (4) The Fourth point is VERACITY. Veracity is another V to focus on making the data useful and trustworthy. Veracity is the lynch pin to all of these aspects. (5) VALUE - it’s all about capturing the insight and foresight via data analytics and visualization. There is always a long lead time before any analysis or ROI can be achieved. An unstructured database doesn't incur the same overhead, but requires a different set of analysis tools in order to pull business value out of the data. In the world of Big Data, describe in terms of VOLUME, VARIETY, VELOCITY of the data & VISUALIZATION for the business use of the data. Big Data can be helpful as opposed to just the marketing and business side of it.

Only “treatment” makes Big Data valuable: “Big data” refers to the large amounts of information that has become accessible. You shouldn’t let the word “data” confuse you. Big Data is useful only if its information content is evaluated for accuracy, relevance, and timeliness. What used to be called “knowledge based enterprises” are designed to transform unevaluated raw data into information whose accuracy and authenticity are verified knowledge. It is still a matter of manipulating information to make it usable. How that it is then used is a different issue. Raw diamonds get their use when becoming treated, processed into industrial diamonds. Only treatment makes them valuable, able to serve a purpose. Human’s role in Big Data:
-People are the mine: This raw material, these raw data embody our experiences, behavior, customs, include our wishes, dreams. They are valuable from the beginning. We deliver these data preprocessed. Without us, these data wouldn’t exist.
-People are the audience as well: The processing of raw data to useful data is aimed at science, research with an anticipated outcome to improve human lives, to make customers buy new products, services, which are even not developed yet.
-People are both, subjects and objects. But first, people are the owners of their individual data.

Another perspective on Big Data & Advanced Analytics is the technology evolution: The classical DW and BI tool sets are very good at reporting on events that occurred in the past. Data are loaded on some frequency, but it's always historical data. For business executives, Reports, Dashboards, Analysis, etc. coming out of a DW has always been like, "driving fast and the view of the road is through the rear view mirror". The platform is deterministic and not very conducive to predicting future events. The emergent NoSQL databases is to solve the data processing problems caused by Big Data. They are distributed. This is known as data distribution. Some RDBMS also have it, but when talking in Big Data (volume, variety, velocity), the relational databases are not very adequate, not only in terms of read speeds (because of its related nature and all the joins), but also in terms of scalability and fault-tolerance. Once you've got your data into such a system, you then need to do something with it. The more data you have about certain activity, the better you're able to model it. Drawbacks include that more data can end up in a "can't see the wood for the trees" problem. It's difficult to draw correct insights from data due to too many variables being involved generating noisy data.

Big Data and Advanced Analytics give executives a view into the future. Some insight into what's likely to occur, and back it up with metrics so the platform is probabilistic. Where business executives struggle, is framing the right questions in order to make the probabilities as accurate as possible.


Post a Comment