Monday, March 23, 2015

Big Data or Wide Data: Which is more Critical to Bring the Result

Everything about the shape of data one needs is very much borne on the problem domain.

For most of organizations, Big Data is still a big puzzle, although there’s abundance of data, how to collect them, store them, clean them up and make effective analysis? Should data be Big, Wide or Deep to get the accurate or near accurate results?



Not about data size, but about the breadth, connection and quality of data: Rather than thinking about the physical size of data, consider the importance of being able to relate different sets of data. More often, the complexity isn't found in the need to store more data; it is in how best to link, arbitrate and cleanse all of the data required to support regulatory needs. So, it becomes less about how big the data is and more about its breadth and reach across the organization.


Neither big nor wide data will be of much use if they are collected with built-in biases:  Most biostatisticians know about biases due to confounding, attrition, treatment non-compliance and cross-arm contamination, and other similar issues in study design. Most survey statisticians know about biases due to lack of coverage, non-response or mode effects, and other similar issues in study design. Do Big Data folks understand issues like that? Lack of a design in "organic" data collection is also a design -- it is called "convenience sample." Where data, big data and wide data make impact is where they bring value in their application and the intelligence and action they drive. But in so many cases, the poor decisions are often made because of poor statistical literacy.


Wide data “requires” Big Data: In general, the wider the data, the greater the number of variables, the more observations you need in order to get trustworthy models that you can be confident are not just fitting noise. So at a certain point, wide data requires big data. Hence, instead of saying you need "deep" data or "wide" data, why not explain the part each type plays and let people make problem driven decisions?Everything about the shape of data one needs is very much borne on the problem domain. If the problem domain is not well understood, then how are data expected to magically change that? A mere case of something blind leading something else which is blind. The point is knowing how to soundly extract information from the *right* data, to answer the *right* questions, with the *right* perspective in approaching them, and understanding what expanding the depth and breadth of data scope should truly look like).

People are the data master to handle Big Data, Wide Data or Deep Data effectively: Nonetheless, we often are faced with making decisions via either incomplete or imperfect data, and it is the role of leadership to leverage every viewpoint and experience to direct the outcome in these circumstances and ensure that associated risks are understood. It is also their role to identify when consequences are not aligned with expectations and to either re-think course adjust as required, to ensure knowledge based big data provide goal, mission performance tracking, cover the necessary accurate data for analytic modeling.


Usually Data does not come before questions unless you use data to ask new questions: It’s a bit of a riot when companies tend to take a backwards "data first - questions later" philosophy to understanding themselves or implementing some particular action based on data. First, you begin with questions, refining them into more scientific or technical terms, whose outputs and meaning are translatable back to the business. Data does not come before the questions, unless you are using data to ask new questions not previously thought which is the subject of data mining. Rarely have companies exhausted their list of questions regarding a basic level of understanding.


So the fundamental point is still about what is needed for successful analytics application:
1) Good quality data, either big, wide or deep
2) Disparate sources that will reinforce good quality and fine tune the findings
3) Good analytics tools (investigative, descriptive, predictive and prescriptive)
4) Competent analytics experts who will properly interpret data and recognize useful patterns
5) Good project and leadership


Big vs wide for accurate analytics results gets muddied depending on the technical and statistical prowess of the one working on the data. Though the issue is that accurate results can itself be a fuzzy topic which introduces additional components into the discussion of big vs wide. Back to basic, it’s all about data quality, talent expertise, insightful questions, efficient tools and effective leadership, to manage a successful analytics project and deliver value for the customers or users.

0 comments:

Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More