Monday, September 30, 2013

The Big Pitfalls of Big Data

There’s no doubt Big Data creating big value for businesses, however, reality could be much different than predictions. This is actually happening nowadays, in this technologically advancing and innovative world even the predictions made on much comprehensive data are getting wrong. So what could make such projects turn bad, what’s the limitation of data analytics, and what are the pitfalls of managing Big Data?

  • Do not understand business problem clearly, not thinking hard enough about the RIGHT data, perhaps a subset of 'big data.' Specifically, the data that help answer the right and correctly posed question..  Not framing the problem. Big Data turns to be the big answer to look for the big question, without a thorough understanding of a domain simply because reams of data are available. 
  • Use of wrong statistical techniques / wrong interpretations: Managerial pressure to deliver 'insights', directly proportional to investment in tools. Not understanding the limitations of applicability of a tool or a 'solution' or disregarding 'fitness for purpose' of tools used. Believing one can number crunch one's way to spectacular results, not to mention being able to interpret them appropriately
  • Ignoring ongoing structural change, and insufficient understanding of data quality, or erroneous data conditioning, which means existing data may have low predictive value, Predictive analysis assumes that history predicts future. This is a very strong assumption which might not be true all the times. Data sanity, data quality and data filtering can be issues. 
  • Letting one's biases come in the way of analysis/decision making. Going through a detailed list, it’s easy to come across instances where such biases impact analysis. Ego/lack of humility, unhealthy in all research endeavors

  • A data model is only for Known-Known and Known-Unknown sets. Unknown- Unknown sets cannot be modeled.  So predictive analytics can only tell part of story, which could be misleading, see the trees, but missing the forest., etc.
  • The limitation of Big Data talent or resources. We always believe that by using past data we can predict future, but that might not always be the case when the factors governing the results changes itself and many times modeler ignores many factors due to unavailability of data, resource, or many other reason.
Is history always repeating itself, in deed, the successful predictive analytics intends to stop the worse cases in the history from happening again, but first of all, Big Data shall be careful of those big pitfalls above.


Post a Comment