Thursday, April 24, 2014

The Brutal Truth about Data Analysis

As an emergent digital deployment, there are both optimistic perspective and brutal truth about data analysis!

Although data analysis is at top priority of business agenda in every forward-look organization, the ROI and success rate of such projects are still very low, more specifically, what is the brutal truth about data analysis, and how to overcome such challenges?

The ability to quickly and iteratively ask good questions is often lacking in the analytic expertise. The right approach should be to pose the problem by considering the business question to be answered, the available data and domain specific information, without any bias from known or available algorithms. In many cases, known algorithms can be used, but that need to be decided after the problem is properly posed. If there are no known algorithm for the problem under consideration, then one need to develop new methods. Of course, developing new methods for data science requires good understanding of probability theory, linear algebra, multivariate calculus and optimization.

Do not let the data analysis to fool you as lots of things are so easy to get wrong. It is the consequences of getting it wrong that matter, not the fact that it is easy to do so. And the highly qualified analysis practitioners perhaps make big mistakes as well. What they often lack, but can learn, given the time and interest, is the context and dynamics of the business or organization they are working with. This is why so many analysts did not achieve maximum impact with their work. It's only when you bring sophistication to the right problems that the business executives suddenly listen and get excited with your work. It emphasizes the need for knowledge training about the data systems and company problems and its interfacing, to help you analyze when, what and why you get wrong and how to avoid it. 

You will never be able to automate true knowledge discovery. The creative human interpretation process is needed to tell a most likely story about the analysis results. What you can automate is the input into such creative discovery process that results from feature selection, so that the predicted probabilities are stable and replicable at the individual level and not arbitrary. When you start with non-arbitrary inputs, you are much more likely to make real discoveries instead of the discovery of artifacts. It is the error reduction mechanism in the modeling that gives such stable feature selection and replicable predicted probabilities at the individual level, without the need to do any ensemble averaging (which obscures the interpretation of models).

Data visualization needs to be optimized. The game changer that is radically altering those who spend the money is visual analysis and data visualization. Why? Regular business managers and executives are suddenly empowered to see what is happening and are demanding clear explanations to their business needs. Additionally, those people who work for the executives, with deep business knowledge, but minimal-moderate analytic knowledge, can learn new tools and techniques that radically improve their insight into changing business dynamics and increasing their perceived ability to ask what-if questions for future planning as well. 

Data analysis pathfinder. There are two parallel paths: On the one hand, relatively complex problems in the past will gradually migrate to the broader and less demanding field. So people with a less extensive preparation will be able to extract more value from the analysis than past generations. A growing number of tools attempt to provide increasingly indicators on quality results. For example, automation tools will help explore the data as quickly as possible, and visualized tools make the better business insight and potentially reveal a way forward. For a growing number of regular applications, it will be enough and any inaccuracies will not produce significant negative impacts in various application domains. On the other hand, a new set of problems of increasing digital complexity will require the deep knowledge, but not qualifying for automatic solutions. 

The concept and clarity of use of "judgment" in a scientific context. There are varying concept need to be clarified in analysis practice. For example: What is "judgment"? Is it an individual's personal choice about what rules should apply? Or an individual's choice about how close something is meeting certain standard? Or is it an application of some method to balance the influence of contradictory diagnostics? How can anything be replicated if one "scientist" uses his or her "judgment"? How does the replicator take into account the application of judgment by the original researcher? If judgment is used in application of scientific method, does it mean that the results can be replicated because the same judgment is applied, or that the results can be replicated because they are independent of the judgment applied? If it is the latter, "judgment" would be irrelevant. 

Lack of cross-disciplinary analysis talent: Data analysis, statistics, data mining and forecasting are very hard to master. It's even hard to master just one of these! In addition, neuroscience and cognitive neuroscience is not data science, but data scientists who wish to develop more reliable artificial intelligence applications probably will need to have an understanding of these more basic fields in much the same way that pharmaceutical scientists need to have a understanding of basic chemistry and biochemistry. Data analytics is underpinned by: (1) statistical (2) mathematical (3) modeling methodologies along with (4) reality of data centricity (5) software and code as tools to these disciplines. Unless people understand and appreciate the education they received in these disciplines,analysis will continue to be a big challenge.

As an emerging digital deployment, there are both optimistic perspective and brutal truth about data analysis, with more mature data analysis tools and technologies available, high professional analysis talent get trained, organizations can reap the fruits from their analysis experiment and efforts.


Post a Comment