As an emergent digital deployment, there are both optimistic perspective and brutal truth about data analysis!
Although data
analysis is at top priority of business agenda in every forward-look
organization, the ROI and success rate of such projects are still very low, more
specifically, what is the brutal truth about data analysis, and how to overcome such challenges?
The ability to quickly and iteratively ask good
questions is often lacking in the analytic expertise. The right approach should be to pose the
problem by considering the business question to be answered, the available data
and domain specific information, without any bias from known or available
algorithms. In many cases, known algorithms can be used, but that need to be
decided after the problem is properly posed. If there are no known algorithm
for the problem under consideration, then one need to develop new methods. Of
course, developing new methods for data science requires good understanding of
probability theory, linear algebra, multivariate calculus and optimization.
Do not let the data analysis to fool you as lots
of things are so easy to get wrong. It is the consequences of getting it wrong that matter, not
the fact that it is easy to do so. And the highly qualified analysis
practitioners perhaps make big mistakes as well. What they often lack, but can learn,
given the time and interest, is the context and dynamics of the business or
organization they are working with. This is why so many analysts did not
achieve maximum impact with their work. It's only when you bring sophistication
to the right problems that the business executives suddenly listen and get excited
with your work. It emphasizes the need for knowledge training about the data
systems and company problems and its interfacing, to help you analyze when, what and why you get wrong and
how to avoid it.
You will never be able to automate true
knowledge discovery. The
creative human interpretation process is needed to tell a most likely story
about the analysis results. What you can automate is the input into such
creative discovery process that results from feature selection, so that the
predicted probabilities are stable and replicable at the individual level and
not arbitrary. When you start with non-arbitrary inputs, you are much more
likely to make real discoveries instead of the discovery of artifacts. It is
the error reduction mechanism in the modeling that gives such stable feature
selection and replicable predicted probabilities at the individual level,
without the need to do any ensemble averaging (which obscures the
interpretation of models).
Data visualization needs to be optimized. The game changer that is radically altering
those who spend the money is visual analysis and data visualization. Why?
Regular business managers and executives are suddenly empowered to see what is
happening and are demanding clear explanations to their business needs.
Additionally, those people who work for the executives, with deep business
knowledge, but minimal-moderate analytic knowledge, can learn new tools and
techniques that radically improve their insight into changing business dynamics
and increasing their perceived ability to ask what-if questions for future
planning as well.
Data analysis pathfinder. There are two parallel paths: On the one hand,
relatively complex problems in the past will gradually migrate to the broader
and less demanding field. So people with a less extensive preparation will be
able to extract more value from the analysis than past generations. A growing
number of tools attempt to provide increasingly indicators on quality results. For
example, automation tools will help explore the data as quickly as possible, and visualized tools make the better business insight and
potentially reveal a way forward. For a growing number of regular applications,
it will be enough and any inaccuracies will not produce significant negative
impacts in various application domains. On the other hand, a new set of
problems of increasing digital complexity will require the deep knowledge, but not qualifying for automatic solutions.
The concept and clarity of use of "judgment"
in a scientific context. There are varying concept need to be clarified in analysis practice. For example: What is "judgment"? Is it an
individual's personal choice about what rules should apply? Or an individual's
choice about how close something is meeting certain standard? Or is it an
application of some method to balance the influence of contradictory
diagnostics? How can anything be
replicated if one "scientist" uses his or her "judgment"? How does the replicator take into
account the application of judgment by the original researcher? If judgment is used in application of
scientific method, does it mean that the results can be replicated because the
same judgment is applied, or that the results can be replicated because they
are independent of the judgment applied? If it is the latter, "judgment"
would be irrelevant.
Lack of cross-disciplinary analysis talent: Data analysis, statistics, data mining and
forecasting are very hard to master. It's even hard to master just one of
these! In addition, neuroscience and cognitive neuroscience is not data
science, but data scientists who wish to develop more reliable artificial
intelligence applications probably will need to have an understanding of these
more basic fields in much the same way that pharmaceutical scientists need to
have a understanding of basic chemistry and biochemistry. Data analytics is
underpinned by: (1) statistical (2)
mathematical (3) modeling methodologies along with (4) reality of data
centricity (5) software and code as tools to these disciplines. Unless
people understand and appreciate the education they received in these
disciplines,analysis will continue to be a big
challenge.
As an emerging
digital deployment, there are both optimistic perspective and brutal truth
about data analysis, with more mature data analysis tools and technologies available,
high professional analysis talent get trained, organizations can reap the fruits from
their analysis experiment and efforts.
0 comments:
Post a Comment