Sunday, January 11, 2015

Analytics vs. Statistics

Data analytics is statistics at speed.
According to Wikipedia: “Statistics is the study of the collection, analysis,  interpretation, presentation, and organization of data. In applying statistics to, a scientific, industrial, or societal problem, it is necessary to begin with a population or process to be studied. Populations can be diverse topics such as "all persons living in a country" or "every atom composing a crystal". It deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments.”  “Analytics is the discovery and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance. Analytics often favors data visualization to communicate insight.”


Analytics and statistics are not substitutes but complements. Analytics is increasingly emerging as the convergence of statistics, information technology, problem-solving, and insights discovery. Analytics was defined as “the use of data, information technology, statistical analysis, quantitative methods, and mathematical or computer-based models to help managers gain improved insight about their business operations and make better, fact-based decisions.” Using this definition, analytics is understood to be broadly divided into three categories, descriptive – using data to understand past and present performance, predictive – analyze past performance to predict the future, prescriptive – focused primarily on optimization (identification of best alternatives to minimize or maximize desired objectives). Statistics plays a role in all the three categories as following: descriptive – statistical measures, probabilities, distributions, sampling, and estimation; predictive – regression, forecasting, simulation, and risk analysis; prescriptive – linear, non-linear, integer, optimization. However, analytics cannot be confined to the application of statistics because it goes beyond quantitative analysis.


If Analytics is the best tool, then statistics provides a framework for understanding sources of uncertainty. There will always be a space between the observations and what actually occurs in the world and so there will always be uncertainty no matter how precise of measurements. So as long as you are concerned with uncertainty and variation you will be concerned with incorporating that uncertainty into the predictions and the reasoning about your decisions. Statistics provides a framework for thinking about those kinds of problems, so it will have a seat at the table for the foreseeable future; though there is and will continue to be debate about what the best methods are for modeling a given scenario. Optimal decision making comes from understanding sources of uncertainty. The only completely representative and accurate model of the world is the world itself. Even data is only a quantitative representation of some aspects of the world. However, we will never have the universe of data to interpret. we are lucky to have much larger samples now. but we still have to understand how the quantitative data is a subset of reality, and that what happened yesterday will not be a perfect image of today or tomorrow. We do need to modify our tools and understanding to accommodate the larger datasets available today, but contextual understanding and the implications of our assumptions is still critical. Statistics has been doing that for more than two centuries. If one is going to declare the end of statistics, they better have a replacement that is not just a rebranding of statistics, because uncertainty is not going away.


Inferential statistics are still and always will remain important; in particular modeling paradigms such as (logistic) regression, etc. even on 'big data' sets - when build predictive/inferential models and/or understand the impact of certain effects on various response variables. There seems to be a misunderstanding because predictions and forecasts, which are fairly common terms these days, are quintessentially inferential methods. statistical literacy is needed more than ever to make sense of the results. There is a danger with the current analytics/big data trend in that many business people without the appropriate background may be increasingly required to make decisions based upon 'advanced' reports, predictions, models, etc., which, ideally, require a reasonable appreciation and understanding of statistics. As a scientist, you're trained from a very early stage to do this, as it's the basis of basic scientific method and all observation depends upon it. Analytics and big data might, potentially, bring with it the misconception that this kind of understanding is not required, but it always will be. You can use more advanced analytics and visualisation tools to render things more accessible, but the underlying logic (statistics) will never go away.


Data Science does not equal statistics. It is much broader than that: Mathematical Science + Computation + intuition for the real data/domain expertise. These three pieces are all critically important. The mathematical sciences of course includes statistics, but also includes much, much more. Algebraic topology, variational analysis, partial differential equations, geometric measure theory, probability and information theory are just some examples of areas of mathematical science that can be used for data science/data analysis. Even more broadly, the academic disciplines/mathematical sciences having strong intersection with data science include at least mathematics, statistics, computer science, electrical engineering, physics and economics.Computation is obviously important -sometimes the computational effort to get the data from the truly raw state to something that can be analyzed is most of the work! The last piece, of intuition for the data -- a feeling for the data -- is critical as well. This is something that is also not controversial to those that have a lot of experience with the analysis of data. Especially when the data comes from measurements of some physical system, whether it be time series or video streams or both, you must use prior knowledge / expert knowledge to make analysis possible. One caveat -- it is true that certain types of data analysis go after such low hanging fruit that some of these more intricate types of questions are not encountered. This is especially true of business data from companies that have loads of data that they have not even begun to exploit. So the domain expertise will be important.

Analytics = Applied Statistics + Domain Expertise + Logical thinking. The best definition of modern data analytics is statistics at speed. statistics in earlier years said computation is difficult and data is scarce, but both of which in today's world is different. Contemporary analytics tools have enabled applied statistics and inferences sharing easier. Information technology has provided capability to analyze entire populations instead of samples in some scenarios but certainly not all. Business Analytics is not end of statistics. There are dangers that may befall executives in situations where analytics/Big Data is construed to be an end in itself rather than a means to an end. This can be attributed to an influx of off-the-shelf solutions (some of which are highly customizable) that are being aggressively marketed. As a result of the competitive pressures, some of the vendors are increasingly creating the impression that such solutions will substitute decision making yet they should remain as decision support solutions. Some of the solutions can facilitate complex decision making but the human element cannot be eliminated. It only takes decision making to a whole new level.


Analytics is not the end of statistics...rather statistics is the beginning of analytics!! Indeed, we will need a deeper understanding of statistics to enhance the value of analytics.


2 comments:

I am considering a Master's degree in business analytics. I am worried that a statistics degree may be too narrow but also understand that a Master's should be a specialization. In your opinion and considering the current climate in business these days, what would your recommendation be?

Hi, thank you for reading my blog, from the IT industry survey, Big Data/Analytics is the hot trend with the strong demand of talent in the upcoming years, I think you are making a great choice to pursue a Master degree in this area. Happy holidays there.

Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More