Saturday, November 14, 2015

How to Apply Data Algorithm for Business Problems-Solving

An algorithm is a procedure or formula for solving a problem. 

An algorithm is a model of the real world. A good algorithm needs to be developed through integrating knowledge-based data into analytic models simulation testing, implemented for problem-solving. However, you need to keep in mind that underlying these algorithms are models, models with their own assumptions, strengths, and weaknesses. In addition, these algorithms require data and understanding the idiosyncrasies of these data are critical to model performance.

The model depended on the nature/structure of the data of the questions needs to be answered. The very first question that has to be asked before any modeling gets done, is what produced the data that you have? If you cannot answer that question, your effort has to be directed towards discovering what produced the data.The data gets produced by causal effects, data acquisition processes, and errors in observation. There is almost always a causal effect, there may be systematic effects due to the way the data was observed, and there will always be noise in the observations. After that is done, you can develop a math model for the causal effects using the data, so you can predict practically. The values of the parameters of that math model will depend on the data with the observation errors and data acquisition effects removed. In the intermediate step, you have to have models for what produce the observation errors, and for what produce the data acquisition effects, otherwise you have little ability to remove them.

The data model can take different forms: And then the data science models should preferably use data/information/experience/known scientific laws to formulate them, and the model should have a purpose, be based on the questions you want to be answered. But beyond that, models can take many different forms. Every single prediction depends on some kind of model. It may be a regression model, it may be a Gaussian process model for machine learning, it may be a Support Vector Machine model, it may be an Artificial Neutral Network (ANN) model, it may be a CFD (computational fluid dynamics) model, it may be a boosting model, it may be a random forest model, but every single prediction in the past or the future depends on some kind of model. Going a step further, boosting creates a model, but it is a different kind of model, with branches and leaves and splitting criteria. Still it's a model. You don’t have to write an equation to make a model. A model can be algorithmic. A maths model can be an equation or algorithm or anything which can be implemented in software. Or, it may be a big table with markers to indicate the position of squadrons. Still there is a model, still there is a lot of data analysis going on, still there need some experts. The problem is that without some domain knowledge, and knowledge of the data acquisition process, these effects (and the noise) are not separable, and whatever fits is done with whatever modeling process (a priori terms, algorithmically selected terms, or boosting,), the resultant math model terms (the output of all these) will absorb all of these effects.

There are two different kinds of data analytics models. There is an a priori model, the understanding of how the world behaves, however, approximate or loose it may be. At one extreme it may be a very complex mathematical modeling model. At the other extreme, it may be informal with some very vague beliefs, such as the age of people is often positive, and often below 1000 years. Hopefully, you do not build into that model any more assumptions than are necessary. Also, these beliefs are made explicit. And then there is the model arising out of the statistical analysis, where the analysis builds a statistical model, It may be a regression, boosting, whatever kind of model, and it may, or may not, choose which variables are influential. It is a generic abstract model. The tools to derive it are often generic and abstract, they have no knowledge of the semantics of the data, what the data means. The second type of model sits on the top of the first model (or maybe below, depends on which way you are standing).The final result is model A plus model B.Whether you call the first model a physical model, a causal model, business expertise, or plain common sense, is less important than recognizing that whenever you do any kind of data analysis you almost have these two models.

It’s important to do data investigation or any attempt at understanding the business context. The point is that humans should all have some humility and recognize the limitations of their expertise and partner them with the other experts to apply the analytical algorithms for problem-solving. Then there is the opportunity to make something great.


Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More