Though Big Data is hot, the overall success rate of
analytics project is low, therefore, some business leaders are holding back in
investing data analytics due to lack of trust. Incidents of inaccurate
analysis due to bad data or even down to incorrect model used for analysis have
caused companies millions of dollars. So what are the effective way to
initiate and mange Big Data projects? Is there a framework that governs trust
for data analytics?
1. Understand the Objectives of Analysis and Tech Trend
It is critical to
understand the objective of the analysis, with in-depth understanding of
business fundamentals
-- bottom line performance and profitability, economies of scale, innovating for the future, etc. Start with what
you can actually do and stay away from trying to solve all the world's problems
in one go-round. Start with an immediate business need and work your way from
there. The immediate need should have a large ROI to start and NOT revolve around a lot of data cleanup.
Capture technology
trend: If ever putting together a framework, begin with fairly recent,
forward-looking technological trends in data storage and computing that can
generate realizable cost-savings and efficiencies today. This would give C-level
executives the big picture and a vision of where their world is headed. But
well collect and communicate the project requirement, otherwise you have SCOPE
CREEP on your hands and the project will never get off the ground.
A framework of
standards may accomplish two things:
1)
Deep
understanding of the variant relationships and patterns of the data. This
will help identify what transformations are needed and how the missing values
should be treated.
2)
Identify
data integrity issues. Note that when these are found, they need to be
discussed with the IT folks, as there may be solid reasons for the data being
as it is. If these issues are immediately flagged to the executives, the
databases will quickly lose credibility and nobody wins.
2. Data Quality is Key Factor in Project Success
Any data mining/analytics project can be roughly divided
into two parts: cleaning the data, and analyzing the data. For many data
analytics projects, half the battle is in the preparation of the data; not just dealing with missing values
but coding and normalizing as well.
The causes of Data
Quality Issues: Data Quality can heavily depend upon the system that
captures data into computer files. It also depends upon the quality of the
methods of duplicate detection (within or across files) and the methods of
filling in missing data or 'correcting' contradictory combinations of data. There
are many questions can be pondering such as: How do you determine whether 5+%
of your data have errors? If your data has 5+% errors, what analyses (if any)
can you reliably do on the data? ., etc.
Data Cleansing is
often overlooked. Although the data
cleaning part is by far the most time-consuming, it is often overlooked during
the planning stage. To obtain buy-in from senior management, it is important
that they are educated up front about the data preparation phase. Sufficient
time and resources must be budgeted to allow the data to be properly prepared
in advance of any analysis. Without that, management will have unrealistic
expectations about the timing and ROI of results. Disappointment will be
inevitable, and future data mining projects will be jeopardized.
Set Project Priority:
While you are doing that project with better ROI and less data clean up, start
finding other projects to which do require data cleanup and do the data cleanup
while you are doing the first or second data projects. The key point is
resources. Your hard core IT types do not like to do data cleanup. Always make
sure the business users are on board and are willing to do their own data
cleanup. The success of data analytics takes collective and collaborative effort cross-functionally.
The criticality of
data quality is also based on business case: If there are systematic data
issues that can be explained, the analysis might still be valid. For example,
if you are interested in rank ordering predicted product performance (using the
predicted values from some model), then the actual predicted values are not
critical as long as the bias is spread across all products. However, if you are developing a model to determine which
customers can be targeted for a specific promotion, knowing that if the
response rate is above X% you make money, otherwise you lose money, and then
the data issues could be terminal.
Indeed, there are quite a few roadblocks in managing data and analyzing data, still, many forward-looking organizations are making continuous progresses, accumulating sufficient experience in order to transform their businesses into data-based intelligent powerhouse.
Indeed, there are quite a few roadblocks in managing data and analyzing data, still, many forward-looking organizations are making continuous progresses, accumulating sufficient experience in order to transform their businesses into data-based intelligent powerhouse.
1 comments:
The tutorials which you are providing will really helpful to the beginners and professionals who are trying to find Big data training institute in Bangalore. Really a valuable content. Thank you sharing.
Post a Comment