Today, for many organizations, there is a lot of unofficial data that is forming a large proportion of the analysis being used by the business. Data lineage tools will help identify some of this but often the data is being tapped out and manipulated offline. In order to manage data life cycle more effectively, the whole issue of governance is there to be put in place in the context of the business process that the data is supporting, but how to enforce data governance via technical processes, methods, or knowledge, or some call the “analytics on analytics”?
Analytics goals setting: First, confirm the goals, aims, and required results with proper time schedule from the senior leadership team of the organization. Also confirm about the entire procedures logistics & budgets & economical provisions, if required for better achievements & results in the ongoing process of the Big Data project. Keep the end in mind, because Big Data is only the means to the end, the end is about how to achieve business value, improve customer satisfaction or harness employee engagement, etc.
Data categorization: How many types of Data are you gathering / generating / creating / capturing as the inputs for your organization? Whether the present inputs are sufficient for the organizational goals / aims? Whether any duplications found in the present Inputs? Can it be erected & shorten the inputs? What & where should you add new points / information in previous inputs formats to get the proper results? All input data / information should be first categorized in small section / units / parts along with key fields / signals / signs as far as possible in the rows & columns basis, and the purpose of such data categorization is to help business manage its data life cycle in the systematic way.
Planning, staffing, undertaking and moderating are where data governance being applied. Keep the exact time schedule to everyone including inputs, analytical procedures, outputs and the co-operation required from the owners of the organization. As for planning, staffing, undertaking and moderating an enterprise data initiative, this is where your data governance must be applied. Thus you can make sure you have architectures, standards, stewardship, compliance and all the other stuff covered plus providing a capable operating model for when there is a dispute/discrepancy/failure that must be brought back into line – over and above the compliance process built into the design of the business process supported by the data.
Analytics on analytics: One way you are looking to solve data integrity issues is the business data lake using big data technology. If you provide a single pool of all data and make it available to everyone and encourage local usage and provide an environment to do so - then you know the data source and can start to effectively do an analysis of your analysis. The business data lake concept then goes on to build more conformed layers - where you need them (customer service, sales..) and these too can be opened up to broader access. Now you have a "quality / integrity" measure on the source that people are using and therefore the reliability of the analysis. It is really calls for taking a different approach to manage and connect data lakes in a more standard and uniform way, and will be the basis for "Analytics on Analytics." Success in migration to Big Data via the lake approach is where the 'analytics on analytics' approach is aimed - not just as analytics 'per se' but as metrics designed to reveal insights into the use as they emerge with proliferation of use cases.
The volume, velocity, variety of Big Data makes the data governance one of the challenging works in data management, however, in order to abstract Big Data value, the governance mechanism and analytics on analytics approach are all necessary for managing data life cycle systematically and unleashing Big Data potential smoothly.