A Framework for Defining the Scope of Data Governance Strategy Projects (Part 1)
This article is for managers who need to create an effective and sustainable data governance strategy and are wondering where to start.
This is Part 1 of this series. Part 2 is here.
As measured by the frequency with which Google searches are conducted for the term “data governance,” interest has been increasing over the past 5 years, as shown by this Google Trends search performed on October 6, 2017:
Figure 1. 5-Year Growth in “Data Governance” Searches
I’ve also been pleased this past year to see data governance discussed in the tech press not only from technical but business management perspectives as well. I have also been lucky to discuss data governance with several experts (see Acknowledgements below) who have helped me tie together what I’ve learned in my own consulting and project management work.
It’s not just technology
As with most topics related to IT, the need for business-aligned management and planning efforts related to data governance should be not come as a surprise. Data resources, like technology, are only valuable and useful if they are planned and managed effectively. Continued interest in data governance is evidence of this.
Still, I’m not completely comfortable with the term “data governance.” I don't think it expresses the depth of what is needed when an organization decides to improve how it manages and makes use of its data and realizes it needs to do a better job of managing it.
Doing so may not only require better or different technologies. Policies, people, and processes associated with data management and analysis may also need attention. This is especially true for those interested not just in short term results but who also are concerned about sustainability and long-term improvements.
Data governance can be approached initially as a mainly technical exercise that focuses on software and the systems and tools associated with security, privacy, data quality, and data lifecycle management. Placing primary responsibility for this in the IT department makes good sense.
You quickly realize, though, that the “tendrils” of data governance extend throughout the organization. A serious focus on data security, for example, will quickly reveal that the really important issues impacting data security haveto do with human behavior, regardless of the technologies that are adopted. Effective planning for improved security, data provenance, and quality control may have to extend across departmental and functional boundaries into how data are managed and used both internally and externally to the organization. Things get complicated very quickly.
Experienced "systems integration" consultants will be familiar with this “the knee bone is connected to the thigh bone” phenomenon and the need to trace, account for, and prioritize all the system and process changes associated with enterprise-level system transformation efforts. Detailed data modeling can be one components of this.
Also, once you move beyond attending to financial transactions and standardized accounting and financial recordkeeping practices, many important data definitions – and the potential for miscommunication – begin to surface. Hence, the need for a well-defined project scope. You need to understand and control your project's focus.
Data program management and metadata
I’ve tried to recognize a more comprehensive approach to data governance by introducing the term data program management. I have also learned that improved data governance requires better metadata management -- managing data about data -- so the organization understands who is responsible for managing and manipulating data as they are generated, managed, manipulated, and used.
One priority is to both define and manage the scope of your data governance improvement effort while avoiding a bound-to-fail “boil the ocean” approach that attempts too much.
Analysis focused roadmap
What follows is an outline of how I would approach this when helping someone to plan out an initial “strategic data governance planning” effort. One deliverable of this effort is a prioritized set of specific actions that supports both short and long term goals based on improving how data are used analytically.
While we want to “start small” so we can deliver useful results as quickly as possible, we want to do so in a way that, as more problems or issues are addressed, we build on a common framework that can scale up over time. I’ve tried to represent such a framework in Figure 2’s three layers:
APPLICATION AREA describes the specific problem area or issue to be addressed. Ideally this will be a defined problem that is really important to the organization, not just a “nice to have” or “low hanging fruit” type of application. We want to engage with key stakeholders early on. Focusing on an important problem or issue will help us to get and keep their attention and support.
I'm focusing here on problem areas that can be addressed by making better use of data through improvements in how data are gathered, organized, managed, and analyzed. Perhaps due to my own background as a "number cruncher," an important focus is on data analysis and helping people to do a better job of planning and decision-making. This does not mean that important issues like security, privacy, and data quality should be ignored. It does mean that data literacy is an important feature of any data governance effort. One of the best ways to promote data literacy is to focus on problem areas that people think are important.
METADATA need to be gathered and organized. These are data describing human-human, human-machine, & machine-machine communication that are associated with the Application Area. Metadata categories include:
- Data: Raw or processed data of any type (financial, numeric, image, text, etc.) relevant to the Application Area. These should be identified regardless of where they originate.
- Systems: Hardware, software, & telecom systems that are used to collect, organize, & process data (regardless of whether they are inside or outside the organization).
- Processes: Manual and automated tasks associated with managing & using data – remember, this is not just about software!
- Users: Individuals & groups that analyze, consume, or apply data supplied by systems & processes.
- Costs: Direct & indirect cost of resources required to provide data to users.
- Benefits: Measures of the value derived from using or analyzing the data.
Figure 2. Data Governance Project Scope Framework
It’s about communication
The focus of the above outline is on answering the question, “What kind of information will we need in order to begin planning an effective data governance program that will help us deliver benefits quickly while providing a foundation that can be scaled larger as we progress?”
Note the reference in the metadata discussion to “human-human, human-machine, & machine-machine communication.” We’re not just talking about formal data dictionaries for defined database contents. Nor are we limiting ourselves to process flow diagrams that emphasize key decision points in repetitive or defined workflows. Instead, our focus is on the exchange of data among people and systems – “communication” -- which, in any complex organization, gets messy and complex very quickly the more people and organizations are involved.
Start at the beginning
Hence, the need at the outset of defining an important application area in order to control the scope of the design and implementation process. The application area chosen for initial attention should be an important one, perhaps something like the following:
- “This is the third month we’ve had to backorder components for our most popular product. Can’t we do a better job of predicting demand?”
- “We’re spending too much time manually crunching numbers for our monthly Board report. Can’t we do a better job of standardizing the sales reports we get from out subsidiaries?”
- “The CEO thinks that Global Warming is going to impact our seasonal demand correction factors for our products that are temperature sensitive. Can we get in front of this before her next data request?”
- “I think we’re paying out too much on fraudulent claims for new electronic medical devices. How can we get a better handle on this?”
- “Our neighborhood crime and fire reports are delayed because the systems we’re drawing data from use different geographic coordinate systems. How can we get these data onto the county’s web site faster?"
The author wishes to thank Jim Lola, Will Moore, and Mark Bruscke for helping me think about these topics. Of course, any mistakes or misunderstandings are my own!