How Important Is ‘Total Cost of Standardization’ to the DATA Act?
Last week I attended a meeting in DC sponsored by the Data Transparency Coalition, PwC, and Intel. Representatives of Federal agencies likely to manage implementation of the DATA Act presented thoughts on implementing the Act’s requirements for standardizing and reporting on Federal financial data.
Successful implementation of the Act’s required standardization of Federal spending data will have several benefits including:
- People and systems that previously had difficulties communicating will be able to talk with each other.
- Data currently locked in difficult-to-access paper and .pdf documents will be made machine-readable, accessible, and reusable.
- Links and metadata tailored to specific organizations and industries will be standardized thereby improving contextual access and data filtering.
- The ability to track and reconcile appropriations and disbursements will improve.
All of this will not happen overnight. Based on the presentations by Treasury and Recovery Accountability and Transparency Board staff there is a high probability it will happen — eventually. Having been involved in various data conversion standardization efforts myself I am certainly supportive of the planned efforts. Still, there are caveats:
- Will the DATA Act provide sufficient resources to ensure that staff, technology, and industry collaboration are available and effectively planned and managed?
- Will appropriate governance mechanisms be in place to ensure that what the DATA Act requires is accompanied by appropriate authority and accountability mechanisms?
- If a choice has to be made between (1) making nonstandardized or even “dirty” data accessible and usable to users in the short-term and (2) waiting longer till data and metadata standards are developed, agreed upon, and implemented, how will such decisions be made?
Total Cost of Standardization
How the third question is addressed will depend partly on who has to bear the costs of “standardization” and how significant those costs are. For example, it might be possible to quickly publish some existing data files “as is” or with minimal editing without waiting for complex data and metadata standards to be agreed upon and implemented. Commercial cloud based services such as Socrata already exist to support this option.
If this option is followed data can be made available rapidly for both private sector and public sector use and reuse. Doing so will enhance “transparency” in the short term and APIs can be provided to enable user driven customization which further advances access and usability.
Meanwhile, work can still be ongoing in parallel to develop and implement more comprehensive standards and process improvements.
From a total cost of standardization perspective such a “dual track” approach might not be optimal. Putting short-term access procedures in place in parallel with more comprehensive approaches to standardizing data, systems, and processes could increase costs over time if at some future point the systems and process dependencies associated with the short-term option have to be modified and synchronized with a longer-term and more comprehensive solution. As the number of system interconnections and user dependencies increases, so too do potential resistance to change and total cost increase.
Short Term vs. Long Term
Ideally a governance mechanism is in place that is capable and empowered to analyze and act upon such trade-offs. Several factors would influence the ability to effectively manage possible conflicts between short-term and long-term solutions.
The first factor as already noted relates to whether an effective governance mechanism exists that is empowered to assess and act upon such trade-offs. Given that multiple Federal agencies will be impacted such collaboration will need to be managed and not just “encouraged.” We have already seen an example of what happens when centralized management authority is lacking in such complex situations.
The second is the nature of the project management environment within the Federal government today. Given today’s realities of politics, sequestration, continuing budget resolutions, procurement complexities, and senior staff retirements, themanagement environment will likely continue be a challenge to navigate.
The third consideration is changing technology. This impact may be difficult to predict. Given constant improvements in data analysis and visualization tools it might be appropriate to ask whether data standardization is as important as it once was given the potential ability of some systems and data sets to interoperate in real time without massive data standardization at the front end.
Addressing the third point regarding the delivery of benefits perhaps without complete standardization will depend on the types of data involved, the types of analysis required, and the risks of error. Cutting benefits checks and preparing statutorily required financial reports will be less tolerant of “dirty” or nonstandardized data than industry analyses where assessment of industry patterns and trends, not individual transaction, are the focus of analysis using evolving “big data” methods.
When you start to peel back the layers involved in data standardization and evaluate how people, processes, and industries are impacted, you can’t ignore costs and benefits. As is common with many systems the costs are often easier to quantify than the benefits.
Having been through a variety of data standardization and transformation exercises in several different industries and application areas I continue to be a firm believer in the value of data standardization.
I’m also aware that there’s more than one way to skin a cat, especially given how much progress has already been made via mechanisms such as Data.gov and Recovery.gov. I’m looking forward to seeing how those responsible for implementing the DATA Act take all of this into account!
Copyright (c) 2013 by Dennis D. McDonald. For more about Data Program Management go here. Contact Dennis by email at email@example.com or by phone at 703-402-7382. Check out his curated Managing Data collection on Google+.