These days, still, when you read about big data or if you attend conferences or webinars you’re much more likely to read about products and tools. You don’t hear as much about “back room” management issues you need to address to make sure all the members of the project team are sharing information and marching in the same direction.
While it may be inevitable that all government data collection efforts have to tighten their belts, hopefully the process of making tough prioritization decisions will be done in light of rational factors such as the value of the data to users, the cost of collecting it, the availability of alternatives, and the manner in which data management processes are governed.
There’s an understanding represented by this group that the data resources being stewarded by Commerce programs both reflect and are critical inputs to U.S. technical and industrial competitiveness. Hopefully this group will be able to facilitate an exchange of useful “lessons learned” and resources across the varied Commerce programs.
Such challenges are not unique to the Federal Government. All large organizations desiring to take a more strategic position in how data — the lifeblood of organization processes — are managed and released will have to address such governance issues.
A key feature of the Project Open Data effort being managed by OMB and OSTP is that so much of it is being conducted in the open using accessible resources such as shared documentation, a defined metadata schema, and use of GitHub for capturing comments and issues. Agencies that want to involve private sector vendors in their open date efforts should consider the use and management of such tools as a required part of program governance and oversight (as long as sufficient staff and resources are provided to manage such efforts, of course).
It’s good to see the Federal government and private sector working together to create value from data that might not be realized were its use restricted only to specifically funded and legislated programs.
What’s different about this newly announced NOAA program is not just the potential “big data” scope of the program but the way in which private sector cloud vendors are involved as intermediaries not only to the public but also to potential data vendors and resellers.
Realistically it’s also impossible to control what people say online about companies. Just type the name of any large company into the Google search engine followed by the word “sucks” and you’ll see what I mean.
By itself, just documenting a process is never enough. Also needed is a sufficiently-resourced governance structure that supports program management in how systems and processes are changed and in how they operate.
One thing that’s different with the NOAA data partnership program is not just its focus on making data available for access and exploitation via the cloud but the changing market environment in which it will be operating.
Still, knowing that data exist – which is what the inventories will tell us — is not the same as accessing and interpreting the data. Even assuming the public eventually gains access to the inventoried data, we’ll still need contextual information about the programs described by the data and measurement of the impacts these programs have.
Much of what the EPA staff talked about involved processes and activities that are necessarily associated not only with “open data” but with any data intensive business process. Data must be managed. Systems that share data need to be coordinated. Resources need to be allocated and shared. Such requirements are not unique to “open data” but are universally relevant.
While it is true that a platform such as GitHub is not really designed to be as user friendly as, say, Facebook, the fact is that the sharing of technical expertise among mid-level IT staff and data administrators in different governmental agencies has probably been at least as important to open data progress as the Administration’s top down support.