New Government Web Site Makes “Restricted Microdata” More Accessible To Researchers
ResearchDataGov
The article Accessing U.S. data for research just got easier in the December 8 issue of AAAS Science describes a step forward for researchers needing access to sometimes difficult-to-access government generated socioeconomic research data. From the article:
The U.S. government has just made it easier for social scientists to get their hands on federally collected data they need for research.
Starting today, an online portal offers one-stop shopping to apply for access to protected data sets maintained by 16 federal agencies. Scientists can search the site to find the data set they want and then, with one click, file an application. The site also allows them to track the status of their request.
“This will be transformational,” says economist Margaret Levenstein of the University of Michigan, Ann Arbor, where she manages the country’s largest curated social science data archive. “I work with thousands of researchers, and I know what they have to go through to find what they need and then get access to it.”
To understand this significance of the ResearchDataGov online portal it’s important to understand (a) what the word “access” means and (b) to whom such access is important.
Knowing data may exist is the first step in obtaining that data. By developing ResearchDataGov the US government helps make researchers aware of the specialized data files available for analysis from 16 separate agencies*. Data files identified through ResearchData.gov may have a specialized focus or content that makes them inappropriate for general public access via existing public data sources such as Data.gov which was established when the “open data” movement became popular a decade ago. (Disclosure: for a time “open data” was a major focus for my own consulting and writing; for example, see Understanding How Open Data Reaches the Public from 2013.)
ResearchDataGov acts as a searchable front end catalog to over 1,000 specialized data files via a standardized set of metadata fields described here. Examples of these standardized metadata fields are data content, source, and importantly, access permission requirements.
The ResearchDataGov system may not itself provide the physical data files; the researcher must still interact with data’s “owners” to download and physically interact with the data. From ResearchDataGov’s “about” page:
The data described in ResearchDataGov.org are owned by and accessed through the agencies and units of the federal statistical system. Data access is determined by the owning or distributing agency and is limited to specific physical or virtual data enclaves. Even though all data assets are listed in a single inventory, they are not necessarily available for use in the same location(s). Multiple data assets accessed in the same location may not be able to be used together due to disclosure risk and other requirements. Please note the access modality of the data in which you are interested and seek guidance from the owning agency about whether assets can be linked or otherwise used together.
Lest the reader at this point say “Ha! The system does not really provide access to specialized data! The researcher still has to jump through hoops with data owners to get the data!”
I suppose that’s a true but it's important not to underestimate the value of a standardized catalog or inventory of what’s potentially available. It’s also appropriate to take into account the real world situation for each of the data files covered by ResearchDataGov given the many different topics** addressed by the data files. Specialized researchers who are already members of the research community surrounding each file may already (a) know about the existence of such data files and (b) may already know the “ins and outs” of what it takes to get physical access to the files for their own analytical purposes. Tools such as ResearchDataGov can be especially valuable to researchers who are not already members of each file’s existing “in group” of knowledgeable researchers.
Operation of ResearchDataGov does not remove responsibility each data file owner has to protect the contents of that file while making sure its use complies with legitimate requirements. Perhaps special consent restrictions exist that are associated with protecting the privacy of the individual research subjects who provided the data. Perhaps there are special content related security or intellectual property conditions associated with the data. Maybe there are special secondary or tertiary data files needed to enable accurate interpretation of the data in the main source file.
Are there many “second tier” users? That’s hard to say. Modern analytical and “big data” tools do make it possible to sift through very large and complex data sets in ways that the data files’ creators may not have anticipated. Opening up access to a broader group of researchers might therefor expand the potential for learning something new. As a public benefit that should interest all thoughtful taxpayers. (Although, in some jurisdictions there may exist regulations that limit what future uses can be made of human-sourced data that exceed the original subject’s “consent” agreement; see Implications of Research Data "Informed Re-Consent" Requirements for a discussion of this issue in the context of China’s “reconsent” requirements.)
Which gets us to the question of costs and benefits.
It takes resources to maintain such a system like the one described here. Nor can you predict how data’s usage or value might evolve, especially if you expand the potential user pool beyond those initially responsible for generating and analyzing the data. We have learned over the years that libraries, archives, data repositories -- and data inventories – provide the means for increasing the potential for generating social value down the road. ResearchDataGov is an example of this that not only makes initial access easier for knowledgeable users but may also increase the pool of potential users who might not otherwise find out about what’s available.
Copyright 2022 by Dennis, D. McDonald
*Agencies
Agencies currently (December 2022) making data available via ResearchDataGov include:
Bureau of Economic Analysis
Bureau of Justice Statistics
Bureau of Labor Statistics
Bureau of Transportation Statistics
Census Bureau
Energy Information Administration
FRB Microeconomic Surveys Unit
IRS Statistics of Income Division
National Center for Education Statistics
National Center for Health Statistics
National Center for Science and Engineering Statistics
SAMHSA Center for Behavioral Health Statistics and Quality
SSA Office of Research, Evaluation, and Statistics
USDA Economic Research Service
USDA National Agricultural Statistics Service
USDA National Animal Health Monitoring System
**Topics
Topics currently addressed (December 2022) by ResearchDataGov data files include:
Agriculture and Food
Arts, Culture, and Religion
Assistance Programs
Behavioral Risk Factors
Business and Economy
Community and Economic Development
Crime
Demographics
Disability
Diseases and Conditions
Early Childhood
Education
Employment Characteristics
Energy
Environment
Families and Living Arrangements
Geography
Government
Health
Health Care and Insurance
Housing and Homeownership
Immigration, Migration, and Commuting Patterns
Income, Pay, and Benefits
Industry and Occupation
Inflation and Prices
Injuries and Illnesses
International
Multinational Enterprises
Poverty
Research and Development
Retirement
Rural Economy and Population
Spending and Time Use
STEM
Taxation
Transportation
Unemployment
Vital Statistics
Voting and Registration