www.ddmcd.com

View Original

New Government Web Site Makes “Restricted Microdata” More Accessible To Researchers

By Dennis D. McDonald

ResearchDataGov

The article Accessing U.S. data for research just got easier in the December 8 issue of AAAS Science describes a step forward for researchers needing access to sometimes difficult-to-access government generated socioeconomic research data. From the article:

The U.S. government has just made it easier for social scientists to get their hands on federally collected data they need for research.

Starting today, an online portal offers one-stop shopping to apply for access to protected data sets maintained by 16 federal agencies. Scientists can search the site to find the data set they want and then, with one click, file an application. The site also allows them to track the status of their request.

“This will be transformational,” says economist Margaret Levenstein of the University of Michigan, Ann Arbor, where she manages the country’s largest curated social science data archive. “I work with thousands of researchers, and I know what they have to go through to find what they need and then get access to it.”

To understand this significance of the ResearchDataGov online portal it’s important to understand (a) what the word “access” means and (b) to whom such access is important.

Knowing data may exist is the first step in obtaining that data. By developing ResearchDataGov the US government helps make researchers aware of the specialized data files available for analysis from 16 separate agencies*. Data files identified through ResearchData.gov may have a specialized focus or content that makes them inappropriate for general public access via existing public data sources such as Data.gov which was established when the “open data” movement became popular a decade ago. (Disclosure: for a time “open data” was a major focus for my own consulting and writing; for example, see Understanding How Open Data Reaches the Public from 2013.)

ResearchDataGov acts as a searchable front end catalog to over 1,000 specialized data files via a standardized set of metadata fields described here. Examples of these standardized metadata fields are data content, source, and importantly, access permission requirements.

The ResearchDataGov system may not itself provide the physical data files; the researcher must still interact with data’s “owners” to download and physically interact with the data. From ResearchDataGov’s “about” page:

The data described in ResearchDataGov.org are owned by and accessed through the agencies and units of the federal statistical system. Data access is determined by the owning or distributing agency and is limited to specific physical or virtual data enclaves. Even though all data assets are listed in a single inventory, they are not necessarily available for use in the same location(s). Multiple data assets accessed in the same location may not be able to be used together due to disclosure risk and other requirements. Please note the access modality of the data in which you are interested and seek guidance from the owning agency about whether assets can be linked or otherwise used together.

Lest the reader at this point say “Ha! The system does not really provide access to specialized data! The researcher still has to jump through hoops with data owners to get the data!”

I suppose that’s a true but it's important not to underestimate the value of a standardized catalog or inventory of what’s potentially available. It’s also appropriate to take into account the real world situation for each of the data files covered by ResearchDataGov given the many different topics** addressed by the data files. Specialized researchers who are already members of the research community surrounding each file may already (a) know about the existence of such data files and (b) may already know the “ins and outs” of what it takes to get physical access to the files for their own analytical purposes. Tools such as ResearchDataGov can be especially valuable to researchers who are not already members of each file’s existing “in group” of knowledgeable researchers.

Operation of ResearchDataGov does not remove responsibility each data file owner has to protect the contents of that file while making sure its use complies with legitimate requirements. Perhaps special consent restrictions exist that are associated with protecting the privacy of the individual research subjects who provided the data. Perhaps there are special content related security or intellectual property conditions associated with the data. Maybe there are special secondary or tertiary data files needed to enable accurate interpretation of the data in the main source file.

Are there many “second tier” users? That’s hard to say. Modern analytical and “big data” tools do make it possible to sift through very large and complex data sets in ways that the data files’ creators may not have anticipated. Opening up access to a broader group of researchers might therefor expand the potential for learning something new. As a public benefit that should interest all thoughtful taxpayers. (Although, in some jurisdictions there may exist regulations that limit what future uses can be made of human-sourced data that exceed the original subject’s “consent” agreement; see Implications of Research Data "Informed Re-Consent" Requirements for a discussion of this issue in the context of China’s “reconsent” requirements.)

Which gets us to the question of costs and benefits.

It takes resources to maintain such a system like the one described here. Nor can you predict how data’s usage or value might evolve, especially if you expand the potential user pool beyond those initially responsible for generating and analyzing the data. We have learned over the years that libraries, archives, data repositories -- and data inventories – provide the means for increasing the potential for generating social value down the road. ResearchDataGov is an example of this that not only makes initial access easier for knowledgeable users but may also increase the pool of potential users who might not otherwise find out about what’s available.

Copyright 2022 by Dennis, D. McDonald

*Agencies

Agencies currently (December 2022) making data available via ResearchDataGov include:

  1. Bureau of Economic Analysis

  2. Bureau of Justice Statistics

  3. Bureau of Labor Statistics

  4. Bureau of Transportation Statistics

  5. Census Bureau

  6. Energy Information Administration 

  7. FRB Microeconomic Surveys Unit

  8. IRS Statistics of Income Division

  9. National Center for Education Statistics

  10. National Center for Health Statistics

  11. National Center for Science and Engineering Statistics

  12. SAMHSA Center for Behavioral Health Statistics and Quality

  13. SSA Office of Research, Evaluation, and Statistics

  14. USDA Economic Research Service

  15. USDA National Agricultural Statistics Service 

  16. USDA National Animal Health Monitoring System

**Topics

Topics currently addressed (December 2022) by ResearchDataGov data files include:

  • Agriculture and Food

  • Arts, Culture, and Religion

  • Assistance Programs

  • Behavioral Risk Factors

  • Business and Economy

  • Community and Economic Development

  • Crime

  • Demographics

  • Disability

  • Diseases and Conditions

  • Early Childhood

  • Education

  • Employment Characteristics

  • Energy

  • Environment

  • Families and Living Arrangements

  • Geography

  • Government

  • Health

  • Health Care and Insurance

  • Housing and Homeownership

  • Immigration, Migration, and Commuting Patterns

  • Income, Pay, and Benefits

  • Industry and Occupation

  • Inflation and Prices

  • Injuries and Illnesses

  • International

  • Multinational Enterprises

  • Poverty

  • Research and Development

  • Retirement

  • Rural Economy and Population

  • Spending and Time Use

  • STEM

  • Taxation

  • Transportation

  • Unemployment

  • Vital Statistics

  • Voting and Registration

More on DATA ACCESS

See this gallery in the original post