(Click "Read more" below for a July 14, 2021 update and link to Part II)
Articles and resources provided May 20, 2021 by Beth Jarosz and SDA Board
- The Census Bureau's resources: (link)
- You can find easy-to access data for your own analyses at NHGIS (link). We encourage you to conduct analyses in May 2021, as the deadline for feedback on redistricting data products is May 28, 2021.
- The Census2020Now page offers a variety of resources and perspectives: (link)
- CNSTAT held a workshop on the initial demonstration data. You can find CNSTAT workshop materials (presentations and recordings) (link) and report (link).
(Notes: These resources will remain at top of this thread for context. We changed the news series title to "disclosure avoidance" rather than "differential privacy" since DP is just one step in the DAS framework.)
(Updated July 14, 2021)
Click here for a link to Part II of Articles on Disclosure Avoidance
(Updated June 27, 2021)
Working paper contributed by Steven Ruggles and David Van Riper
- Click here for a link to the working paper "The Role of Chance in the Census Bureau Database Reconstruction Experiment"
(Updated June 22, 2021)
Articles contributed by Alexis R. Santos1. How differential privacy will impact our understanding of health disparities in the United States (https://www.pnas.org/content/117/24/13405)
Using the 2010 decennial counts produced with proposed differential privacy and traditional techniques, we evaluate how the implementation of differential privacy can affect understandings of mortality rates by obscuring accurate denominators. We find that the implementation of differential privacy will produce dramatic changes in population counts for racial/ethnic minorities in small areas and less urban settings, significantly altering knowledge about health disparities in mortality.
1b. Census differential privacy products — implications for health disparities research (https://medium.com/@alexisrsantos/census-differential-privacy-products-implications-for-health-disparities-research-5d56159a7165)I replicate the analysis published in Santos, Howard and Verdery (2020) using a second specification of the demonstration product. The revised demonstration products continue to perform well for the total population and non-Hispanic whites. The results continue to show substantial variation in artificial changes in population counts and mortality rate estimates for non-Hispanic Blacks and Hispanics. Since the main conclusions of the article published in PNAS remain with the revised demonstration product I think we need more time to evaluate the implementation of DP and the implications of its implementation in Census 2020 tabulation and subsequent census products.
2. Differential Privacy in the 2020 Census Will Distort COVID-19 Rates (https://journals.sagepub.com/doi/full/10.1177/2378023121994014)
Using empirical COVID-19 mortality curves, the authors show that differential privacy will introduce substantial distortion in COVID-19 mortality rates, sometimes causing mortality rates to exceed 100 percent, hindering our ability to understand the pandemic. This distortion is particularly large for population groupings with fewer than 1,000 persons: 40 percent of all county-level age-sex groupings and 60 percent of race groupings. The U.S. Census Bureau should consider a larger privacy budget, and data users should consider pooling data to minimize differential privacy’s distortion.
3. How differential privacy will affect our understanding of population growth in the United States (https://osf.io/preprints/socarxiv/pmux7/)
We test the potential impact of this change in disclosure avoidance systems to the tracking of population growth and distribution using county-level population counts. We ask how population counts produced under the differential privacy algorithm might lead to different conclusions regarding population growth for the total population and three major racial/ethnic groups in comparison to counts produced using the traditional methods. Our results suggest that the implementation of differential privacy, as proposed, will impact our understanding of population changes in the US. We find potential for overstating and understating growth and decline, with these effects being more pronounced for non-Hispanic blacks and Hispanics, as well as for non-metropolitan counties
4. Proposed U.S. Census Bureau Differential Privacy Method is Biased Against Rural and Non-white Populations (Tom Mueller and Alexis R. Santos) (https://osf.io/preprints/socarxiv/69mtk/)
We investigate how the proposed differential privacy method alters population counts for the total population and ethnoracial groups with emphasis in rural-urban continuum codes and regional divisions of the United States. We found, with both levels of epsilon (4.5 and 12.2), substantial variation for ethnoracial population counts by levels of rurality and regional subdivisions. For Hispanics, non-Hispanic blacks, and non-Hispanic American Indians we find that the proposed disclosure avoidance system, at both epsilon levels, results in highly variable estimates of population growth.
5. Changes in Census Data Will Affect Our Understanding of Infant Health (Alexis R. Santos)
This data visualization illustrates how the implementation of the proposed disclosure avoidance system, which relies on differential privacy, affects infant mortality rate estimation. Results indicate that infant mortality rates produced using the proposed DAS are different from those produced using the traditional methods, with higher variation observed for nonmetropolitan counties and areas with smaller populations. The results add on to my two previous studies (items 1 and 2) that illustrate how the implementation obscures our understanding of health disparities in the United States.
(Updated June 9, 2021)
Articles and resources provided by David A. Swanson
- Click here for a link to "The Fundamental Flaw in Synthetic Microdata: A Simple Example." The US Census Bureau is examining the possibility of creating “synthetic microdata sets” to be used by researchers in place of the “PUMS” (Public Use Microdata Sample) files now being made available. Along with others, I believe that this is a mistake. These synthetic data sets will be created from “real” data by modeling the relationships in the latter and then using the models to generate values in the former. There is a fundamental flaw in this approach: Those attempting to use a synthetic microdata set will have no idea of the relationships in the “real” data underlying the synthetic microdata and the form(s) of the model(s) applied to former to create the latter.
- Click here for a link to "Census Bureau Sets Key Parameters to Protect Privacy in 2020 Census Results" on the United States Census Bureau website.
- Click here for a link to "Census Releases Guidelines for Controversial Privacy Tool" by Mike Schneider, Associated Press.
(Updated June 7, 2021)
Article provided by Teresa A. Sullivan and Qian Cai
- Click here for a link to the article "Differential Privacy and the Upcoming Process of Redistricting" published in Sabato's Crystal Ball
(Updated June 4, 2021)
Article and brief description contributed by Margo Anderson
- Click here for a Research Gate link to "Redistricting Criteria and Testing the Adequacy of the the 2020 Census Disclosure Avoidance System." This is a response to the Census Bureau's April 28, 2021 Disclosure Avoidance System Demonstration Products in light of other responses submitted to the U.S. Census Bureau. It focuses on the adequacy of the the 2020 Census Disclosure Avoidance System to meet the standard legal requirements for redistricting and voting rights enforcement.
(Updated June 3, 2021)
Information provided by David Van Riper
- IPUMS created state-level summaries of differences between census block counts in the 2010 Decennial Census and the recently released 2021-04-28 demonstration data. Click here to access.
(Updated June 2, 2021)
Paper written by Christopher T. Kenny, Shiro Kuriwaki, Cory McCartan, Evan Rosenman, Tyler Simko, and Kosuke Imai
- Click here for a blog-style post on their paper "Impact of the Census Disclosure Avoidance System on Redistricting"
- Click here for a direct link to the same paper
(Updated May 30, 2021)
Analysis files and feedback letters provided by Mike Mohrman
- Click here for the direct link to their feedback letter.
- Click here for all of their analysis files, and previous feedback letters. They are under the ‘Impacts of privacy protections’ section of their ‘2020 census data quality and accuracy’ page.
(Updated May 27, 2021)
Article written by Mike Schneider and provided by David A. Swanson
- Click here for the link to a story in The Washington Post on the U.S. Census Bureau's plan to use "synthetic data" for ACS PUMS entitled "Census Bureau's use of 'synthetic data' worries researchers"
- Click here for the article via AP News if you encounter a paywall with the previous link.
(Updated May 25, 2021)
The following is contributed by Michael Cline
Report contributed by William P. O'Hare
- Click here for an overview comparing differential privacy to the original Summary File.
- Click here for the description of the report "Analysis of Census Bureau's April 2021 Differential Privacy Demonstration Product: Implications for Data on Children" posted on the Count All Kids website. The link to the full report is on the website at the end of the description.
(Updated May 19, 2021)
Articles provided by David A. Swanson
- There’s a move afoot to change the way the Census Bureau presents its findings; they say it will protect privacy. Intrepid writer and statistician, David Swanson, explains the problems with that. See his article, “The Census: Protecting Privacy versus Creating Useless Data"
- Here is a link to a paper on the effect of Differential Privacy on census block population numbers in Alaska. It is by David A. Swanson, Tom Bryan, and Rich Sewell and compares the change in accuracy in going from an earlier (higher level of privacy protection) to the latest "demonstration product" released by the Census Bureau on April 28th (lower level of privacy protection). It will be presented at the Symposium on Data Science and Statistics on June 4th.
- Here is another link to a study of the effects of Differential Privacy on census block populations in Mississippi. It is by David A. Swanson and Ron Cossman and like the Alaska paper (link here and within the first bullet point linked article), it takes into account the change in accuracy in going from a higher to lower level of privacy protection. It has been accepted for presentation at the annual meeting of the American Statistical Association this summer.