Abstract: We describe a role for differential privacy in open data repositories handling sensitive data. Archival repositories in the human sciences balance discoverability and replicability with their legal liabilities and ethical constraints to protect sensitive information. The ability to explore differentially private releases of archived data allows a curve-bending change in this trade-off. We further describe PSI, an implementation of a curator system for differentially private queries and statistical models, and its integration with the Dataverse repository. We describe some of the pragmatics of implementing a general purpose curator that works across a wide variety of types of data and types of uses, and of presenting differential privacy to an applied audience new to these concepts.
Bio: James Honaker is a Research Associate at CRCS. Previously he has been a Senior Research Scientist at IQSS, and faculty at Penn State and UCLA. He leads development of the PSI: Private data Sharing Interface. His research focuses on statistical software solutions for broad problems in quantitative social science. He is an author of several widely used statistical software packages for quantitative social science, including Amelia (for missing data), Zelig (for statistical inference and interpretation), and TwoRavens (for data exploration of repositories). He won the 2014 Award for Best Statistical Research Software of the Society for Political Methodology.