The goal of this project is to improve replicability and reproducibility in social science by developing easy-to-use tools for researchers to share confidential research data in a privacy-protective manner, supported by rigorous computational, institutional, and legal foundations. We are motivated by the opportunities in social science created by massive new sources of data and developments in data analysis and sharing, and by the threat that privacy concerns pose to realizing the full potential of social science research. By leveraging ongoing multidisciplinary collaborations and theoretical advances in computation, statistics, law, and social science, the project aims to improve the replicability and reproducibility of data in empirical social science. Our goal is to develop and extend integrated privacy-preserving tools for enabling access to, use, and disclosure of social science data. The proposed project builds on a successful, ongoing multidisciplinary collaboration supported by an NSF Frontier grant: Privacy Tools for Sharing Research Data.

Objectives for this project include (1) analyzing the institutional and stakeholder incentives for managing research data privacy and the policy consequences of implementing new computational and legal privacy tools and concepts; (2) designing a blueprint for securing large-scale confidential archival data in the Dataverse repository; (3) exploring applications of our new computational and legal privacy tools to massive data and selected use cases, including online education data, human subjects research data, and economic data protected by NDAs; and (4) expanding research collaborations to engage with other differential privacy and privacy law experts, ongoing data privacy and dissemination efforts at MIT and Harvard, and several related Sloan Foundation projects.

This project is supported by a grant from the Sloan Foundation. For more information, please see the original proposed project description:

Principal Investigators

Salil Vadhan

Principal Investigator
Vicky Joseph Professor of Computer Science and Applied Mathematics, SEAS, Harvard

Home Page

Urs Gasser

Executive Director, Berkman Center for Internet & Society
Professor of Practice, Harvard Law School
Current Member of Datatags Team

Home Page

Micah Altman

Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, MIT
Non-Resident Senior Fellow, The Brookings Institution
Current Member of Datatags Team

Home Page

Gary King

Co-PI
Albert J. Weatherhead III University Professor, Harvard
Director, IQSS, Harvard

Home Page

Mercè Crosas

Co-PI
Director of Data Science, IQSS, Harvard

Home Page

Publications

Jack Murtagh, Kathryn Taylor, George Kellaris, and Salil Vadhan. 2018. “Usable Differential Privacy: A Case Study with PSI”. arXiv Page

PDF

Micah Altman, Alexandra Wood, David O'Brien, and Urs Gasser. 3/2018. “Practical Approaches to Big Data Privacy Over Time.” International Data Privacy Law, 8, 1, Pp. 29-51. Publisher's Version

Micah Altman, Alexandra Wood, and Effy Vayena. 2018. “A Harm-Reduction Framework for Algorithmic Accountability over Personal Information.” IEEE Security & Privacy , 16, 3, Pp. 34-45. Publisher's Version

harm-reduction_framework.pdf

Mark Bun, Jelani Nelson, and Uri Stemmer. 6/2018. “Heavy Hitters and the Structure of Local Privacy.” In ACM SIGMOD/PODS Conference International Conference on Management of Data (PODS 2018).

PDF

Victor Balcer and Salil Vadhan. 9/2019. “Differential Privacy on Finite Computers.” Journal of Privacy and Confidentiality, 9, 2. JPC Page Abstract

JPC 2019

ITCS 2018

ArXiv

Version History:

Also presented at TPDP 2017; preliminary version posted as arXiv:1709.05396 [cs.DS].

2018: Published in Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94 of Leibniz International Proceedings in Informatics (LIPIcs), pp 43:1-43:21. http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=8353

We consider the problem of designing and analyzing differentially private algorithms that can be implemented on discrete models of computation in strict polynomial time, motivated by known attacks on floating point implementations of real-arithmetic differentially private algorithms (Mironov, CCS 2012) and the potential for timing attacks on expected polynomial-time algorithms. As a case study, we examine the basic problem of approximating the histogram of a categorical dataset over a possibly large data universe \(X\). The classic Laplace Mechanism (Dwork, McSherry, Nissim, Smith, TCC 2006 and J. Privacy & Condentiality 2017) does not satisfy our requirements, as it is based on real arithmetic, and natural discrete analogues, such as the Geometric Mechanism (Ghosh, Roughgarden, Sundarajan, STOC 2009 and SICOMP 2012), take time at least linear in \(|X|\), which can be exponential in the bit length of the input.

In this paper, we provide strict polynomial-time discrete algorithms for approximate histograms whose simultaneous accuracy (the maximum error over all bins) matches that of the Laplace Mechanism up to constant factors, while retaining the same (pure) differential privacy guarantee. One of our algorithms produces a sparse histogram as output. Its "per-bin accuracy" (the error on individual bins) is worse than that of the Laplace Mechanism by a factor of \(\log |X|\), but we prove a lower bound showing that this is necessary for any algorithm that produces a sparse histogram. A second algorithm avoids this lower bound, and matches the per-bin accuracy of the Laplace Mechanism, by producing a compact and eciently computable representation of a dense histogram; it is based on an \((n + 1)\) - wise independent implementation of an appropriately clamped version of the Discrete Geometric Mechanism.

Merce Crosas. 2017. “The DataTags System: Sharing Sensitive Data with Confidence.” Research Data Alliance (RDA) 8th Plenary on Privacy Implications of Research Data Sets, during International Data Week 2016.

PDF

Harvard University Privacy Tools Project

Applying Theoretical Advances in Privacy to Computational Social Science Practice

Principal Investigators

Salil Vadhan

Urs Gasser

Micah Altman

Gary King

Mercè Crosas

Publications

h_seas_logo_rgb.png