Applying Theoretical Advances in Privacy to Computational Social Science Practice

The goal of this project is to improve replicability and reproducibility in social science by developing easy-to-use tools for researchers to share confidential research data in a privacy-protective manner, supported by rigorous computational, institutional, and legal foundations. We are motivated by the opportunities in social science created by massive new sources of data and developments in data analysis and sharing, and by the threat that privacy concerns pose to realizing the full potential of social science research. By leveraging ongoing multidisciplinary collaborations and theoretical advances in computation, statistics, law, and social science, the project aims to improve the replicability and reproducibility of data in empirical social science. Our goal is to develop and extend integrated privacy-preserving tools for enabling access to, use, and disclosure of social science data. The proposed project builds on a successful, ongoing multidisciplinary collaboration supported by an NSF Frontier grant: Privacy Tools for Sharing Research Data.

Objectives for this project include (1) analyzing the institutional and stakeholder incentives for managing research data privacy and the policy consequences of implementing new computational and legal privacy tools and concepts; (2) designing a blueprint for securing large-scale confidential archival data in the Dataverse repository; (3) exploring applications of our new computational and legal privacy tools to massive data and selected use cases, including online education data, human subjects research data, and economic data protected by NDAs; and (4) expanding research collaborations to engage with other differential privacy and privacy law experts, ongoing data privacy and dissemination efforts at MIT and Harvard, and several related Sloan Foundation projects. 

This project is supported by a grant from the Sloan Foundation. For more information, please see the original proposed project description:

cover sheet
project proposal
appendices

Principal Investigators

Salil Vadhan

Salil Vadhan

Vicky Joseph Professor of Computer Science and Applied Mathematics, SEAS, Harvard
Area Chair for Computer Science

Salil Vadhan is the lead PI of Privacy Tools for Sharing Research Data project and the Vicky Joseph Professor of Computer Science and Applied Mathematics...

Read more about Salil Vadhan
Urs Gasser

Urs Gasser

Executive Director, Berkman Center for Internet & Society
Professor of Practice, Harvard Law School
Current Member of Datatags Team
Micah Altman

Micah Altman

Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, MIT
Non-Resident Senior Fellow, The Brookings Institution
Current Member of Datatags Team
Gary King

Gary King

Albert J. Weatherhead III University Professor, Harvard
Director, IQSS, Harvard

Publications

Victor Balcer and Salil Vadhan. 1/2018. “Differential Privacy on Finite Computers.” in 9th Innovations in Theoretical Computer Science Conference (ITCS 2018); also presented at Theory and Practice of Differential Privacy Conference (TPDP 2017). arXiv PageAbstract

We consider the problem of designing and analyzing differentially private algorithms that can be implemented on discrete models of computation in strict polynomial time, motivated by known attacks on floating point implementations of real-arithmetic differentially private algorithms (Mironov, CCS 2012) and the potential for timing attacks on expected polynomialtime algorithms. We use a case study the basic problem of approximating the histogram of a categorical dataset over a possibly large data universe X . The classic Laplace Mechanism (Dwork, McSherry, Nissim, Smith, TCC 2006 and J. Privacy & Confidentiality 2017) does not satisfy our requirements, as it is based on real arithmetic, and natural discrete analogues, such as the Geometric Mechanism (Ghosh, Roughgarden, Sundarajan, STOC 2009 and SICOMP 2012), take time at least linear in |X |, which can be exponential in the bit length of the input.

In this paper, we provide strict polynomial-time discrete algorithms for approximate histograms whose simultaneous accuracy (the maximum error over all bins) matches that of the Laplace Mechanism up to constant factors, while retaining the same (pure) differential privacy guarantee. One of our algorithms produces a sparse histogram as output. Its “per-bin accuracy” (the error on individual bins) is worse than that of the Laplace Mechanism by a factor of log |X |, but we prove a lower bound showing that this is necessary for any algorithm that produces a sparse histogram. A second algorithm avoids this lower bound, and matches the per-bin accuracy of the Laplace Mechanism, by producing a compact and efficiently computable representation of a dense histogram; it is based on an (n + 1)-wise independent implementation of an appropriately clamped version of the Discrete Geometric Mechanism.

Vishesh Karwa and Salil Vadhan. 2018. “Finite Sample Differentially Private Confidence Intervals.” 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). arXiv PageAbstract
We study the problem of estimating finite sample confidence intervals of the mean of a normal population under the constraint of differential privacy. We consider both the known and unknown variance cases and construct differentially private algorithms to estimate confidence intervals. Crucially, our algorithms guarantee a finite sample coverage, as opposed to an asymptotic coverage. Unlike most previous differentially private algorithms, we do not require the domain of the samples to be bounded. We also prove lower bounds on the expected size of any differentially private confidence set showing that our the parameters are optimal up to polylogarithmic factors.
  •  
  • 1 of 4
  • »
More