Applying Theoretical Advances in Privacy to Computational Social Science Practice

The goal of this project is to improve replicability and reproducibility in social science by developing easy-to-use tools for researchers to share confidential research data in a privacy-protective manner, supported by rigorous computational, institutional, and legal foundations. We are motivated by the opportunities in social science created by massive new sources of data and developments in data analysis and sharing, and by the threat that privacy concerns pose to realizing the full potential of social science research. By leveraging ongoing multidisciplinary collaborations and theoretical advances in computation, statistics, law, and social science, the project aims to improve the replicability and reproducibility of data in empirical social science. Our goal is to develop and extend integrated privacy-preserving tools for enabling access to, use, and disclosure of social science data. The proposed project builds on a successful, ongoing multidisciplinary collaboration supported by an NSF Frontier grant: Privacy Tools for Sharing Research Data.

Objectives for this project include (1) analyzing the institutional and stakeholder incentives for managing research data privacy and the policy consequences of implementing new computational and legal privacy tools and concepts; (2) designing a blueprint for securing large-scale confidential archival data in the Dataverse repository; (3) exploring applications of our new computational and legal privacy tools to massive data and selected use cases, including online education data, human subjects research data, and economic data protected by NDAs; and (4) expanding research collaborations to engage with other differential privacy and privacy law experts, ongoing data privacy and dissemination efforts at MIT and Harvard, and several related Sloan Foundation projects. 

This project is supported by a grant from the Sloan Foundation. For more information, please see the original proposed project description:

cover sheet
project proposal
appendices

Principal Investigators

Salil Vadhan

Salil Vadhan

Principal Investigator
Vicky Joseph Professor of Computer Science and Applied Mathematics, SEAS, Harvard
Urs Gasser

Urs Gasser

Executive Director, Berkman Center for Internet & Society
Professor of Practice, Harvard Law School
Current Member of Datatags Team
Micah Altman

Micah Altman

Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, MIT
Non-Resident Senior Fellow, The Brookings Institution
Current Member of Datatags Team
Gary King

Gary King

Co-PI
Albert J. Weatherhead III University Professor, Harvard
Director, IQSS, Harvard

Publications

Victor Balcer and Salil Vadhan. 9/2019. “Differential Privacy on Finite Computers.” Journal of Privacy and Confidentiality, 9, 2. JPC PageAbstract

Version History: 

Also presented at TPDP 2017; preliminary version posted as arXiv:1709.05396 [cs.DS].

2018: Published in Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94 of Leibniz International Proceedings in Informatics (LIPIcs), pp 43:1-43:21. http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=8353

We consider the problem of designing and analyzing differentially private algorithms that can be implemented on discrete models of computation in strict polynomial time, motivated by known attacks on floating point implementations of real-arithmetic differentially private algorithms (Mironov, CCS 2012) and the potential for timing attacks on expected polynomial-time algorithms. As a case study, we examine the basic problem of approximating the histogram of a categorical dataset over a possibly large data universe \(X\). The classic Laplace Mechanism (Dwork, McSherry, Nissim, Smith, TCC 2006 and J. Privacy & Condentiality 2017) does not satisfy our requirements, as it is based on real arithmetic, and natural discrete analogues, such as the Geometric Mechanism (Ghosh, Roughgarden, Sundarajan, STOC 2009 and SICOMP 2012), take time at least linear in \(|X|\), which can be exponential in the bit length of the input.

In this paper, we provide strict polynomial-time discrete algorithms for approximate histograms whose simultaneous accuracy (the maximum error over all bins) matches that of the Laplace Mechanism up to constant factors, while retaining the same (pure) differential privacy guarantee.  One of our algorithms produces a sparse histogram as output. Its "per-bin accuracy" (the error on individual bins) is worse than that of the Laplace Mechanism by a factor of \(\log |X|\), but we prove a lower bound showing that this is necessary for any algorithm that produces a sparse histogram.  A second algorithm avoids this lower bound, and matches the per-bin accuracy of the Laplace Mechanism, by producing a compact and eciently computable representation of a dense histogram; it is based on an \((n + 1)\) - wise independent implementation of an appropriately clamped version of the Discrete Geometric Mechanism.

 

  •  
  • 1 of 4
  • »
More