Privacy Tools for Sharing Research Data
Privacy Tools for Sharing Research Data
Information technology, advances in statistical computing, and the deluge of data available through the Internet are transforming computational social science. However, a major challenge is maintaining the privacy of human subjects. This project is a broad, multidisciplinary effort to help enable the collection, analysis, and sharing of sensitive data while providing privacy for individual subjects. Bringing together computer science, social science, statistics, and law, the investigators seek to refine and develop definitions and measures of privacy and data utility, and design an array of technological, legal, and policy tools for dealing with sensitive data. In addition to contributing to research infrastructure around the world, the ideas developed in this project will benefit society more broadly as it grapples with data privacy issues in many other domains, including public health and electronic commerce.
This project will define and measure privacy in both mathematical and legal terms, and explore alternate definitions of privacy that may be more general or more practical. The project will study variants of differential privacy and develop new theoretical results for use in contexts where it is currently inappropriate or impractical. The research will provide a better understanding of the practical performance and usability of a variety of algorithms for analyzing and sharing privacy-sensitive data. The project will develop secure implementations of these algorithms and legal instruments, which will be made publicly available and used to enable wider access to privacy-sensitive data sets at the Harvard Institute for Quantitative Social Science's Dataverse Network.
For more information, see the original proposed project description to NSF (2012) in PDF format below.
Urs Gasser
James Honaker
Gary King
Phil Malone
Kobbi Nissim
David O’Brien
2020
We study the problem of verifying differential privacy for loop-free programs with probabilistic choice. Programs in this class can be seen as randomized Boolean circuits, which we will use as a formal model to answer two different questions: first...
2019
- description Publisher's Version
- picture_as_pdf JPC 2019
- picture_as_pdf ITCS 2018
- picture_as_pdf ArXiv
Version History:
Also presented at TPDP 2017; preliminary version posted as arXiv:1709.05396 [cs.DS].
2018: Published in Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94 of Leibniz International...
2018
This document is a primer on differential privacy, which is a formal mathematical framework for guaranteeing privacy protection when analyzing or releasing statistical data. Recently emerging from the theoretical computer science literature, differential...
In the study of differential privacy, composition theorems (starting with the original paper of Dwork, McSherry, Nissim, and Smith (TCC'06)) bound the degradation of privacy when composing several differentially private algorithms. Kairouz, Oh, and...
2017
We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest...