Computing Over Distributed Sensitive Data

Large amounts of data are being collected about individuals by a variety of organizations: government agencies, banks, hospitals, research institutions, privacy companies, etc. Many of these organizations collect similar data, or data about similar populations. Sharing this data between organizations could bring about many benefits in social, scientific, business, and security domains. For example, by sharing their data, hospitals and small clinics can obtain statistically significant results in cases where the individual datasets are otherwise too small. Unfortunately, much of the collected data is sensitive: it contains personal details about individuals or information that may damage an organization’s reputation and competitiveness. The sharing of data is hence often curbed for ethical, legal, or business reasons. 

This project develops a collection of tools that will enable the benefits of data sharing without requiring data owners to share their data. The techniques developed respect principles of data ownership and privacy requirement, and draw on recent scientific developments in privacy, cryptography, machine learning, computational statistics, program verification, and system security. The tools developed in this project will contribute to existing research and business infrastructure, and hence enable new ways to create value in information whose use would otherwise have been restricted. The project supports the development of new curricula material and trains a new generation of researchers and citizens with the multidisciplinary perspectives required to address the complex issues surrounding data privacy.

This project is funded by grant 1565387 from the National Science Foundation to Harvard University and SUNY at Buffalo.


Kobbi Nissim

Kobbi Nissim

Senior Research Fellow at Harvard University
Professor of Computer Science at Georgetown University
  • «
  • 2 of 2


A new line of work [6, 9, 15, 2] demonstrates how differential privacy [8] can be used as a mathematical tool for guaranteeing generalization in adaptive data analysis. Specifically, if a differentially private analysis is applied on a sample S of i.i.d. examples to select a lowsensitivity function f , then w.h.p. f (S) is close to its expectation, although f is being chosen based on the data. Very recently, Steinke and Ullman [16] observed that these generalization guarantees can be used for proving concentration bounds in the non-adaptive setting, where the low-sensitivity function is fixed beforehand. In particular, they obtain alternative proofs for classical concentration bounds for low-sensitivity functions, such as the Chernoff bound and McDiarmid’s Inequality. In this work, we set out to examine the situation for functions with high-sensitivity, for which differential privacy does not imply generalization guarantees under adaptive analysis. We show that differential privacy can be used to prove concentration bounds for such functions in the non-adaptive setting.

E. Cicek, G. Barthe, M. Gaboardi, D. Garg, and J. Hoffmann. 1/2017. “Relational Cost Analysis.” Symposium on the Principle of Programming Languages, ACM.
G. Barthe, G. P. Farina, M. Gaboardi, E. J. Gallego Arias, A. D. Gordon, J. Hsu, and P.-Y. Strub. 10/2016. “Differentially Private Bayesian Programming .” 23rd ACM Conference on Computer and Communications Security, CCS.
  • 1 of 2
  • »