Computing Over Distributed Sensitive Data

Large amounts of data are being collected about individuals by a variety of organizations: government agencies, banks, hospitals, research institutions, privacy companies, etc. Many of these organizations collect similar data, or data about similar populations. Sharing this data between organizations could bring about many benefits in social, scientific, business, and security domains. For example, by sharing their data, hospitals and small clinics can obtain statistically significant results in cases where the individual datasets are otherwise too small. Unfortunately, much of the collected data is sensitive: it contains personal details about individuals or information that may damage an organization’s reputation and competitiveness. The sharing of data is hence often curbed for ethical, legal, or business reasons. 

This project develops a collection of tools that will enable the benefits of data sharing without requiring data owners to share their data. The techniques developed respect principles of data ownership and privacy requirement, and draw on recent scientific developments in privacy, cryptography, machine learning, computational statistics, program verification, and system security. The tools developed in this project will contribute to existing research and business infrastructure, and hence enable new ways to create value in information whose use would otherwise have been restricted. The project supports the development of new curricula material and trains a new generation of researchers and citizens with the multidisciplinary perspectives required to address the complex issues surrounding data privacy.

This project is funded by grant 1565387 from the National Science Foundation to Harvard University and SUNY at Buffalo.


Kobbi Nissim

Kobbi Nissim

Senior Research Fellow at Harvard University
Professor of Computer Science at Georgetown University
  • «
  • 2 of 2


Owen Arden, Anitha Gollamudi, Ethan Cecchetti, Stephen Chong, and Andrew C. Myers. 2020. “A Calculus for Flow-Limited Authorization.” Journal of Computer Security.Abstract

Real-world applications routinely make authorization decisions based on dynamic computation. Reasoning about dynamically computed authority is challenging. Integrity of the system might be compromised if attackers can improperly influence the authorizing computation. Confidentiality can also be compromised by authorization, since authorization decisions are often based on sensitive data such as membership lists and passwords. Previous formal models for authorization do not fully address the security implications of permitting trust relationships to change, which limits their ability to reason about authority that derives from dynamic computation. Our goal is a way to construct dynamic authorization mechanisms that do not violate confidentiality or integrity.

We introduce the Flow-Limited Authorization Calculus (FLAC), which is both a simple, expressive model for reasoning about dynamic authorization and also an information flow control language for securely implementing various authorization mechanisms. FLAC combines the insights of two previous models: it extends the Dependency Core Calculus with features made possible by the Flow-Limited Authorization Model. FLAC provides strong end-to-end information security guarantees even for programs that incorporate and implement rich dynamic authorization mechanisms. These guarantees include noninterference and robust declassification, which prevent attackers from influencing information disclosures in unauthorized ways. We prove these security properties formally for all FLAC programs and explore the expressiveness of FLAC with several examples.

Amos Beimel, Shay Moran, Kobbi Nissim, and Uri Stemmer. 2/2019. “Private Center Points and Learning of Halfspaces.” In 32nd Annual Conference on Learning Theory, vol 99: Pp. 1–14. Publisher's VersionAbstract
We present a private learner for halfspaces over an arbitrary finite domain X⊂ℝd with sample complexity mathrmpoly(d,2log∗|X|). The building block for this learner is a differentially private algorithm for locating an approximate center point of m>poly(d,2log∗|X|) points -- a high dimensional generalization of the median function. Our construction establishes a relationship between these two problems that is reminiscent of the relation between the median and learning one-dimensional thresholds [Bun et al.\ FOCS '15]. This relationship suggests that the problem of privately locating a center point may have further applications in the design of differentially private algorithms.
We also provide a lower bound on the sample complexity for privately finding a point in the convex hull. For approximate differential privacy, we show a lower bound of m=Ω(d+log|X|), whereas for pure differential privacy m=Ω(d log|X|).
Benny Applebaum, Amos Beimel, Oriol Farr`as, Oded Nir, and Naty Peter. 5/2019. “Secret-Sharing Schemes for General and Uniform Access Structures.” In Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT 2019). Springer VersionAbstract
A secret-sharing scheme allows some authorized sets of parties to reconstruct a secret; the collection of authorized sets is called the access structure. For over 30 years, it was known that any (monotone) collection of authorized sets can be realized by a secret-sharing scheme whose shares are of size2n−o(n)and until recently no better scheme was known. In a recent breakthrough, Liu and Vaikuntanathan (STOC 2018) have reduced the share size to O(20.994n) Our first contribution is improving the exponent of secret sharing down to 0.892. For the special case of linear secret-sharing schemes, we get an exponent of 0.942 (compared to 0.999 of Liu and Vaikuntanathan).Motivated by the construction of Liu and Vaikuntanathan, we study secret-sharing schemes for uniform access structures. An access structure is k-uniform if all sets of size larger than k are authorized, all sets of size smaller than k are unauthorized, and each set of size k can be either authorized or unauthorized. The construction of Liu and Vaikuntanathan starts from protocols for conditional disclosure of secrets, constructs secret-sharing schemes for uniform access structures from them, and combines these schemes in order to obtain secret-sharing schemes for general access structures. Our second contribution in this paper is constructions of secret-sharing schemes for uniform access structures. We achieve the following results:A secret-sharing scheme for k-uniform access structures for large secrets in which the share size is O(k2) times the size of the secret.A linear secret-sharing scheme for k-uniform access structures for a binary secret in which the share size is~O(2h(k/n)n/2) (where h is the binary entropy function). By counting arguments, this construction is optimal (up to polynomial factors).A secret-sharing scheme for k-uniform access structures for a binary secret in which the share size is 2~O(√klogn). Our third contribution is a construction of ad-hoc PSM protocols, i.e., PSM protocols in which only a subset of the parties will compute a function on their inputs. This result is based on ideas we used in the construction of secret-sharing schemes for k-uniform access structures for a binary secret.
Amos Beimel, Aleksandra Korolova, Kobbi Nissim, Or Sheffet, and Uri Stemmer. 6/2020. “The Power of Synergy in Differential Privacy: Combining a Small Curator with Local Randomizers.” In Information-Theoretic Cryptography (To appear - ITC 2020). ArXiv VersionAbstract

Motivated by the desire to bridge the utility gap between local and trusted curator modelsof differential privacy for practical applications, we initiate the theoretical study of a hybridmodel introduced by “Blender” [Avent et al., USENIX Security ’17], in which differentially private protocols of n agents that work in the local-model are assisted by a differentially private curator that has access to the data of m additional users. We focus on the regime where mn and study the new capabilities of this (m;n)-hybrid model. We show that, despite the fact that the hybrid model adds no significant new capabilities for the basic task of simple hypothesistesting, there are many other tasks (under a wide range of parameters) that can be solved in the hybrid model yet cannot be solved either by the curator or by the local-users separately. Moreover, we exhibit additional tasks where at least one round of interaction between the curator and the local-users is necessary – namely, no hybrid model protocol without such interaction can solve these tasks. Taken together, our results show that the combination of the local model with a small curator can become part of a promising toolkit for designing and implementing differential privacy.

Victor Balcer and Albert Cheu. 4/2020. “Separating Local & Shuffled Differential Privacy via Histograms.” In Information-Theoretic Cryptography (To appear - ITC 2020). ArXiv VersionAbstract

Recent work in differential privacy has highlighted the shuffled model as a promising avenue to compute accurate statistics while keeping raw data in users’ hands. We present a protocol in this model that estimates histograms with error independent of the domain size. This impliesan arbitrarily large gap in sample complexity between the shuffled and local models. On theother hand, we show that the models are equivalent when we impose the constraints of pure differential privacy and single-message randomizers.

Borja Balle, James Bell, Adria Gascon, and Kobbi Nissim. 6/2/2019. “The Privacy Blanket of the Shuffle Model.” In International Cryptology Conference (CRYPTO 2019). Publisher's VersionAbstract
This work studies differential privacy in the context of the recently proposed shuffle model. Unlike in the local model, where the server collecting privatized data from users can track back an input to a specific user, in the shuffle model users submit their privatized inputs to a server anonymously. This setup yields a trust model which sits in between the classical curator and local models for differential privacy. The shuffle model is the core idea in the Encode, Shuffle, Analyze (ESA) model introduced by Bittau et al. (SOPS 2017). Recent work by Cheu et al. (EUROCRYPT 2019) analyzes the differential privacy properties of the shuffle model and shows that in some cases shuffled protocols provide strictly better accuracy than local protocols. Additionally, Erlingsson et al. (SODA 2019) provide a privacy amplification bound quantifying the level of curator differential privacy achieved by the shuffle model in terms of the local differential privacy of the randomizer used by each user. In this context, we make three contributions. First, we provide an optimal single message protocol for summation of real numbers in the shuffle model. Our protocol is very simple and has better accuracy and communication than the protocols for this same problem proposed by Cheu et al. Optimality of this protocol follows from our second contribution, a new lower bound for the accuracy of private protocols for summation of real numbers in the shuffle model. The third contribution is a new amplification bound for analyzing the privacy of protocols in the shuffle model in terms of the privacy provided by the corresponding local randomizer. Our amplification bound generalizes the results by Erlingsson et al. to a wider range of parameters, and provides a whole family of methods to analyze privacy amplification in the shuffle model.
  • 1 of 4
  • »