Publications

2017
Robert M Groves, Michael E Chernew, Piet Daas, Cynthia Dwork, Ophir Frieder, Hosagrahar V Jagadish, Frauke Kreuter, Sharon Lohr, James P Lynch, Colm O'Muircheartaigh, Trivellore Raghunathan, Roberto Rigobon, and Marc Rotenberg. 2017. “Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy.” National Academies of Sciences, Engineering, and Medicine paper.Abstract
Federal government statistics provide critical information to the country and serve a key role in a democracy. For decades, sample surveys with instruments carefully designed for particular data needs have been one of the primary methods for collecting data for federal statistics. However, the costs of conducting such surveys have been increasing while response rates have been declining, and many surveys are not able to fulfill growing demands for more timely information and for more detailed information at state and local levels.
Mark Bun, Thomas Steinke, and Jonathan Ullman. 2017. “Make Up Your Mind: The Price of Online Queries in Differential Privacy.” Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). arXiv PageAbstract

We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer:

• Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch.

• Online: The queries are chosen all at once, but the mechanism only receives the queries in a streaming fashion and must answer each query before seeing the next query.

• Adaptive: The queries are chosen one at a time and the mechanism must answer each query before the next query is chosen. In particular, each query may depend on the answers given to previous queries.

Many differentially private mechanisms are just as efficient in the adaptive model as they are in the offline model. Meanwhile, most lower bounds for differential privacy hold in the offline setting. This suggests that the three models may be equivalent. We prove that these models are all, in fact, distinct. Specifically, we show that there is a family of statistical queries such that exponentially more queries from this family can be answered in the offline model than in the online model. We also exhibit a family of search queries such that exponentially more queries from this family can be answered in the online model than in the adaptive model. We also investigate whether such separations might hold for simple queries like threshold queries over the real line.

Michael Bar-Sinai. 2017. “Modeling Welfare Benefit Policies using the DataTags Toolset.” In European Social Policy Analysis network (ESPAnet). Israel.Abstract
We apply Tags, a framework for modeling data handling policies, to a welfare policy. The generated model is useful for assessing entitlements of specific cases, and for gaining insights into the modeled policy as a whole.
PDF
Mitali Bafna and Jonathan Ullman. 2017. “The Price of Selection in Differential Privacy.” Proceedings of The 30th Conference on Learning Theory Conference (COLT 2017). arXiv Page
Shiva Kasiviswanathan, Kobbi Nissim, and Hongxia Jin. 2017. “Private Incremental Regression.” in the ACM SIGMOD/PODS Conference (PODS 2017).
Latanya Sweeney, Ji Su Yoo, Laura Perovich, Katherine E Boronow, Phil Brown, and Julia Green Brody. 2017. “Reidentification Risks in HIPAA Safe Harbor Data: A study of data from one environmental health study.” in Technology Science.
Gustavo Durand, Michael Bar-Sinai, and Merce Crosas. 2017. “Securing Dataverse with an Adapted Command Design Pattern.” in the IEEE Secure Development Conference (IEEE 2017).
Thomas Steinke and Jonathan Ullman. 2017. “Tight Lower Bounds for Differentially Private Selection.” in the 58th Annual Symposium on Foundations of Computer Science (FOCS 2017). arXiv Page
Latanya Sweeney, Ji Su Yoo, and Jinyan Zang. 2017. “Voter Identity Theft: Submitting Changes to Voter Registrations Online to Disrupt Elections.” in Technology Science.
Cynthia Dwork. 2017. “What's Fair?” in the 23rd SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2017).
2016
Urs Gasser. 12/9/2016. “Recoding Privacy Law: Reflections on the Future Relationship Among Law, Technology, and Privacy.” Harvard Law Review . Publisher's Version
G. Barthe, M. Gaboardi, E. J. Gallego Arias, J. Hsu, A. Roth, and P.-Y. Strub. 12/2016. “Computer-Aided Verification in Mechanism Design.” Conference on Internet and Economics, WINE .
G. Barthe, N. Fong, M. Gaboardi, B. Gregoire, J. Hsu, and P.-Y. Strub. 10/2016. “Advanced Probabilistic Couplings for Differential Privacy .” 23rd ACM Conference on Computer and Communications Security, CCS.
G. Barthe, G. P. Farina, M. Gaboardi, E. J. Gallego Arias, A. D. Gordon, J. Hsu, and P.-Y. Strub. 10/2016. “Differentially Private Bayesian Programming .” 23rd ACM Conference on Computer and Communications Security, CCS.
Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam O'Neill. 10/2016. “Generic Attacks on Secure Outsourced Databases.” 23rd ACM Conference on Computer and Communications Security.Abstract

Recently, various protocols have been proposed for securely outsourcing database storage to a third party server, ranging from systems with “full-fledged” security based on strong cryptographic primitives such as fully homomorphic encryption or oblivious RAM, to more practical implementations based on searchable symmetric encryption or even on deterministic and order-preserving encryption. On the flip side, various attacks have emerged that show that for some of these protocols confidentiality of the data can be compromised, usually given certain auxiliary information. We take a step back and identify a need for a formal understanding of the inherent efficiency/privacy trade-off in outsourced database systems, independent of the details of the system. We propose abstract models that capture secure outsourced storage systems in sufficient generality, and identify two basic sources of leakage, namely access pattern and communication volume. We use our models to distinguish certain classes of outsourced database systems that have been proposed, and deduce that all of them exhibit at least one of these leakage sources. We then develop generic reconstruction attacks on any system supporting range queries where either access pattern or communication volume is leaked. These attacks are in a rather weak passive adversarial model, where the untrusted server knows only the underlying query distribution. In particular, to perform our attack the server need not have any prior knowledge about the data, and need not know any of the issued queries nor their results. Yet, the server can reconstruct the secret attribute of every record in the database after about N 4 queries, where N is the domain size. We provide a matching lower bound showing that our attacks are essentially optimal. Our reconstruction attacks using communication volume apply even to systems based on homomorphic encryption or oblivious RAM in the natural way. Finally, we provide experimental results demonstrating the efficacy of our attacks on real datasets with a variety of different features. On all these datasets, after the required number of queries our attacks successfully recovered the secret attributes of every record in at most a few seconds.

PDF
Michael Bar-Sinai, Latanya Sweeney, and Merce Crosas. 8/4/2016. “DataTags, Data Handling Policy Spaces and the Tags Language.” In Security and Privacy Workshops (SPW), 2016 IEEE. San Jose, California : IEEE. Publisher's VersionAbstract

Widespread sharing of scientific datasets holds great promise for new scientific discoveries and great risks for personal privacy. Dataset handling policies play the critical role of balancing privacy risks and scientific value. We propose an extensible, formal, theoretical model for dataset handling policies. We define binary operators for policy composition and for comparing policy strictness, such that propositions like "this policy is stricter than that policy" can be formally phrased. Using this model, The policies are described in a machine-executable and human-readable way. We further present the Tags programming language and toolset, created especially for working with the proposed model. Tags allows composing interactive, friendly questionnaires which, when given a dataset, can suggest a data handling policy that follows legal and technical guidelines. Currently, creating such a policy is a manual process requiring access to legal and technical experts, which are not always available. We present some of Tags' tools, such as interview systems, visualizers, development environment, and questionnaire inspectors. Finally, we discuss methodologies for questionnaire development. Data for this paper include a questionnaire for suggesting a HIPAA compliant data handling policy, and formal description of the set of data tags proposed by the authors in a recent paper.

Salil Vadhan. 8/2016. “The Complexity of Differential Privacy”.Abstract

Differential Privacy is a theoretical framework for ensuring the privacy of individual-level data when performing statistical analysis of privacy-sensitive datasets. The goal of this tutorial is to convey the deep connections between differential privacy and a variety of other topics in computational complexity, cryptography, and theoretical computer science at large. This tutorial was written starting from notes taken during a minicourse given by the author and Kunal Talwar at the 26th McGill Invitational Workshop on Computational Complexity in February 2014, at the Bellairs Institute in Holetown, Barbados.

August 2016 March 2017 Errata
Marco Gaboardi, James Honaker, Gary King, Kobbi Nissim, Jonathan Ullman, Salil Vadhan, and Jack Murtagh. 6/2016. “PSI (Ψ): a Private data Sharing Interface.” In Theory and Practice of Differential Privacy. New York, NY. ArXiv VersionAbstract

We provide an overview of PSI (“a Private data Sharing Interface”), a system we are devel- oping to enable researchers in the social sciences and other fields to share and explore privacy- sensitive datasets with the strong privacy protections of differential privacy.

Poster presented at Theory and Practice of Differential Privacy (TPDP 2016).

TPDP Poster PSI.pdf
Yiling Chen, Stephen Chong, Ian Kash, Tal Moran, and Salil Vadhan. 6/2016. “Truthful Mechanisms for Agents that Value Privacy.” ACM Transactions on Economics and Computation (TEAC), 4, 3. TEAC VersionAbstract

Recent work has constructed economic mechanisms that are both truthful and differentially private. In these mechanisms, privacy is treated separately from truthfulness; it is not incorporated in players’ utility functions (and doing so has been shown to lead to nontruthfulness in some cases). In this work, we propose a new, general way of modeling privacy in players’ utility functions. Specifically, we only assume that if an outcome o has the property that any report of player i would have led to o with approximately the same probability, then o has a small privacy cost to player i. We give three mechanisms that are truthful with respect to our modeling of privacy: for an election between two candidates, for a discrete version of the facility location problem, and for a general social choice problem with discrete utilities (via a VCG-like mechanism). As the number n of players increases, the social welfare achieved by our mechanisms approaches optimal (as a fraction of n).

Rachel Cummings, Katrina Ligett, Kobbi Nissim, Aaron Roth, and Zhiwei Steven Wu. 2016. “Adaptive Learning with Robust Generalization Guarantees.” Conference on Learning Theory (COLT). arXiv VersionAbstract

The traditional notion of generalization---i.e., learning a hypothesis whose empirical error is close to its true error---is surprisingly brittle. As has recently been noted in [DFH+15b], even if several algorithms have this guarantee in isolation, the guarantee need not hold if the algorithms are composed adaptively. In this paper, we study three notions of generalization---increasing in strength---that are robust to postprocessing and amenable to adaptive composition, and examine the relationships between them. We call the weakest such notion Robust Generalization. A second, intermediate, notion is the stability guarantee known as differential privacy. The strongest guarantee we consider we call Perfect Generalization. We prove that every hypothesis class that is PAC learnable is also PAC learnable in a robustly generalizing fashion, with almost the same sample complexity. It was previously known that differentially private algorithms satisfy robust generalization. In this paper, we show that robust generalization is a strictly weaker concept, and that there is a learning task that can be carried out subject to robust generalization guarantees, yet cannot be carried out subject to differential privacy. We also show that perfect generalization is a strictly stronger guarantee than differential privacy, but that, nevertheless, many learning tasks can be carried out subject to the guarantees of perfect generalization.

PDF

Pages