# Publications

2011
Anupam Gupta, Moritz Hardt, Aaron Roth, and Jonathan Ullman. 2011. “Privately releasing conjunctions and the statistical query barrier.” In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, Pp. 803-812. San Jose, CA, USA: ACM. ACM Digital LibraryAbstract
Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? We show that the number of statistical queries necessary and sufficient for this task is---up to polynomial factors---equal to the agnostic learning complexity of C in Kearns' statistical query (SQ)model. This gives a complete answer to the question when running time is not a concern. We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. While interesting from a learning theoretic point of view, our main applications are in privacy-preserving data analysis: Here, our second result leads to an algorithm that efficiently releases differentially private answers to all Boolean conjunctions with 1% average error. This presents progress on a key open problem in privacy-preserving data analysis. Our first result on the other hand gives unconditional lower bounds on any differentially private algorithm that admits a (potentially non-privacy-preserving) implementation using only statistical queries. Not only our algorithms, but also most known private algorithms can be implemented using only statistical queries, and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQ-model as a new barrier in the design of differentially private algorithms.
Yiling Chen, Stephen Chong, Ian A. Kash, Tal Moran, and Salil P. Vadhan. 2011. “Truthful Mechanisms for Agents that Value Privacy.” CoRR, abs/1111.5472. ArXiv VersionAbstract

Recent work has constructed economic mechanisms that are both truthful and differentially private. In these mechanisms, privacy is treated separately from the truthfulness; it is not incorporated in players' utility functions (and doing so has been shown to lead to non-truthfulness in some cases). In this work, we propose a new, general way of modelling privacy in players' utility functions. Specifically, we only assume that if an outcome $o$ has the property that any report of player $i$ would have led to $o$ with approximately the same probability, then $o$ has small privacy cost to player $i$. We give three mechanisms that are truthful with respect to our modelling of privacy: for an election between two candidates, for a discrete version of the facility location problem, and for a general social choice problem with discrete utilities (via a VCG-like mechanism). As the number $n$ of players increases, the social welfare achieved by our mechanisms approaches optimal (as a fraction of $n$).

2010
Cynthia Dwork, Guy Rothblum, and Salil Vadhan. 2010. “Boosting and Differential Privacy.” In Proceedings of the 51st Annual {IEEE} Symposium on Foundations of Computer Science (FOCS 10), Pp. 51–60. Las Vegas, NV: IEEE. DOIAbstract

Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved privacy-pre serving synopses of an input database. These are data structures that yield, for a given set Q of queries over an input database, reasonably accurate estimates of the responses to every query in Q, even when the number of queries is much larger than the number of rows in the database. Given a base synopsis generator that takes a distribution on Q and produces a "weak" synopsis that yields "good" answers for a majority of the weight in Q, our Boosting for Queries algorithm obtains a synopsis that is good for all of Q. We ensure privacy for the rows of the database, but the boosting is performed on the queries. We also provide the first synopsis generators for arbitrary sets of arbitrary low-sensitivity queries, i.e., queries whose answers do not vary much under the addition or deletion of a single row. In the execution of our algorithm certain tasks, each incurring some privacy loss, are performed many times. To analyze the cumulative privacy loss, we obtain an O(ε2) bound on the expected privacy loss from a single e-differentially private mechanism. Combining this with evolution of confidence arguments from the literature, we get stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides e-differential privacy or one of its relaxations, and each of which operates on (potentially) different, adaptively chosen, databases.

Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil Vadhan. 2010. “The Limits of Two-Party Differential Privacy.” In Proceedings of the 51st Annual {IEEE} Symposium on Foundations of Computer Science (FOCS 10), Pp. 81–90. Las Vegas, NV: IEEE. DOIAbstract

We study differential privacy in a distributed setting where two parties would like to perform analysis of their joint data while preserving privacy for both datasets. Our results imply almost tight lower bounds on the accuracy of such data analyses, both for specific natural functions (such as Hamming distance) and in general. Our bounds expose a sharp contrast between the two-party setting and the simpler client-server setting (where privacy guarantees are one-sided). In addition, those bounds demonstrate a dramatic gap between the accuracy that can be obtained by differentially private data analysis versus the accuracy obtainable when privacy is relaxed to a computational variant of differential privacy. The first proof technique we develop demonstrates a connection between differential privacy and deterministic extraction from Santha-Vazirani sources. A second connection we expose indicates that the ability to approximate a function by a low-error differentially private protocol is strongly related to the ability to approximate it by a low communication protocol. (The connection goes in both directions).

2009
Cynthia Dwork, Moni Naor, Omer Reingold, Guy Rothblum, and Salil Vadhan. 2009. “On the Complexity of Differentially Private Data Release: Efficient Algorithms and Hardness Results.” In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC 09), Pp. 381–390. Bethesda, MD. ACM Digital LibraryAbstract

We consider private data analysis in the setting in which a trusted and trustworthy curator, having obtained a large data set containing private information, releases to the public a "sanitization" of the data set that simultaneously protects the privacy of the individual contributors of data and offers utility to the data analyst. The sanitization may be in the form of an arbitrary data structure, accompanied by a computational procedure for determining approximate answers to queries on the original data set, or it may be a "synthetic data set" consisting of data items drawn from the same universe as items in the original data set; queries are carried out as if the synthetic data set were the actual input. In either case the process is non-interactive; once the sanitization has been released the original data and the curator play no further role. For the task of sanitizing with a synthetic dataset output, we map the boundary between computational feasibility and infeasibility with respect to a variety of utility measures. For the (potentially easier) task of sanitizing with unrestricted output format, we show a tight qualitative and quantitative connection between hardness of sanitizing and the existence of traitor tracing schemes.

Ilya Mironov, Omkant Pandey, Omer Reingold, and Salil Vadhan. 2009. “Computational Differential Privacy.” In Advances in Cryptology–-CRYPTO 09, 5677: Pp. 126–142. Santa Barbara, CA: Springer-Verlag. Springer LinkAbstract

The definition of differential privacy has recently emerged as a leading standard of privacy guarantees for algorithms on statistical databases. We offer several relaxations of the definition which require privacy guarantees to hold only against efficient—i.e., computationally-bounded—adversaries. We establish various relationships among these notions, and in doing so, we observe their close connection with the theory of pseudodense sets by Reingold et al.[1]. We extend the dense model theorem of Reingold et al. to demonstrate equivalence between two definitions (indistinguishability- and simulatability-based) of computational differential privacy. Our computational analogues of differential privacy seem to allow for more accurate constructions than the standard information-theoretic analogues. In particular, in the context of private approximation of the distance between two vectors, we present a differentially-private protocol for computing the approximation, and contrast it with a substantially more accurate protocol that is only computationally differentially private.

2000
Latanya Sweeney. 2000. “Simple Demographics Often Identify People Uniquely.” Carnegie Mellon University, Data Privacy. Project website