Salil Vadhan. 2017. “
The Complexity of Differential Privacy.” In Tutorials on the Foundations of Cryptography, Pp. 347-450. Springer, Yehuda Lindell, ed.
Publisher's VersionAbstractVersion History:
August 2016: Manuscript v1 (see files attached)
March 2017: Manuscript v2 (see files attached); Errata
April 2017: Published Version (in Tutorials on the Foundations of Cryptography; see above)
Differential privacy is a theoretical framework for ensuring the privacy of individual-level data when performing statistical analysis of privacy-sensitive datasets. This tutorial provides an introduction to and overview of differential privacy, with the goal of conveying its deep connections to a variety of other topics in computational complexity, cryptography, and theoretical computer science at large. This tutorial is written in celebration of Oded Goldreich’s 60th birthday, starting from notes taken during a minicourse given by the author and Kunal Talwar at the 26th McGill Invitational Workshop on Computational Complexity [1].
MANUSCRIPT 2016.pdf
MANUSCRIPT 2017.pdf
ERRATA 2017.pdf
SPRINGER 2017.pdf Merce Crosas. 2017. “
The DataTags System: Sharing Sensitive Data with Confidence.” Research Data Alliance (RDA) 8th Plenary on Privacy Implications of Research Data Sets, during International Data Week 2016.
PDF Cynthia Dwork, Nicole Immorlica, Adam Kalai, and Max Leiserson. 2017. “
Decoupled Classifiers for Fair and Efficient Machine Learning.” In Fairness and Transparency in Machine Learning Conference (FATML).
AbstractWhen it is ethical and legal to use a sensitive attribute (such as gender or race) in machine learning systems, the question remains how to do so. We show that the naive application of machine learning algorithms using sensitive features leads to an inherent tradeoff in accuracy between groups. We provide a simple and efficient decoupling technique, that can be added on top of any black-box machine learning algorithm, to learn different classifiers for different groups. Transfer learning is used to mitigate the problem of having too little data on any one group.
The method can apply to a range of fairness criteria. In particular, we require the application designer to specify as joint loss function that makes explicit the trade-off between fairness and accuracy. Our reduction is shown to efficiently find the minimum loss as long as the objective has a certain natural monotonicity property which may be of independent interest in the study of fairness in algorithms.
PDF Marko Mitrovic, Mark Bun, Andreas Krause, and Amin Karbasi. 2017. “
Differentially Private Submodular Maximization: Data Summarization in Disguise.” In Proceedings of the 34th International Conference on Machine Learning (ICML 2017).
AbstractMany data summarization applications are captured by the general framework of submodular maximization. As a consequence, a wide range of efficient approximation algorithms have been developed. However, when such applications involve sensitive data about individuals, their privacy concerns are not automatically addressed. To remedy this problem, we propose a general and systematic study of differentially private submodular maximization. We present privacy-preserving algorithms for both monotone and non-monotone submodular maximization under cardinality, matroid, and p-extendible system constraints, with guarantees that are competitive with optimal. Along the way, we analyze a new algorithm for non-monotone submodular maximization, which is the first (even non-privately) to achieve a constant approximation ratio while running in linear time. We additionally provide two concrete experiments to validate the efficacy of these algorithms.
PDF Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. 2017. “
Exposed! A Survey of Attacks on Private Data.” Annual Review of Statistics and Its Application (2017).
AbstractPrivacy-preserving statistical data analysis addresses the general question of protecting privacy when publicly releasing information about a sensitive dataset. A privacy attack takes seemingly innocuous released information and uses it to discern the private details of individuals, thus demonstrating that such information compromises privacy. For example, re-identification attacks have shown that it is easy to link supposedly de-identified records to the identity of the individual concerned. This survey focuses on attacking aggregate data, such as statistics about how many individuals have a certain disease, genetic trait, or combination thereof. We consider two types of attacks: reconstruction attacks, which approximately determine a sensitive feature of all the individuals covered by the dataset, and tracing attacks, which determine whether or not a target individual's data are included in the dataset.Wealso discuss techniques from the differential privacy literature for releasing approximate aggregate statistics while provably thwarting any privacy attack.
PDF Robert M Groves, Michael E Chernew, Piet Daas, Cynthia Dwork, Ophir Frieder, Hosagrahar V Jagadish, Frauke Kreuter, Sharon Lohr, James P Lynch, Colm O'Muircheartaigh, Trivellore Raghunathan, Roberto Rigobon, and Marc Rotenberg. 2017. “
Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy.” National Academies of Sciences, Engineering, and Medicine paper.
AbstractFederal government statistics provide critical information to the country and serve a key role in a democracy. For decades, sample surveys with instruments carefully designed for particular data needs have been one of the primary methods for collecting data for federal statistics. However, the costs of conducting such surveys have been increasing while response rates have been declining, and many surveys are not able to fulfill growing demands for more timely information and for more detailed information at state and local levels.
PDF Mark Bun, Thomas Steinke, and Jonathan Ullman. 2017. “
Make Up Your Mind: The Price of Online Queries in Differential Privacy.” Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA).
arXiv PageAbstractWe consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer:
• Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch.
• Online: The queries are chosen all at once, but the mechanism only receives the queries in a streaming fashion and must answer each query before seeing the next query.
• Adaptive: The queries are chosen one at a time and the mechanism must answer each query before the next query is chosen. In particular, each query may depend on the answers given to previous queries.
Many differentially private mechanisms are just as efficient in the adaptive model as they are in the offline model. Meanwhile, most lower bounds for differential privacy hold in the offline setting. This suggests that the three models may be equivalent. We prove that these models are all, in fact, distinct. Specifically, we show that there is a family of statistical queries such that exponentially more queries from this family can be answered in the offline model than in the online model. We also exhibit a family of search queries such that exponentially more queries from this family can be answered in the online model than in the adaptive model. We also investigate whether such separations might hold for simple queries like threshold queries over the real line.
PDF Mitali Bafna and Jonathan Ullman. 2017. “
The Price of Selection in Differential Privacy.” Proceedings of The 30th Conference on Learning Theory Conference (COLT 2017).
arXiv Page
PDF Shiva Kasiviswanathan, Kobbi Nissim, and Hongxia Jin. 2017. “
Private Incremental Regression.” in the ACM SIGMOD/PODS Conference (PODS 2017).
Michael Bar-Sinai and Rotem Medzini. 2017. “
Public Policy Modeling using the DataTags Toolset.” In European Social Policy Analysis network (ESPAnet). Israel.
AbstractWe apply Tags, a framework for modeling data handling policies, to a welfare policy. The generated model is useful for assessing entitlements of specific cases, and for gaining insights into the modeled policy as a whole.
PDF Latanya Sweeney, Ji Su Yoo, Laura Perovich, Katherine E Boronow, Phil Brown, and Julia Green Brody. 2017. “
Reidentification Risks in HIPAA Safe Harbor Data: A study of data from one environmental health study.” in Technology Science.
Gustavo Durand, Michael Bar-Sinai, and Merce Crosas. 2017. “
Securing Dataverse with an Adapted Command Design Pattern.” in the IEEE Secure Development Conference (IEEE 2017).
PDF Thomas Steinke and Jonathan Ullman. 2017. “
Tight Lower Bounds for Differentially Private Selection.” in the 58th Annual Symposium on Foundations of Computer Science (FOCS 2017).
arXiv Page
PDF Cynthia Dwork. 2017. “
What's Fair?” in the 23rd SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2017).