Publications

Forthcoming
M. Altman, A. Wood, D. O'Brien, S. Vadhan, and U. Gasser, “Towards a Modern Approach to Privacy-Aware Government Data Releases,” Berkeley Journal of Technology Law, Forthcoming. modernopendataprivacy.pdf
2016
M. Bun and M. Zhandry, “Order revealing encryption and the hardness of private learning,” in Proceedings of the 12th Theory of Cryptography Conference (TCC 2016), Tel-Aviv, Israel, 2016. ArXiv VersionAbstract

An order-revealing encryption scheme gives a public procedure by which two ciphertexts can be compared to reveal the ordering of their underlying plaintexts. We show how to use order-revealing encryption to separate computationally efficient PAC learning from efficient (ϵ,δ)-differentially private PAC learning. That is, we construct a concept class that is efficiently PAC learnable, but for which every efficient learner fails to be differentially private. This answers a question of Kasiviswanathan et al. (FOCS '08, SIAM J. Comput. '11).
To prove our result, we give a generic transformation from an order-revealing encryption scheme into one with strongly correct comparison, which enables the consistent comparison of ciphertexts that are not obtained as the valid encryption of any message. We believe this construction may be of independent interest.

BunZhandry.pdf
J. Murtagh and S. Vadhan, “The Complexity of Computing the Optimal Composition of Differential Privacy,” in Theory of Cryptography Conference (TCC 2016), 2016. ArXiv VersionAbstract

In the study of differential privacy, composition theorems (starting with the original paper of Dwork, McSherry, Nissim, and Smith (TCC'06)) bound the degradation of privacy when composing several differentially private algorithms. Kairouz, Oh, and Viswanath (ICML'15) showed how to compute the optimal bound for composing k arbitrary (ϵ,δ)-differentially private algorithms. We characterize the optimal composition for the more general case of k arbitrary (ϵ1,δ1),,(ϵk,δk)-differentially private algorithms where the privacy parameters may differ for each algorithm in the composition. We show that computing the optimal composition in general is #P-complete. Since computing optimal composition exactly is infeasible (unless FP=#P), we give an approximation algorithm that computes the composition to arbitrary accuracy in polynomial time. The algorithm is a modification of Dyer's dynamic programming approach to approximately counting solutions to knapsack problems (STOC'03).

MurtaghVadhan.pdf
V. Karwa and A. Slavković, “Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs,” Annals of Statistics , vol. 44, no. 1, pp. 87-112, 2016. ArXiv's VersionAbstract

The β-model of random graphs is an exponential family model with the degree sequence as a sufficient statistic. In this paper, we contribute three key results. First, we characterize conditions that lead to a quadratic time algorithm to check for the existence of MLE of the β-model, and show that the MLE never exists for the degree partition β-model. Second, motivated by privacy problems with network data, we derive a differentially private estimator of the parameters of β-model, and show it is consistent and asymptotically normally distributed - it achieves the same rate of convergence as the nonprivate estimator. We present an efficient algorithm for the private estimator that can be used to release synthetic graphs. Our techniques can also be used to release degree distributions and degree partitions accurately and privately, and to perform inference from noisy degrees arising from contexts other than privacy. We evaluate the proposed estimator on real graphs and compare it with a current algorithm for releasing degree distributions and find that it does significantly better. Finally, our paper addresses shortcomings of current approaches to a fundamental problem of how to perform valid statistical inference from data released by privacy mechanisms, and lays a foundational groundwork on how to achieve optimal and private statistical inference in a principled manner by modeling the privacy mechanism; these principles should be applicable to a class of models beyond the β-model.

A. Wood, et al., “Comments on the Proposed Rules to Revise the Federal Policy for the Protection of Human Subjects”. 2016. Regulations.govAbstract

Alexandra Wood, Edo Airoldi, Micah Altman, Yves-Alexandre de Montjoye, Urs Gasser, David O'Brien, and Salil Vadhan submitted comments in response to the September 2015 notice of proposed rulemaking to revise the Federal Policy for the Protection of Human Subjects. With the ability to collect and analyze massive quantities of data related to human characteristics, behaviors, and interactions, researchers are increasingly able to explore phenomena in finer detail and with greater confidence. A major challenge for realizing the full potential of these recent advances will be protecting the privacy of human subjects. Drawing from their research findings and a forthcoming article articulating a modern approach to privacy analysis, the authors offer recommendations for updating the Common Rule to reflect recent developments in the scientific understanding of privacy. The suggested revisions ultimately aim to enable wider collection, use, and sharing of research data while providing stronger privacy protection for human subjects.

 

Specific recommendations include:

 

  • Incorporating clear and consistent definitions for privacy, confidentiality, and security.

  • Providing similar levels of protection to research activities that pose similar risks.

  • Relying on standards and requirements that recognize the limitations of traditional de-identification techniques, the inadequacy of binary conceptions of “identifiable” and “publicly-available” information, and the significance of inference risks to privacy.

  • Creating a new privacy standard based not on a binary identifiability standard, but on the extent to which attributes that may be revealed or inferred depend on an individual’s data and the potential harm that may result.

  • Requiring investigators to conduct systematic privacy analyses and calibrate their use of privacy and security controls to the specific intended uses and privacy risks at every stage of the information lifecycle.

  • Addressing informational risks using a combination of privacy and security controls rather than relying on a single control such as consent or de-­identification and adopting tiered access models where appropriate.

  • Forming an advisory committee of data privacy experts to help the Secretary of Health and Human Services develop guidance on applying privacy and security controls that are closely matched to the intended uses and privacy risks in specific research activities.

 

The authors argue that addressing these issues will help lead researchers towards state-of-the-art privacy practices and advance the exciting research opportunities enabled by new data sources and technologies for collecting, analyzing, and sharing data about individuals.

The full comments are also available through Regulations.gov.
privacy_tools_project_response_to_common_rule_nprm.pdf
2015
Y. Chen, K. Nissim, and B. Waggoner, “Fair Information Sharing for Treasure Hunting.,” in AAI Conference on Artificial Intelligence, North America, 2015. PDF Abstract

In a search task, a group of agents compete to be the first to find the solution. Each agent has different private information to incorporate into its search. This problem is inspired by settings such as scientific research, Bitcoin hash inversion, or hunting for some buried treasure. A social planner such as a funding agency, mining pool, or pirate captain might like to convince the agents to collaborate, share their information, and greatly reduce the cost of searching. However, this cooperation is in tension with the individuals' competitive desire to each be the first to win the search. The planner's proposal should incentivize truthful information sharing, reduce the total cost of searching, and satisfy fairness properties that preserve the spirit of the competition. We design contract-based mechanisms for information sharing without money. The planner solicits the agents' information and assigns search locations to the agents, who may then search only within their assignments. Truthful reporting of information to the mechanism maximizes an agent's chance to win the search. Epsilon-voluntary participation is satisfied for large search spaces. In order to formalize the planner's goals of fairness and reduced search cost, we propose a simplified, simulated game as a benchmark and quantify fairness and search cost relative to this benchmark scenario. The game is also used to implement our mechanisms. Finally, we extend to the case where coalitions of agents may participate in the mechanism, forming larger coalitions recursively.

C. Dwork, A. Smith, T. Steinke, J. Ullman, and S. Vadhan, “Robust Traceability from Trace Amounts,” in IEEE Symposium on Foundations of Computer Science (FOCS 2015), Berkeley, California, 2015. robust.pdf
M. Bun, K. Nissim, and U. Stemmer, “Simultaneous private learning of multiple concepts,” 2015.
S. Zheng, The Differential Privacy of Bayesian Inference. Bachelor's thesis, Harvard College, 2015. DASH Version
M. Bun, K. Nissim, U. Stemmer, and S. Vadhan, “Differentially Private Release and Learning of Threshold Functions,” in 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS 15), Berkeley, California, 2015. ArXiv VersionAbstract

We prove new upper and lower bounds on the sample complexity of (ϵ,δ) differentially private algorithms for releasing approximate answers to threshold functions. A threshold function cx over a totally ordered domain X evaluates to cx(y)=1 if yx, and evaluates to 0 otherwise. We give the first nontrivial lower bound for releasing thresholds with (ϵ,δ) differential privacy, showing that the task is impossible over an infinite domain X, and moreover requires sample complexity nΩ(log|X|), which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with n2(1+o(1))log|X| samples. This improves the previous best upper bound of 8(1+o(1))log|X| (Beimel et al., RANDOM '13).
Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with (ϵ,δ) differential privacy and learning without privacy. For properly learning thresholds in dimensions, this lower bound extends to nΩ(log|X|).
To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database D of elements from X, the interior point problem asks for an element between the smallest and largest elements in D. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.

bunnissimstemmervadhan.pdf
D. Abrams, “What Stays in Vegas: The Road to “Zero Privacy",” New England Law Review, vol. 49, no. 4, 2015. Publisher's Version
L. Sweeney, M. Crosas, and M. Bar-Sinai, “Sharing Sensitive Data with Confidence: The Datatags System,” Technology Science , 2015. Technology ScienceAbstract

Society generates data on a scale previously unimagined. Wide sharing of these data promises to improve personal health, lower healthcare costs, and provide a better quality of life. There is a tendency to want to share data freely. However, these same data often include sensitive information about people that could cause serious harms if shared widely. A multitude of regulations, laws and best practices protect data that contain sensitive personal information. Government agencies, research labs, and corporations that share data, as well as review boards and privacy officers making data sharing decisions, are vigilant but uncertain. This uncertainty creates a tendency not to share data at all. Some data are more harmful than other data; sharing should not be an all-or-nothing choice. How do we share data in ways that ensure access is commensurate with risks of harm?

O. Sheffet, “Private Approximations of the 2nd-Moment Matrix Using Existing Techniques in Linear Regression,” 2015. ArXiv VersionAbstract

We introduce three differentially-private algorithms that approximates the 2nd-moment matrix of the data. These algorithm, which in contrast to existing algorithms output positive-definite matrices, correspond to existing techniques in linear regression literature. Specifically, we discuss the following three techniques. (i) For Ridge Regression, we propose setting the regularization coefficient so that by approximating the solution using Johnson-Lindenstrauss transform we preserve privacy. (ii) We show that adding a small batch of random samples to our data preserves differential privacy. (iii) We show that sampling the 2nd-moment matrix from a Bayesian posterior inverse-Wishart distribution is differentially private provided the prior is set correctly. We also evaluate our techniques experimentally and compare them to the existing "Analyze Gauss" algorithm of Dwork et al.

M. Bun and J. Thaler, “Hardness Amplification and the Approximate Degree of Constant-Depth Circuits,” International Colloquium on Automata, Languages, and Programming (ICALP 2015) BG, 2015. ArXiv Version hardnessamplification.pdf
A. Askarov, S. Moore, C. Dimoulas, and S. Chong, “Cryptographic Enforcement of Language-Based Erasure,” Proceedings of the 28th IEEE Computer Security Foundations Symposium (CSF), 2015.
L. Sweeney and M. Crosas, “An Open Science Platform for the Next Generation of Data,” Arxiv.org Computer Science, Computers and Scoiety [Internet], 2015. ArXiv VersionAbstract

Imagine an online work environment where researchers have direct and immediate access to myriad data sources and tools and data management resources, useful throughout the research lifecycle. This is our vision for the next generation of the Dataverse Network: an Open Science Platform (OSP). For the first time, researchers would be able to seamlessly access and create primary and derived data from a variety of sources: prior research results, public data sets, harvested online data, physical instruments, private data collections, and even data from other standalone repositories. Researchers could recruit research participants and conduct research directly on the OSP, if desired, using readily available tools. Researchers could create private or shared workspaces to house data, access tools, and computation and could publish data directly on the platform or publish elsewhere with persistent, data citations on the OSP. This manuscript describes the details of an Open Science Platform and its construction. Having an Open Science Platform will especially impact the rate of new scientific discoveries and make scientific findings more credible and accountable.

sweeneycrosas.pdf
L. Sweeney, “All the Data on All the People,” UC Berkeley Law School & GWU Law School (Berkeley Center for Law & Technology). The Privacy Law Scholars Conference (PLSC). Berkeley, CA. . 2015.
L. Sweeney, “Privacy as a Sword and Shield in Public Health,” New York City Department of Public Health. New York, NY. . 2015.
M. Altman, “Privacy Principles (framing talk),” United Nations Global Pulse Workshop on ICT4D Principle 8: Address Privacy & Security In Development Programs. New York, USA. 2015.
L. Cranor, T. Rabin, V. Shmatikov, S. Vadhan, and D. Weitzner, “Towards a Privacy Research Roadmap for the Computing Community.,” Report for the Computing Community Consortium (CCC), 2015. CCC Version