Recently, various protocols have been proposed for securely outsourcing database storage to a third party server, ranging from systems with “full-fledged” security based on strong cryptographic primitives such as fully homomorphic encryption or oblivious RAM, to more practical implementations based on searchable symmetric encryption or even on deterministic and order-preserving encryption. On the flip side, various attacks have emerged that show that for some of these protocols confidentiality of the data can be compromised, usually given certain auxiliary information. We take a step back and identify a need for a formal understanding of the inherent efficiency/privacy trade-off in outsourced database systems, independent of the details of the system. We propose abstract models that capture secure outsourced storage systems in sufficient generality, and identify two basic sources of leakage, namely access pattern and communication volume. We use our models to distinguish certain classes of outsourced database systems that have been proposed, and deduce that all of them exhibit at least one of these leakage sources. We then develop generic reconstruction attacks on any system supporting range queries where either access pattern or communication volume is leaked. These attacks are in a rather weak passive adversarial model, where the untrusted server knows only the underlying query distribution. In particular, to perform our attack the server need not have any prior knowledge about the data, and need not know any of the issued queries nor their results. Yet, the server can reconstruct the secret attribute of every record in the database after about N 4 queries, where N is the domain size. We provide a matching lower bound showing that our attacks are essentially optimal. Our reconstruction attacks using communication volume apply even to systems based on homomorphic encryption or oblivious RAM in the natural way. Finally, we provide experimental results demonstrating the efficacy of our attacks on real datasets with a variety of different features. On all these datasets, after the required number of queries our attacks successfully recovered the secret attributes of every record in at most a few seconds.

%B 23rd ACM Conference on Computer and Communications Security %8 Oct 2016 %G eng %0 Conference Paper %B Security and Privacy Workshops (SPW), 2016 IEEE %D 2016 %T DataTags, Data Handling Policy Spaces and the Tags Language %A Bar-Sinai, Michael %A Sweeney, Latanya %A Merce Crosas %XWidespread sharing of scientific datasets holds great promise for new scientific discoveries and great risks for personal privacy. Dataset handling policies play the critical role of balancing privacy risks and scientific value. We propose an extensible, formal, theoretical model for dataset handling policies. We define binary operators for policy composition and for comparing policy strictness, such that propositions like "this policy is stricter than that policy" can be formally phrased. Using this model, The policies are described in a machine-executable and human-readable way. We further present the Tags programming language and toolset, created especially for working with the proposed model. Tags allows composing interactive, friendly questionnaires which, when given a dataset, can suggest a data handling policy that follows legal and technical guidelines. Currently, creating such a policy is a manual process requiring access to legal and technical experts, which are not always available. We present some of Tags' tools, such as interview systems, visualizers, development environment, and questionnaire inspectors. Finally, we discuss methodologies for questionnaire development. Data for this paper include a questionnaire for suggesting a HIPAA compliant data handling policy, and formal description of the set of data tags proposed by the authors in a recent paper.

%B Security and Privacy Workshops (SPW), 2016 IEEE %I IEEE %C San Jose, California %8 22-26 May %G eng %U http://ieeexplore.ieee.org/document/7527746/?reload=true&arnumber=7527746 %0 Manuscript %D 2016 %T The Complexity of Differential Privacy %A Salil Vadhan %XDifferential Privacy is a theoretical framework for ensuring the privacy of individual-level data when performing statistical analysis of privacy-sensitive datasets. The goal of this tutorial is to convey the deep connections between differential privacy and a variety of other topics in computational complexity, cryptography, and theoretical computer science at large. This tutorial was written starting from notes taken during a minicourse given by the author and Kunal Talwar at the 26th McGill Invitational Workshop on Computational Complexity in February 2014, at the Bellairs Institute in Holetown, Barbados.

We provide an overview of PSI (“a Private data Sharing Interface”), a system we are devel- oping to enable researchers in the social sciences and other fields to share and explore privacy- sensitive datasets with the strong privacy protections of differential privacy.

Poster presented at Theory and Practice of Differential Privacy (TPDP 2016).

%B Theory and Practice of Differential Privacy %C New York, NY %8 2016 %G eng %U https://arxiv.org/abs/1609.04340 %0 Journal Article %J ACM Transactions on Economics and Computation (TEAC) %D 2016 %T Truthful Mechanisms for Agents that Value Privacy %A Yiling Chen %A Stephen Chong %A Ian Kash %A Tal Moran %A Salil Vadhan %XRecent work has constructed economic mechanisms that are both truthful and differentially private. In these mechanisms, privacy is treated separately from truthfulness; it is not incorporated in players’ utility functions (and doing so has been shown to lead to nontruthfulness in some cases). In this work, we propose a new, general way of modeling privacy in players’ utility functions. Specifically, we only assume that if an outcome *o* has the property that any report of player *i* would have led to *o* with approximately the same probability, then *o* has a small privacy cost to player *i*. We give three mechanisms that are truthful with respect to our modeling of privacy: for an election between two candidates, for a discrete version of the facility location problem, and for a general social choice problem with discrete utilities (via a VCG-like mechanism). As the number *n* of players increases, the social welfare achieved by our mechanisms approaches optimal (as a fraction of *n*).

Stochastic gradient descent procedures have gained popularity for parameter estimation from large data sets. However, their statistical properties are not well understood, in theory. And in practice, avoiding numerical instability requires careful tuning of key parameters. Here, we introduce implicit stochastic gradient descent procedures, which involve parameter updates that are implicitly defined. Intuitively, implicit updates shrink standard stochastic gradient descent updates. The amount of shrinkage depends on the observed Fisher information matrix, which does not need to be explicitly computed; thus, implicit procedures increase stability without increasing the computational burden. Our theoretical analysis provides the first full characterization of the asymptotic behavior of both standard and implicit stochastic gradient descent-based estimators, including finite-sample error bounds. Importantly, analytical expressions for the variances of these stochastic gradient-based estimators reveal their exact loss of efficiency. We also develop new algorithms to compute implicit stochastic gradient descent-based estimators for generalized linear models, Cox proportional hazards, M-estimators, in practice, and perform extensive experiments. Our results suggest that implicit stochastic gradient descent procedures are poised to become a workhorse for approximate inference from large data sets.

%B Annals of Statistics %G eng %U https://arxiv.org/abs/1408.2923 %0 Journal Article %J NIPS 2015 Workshop on Learning and Privacy with Incomplete Data and Weak Supervision %D 2016 %T Private Posterior Distributions from Variational Approximations %A Vishesh Karwa %A Dan Kifer %A Aleksandra Slavkovic %XPrivacy preserving mechanisms such as differential privacy inject additional randomness in the form of noise in the data, beyond the sampling mechanism. Ignoring this additional noise can lead to inaccurate and invalid inferences. In this paper, we incorporate the privacy mechanism explicitly into the likelihood function by treating the original data as missing, with an end goal of estimating posterior distributions over model parameters. This leads to a principled way of performing valid statistical inference using private data, however, the corresponding likelihoods are intractable. In this paper, we derive fast and accurate variational approximations to tackle such intractable likelihoods that arise due to privacy. We focus on estimating posterior distributions of parameters of the naive Bayes log-linear model, where the sufficient statistics of this model are shared using a differentially private interface. Using a simulation study, we show that the posterior approximations outperform the naive method of ignoring the noise addition mechanism.

%B NIPS 2015 Workshop on Learning and Privacy with Incomplete Data and Weak Supervision %G eng %U https://arxiv.org/abs/1511.07896 %0 Journal Article %J The Ethics of Biomedical Big Data %D 2016 %T Strictly Biomedical? Sketching the Ethics of the Big Data Ecosystem in Biomedicine %A Effy Vayena %A Gasser, Urs %XIn today’s ever evolving data ecosystem it is evident that data generated for a wide range of purposes unrelated to biomedicine possess tremendous potential value for biomedical research. Analyses of our Google searches, social media content, loyalty card points and the like are used to draw a fairly accurate picture of our health, our future health, our attitudes towards vaccination, disease outbreaks within a county and epidemic trajectories in other continents. These data sets are different from traditional biomedical data, if a biomedical purpose is the categorical variable. Yet the results their analyses yield are of serious biomedical relevance. This paper discusses important but unresolved challenges within typical biomedical data, and it explores examples of non-biomedical Big Data with high biomedical value, including the specific conundrums these engender, especially when we apply biomedical data concepts to them. It also highlights the “digital phenotype” project, illustrating the Big Data ecosystem in action and an approach believed as likely to yield biomedical and health knowledge. We argue that to address the challenges and make full use of the opportunities that Big Data offers to biomedicine, a new ethical framework taking a data ecosystem approach is urgently needed. We conclude by discussing key components, design requirements and substantive normative elements of such a framework.

%B The Ethics of Biomedical Big Data %G eng %U http://link.springer.com/chapter/10.1007/978-3-319-33525-4_2 %0 Journal Article %J Conference on Learning Theory (COLT) %D 2016 %T Adaptive Learning with Robust Generalization Guarantees %A Rachel Cummings %A Katrina Ligett %A Kobbi Nissim %A Aaron Roth %A Zhiwei Steven Wu %XThe traditional notion of generalization---i.e., learning a hypothesis whose empirical error is close to its true error---is surprisingly brittle. As has recently been noted in [DFH+15b], even if several algorithms have this guarantee in isolation, the guarantee need not hold if the algorithms are composed adaptively. In this paper, we study three notions of generalization---increasing in strength---that are robust to postprocessing and amenable to adaptive composition, and examine the relationships between them. We call the weakest such notion Robust Generalization. A second, intermediate, notion is the stability guarantee known as differential privacy. The strongest guarantee we consider we call Perfect Generalization. We prove that every hypothesis class that is PAC learnable is also PAC learnable in a robustly generalizing fashion, with almost the same sample complexity. It was previously known that differentially private algorithms satisfy robust generalization. In this paper, we show that robust generalization is a strictly weaker concept, and that there is a learning task that can be carried out subject to robust generalization guarantees, yet cannot be carried out subject to differential privacy. We also show that perfect generalization is a strictly stronger guarantee than differential privacy, but that, nevertheless, many learning tasks can be carried out subject to the guarantees of perfect generalization.

%B Conference on Learning Theory (COLT) %G eng %U https://arxiv.org/abs/1602.07726 %0 Journal Article %J 48th Annual Symposium on the Theory of Computing %D 2016 %T Algorithmic Stability for Adaptive Data Analysis %A Raef Bassily %A Kobbi Nissim %A Smith, Adam %A Thomas Steinke %A Uri Stemmer %A Jonathan Ullman %XAdaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated the formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution P and a set of n independent samples x is drawn from P. We seek an algorithm that, given x as input, accurately answers a sequence of adaptively chosen queries about the unknown distribution P. How many samples n must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions: (i) We give upper bounds on the number of samples n that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015, NIPS, 2015). (ii) We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and an important class of optimization queries. As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that stability implies low generalization error. We also study weaker stability guarantees such as bounded KL divergence and total variation distance.

%B 48th Annual Symposium on the Theory of Computing %G eng %U https://arxiv.org/abs/1511.02513v1 %0 Journal Article %J PLoS Med %D 2016 %T Between Openness and Privacy in Genomics. %A Effy Vayena %A Gasser, Urs %X- Advancing genomic research depends on the accessing and sharing of genomic data. However, the increasing need for sharing escalates the tension between genomic privacy and openness.
- Promoting openness while protecting privacy is a challenge that cannot be overcome only with technical solutions such as encryption and differential privacy. Although such solutions are crucial, we still need to confront some fundamental normative tensions that are intensified in the era of genomics and big data. Here are at least three:
- The right to genomic privacy is not an absolute right. If privacy is understood as control over information or data, privacy is not about maximal levels of control, but rather about reasonable measures of control.
- Although individual control is necessary, it is not a sufficient safeguard of privacy. Individuals’ willingness to be open about their data does not obviate responsibility for reducing privacy risks on the part of the data users.
- Data governance models, such as data cooperatives, that enable meaningful and continuous roles of the individuals whose data are at stake hold promise for reconciling privacy and openness.

This is a graduate textbook of advanced tutorials on the theory of cryptography and computational complexity. In particular, the chapters explain aspects of garbled circuits, public-key cryptography, pseudorandom functions, one-way functions, homomorphic encryption, the simulation proof technique, and the complexity of differential privacy. Most chapters progress methodically through motivations, foundations, definitions, major results, issues surrounding feasibility, surveys of recent developments, and suggestions for further study.

This book honors Professor Oded Goldreich, a pioneering scientist, educator, and mentor. Oded was instrumental in laying down the foundations of cryptography, and he inspired the contributing authors, Benny Applebaum, Boaz Barak, Andrej Bogdanov, Iftach Haitner, Shai Halevi, Yehuda Lindell, Alon Rosen, and Salil Vadhan, themselves leading researchers on the theory of cryptography and computational complexity. The book is appropriate for graduate tutorials and seminars, and for self-study by experienced researchers, assuming prior knowledge of the theory of cryptography.

%B Tutorials on the Foundations of Cryptography %I Yehuda Lindell %G eng %0 Journal Article %J 14th Theory of Cryptography Conference %D 2016 %T Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds %A Mark Bun %A Thomas Steinke %X"Concentrated differential privacy" was recently introduced by Dwork and Rothblum as a relaxation of differential privacy, which permits sharper analyses of many privacy-preserving computations. We present an alternative formulation of the concept of concentrated differential privacy in terms of the Renyi divergence between the distributions obtained by running an algorithm on neighboring inputs. With this reformulation in hand, we prove sharper quantitative results, establish lower bounds, and raise a few new questions. We also unify this approach with approximate differential privacy by giving an appropriate definition of "approximate concentrated differential privacy."

%B 14th Theory of Cryptography Conference %G eng %U https://arxiv.org/abs/1605.02065 %0 Journal Article %J 23rd ACM Conference on Computer and Communications Security, CCS %D 2016 %T Differentially Private Bayesian Programming %A G. Barthe %A P.Y. Strub %A J. Hsu %A A. D. Gordon %A E. J. Gallego Arias %A M. Gaboardi %A G. P. Farina %B 23rd ACM Conference on Computer and Communications Security, CCS %G eng %0 Journal Article %D 2016 %T Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing %A Marco Gaboardi %A Hyun woo Lim %A Ryan Rogers %A Salil Vadhan %XHypothesis testing is a useful statistical tool in determining whether a given model should be rejected based on a sample from the population. Sample data may contain sensitive information about individuals, such as medical information. Thus it is important to design statistical tests that guarantee the privacy of subjects in the data. In this work, we study hypothesis testing subject to differential privacy, specifically chi-squared tests for goodness of fit for multinomial data and independence between two categorical variables.

We propose new tests for goodness of fit and independence testing that like the classical versions can be used to determine whether a given model should be rejected or not, and that additionally can ensure differential privacy. We give both Monte Carlo based hypothesis tests as well as hypothesis tests that more closely follow the classical chi-squared goodness of fit test and the Pearson chi-squared test for independence. Crucially, our tests account for the distribution of the noise that is injected to ensure privacy in determining significance.

We show that these tests can be used to achieve desired significance levels, in sharp contrast to direct applications of classical tests to differentially private contingency tables which can result in wildly varying significance levels. Moreover, we study the statistical power of these tests. We empirically show that to achieve the same level of power as the classical non-private tests our new tests need only a relatively modest increase in sample size.

merging large-scale data sources hold tremendous potential for new scientific research into human biology, behaviors, and relationships. At the same time, big data research presents privacy and ethical challenges that the current regulatory framework is ill-suited to address. In light of the immense value of large-scale research data, the central question moving forward is not whether such data should be made available for research, but rather how the benefits can be captured in a way that respects fundamental principles of ethics and privacy.

In response, this Essay outlines elements of a new ethical framework for big data research. It argues that oversight should aim to provide universal coverage of human subjects research, regardless of funding source, across all stages of the information lifecycle. New definitions and standards should be developed based on a modern understanding of privacy science and the expectations of research subjects. In addition, researchers and review boards should be encouraged to incorporate systematic risk-benefit assessments and new procedural and technological solutions from the wide range of interventions that are available. Finally, oversight mechanisms and the safeguards implemented should be tailored to the intended uses, benefits, threats, harms, and vulnerabilities associated with a specific research activity.

Development of a new ethical framework with these elements should be the product of a dynamic multistakeholder process that is designed to capture the latest scientific understanding of privacy, analytical methods, available safeguards, community and social norms, and best practices for research ethics as they evolve over time. Such a framework would support big data utilization and help harness the value of big data in a sustainable and trust-building manner.

%B Washington and Lee Law Review %V 72 %8 31 Mar, 2016 %G eng %U http://lawreview.journals.wlu.io/elements-of-a-new-ethical-framework-for-big-data-research/ %N 3 %0 Journal Article %J Annals of Statistics %D 2016 %T Inference Using Noisy Degrees: Differentially Private Beta-Model and Synthetic Graphs %A Vishesh Karwa %A Aleksandra Slavković %XThe β-model of random graphs is an exponential family model with the degree sequence as a sufficient statistic. In this paper, we contribute three key results. First, we characterize conditions that lead to a quadratic time algorithm to check for the existence of MLE of the β-model, and show that the MLE never exists for the degree partition β-model. Second, motivated by privacy problems with network data, we derive a differentially private estimator of the parameters of β-model, and show it is consistent and asymptotically normally distributed - it achieves the same rate of convergence as the nonprivate estimator. We present an efficient algorithm for the private estimator that can be used to release synthetic graphs. Our techniques can also be used to release degree distributions and degree partitions accurately and privately, and to perform inference from noisy degrees arising from contexts other than privacy. We evaluate the proposed estimator on real graphs and compare it with a current algorithm for releasing degree distributions and find that it does significantly better. Finally, our paper addresses shortcomings of current approaches to a fundamental problem of how to perform valid statistical inference from data released by privacy mechanisms, and lays a foundational groundwork on how to achieve optimal and private statistical inference in a principled manner by modeling the privacy mechanism; these principles should be applicable to a class of models beyond the β-model.

%B Annals of Statistics %V 44 %P 87-112 %G eng %N 1 %0 Journal Article %J Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS) %D 2016 %T Lipschitz Extensions for NodePrivate Graph Statistics and the Generalized Exponential Mechanism %A Raskhodnikova, Sofya %A Smith, Adam %B Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS) %G eng %0 Conference Paper %B PODS 2016 %D 2016 %T Locating a Small Cluster Privately %A Kobbi Nissim %A Uri Stemmer %A Salil Vadhan %XWe present a new algorithm for locating a small cluster of points with differential privacy [Dwork, McSherry, Nissim,and Smith, 2006]. Our algorithm has implications to private data exploration, clustering, and removal of outliers. Furthermore, we use it to significantly relax the requirements of the sample and aggregate technique [Nissim, Raskhodnikova,and Smith, 2007], which allows compiling of "off the shelf" (non-private) analyses into analyses that preserve differential privacy.

%B PODS 2016 %C ACM SIGMOD/PODS Conference, San Francisco, USA, 2016 %8 June 26 2016 %G eng %0 Conference Paper %B Proceedings of the 12th Theory of Cryptography Conference (TCC 2016) %D 2016 %T Order revealing encryption and the hardness of private learning %A Mark Bun %A Mark Zhandry %XAn order-revealing encryption scheme gives a public procedure by which two ciphertexts can be compared to reveal the ordering of their underlying plaintexts. We show how to use order-revealing encryption to separate computationally efficient PAC learning from efficient (ϵ,δ)-differentially private PAC learning. That is, we construct a concept class that is efficiently PAC learnable, but for which every efficient learner fails to be differentially private. This answers a question of Kasiviswanathan et al. (FOCS '08, SIAM J. Comput. '11).

To prove our result, we give a generic transformation from an order-revealing encryption scheme into one with strongly correct comparison, which enables the consistent comparison of ciphertexts that are not obtained as the valid encryption of any message. We believe this construction may be of independent interest.

Increasingly, governments and businesses are collecting, analyzing, and sharing detailed information about individuals over long periods of time. Vast quantities of data from new sources and novel methods for large-scale data analysis promise to yield deeper understanding of human characteristics, behavior, and relationships and advance the state of science, public policy, and innovation. At the same time, the collection and use of fine-grained personal data over time is associated with significant risks to individuals, groups, and society at large. In this article, we examine a range of longterm data collections, conducted by researchers in social science, in order to identify the characteristics of these programs that drive their unique sets of risks and benefits. We also examine the practices that have been established by social scientists to protect the privacy of data subjects in light of the challenges presented in long-term studies. We argue that many uses of big data, across academic, government, and industry settings, have characteristics similar to those of traditional long-term research studies. In this article, we discuss the lessons that can be learned from longstanding data management practices in research and potentially applied in the context of newly emerging data sources and uses.

%B Brussels Privacy Symposium %C Brussels, Belgium %8 2016 %G eng %0 Journal Article %J Neural Information Processing Systems (NIPS) %D 2016 %T Privacy Odometers and Filters: Pay-as-you-Go Composition %A Ryan Rogers %A Aaron Roth %A Jonathan Ullman %A Salil Vadhan %XIn this paper we initiate the study of adaptive composition in differential privacy when the length of the composition, and the privacy parameters themselves can be chosen adaptively, as a function of the outcome of previously run analyses. This case is much more delicate than the setting covered by existing composition theorems, in which the algorithms themselves can be chosen adaptively, but the privacy parameters must be fixed up front. Indeed, it isn't even clear how to define differential privacy in the adaptive parameter setting. We proceed by defining two objects which cover the two main use cases of composition theorems. A privacy filter is a stopping time rule that allows an analyst to halt a computation before his pre-specified privacy budget is exceeded. A privacy odometer allows the analyst to track realized privacy loss as he goes, without needing to pre-specify a privacy budget. We show that unlike the case in which privacy parameters are fixed, in the adaptive parameter setting, these two use cases are distinct. We show that there exist privacy filters with bounds comparable (up to constants) with existing privacy composition theorems. We also give a privacy odometer that nearly matches non-adaptive private composition theorems, but is sometimes worse by a small asymptotic factor. Moreover, we show that this is inherent, and that any valid privacy odometer in the adaptive parameter setting must lose this factor, which shows a formal separation between the filter and odometer use-cases.

%B Neural Information Processing Systems (NIPS) %8 2016 %G eng %0 Journal Article %J 43rd International Colloquium on Automata, Languages, and Programming %D 2016 %T Sensitivity of Counting Queries %A Myrto Arapinis %A Diego Figueira %A Marco Gaboardi %XIn the context of statistical databases, the release of accurate statistical information about the collected data often puts at risk the privacy of the individual contributors. The goal of differential privacy is to maximize the utility of a query while protecting the individual records in the database. A natural way to achieve differential privacy is to add statistical noise to the result of the query. In this context, a mechanism for releasing statistical information is thus a trade-off between utility and privacy. In order to balance these two "conflicting" requirements, privacy preserving mechanisms calibrate the added noise to the so-called sensitivity of the query, and thus a precise estimate of the sensitivity of the query is necessary to determine the amplitude of the noise to be added. In this paper, we initiate a systematic study of sensitivity of counting queries over relational databases. We first observe that the sensitivity of a Relational Algebra query with counting is not computable in general, and that while the sensitivity of Conjunctive Queries with counting is computable, it becomes unbounded as soon as the query includes a join. We then consider restricted classes of databases (databases with constraints), and study the problem of computing the sensitivity of a query given such constraints. We are able to establish bounds on the sensitivity of counting conjunctive queries over constrained databases. The kind of constraints studied here are: functional dependencies and cardinality dependencies. The latter is a natural generalization of functional dependencies that allows us to provide tight bounds on the sensitivity of counting conjunctive queries.

%B 43rd International Colloquium on Automata, Languages, and Programming %G eng %0 Journal Article %J Proceedings of the 14th Theory of Cryptography Conference (TCC 2016-B) %D 2016 %T Separating Computational and Statistical Differential Privacy in the Client-Server Model %A Mark Bun %A Yi Hsiu Chen %A Salil Vadhan %XDifferential privacy is a mathematical definition of privacy for statistical data analysis. It guarantees that any (possibly adversarial) data analyst is unable to learn too much information that is specific to an individual. Mironov et al. (CRYPTO 2009) proposed several computational relaxations of differential privacy (CDP), which relax this guarantee to hold only against computationally bounded adversaries. Their work and subsequent work showed that CDP can yield substantial accuracy improvements in various multiparty privacy problems. However, these works left open whether such improvements are possible in the traditional client-server model of data analysis. In fact, Groce, Katz and Yerukhimovich (TCC 2011) showed that, in this setting, it is impossible to take advantage of CDP for many natural statistical tasks. Our main result shows that, assuming the existence of sub-exponentially secure one-way functions and 2-message witness indistinguishable proofs (zaps) for NP, that there is in fact a computational task in the client-server model that can be efficiently performed with CDP, but is infeasible to perform with information-theoretic differential privacy.

%B Proceedings of the 14th Theory of Cryptography Conference (TCC 2016-B) %8 Aug 2016 %G eng %0 Journal Article %J Berkeley Technology Law Journal %D 2016 %T Towards a Modern Approach to Privacy-Aware Government Data Releases %A Micah Altman %A Alexandra Wood %A O'Brien, David %A Salil Vadhan %A Gasser, Urs %XThis article summarizes research exploring various models by which governments release data to the public and the interventions in place to protect the privacy of individuals in the data. Applying concepts from the recent scientific and legal literature on privacy, the authors propose a framework for a modern privacy analysis and illustrate how governments can use the framework to select appropriate privacy controls that are calibrated to the specific benefits and risks in individual data releases.

%B Berkeley Technology Law Journal %V 30 %G eng %N 3 %0 Thesis %B Computer Science Department at Harvard University %D 2016 %T Upper and Lower Bounds for Privacy and Adaptivity in Algorithmic Data Analysis %A Thomas Steinke %B Computer Science Department at Harvard University %G eng %9 PhD Thesis