This article presents a self-help software system that makes rights accessible through an on-line interview. The interview is based on a formal model of the relevant jurisprudence and does not require the involvement of a service representative, only a user who wants to understand his or her rights. In addition, the article provides a methodology for building models and interviews for similar social contexts and describes building a model for workers’ rights according to Israeli law, upon completing their employment. In addition to conducting interviews, these models can be used to create diagrams and perform legal queries. This kind of system can fulfill a central role in empowering disadvantaged populations, as it enables people to asses their rights in a user-friendly manner, which is personalized to the situation of the interviewee and not overburdened with large amounts of information that it is difficult to navigate.
In a recent paper, Chan et al. [SODA ’19] proposed a relaxation of the notion of (full) memory obliviousness, which was introduced by Goldreich and Ostrovsky [J. ACM ’96] and extensively researched by cryptographers. The new notion, differential obliviousness, requires that any two neighboring inputs exhibit similar memory access patterns, where the similarity requirement is that of differential privacy.
Chan et al. demonstrated that differential obliviousness allows achieving improved efficiency for several algorithmic tasks, including sorting, merging of sorted lists, and range query data structures.
In this work, we continue the exploration of differential obliviousness, focusing on algorithms that do not necessarily examine all their input. This choice is motivated by the fact that the existence of logarithmic overhead ORAM protocols implies that differential obliviousness can yield at most a logarithmic improvement in efficiency for computations that need to examine all their input. In particular, we explore property testing, where we show that differential obliviousness yields an almost linear improvement in overhead in the dense graph model, and at most quadratic improvement in the bounded degree model.
We also explore tasks where a non-oblivious algorithm would need to explore different portions of the input, where the latter would depend on the input itself, and where we show that such a behavior can be maintained under differential obliviousness, but not under full obliviousness. Our examples suggest that there would be benefits in further exploring which class of computational tasks are amenable to differential obliviousness.
Symbolic execution is a classical program analysis technique used to show that programs satisfy or violate given specifications. In this work we generalize symbolic execution to support program analysis for relational specifications in the form of relational properties - these are properties about two runs of two programs on related inputs, or about two executions of a single program on related inputs. Relational properties are useful to formalize notions in security and privacy, and to reason about program optimizations. We design a relational symbolic execution engine, named RelSymwhich supports interactive refutation, as well as proving of relational properties for programs written in a language with arrays and for-like loops.
We consider the problem of designing and analyzing differentially private algorithms that can be implemented on discrete models of computation in strict polynomial time, motivated by known attacks on floating point implementations of real-arithmetic differentially private algorithms (Mironov, CCS 2012) and the potential for timing attacks on expected polynomial-time algorithms. As a case study, we examine the basic problem of approximating the histogram of a categorical dataset over a possibly large data universe \(X\). The classic Laplace Mechanism (Dwork, McSherry, Nissim, Smith, TCC 2006 and J. Privacy & Condentiality 2017) does not satisfy our requirements, as it is based on real arithmetic, and natural discrete analogues, such as the Geometric Mechanism (Ghosh, Roughgarden, Sundarajan, STOC 2009 and SICOMP 2012), take time at least linear in \(|X|\), which can be exponential in the bit length of the input.
In this paper, we provide strict polynomial-time discrete algorithms for approximate histograms whose simultaneous accuracy (the maximum error over all bins) matches that of the Laplace Mechanism up to constant factors, while retaining the same (pure) differential privacy guarantee. One of our algorithms produces a sparse histogram as output. Its "per-bin accuracy" (the error on individual bins) is worse than that of the Laplace Mechanism by a factor of \(\log |X|\), but we prove a lower bound showing that this is necessary for any algorithm that produces a sparse histogram. A second algorithm avoids this lower bound, and matches the per-bin accuracy of the Laplace Mechanism, by producing a compact and eciently computable representation of a dense histogram; it is based on an \((n + 1)\) - wise independent implementation of an appropriately clamped version of the Discrete Geometric Mechanism.
This work studies differential privacy in the context of the recently proposed shuffle model. Unlike in the local model, where the server collecting privatized data from users can track back an input to a specific user, in the shuffle model users submit their privatized inputs to a server anonymously. This setup yields a trust model which sits in between the classical curator and local models for differential privacy. The shuffle model is the core idea in the Encode, Shuffle, Analyze (ESA) model introduced by Bittau et al. (SOPS 2017). Recent work by Cheu et al. (EUROCRYPT 2019) analyzes the differential privacy properties of the shuffle model and shows that in some cases shuffled protocols provide strictly better accuracy than local protocols. Additionally, Erlingsson et al. (SODA 2019) provide a privacy amplification bound quantifying the level of curator differential privacy achieved by the shuffle model in terms of the local differential privacy of the randomizer used by each user. In this context, we make three contributions. First, we provide an optimal single message protocol for summation of real numbers in the shuffle model. Our protocol is very simple and has better accuracy and communication than the protocols for this same problem proposed by Cheu et al. Optimality of this protocol follows from our second contribution, a new lower bound for the accuracy of private protocols for summation of real numbers in the shuffle model. The third contribution is a new amplification bound for analyzing the privacy of protocols in the shuffle model in terms of the privacy provided by the corresponding local randomizer. Our amplification bound generalizes the results by Erlingsson et al. to a wider range of parameters, and provides a whole family of methods to analyze privacy amplification in the shuffle model.
Distributed applications cannot assume that their security policies will be enforced on untrusted hosts. Trusted execution environments (TEEs) combined with cryptographic mechanisms enable execution of known code on an untrusted host and the exchange of confidential and authenticated messages with it. TEEs do not, however, establish the trustworthiness of code executing in a TEE. Thus, developing secure applications using TEEs requires specialized expertise and careful auditing. This paper presents DFLATE, a core security calculus for distributed applications with TEEs. DFLATE offers high-level abstractions that reflect both the guarantees and limitations of the underlying security mechanisms they are based on. The accuracy of these abstractions is exhibited by asymmetry between confidentiality and integrity in our formal results: DFLATE enforces a strong form of noninterference for confidentiality, but only a weak form for integrity. This reflects the asymmetry of the security guarantees of a TEE: a malicious host cannot access secrets in the TEE or modify its contents, but they can suppress or manipulate the sequence of its inputs and outputs. Therefore DFLATE cannot protect against the suppression of high-integrity messages, but when these messages are delivered, their contents cannot have been influenced by an attacker.
A secret-sharing scheme allows some authorized sets of parties to reconstruct a secret; the collection of authorized sets is called the access structure. For over 30 years, it was known that any (monotone) collection of authorized sets can be realized by a secret-sharing scheme whose shares are of size2n−o(n)and until recently no better scheme was known. In a recent breakthrough, Liu and Vaikuntanathan (STOC 2018) have reduced the share size to O(20.994n) Our first contribution is improving the exponent of secret sharing down to 0.892. For the special case of linear secret-sharing schemes, we get an exponent of 0.942 (compared to 0.999 of Liu and Vaikuntanathan).Motivated by the construction of Liu and Vaikuntanathan, we study secret-sharing schemes for uniform access structures. An access structure is k-uniform if all sets of size larger than k are authorized, all sets of size smaller than k are unauthorized, and each set of size k can be either authorized or unauthorized. The construction of Liu and Vaikuntanathan starts from protocols for conditional disclosure of secrets, constructs secret-sharing schemes for uniform access structures from them, and combines these schemes in order to obtain secret-sharing schemes for general access structures. Our second contribution in this paper is constructions of secret-sharing schemes for uniform access structures. We achieve the following results:A secret-sharing scheme for k-uniform access structures for large secrets in which the share size is O(k2) times the size of the secret.A linear secret-sharing scheme for k-uniform access structures for a binary secret in which the share size is~O(2h(k/n)n/2) (where h is the binary entropy function). By counting arguments, this construction is optimal (up to polynomial factors).A secret-sharing scheme for k-uniform access structures for a binary secret in which the share size is 2~O(√klogn). Our third contribution is a construction of ad-hoc PSM protocols, i.e., PSM protocols in which only a subset of the parties will compute a function on their inputs. This result is based on ideas we used in the construction of secret-sharing schemes for k-uniform access structures for a binary secret.
Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. This work answers a basic question about privately testing simple hypotheses: given two distributions P and Q, and a privacy level ε, how many i.i.d. samples are needed to distinguish P from Q subject to ε-differential privacy, and what sort of tests have optimal sample complexity? Specifically, we characterize this sample complexity up to constant factors in terms of the structure of P and Q and the privacy level ε, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test. Our result is an analogue of the classical Neyman–Pearson lemma in the setting of private hypothesis testing. We also give an application of our result to private change-point detection. Our characterization applies more generally to hypothesis tests satisfying essentially any notion of algorithmic stability, which is known to imply strong generalization bounds in adaptive data analysis, and thus our results have applications even when privacy is not a primary concern.
A new line of work [6, 9, 15, 2] demonstrates how differential privacy  can be used as a mathematical tool for guaranteeing generalization in adaptive data analysis. Specifically, if a differentially private analysis is applied on a sample S of i.i.d. examples to select a lowsensitivity function f , then w.h.p. f (S) is close to its expectation, although f is being chosen based on the data. Very recently, Steinke and Ullman  observed that these generalization guarantees can be used for proving concentration bounds in the non-adaptive setting, where the low-sensitivity function is fixed beforehand. In particular, they obtain alternative proofs for classical concentration bounds for low-sensitivity functions, such as the Chernoff bound and McDiarmid’s Inequality. In this work, we set out to examine the situation for functions with high-sensitivity, for which differential privacy does not imply generalization guarantees under adaptive analysis. We show that differential privacy can be used to prove concentration bounds for such functions in the non-adaptive setting.
We present a private learner for halfspaces over an arbitrary finite domain X⊂ℝd with sample complexity mathrmpoly(d,2log∗|X|). The building block for this learner is a differentially private algorithm for locating an approximate center point of m>poly(d,2log∗|X|) points -- a high dimensional generalization of the median function. Our construction establishes a relationship between these two problems that is reminiscent of the relation between the median and learning one-dimensional thresholds [Bun et al.\ FOCS '15]. This relationship suggests that the problem of privately locating a center point may have further applications in the design of differentially private algorithms. We also provide a lower bound on the sample complexity for privately finding a point in the convex hull. For approximate differential privacy, we show a lower bound of m=Ω(d+log∗|X|), whereas for pure differential privacy m=Ω(d log|X|).
Is it possible to piece together the confidential data of almost everyone in the US from statistics published by the Census Bureau—without breaching Census security or policy? Could someone—a doctor, a nosy neighbor, or a foreign state actor—determine whether a particular person participated in a genetic study of hundreds of individuals, when each individual contributed only tiny trace amounts of DNA to a highly complex and aggregated genetic mixture? Could police detectives re-trace a suspect’s every movement over the course of many months and thereby learn intimate details about the suspect’s political, religious, and sexual associations—without having to deploy any sort of surveillance or tracking devices? Could someone reliably deduce the sexual preferences of a Facebook user without looking at any content that user has shared?
Until recently, most people probably never imagined that their highly sensitive personal data could be so vulnerable to discovery from seemingly innocuous sources. Many continue to believe that the privacy risks from purely public, statistical, and anonymised data are merely theoretical, and that the practical risks are negligibly small. Yet all of the privacy violations described above are not only theoretically possible—they have already been successfully executed.
The foregoing examples of real-world privacy attacks all leverage one particular vulnerability that we refer to as composition effects. This vulnerability stems from the cumulative erosions of privacy that inhere in every piece of data about people. These erosions occur no matter how aggregated, insignificant, or anonymised the data may seem, and even small erosions can combine in unanticipated ways to create big risks.
Privacy and data protection failures from unanticipated composition effects reflect a type of data myopia—a short-sighted approach toward addressing increasingly-ubiquitous surveillance and privacy risks from Big Data analytics, characterized by a near-total focus on individual data processors and processes and by pervasive underestimation of systemic risks accumulating from independent data products. The failure to recognize accumulation of risk in the information ecosystem reflects a more general societal blind spot to cumulative systemic risks, with parallels in collective failures to foresee or forestall global financial crises, and to adequately address mounting risks to the natural environment.
As the volume and complexity of data uses and publications grow rapidly across a broad range of contexts, the need to develop frameworks for addressing cumulative privacy risks is likely to become an increasingly urgent and widespread problem. Threats to privacy are growing due to the accelerating abundance, and richness, of data about individuals being generated and made publicly available. Furthermore, substantial increases in computing power and algorithmic improvements are making the execution of such attacks more technically feasible. These threats will be impossible to overcome unless regulations are designed to explicitly regulate cumulative risk in a manner that is consistent with the science of composition effects.