Publications by Year: 2013

2013
George Alter, Micah Altman, Mark Abrahamson, Roper Center, Merce Crosas, Jon Crabtree, and Ted Hull. 6/25/2013. “Response to the National Institute of Health Request Information: Input on Development of NIH Data Catalog”.
ShivaPrasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2013. “Analyzing Graphs with Node Differential Privacy.” In Proceedings of the 10th Theory of Cryptography Conference on Theory of Cryptography, Pp. 457–476. Berlin, Heidelberg: Springer-Verlag. Publisher's Version
Jonathan Ullman. 2013. “Answering n{2+o(1)} counting queries with differential privacy is hard.” In Proceedings of the 45th annual ACM symposium on Symposium on theory of computing, Pp. 361-370. Palo Alto, California, USA: ACM. DOIAbstract
A central problem in differentially private data analysis is how to design efficient algorithms capable of answering large numbers of counting queries on a sensitive database. Counting queries are of the form "What fraction of individual records in the database satisfy the property q?" We prove that if one-way functions exist, then there is no algorithm that takes as input a database db ∈ dbset, and k = ~Θ(n2) arbitrary efficiently computable counting queries, runs in time poly(d, n), and returns an approximate answer to each query, while satisfying differential privacy. We also consider the complexity of answering "simple" counting queries, and make some progress in this direction by showing that the above result holds even when we require that the queries are computable by constant-depth (AC0) circuits. Our result is almost tight because it is known that ~Ω(n2) counting queries can be answered efficiently while satisfying differential privacy. Moreover, many more than n2 queries (even exponential in n) can be answered in exponential time. We prove our results by extending the connection between differentially private query release and cryptographic traitor-tracing schemes to the setting where the queries are given to the sanitizer as input, and by constructing a traitor-tracing scheme that is secure in this setting.
Amos Beimel, Kobbi Nissim, and Uri Stemmer. 2013. “Characterizing the sample complexity of private learners.” In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, Pp. 97-110. Berkeley, California, USA: ACM. DOIAbstract
In 2008, Kasiviswanathan el al. defined private learning as a combination of PAC learning and differential privacy [16]. Informally, a private learner is applied to a collection of labeled individual information and outputs a hypothesis while preserving the privacy of each individual. Kasiviswanathan et al. gave a generic construction of private learners for (finite) concept classes, with sample complexity logarithmic in the size of the concept class. This sample complexity is higher than what is needed for non-private learners, hence leaving open the possibility that the sample complexity of private learning may be sometimes significantly higher than that of non-private learning. We give a combinatorial characterization of the sample size sufficient and necessary to privately learn a class of concepts. This characterization is analogous to the well known characterization of the sample complexity of non-private learning in terms of the VC dimension of the concept class. We introduce the notion of probabilistic representation of a concept class, and our new complexity measure RepDim corresponds to the size of the smallest probabilistic representation of the concept class. We show that any private learning algorithm for a concept class C with sample complexity m implies RepDim(C) = O(m), and that there exists a private learning algorithm with sample complexity m = O(RepDim(C)). We further demonstrate that a similar characterization holds for the database size needed for privately computing a large class of optimization problems and also for the well studied problem of private data release.
Justin Hsu, Aaron Roth, and Jonathan Ullman. 2013. “Differential privacy for the analyst via private equilibrium computation.” In Proceedings of the 45th annual ACM symposium on Symposium on theory of computing, Pp. 341-350. Palo Alto, California, USA: ACM. DOIAbstract
We give new mechanisms for answering exponentially many queries from multiple analysts on a private database, while protecting dif- ferential privacy both for the individuals in the database and for the analysts. That is, our mechanism's answer to each query is nearly insensitive to changes in the queries asked by other analysts. Our mechanism is the first to offer differential privacy on the joint distribution over analysts' answers, providing privacy for data an- alysts even if the other data analysts collude or register multiple accounts. In some settings, we are able to achieve nearly optimal error rates (even compared to mechanisms which do not offer an- alyst privacy), and we are able to extend our techniques to handle non-linear queries. Our analysis is based on a novel view of the pri- vate query-release problem as a two-player zero-sum game, which may be of independent interest.
Latanya Sweeney. 2013. “Discrimination in online ad delivery.” Commun. ACM, 56, 5, Pp. 44–54. DOIAbstract
Mark Bun and Justin Thaler. 2013. “Dual Lower Bounds for Approximate Degree and Markov-Bernstein Inequalities.” Edited by FedorV. Fomin, Rūsiņš Freivalds, Marta Kwiatkowska, and David Peleg. Automata, Languages, and Programming, 7965, Pp. 303-314. DOIAbstract
The ε-approximate degree of a Boolean function f: { − 1, 1} n  → { − 1, 1} is the minimum degree of a real polynomial that approximates f to within ε in the ℓ ∞  norm. We prove several lower bounds on this important complexity measure by explicitly constructing solutions to the dual of an appropriate linear program. Our first result resolves the ε-approximate degree of the two-level AND-OR tree for any constant ε > 0. We show that this quantity is Θ(n‾‾√) , closing a line of incrementally larger lower bounds [3,11,21,30,32]. The same lower bound was recently obtained independently by Sherstov using related techniques [25]. Our second result gives an explicit dual polynomial that witnesses a tight lower bound for the approximate degree of any symmetric Boolean function, addressing a question of Špalek [34]. Our final contribution is to reprove several Markov-type inequalities from approximation theory by constructing explicit dual solutions to natural linear programs. These inequalities underly the proofs of many of the best-known approximate degree lower bounds, and have important uses throughout theoretical computer science.
S.H. Chan, T.B. Costa, and E.M. Airoldi. 2013. “Estimation of exchangeable graph models by stochastic blockmodel approximation.” In Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE, Pp. 293-296.
Karthekeyan Chandrasekaran, Justin Thaler, Jonathan Ullman, and Andrew Wan. 2013. “Faster Private Release of Marginals on Small Databases.” CoRR, abs/1304.3754. arXiv.orgAbstract
We study the problem of answering \emph{$k$-way marginal} queries on a database $D \in (\{0,1\}^d)^n$, while preserving differential privacy. The answer to a $k$-way marginal query is the fraction of the database's records $x \in \{0,1\}^d$ with a given value in each of a given set of up to $k$ columns. Marginal queries enable a rich class of statistical analyses on a dataset, and designing efficient algorithms for privately answering marginal queries has been identified as an important open problem in private data analysis. For any $k$, we give a differentially private online algorithm that runs in time $$\min{\exp(d^{1-\Omega(1/\sqrt{k})}), \exp(d / \log^{.99} d)\}$$ per query and answers any (possibly superpolynomially long and adaptively chosen) sequence of $k$-way marginal queries up to error at most $\pm .01$ on every query, provided $n \gtrsim d^{.51}$. To the best of our knowledge, this is the first algorithm capable of privately answering marginal queries with a non-trivial worst-case accuracy guarantee on a database of size $\poly(d, k)$ in time $\exp(o(d))$. Our algorithms are a variant of the private multiplicative weights algorithm (Hardt and Rothblum, FOCS '10), but using a different low-weight representation of the database. We derive our low-weight representation using approximations to the OR function by low-degree polynomials with coefficients of bounded $L_1$-norm. We also prove a strong limitation on our approach that is of independent approximation-theoretic interest. Specifically, we show that for any $k = o(\log d)$, any polynomial with coefficients of $L_1$-norm $poly(d)$ that pointwise approximates the $d$-variate OR function on all inputs of Hamming weight at most $k$ must have degree $d^{1-O(1/\sqrt{k})}$.
Latanya Sweeney, Akua Abu, and Julia Winn. 2013. “Identifying Participants in the Personal Genome Project by Name.” Data Privacy Lab, IQSS, Harvard University. Project website
Guy N. Rothblum, Salil Vadhan, and Avi Wigderson. 2013. “Interactive proofs of proximity: delegating computation in sublinear time.” In Proceedings of the 45th annual ACM symposium on Symposium on theory of computing, Pp. 793-802. Palo Alto, California, USA: ACM. DOIAbstract

We study interactive proofs with sublinear-time verifiers. These proof systems can be used to ensure approximate correctness for the results of computations delegated to an untrusted server. Following the literature on property testing, we seek proof systems where with high probability the verifier accepts every input in the language, and rejects every input that is far from the language. The verifier's query complexity (and computation complexity), as well as the communication, should all be sublinear. We call such a proof system an Interactive Proof of Proximity (IPP). On the positive side, our main result is that all languages in NC have Interactive Proofs of Proximity with roughly √n query and communication and complexities, and polylog(n) communication rounds. This is achieved by identifying a natural language, membership in an affine subspace (for a structured class of subspaces), that is complete for constructing interactive proofs of proximity, and providing efficient protocols for it. In building an IPP for this complete language, we show a tradeoff between the query and communication complexity and the number of rounds. For example, we give a 2-round protocol with roughly n3/4 queries and communication. On the negative side, we show that there exist natural languages in NC1, for which the sum of queries and communication in any constant-round interactive proof of proximity must be polynomially related to n. In particular, for any 2-round protocol, the sum of queries and communication must be at least ~Ω(√n). Finally, we construct much better IPPs for specific functions, such as bipartiteness on random or well-mixing graphs, and the majority function. The query complexities of these protocols are provably better (by exponential or polynomial factors) than what is possible in the standard property testing model, i.e. without a prover.

Latanya Sweeney. 2013. “Matching Known Patients to Health Records in Washington State Data.” Data Privacy Lab, IQSS, Harvard University. Project website
Amos Beimel, Kobbi Nissim, Uri Stemmer, Prasad Raghavendra, Sofya Raskhodnikova, Klaus Jansen, and JoséD.P. Rolim. 2013. “Private Learning and Sanitization: Pure vs. Approximate Differential Privacy.” In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 8096: Pp. 363-378. Springer Berlin Heidelberg. Publisher's Version
Latanya Sweeney, William A Yasnoff, and Edward H Shortliffe. 2013. “Putting Health IT on the Path to Success.” JAMA, 309, 10, Pp. 989-990. DOIAbstract
The promise of health information technology (HIT) is comprehensive electronic patient records when and where needed, leading to improved quality of care at reduced cost. However, physician experience and other available evidence suggest that this promise is largely unfulfilled. Serious flaws in current approaches to health information exchanges: (1) complex and expensive; (2) prone to error and insecurity; (3) increase liability; (4) not financially sustainable; (5) unable to protect privacy; (6) unable to ensure stakeholder cooperation; and, (7) unable to facilitate robust data sharing. The good news is that personal health record banks pose a viable alternative that is: (a) simpler; (b) scalable; (c) less expensive; (d) more secure; (e) community oriented to ensure stakeholder participation; and, (e) capable of providing the most comprehensive records. The idea of patient controlled records is not new, but what is new is how personally controlled records can help achieve the HIT vision.
Sean Hooley and Latanya Sweeney. 2013. “Survey of Publicly Available State Health Databases.” Data Privacy Lab, IQSS, Harvard University. Project website
Yiling Chen, Stephen Chong, Ian A. Kash, Tal Moran, and Salil Vadhan. 2013. “Truthful mechanisms for agents that value privacy.” In Proceedings of the fourteenth ACM conference on Electronic commerce, Pp. 215-232. Philadelphia, Pennsylvania, USA: ACM. DOIAbstract
Recent work has constructed economic mechanisms that are both truthful and differentially private. In these mechanisms, privacy is treated separately from the truthfulness; it is not incorporated in players' utility functions (and doing so has been shown to lead to non-truthfulness in some cases). In this work, we propose a new, general way of modelling privacy in players' utility functions. Specifically, we only assume that if an outcome o has the property that any report of player i would have led to o with approximately the same probability, then o has small privacy cost to player i. We give three mechanisms that are truthful with respect to our modelling of privacy: for an election between two candidates, for a discrete version of the facility location problem, and for a general social choice problem with discrete utilities (via a VCG-like mechanism). As the number n of players increases, the social welfare achieved by our mechanisms approaches optimal (as a fraction of n).