PCPs and the Hardness of Generating Synthetic Data


Jon Ullman and Salil Vadhan. 2011. “PCPs and the Hardness of Generating Synthetic Data.” In Proceedings of the 8th IACR Theory of Cryptography Conference (TCC `11), edited by Yuval Ishai, Lecture Notes on Computer Science, 5978: Pp. 572–587. Providence, RI: Springer-Verlag. Date Presented: 28–30 March. Springer Link


Assuming the existence of one-way functions, we show that there is no polynomial-time, differentially private algorithm A that takes a database D\in ({0,1}^d)^n and outputs a ``synthetic database'' D' all of whose two-way marginals are approximately equal to those of D. (A two-way marginal is the fraction of database rows x\in {0,1}^d with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time poly(n,2^d). Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with PCP-based Levin-reductions from NP search problems to finding approximate solutions to CSPs.


Full version posted as {\em ECCC} TR10-017.

Acknowledgements: This paper was supported, in part, by NSF grant CNS-0831289.
Last updated on 04/13/2019