"Privacy Integrated Queries: A Programming Language for Differentially-Private Computation" (CRCS Lunch Seminar)

Presentation Date: 

Monday, November 1, 2010

CRCS Lunch Seminar

Date: Monday, November 1, 2010
Speaker: Frank McSherry, Microsoft Research Silicon Valley
Title: Privacy Integrated Queries: A Programming Language for Differentially-Private Computation

Abstract: Large volumes of sensitive data are currently collected by an array of agencies, companies, and other organizations. While these data clearly hold great potential for analysis, they can also reflect sensitive information about their participants. Scientists have struggled with the tension between extracting valuable statistical information from these datasets without accidentally disclosing specifics of individual records.

A recent privacy criterion, differential privacy, formally constrains the disclosure of specifics of individual records, without precluding the release of statistical information. Differential privacy requires that the outcome of a computation be almost as likely with and without any one record; to each participant, the analysis behaves as if it did not have access to the participant’s data.

While differential privacy is very strong, its use to date has been restricted to privacy experts; a small collection of highly-trained individuals who, no matter how motivated, are not able to satisfy the enormous volume of the world’s data analysis needs. To this end, we have assembled a programming language in which any program provides differential privacy, without requiring an expert privacy analysis. The language is almost identical to LINQ, a SQL-like extension to C#, and is readily useable by analysts with only a modest background in programming. We will discuss the design, implementation, and application of this language across a variety of data analysis contexts.

Bio: Frank McSherry is a researcher at Microsoft Research’s Silicon Valley lab. His research focus is on large scale data analysis, with a recent focus on issues of privacy and confidentiality. In particular, he helped to develop the recent definition of Differential Privacy, and designed and implemented the Privacy Integrated Queries data analysis platform providing these guarantees.