Raquel Hill: "Evaluating the Utility of a Differentially Private Behavioral Science Dataset"

Presentation Date: 

Wednesday, October 2, 2013


Maxwell Dworkin 119

Date: Wednesday, October 2, 2013
Time: 12:00pm – 1:30pm
Place: Maxwell Dworkin 119

Speaker: Raquel Hill, Harvard CRCS and Indiana University

Title: Evaluating the Utility of a Differentially Private Behavioral Science Dataset

Abstract: Social and behavioral scientists often collect and maintain datasets that are high-dimensional, including some combination of demographic, medical, sexual, and other personal information), which presents opportunities to characterize participants in unique ways. The conventional wisdom for protecting the privacy of such participants is to either not ask certain questions or to remove or recode potentially identifiable information. The premise of the research discussed here is that neither approach may be sufficient for preventing the (re)identification of participants in large and/or multidimensional datasets. Per human subjects guidelines, researchers need to consider all of the potential risks including whether any disclosure of the subjects’ responses outside of the research could reasonably place the subjects at risk of criminal or civil liability or be damaging to the subjects’ financial standing, employability, insurability, or reputation .

In this work, I present new results of a use-case analysis that evaluates Differential Privacy (hereafter referred to as DP) as a technique to protect behavioral science datasets while preserving their research utility. DP is a data perturbation technique that provides strong and formal privacy guarantees. The essential goal is to prevent a possible adversary from discovering whether or not some specific individual's data is present in a differentially private dataset, given some risk threshold. The use cases are derived from a study by the Kinsey Institute for Research in Sex, Gender and Reproduction that looks at predictors of unprotected sex and unplanned pregnancy. The specific use cases evaluate the likelihood of a participant reporting an unplanned pregnancy and the likelihood of a participant reporting having unsafe sex in the last 12 months (both binary outcomes).While there are many theoretical results ('in vitro') for DP and utility outside of actual data cases, very little has been done to evaluate its effect on data utility in a real-world research setting ('in vivo'). To my knowledge, this is the first work to evaluate the utility of differentially private behavioral science data.

Bio: Raquel Hill is an Associate Professor of Computer Science in the School of Informatics and Computing at Indiana University.  She is also a Visiting Scholar at the Center for Research on Computation and Society at Harvard University. Dr. Hill’s primary research interests are in the areas of trust and security of distributed computing environments and data privacy with a specific interest in privacy protection mechanisms for medical-related social science datasets. Her research is funded by various sources, including the National Science Foundation. She holds B.S. and M.S. degrees in Computer Science from the Georgia Institute of Technology and a Ph.D. in Computer Science from Harvard University. Prior to joining the School of Informatics and Computing, Dr. Hill was a Post-Doctoral Research Associate at the University of Illinois Urbana-Champaign, with a joint appointment with the Department of Computer Science and the National Center for Super Computing Applications (NCSA).