Privacy-Preserving Scientific Data Analysis in an Open Cloud

Slides Available Here

Abstract: There is an unprecedented increase in the amount of digitally available research data sets. Unfortunately, we have been observing for a while now that a certain segment of our scientific user base cannot enjoy the full transformative capacity achievable within our cyberinfrastructure. Due to concerns about privacy & confidentiality, the threat of data spills, and the potential for misuse or exploitation, many researchers isolate themselves rather than sharing data for the benefit of fellow scientists and society at large. In this talk, I will describe our work in progress to build a data repository and computational framework that enables participants to do analytics over data sets (even ones they cannot read) in a cryptographically protected manner. This work exploits the synergy between three projects: Dataverse provides the data management and access control infrastructure, Conclave provides a method for cryptographically secure multiparty computation at scale, and the Massachusetts Open Cloud provides the isolated computational environments and low-latency communication that underlie Conclave's security and performance guarantees. This work addresses an important need in research computing: enabling scientific workflows involving collaborative experiments or replication/extension of existing results when the underlying data are encumbered by privacy concerns.

Bio: Mayank Varia is the Director for the MACS project. His research interests span theoretical and applied cryptography and their application to problems throughout computer science. Previously, he worked for four years at MIT Lincoln Laboratory, where he designed and evaluated high performance privacy-enhancing data search technology, created information theoretic metrics to quantify privacy, and developed algorithms to capture linguistic provenance automatically. He received a Ph.D. in mathematics from MIT for his work on program obfuscation.