Bridging Privacy Definitions

This working group - composed of privacy experts across disciplines - explores the range of privacy-related definitions from law, computer science, and social science, covering topics such as measures of informational harm, de-identification techniques, formal privacy models such as differential privacy, and privacy standards from laws such as FERPA and HIPAA. The group explores the nature of these definitions, the relationships and gaps between them, and potential methods of bridging the disciplinary divide.

A recent product from this working group is a methodology for extracting a mathematical model from a legal standard such as FERPA. This product can be used to demonstrate that a privacy technology satisfies any given legal standard.

For the 2016-2017 year, we plan to focus on questions related to the broad conceptualization of informational harms, including group harms like discrimination and their relationship to the types of harms addressed by formal privacy definitions like differential privacy. We are also looking to develop methods for setting formal privacy parameters (like the differential privacy parameter epsilon) based on accepted legal, ethical, and social notions.

We are excited to hear from anyone seeking to explore multidisciplinary approaches to privacy. For more information and to join our mailing list, please contact Lindsay Froess at



Salil Vadhan

Salil Vadhan

Principal Investigator
Vicky Joseph Professor of Computer Science and Applied Mathematics, SEAS, Harvard

Salome Viljoen

Research Fellow, Berkman Klein Center for Internet & Society, Harvard

Michel Reymond

Postdoctoral Researcher and Teaching Assistant at the University of Geneva, Switzerland
Visiting Researcher at the Berkman Center
Legal Intern at Byrne-Sutton, Bollen, Kern
Aaron Bembenek

Aaron Bembenek

Graduate researcher (summer 2015), Special Student affiliated with the School of Engineering and Applied Sciences, Harvard University
Graduate researcher (fall 2015)

Aaron Bembenek is a graduate researcher in programming languages with Steve Chong. He joined as a summer 2015 intern, and continued his work into the...

Read more about Aaron Bembenek
  • 1 of 3
  • »


Micah Altman, Alexandra Wood, David R. O'Brien, and Urs Gasser. 2016. “Practical Approaches to Big Data Privacy Over Time.” Brussels Privacy Symposium.Abstract

Increasingly, governments and businesses are collecting, analyzing, and sharing detailed information about individuals over long periods of time. Vast quantities of data from new sources and novel methods for large-scale data analysis promise to yield deeper understanding of human characteristics, behavior, and relationships and advance the state of science, public policy, and innovation. At the same time, the collection and use of fine-grained personal data over time is associated with significant risks to individuals, groups, and society at large. In this article, we examine a range of longterm data collections, conducted by researchers in social science, in order to identify the characteristics of these programs that drive their unique sets of risks and benefits. We also examine the practices that have been established by social scientists to protect the privacy of data subjects in light of the challenges presented in long-term studies. We argue that many uses of big data, across academic, government, and industry settings, have characteristics similar to those of traditional long-term research studies. In this article, we discuss the lessons that can be learned from longstanding data management practices in research and potentially applied in the context of newly emerging data sources and uses.

K. Nissim, A Bembenek, A Wood, M Bun, M Gaboardi, U. Gasser, D O'Brien, T Steinke, and S. Vadhan. 2018. “Bridging the Gap between Computer Science and Legal Approaches to Privacy .” In , 2nd ed., 31: Pp. 687-780. Harvard Journal of Law & Technology. Publisher's VersionAbstract
The fields of law and computer science incorporate contrasting notions of the privacy risks associated with the analysis and release of statistical data about individuals and groups of individuals. Emerging concepts from the theoretical computer science literature provide formal mathematical models for quantifying and mitigating privacy risks, where the set of risks they take into account is much broader than the privacy risks contemplated by many privacy laws. An example of such a model is differential privacy, which provides a provable guarantee of privacy against a wide range of potential attacks, including types of attacks currently unknown or unforeseen. The subject of much theoretical investigation, new privacy technologies based on formal models have recently been making significant strides towards practical implementation. For these tools to be used with sensitive personal information, it is important to demonstrate that they satisfy relevant legal requirements for privacy protection. However, making such an argument is challenging due to the conceptual gaps between the legal and technical approaches to defining privacy. Notably, information privacy laws are generally subject to interpretation and some degree of flexibility, which creates uncertainty for the implementation of more formal approaches. This Article articulates the gaps between legal and technical approaches to privacy and presents a methodology for rigorously arguing that a technological method for privacy protection satisfies the requirements of a particular law. The proposed methodology has two main components: (i) extraction of a formal mathematical requirement of privacy based on a legal standard found in an information privacy law, and (ii) construction of a rigorous mathematical proof for establishing that a technological privacy solution satisfies the mathematical requirement derived from the law. To handle ambiguities that can lead to different interpretations of a legal standard, the methodology takes a conservative “worst-case” approach and attempts to extract a mathematical requirement that is robust to potential ambiguities. Under this approach, the mathematical proof demonstrates that a technological method satisfies a broad range of reasonable interpretations of a legal standard. The Article demonstrates the application of the proposed methodology with an example bridging between the requirements of the Family Educational Rights and Privacy Act of 1974 and differential privacy.
Effy Vayena, Urs Gasser, Alexandra Wood, David R. O'Brien, and Micah Altman. 2016. “Elements of a New Ethical Framework for Big Data Research.” Washington and Lee Law Review, 72, 3. Publisher's VersionAbstract

merging large-scale data sources hold tremendous potential for new scientific research into human biology, behaviors, and relationships. At the same time, big data research presents privacy and ethical challenges that the current regulatory framework is ill-suited to address. In light of the immense value of large-scale research data, the central question moving forward is not whether such data should be made available for research, but rather how the benefits can be captured in a way that respects fundamental principles of ethics and privacy.

In response, this Essay outlines elements of a new ethical framework for big data research. It argues that oversight should aim to provide universal coverage of human subjects research, regardless of funding source, across all stages of the information lifecycle. New definitions and standards should be developed based on a modern understanding of privacy science and the expectations of research subjects. In addition, researchers and review boards should be encouraged to incorporate systematic risk-benefit assessments and new procedural and technological solutions from the wide range of interventions that are available. Finally, oversight mechanisms and the safeguards implemented should be tailored to the intended uses, benefits, threats, harms, and vulnerabilities associated with a specific research activity.

Development of a new ethical framework with these elements should be the product of a dynamic multistakeholder process that is designed to capture the latest scientific understanding of privacy, analytical methods, available safeguards, community and social norms, and best practices for research ethics as they evolve over time. Such a framework would support big data utilization and help harness the value of big data in a sustainable and trust-building manner.

Micah Altman, Alexandra Wood, David O'Brien, Salil Vadhan, and Urs Gasser. 2016. “Towards a Modern Approach to Privacy-Aware Government Data Releases.” Berkeley Technology Law Journal, 30, 3.Abstract

This article summarizes research exploring various models by which governments release data to the public and the interventions in place to protect the privacy of individuals in the data. Applying concepts from the recent scientific and legal literature on privacy, the authors propose a framework for a modern privacy analysis and illustrate how governments can use the framework to select appropriate privacy controls that are calibrated to the specific benefits and risks in individual data releases.

David O'Brien, Jonathan Ullman, Micah Altman, Urs Gasser, Michael Bar-Sinai, Kobbi Nissim, Salil Vadhan, Michael Wojcik, and Alexandra Wood. 2015. “Integrating Approaches to Privacy Across the Research Lifecycle: When is Information Purely Public?” Social Science Research Network. SSRN VersionAbstract

On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data privacy research. The resulting conversations centered on the emerging tools and approaches from the participants’ various disciplines and how they should be integrated in the context of real-world use cases that involve the management of confidential research data.

Researchers are increasingly obtaining data from social networking websites, publicly-placed sensors, government records and other public sources. Much of this information appears public, at least to first impressions, and it is capable of being used in research for a wide variety of purposes with seemingly minimal legal restrictions. The insights about human behaviors we may gain from research that uses this data are promising. However, members of the research community are questioning the ethics of these practices, and at the heart of the matter are some difficult questions about the boundaries between public and private information. This workshop report, the second in a series, identifies selected questions and explores issues around the meaning of “public” in the context of using data about individuals for research purposes.

Policy Commentary

2018 Submitted Comments
  • On March 13, 2018, members of the Privacy Tools team submitted comments to the Chief Statistician of the United States and the Statistical and Science Policy Branch in the U.S. Office of Management and Budget. In response to a request for information, the comments focus on privacy or confidentiality issues that arise when combining data from multiple sources in the course of federal statistical activities. See PDF here.
2017 Submitted Comments
2016 Submitted Comments
  • Members of the project team (PI Salil Vadhan, Co-PI Edo Airoldi, Co-PI Urs Gasser, Co-Investigator Micah Altman, Research Fellow Yves-Alexandre de Montjoye, Sr. Researcher David R. O'Brien, and Research Fellow Alexandra Wood) submitted comments on the Proposed Rules to Revise the Federal Policy for the Protection of Human Subjects ("Common Rule"), HHS-OPHS-2015-0008 (January 6, 2016). This commentary is available at;D=HHS-OPHS-2015-0008-2015 
  • On May 23, 2016, Micah Altman provided testimony and written comments in a Hearing on “De-Identification and the Health Insurance Portability and Accountability Act (HIPAA)" before the Subcommittee on Privacy, Confidentiality & Security, National Committee on Vital and Health Statistics:
2014 Submitted Comments
2013 Submitted Comments
  • M. Altman, M. Crosas, et al.,on behalf on DataPASS, “Response to the National Institute of Health Request Information: Input on Development of NIH Data Catalog". 2013.  PDF version of comments