#  DataTags Research 

 



   ![screen_shot_2015-03-05_at_3.43.29_pm.png](/sites/g/files/omnuum6656/files/styles/hwp_1_1__360x360_scale/public/privacytools/files/screen_shot_2015-03-05_at_3.43.29_pm.png?itok=phOkRkxC) 

 

**Research Overview:**

Members of the Privacy Tools project are developing DataTags, a suite of tools to help researchers share and use sensitive data in a standardized and responsible way.

Proper handling of human subjects data requires knowledge of relevant federal and state data privacy laws, applicable data sharing agreements, best practices for confidentiality and security, and available mechanisms for privacy protection. The goal of DataTags is to help researchers who are not legal or technical experts navigate these considerations and make informed decisions when collecting, storing, and sharing privacy-sensitive data.

This project is in collaboration with the IQSS Dataverse team. For more information, please see the FAQ below and visit [DataTags.org](http://datatags.org/) to try the demo available.



 

##  About the DataTags project 

 





###    What problem is DataTags designed to address?  expand\_more  

 

Making data widely available to researchers is good policy and crucial to good science. It enables replication and validation of scientific findings, supports extensions of studies, and maximizes return on research investment. For these reasons, sponsors and publishers expect or mandate the sharing of data where possible.

However, data containing sensitive information about individuals cannot be shared openly without appropriate safeguards. An extensive body of statutes, regulations, institutional policies, consent forms, data sharing agreements, and best practices govern how sensitive data should be used and disclosed in different contexts. Researchers and institutions that manage and share data must interpret how the various legal requirements and other data privacy and security standards constrain their handling of a given dataset. DataTags helps researchers navigate these complex issues.



 

 

 



###    How does DataTags work?  expand\_more  

 

DataTags is designed to enable computer-assisted assessments of the legal, contractual, and policy restrictions that govern data sharing decisions. Assessments are performed through interactive computation, in which the DataTags system asks a user a series of questions to elicit the key properties of a given dataset and applies inference rules to determine which laws, contracts, and best practices are applicable. The output is a set of recommended DataTags, or simple, iconic labels that represent a human-readable and machine-actionable data policy, and a license agreement that is tailored to the individual dataset. The DataTags system is being designed to integrate with the open source data repository software [Dataverse ](http://dataverse.org/)and its suite of access controls and statistical analysis tools. It will also operate as a standalone tool and as an application that can be integrated with other platforms.



 

 

 



###    What are DataTags?  expand\_more  

 

The DataTags recommended by the system are human-readable and machine-actionable labels that express conditions under which datasets can be stored, transmitted, or used. Colloquially, each DataTag tells you that there are some specific things you can safely do with the data — such as make the data available to any user who accepts a prespecified click-through agreement — without requiring further human analysis or decision making. Requirements that cannot be automated and expressed by a simple label are encoded instead in a custom license agreement that complements the DataTags assigned to a dataset.

More formally, a DataTag is an informative label from a controlled vocabulary that can be applied to a dataset. It carries distinct semantics, summarizing sufficient conditions for a specific set of automated actions over the data. A dataset is labelled with a tag on the basis of a systematic interrogation of a data controller, conducted using a specified set of survey questions, and inferential rules for tag assignment. Each label formally corresponds to a set of assertions regarding permissible or impermissible actions over the dataset.



 

 

 



###    Who are the members of the DataTags team?  expand\_more  

 

  [### Salil Vadhan 

 ](/people/salil-vadhan)Principal Investigator

Vicky Joseph Professor of Computer Science and Applied Mathematics, SEAS, Harvard

 

 

 

      ![salil-vadhan.jpg](/sites/g/files/omnuum6656/files/styles/hwp_4_5__690x865/public/privacytools/files/salil-vadhan.jpg?itok=Ba7yWYPB) 

 

 

 

   [### Micah Altman

 ](/people/micah-altman)Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, MIT

Non-Resident Senior Fellow, The Brookings Institution

Current Member of Datatags Team

 

 

 

      ![drmaltman_1315862855_45.jpg](/sites/g/files/omnuum6656/files/styles/hwp_4_5__690x865/public/drmaltman_1315862855_45.jpg?itok=UAYee9D8) 

 

 

 

   [### Michael Bar-Sinai

 ](/people/michael-bar-sinai)Graduate Student, Ben Gurion University in Negev, Israel

Visiting Graduate Student, Harvard University, IQSS

Current Member of Datatags Team

 

 

 

      ![screen_shot_2013-03-29_at_10-1.09.31_am.png](/sites/g/files/omnuum6656/files/styles/hwp_4_5__690x865/public/privacytools/files/screen_shot_2013-03-29_at_10-1.09.31_am.png?itok=vXdRPlMI) 

 

 

 

   [### Stephen Chong

 ](/people/stephen-chong)Gordon McKay Professor of Computer Science, SEAS, Harvard

Current Member of Datatags Team

 

 

 

      ![chong-jul13-01.jpg](/sites/g/files/omnuum6656/files/styles/hwp_4_5__690x865/public/privacytools/files/chong-jul13-01.jpg?itok=To9WDGtI) 

 

 

 

   [### Mercè Crosas

 ](/people/merc%C3%A8-crosas)Co-PI

Director of Data Science, IQSS, Harvard

 

 

 

      ![crosas.jpg](/sites/g/files/omnuum6656/files/styles/hwp_4_5__690x865/public/crosas_1.jpg?itok=QUNkHVCk) 

 

 

 

   [### Marco Gaboardi 

 ](/people/marco-gaboardi)Visiting Scholar, Center for Research on Computation &amp; Society

State University of New York at Buffalo

Current Member of Datatags Team

 

 

 

      ![io.jpg](/sites/g/files/omnuum6656/files/styles/hwp_4_5__690x865/public/privacytools/files/io.jpg?itok=zB7vqRoi) 

 

 

 

   [### Urs Gasser

 ](/people/urs-gasser)Executive Director, Berkman Center for Internet &amp; Society

Professor of Practice, Harvard Law School

Current Member of Datatags Team

 

 

 

      ![author_photo_urs_gasser_cropped.jpg.jpeg](/sites/g/files/omnuum6656/files/styles/hwp_4_5__690x865/public/privacytools/files/author_photo_urs_gasser_cropped.jpg.jpeg?itok=4G6x9y71) 

 

 

 

   [### Bryan Lee

 ](/people/bryan-lee)Law Student Intern (Summer 2014), Berkman Center

Past Personnel 

Past Member of Datatags Team

 

 

 

      ![photo-bryanlee.jpeg](/sites/g/files/omnuum6656/files/styles/hwp_4_5__690x865/public/privacytools/files/photo-bryanlee_1.jpeg?itok=edc1C7N6) 

 

 

 

   [### Jeremy Merkel

 ](/people/jeremy-merkel)Law Student Intern (Summer 2014), Berkman Center

American University, Washington College of Law, 2015

Past Personnel, Past Member of Datatags Team

 

 

 

      ![headshot_614.jpeg](/sites/g/files/omnuum6656/files/styles/hwp_4_5__690x865/public/privacytools/files/headshot_614.jpeg?itok=UhVSXdUA) 

 

 

 

   [### Anna Myers

 ](/people/anna-myers)Law Student Intern (Summer 2014), Berkman Center

Past Personnel 

Past Member of Datatags Team

 

 

 

      ![amyers_headshot.png](/sites/g/files/omnuum6656/files/styles/hwp_4_5__690x865/public/privacytools/files/amyers_headshot.png?itok=vTcV60Pq) 

 

 

 

  

 

 

 

 

 

 



###    Publications  expand\_more  

 

**2017**

Bar-Sinai M, Medzini R. [Public Policy Modeling using the DataTags Toolset](https://datatags.org/publications/public-policy-modeling-using-datatags-toolset). 2017.

**2016**

Bar-Sinai M, Sweeney L, Crosas M. [DataTags, Data Handling Policy Spaces and the Tags Language](https://datatags.org/publications/datatags-data-handling-policy-spaces-and-tags-language), in In Proceedings of the International Workshop on Privacy Engineering, IEEE. San-Jose, CA, USA: IEEE; 2016.

**2015**

Crosas M, King G, Honaker J, Sweeney L. [Automating Open Science for Big Data](https://datatags.org/publications/automating-open-science-big-data). The ANNALS of the American Academy of Political and Social Science \[Internet\]. 2015;659 (1) :260-273.

Sweeney L. [All the Data on All the People](https://datatags.org/publications/all-data-all-people), in The Privacy Law Scholars Conference (PLSC). Berkeley, California: UC Berkeley Law School &amp; GWU Law School (Berkeley Center for Law &amp; Technology); 2015.

Sweeney L, Crosas M. [An Open Science Platform for the Next Generation of Data](https://datatags.org/publications/open-science-platform-next-generation-data). Arxiv.org Computer Science, Computers and Scoiety \[Internet\] \[Internet\]. 2015.

Sweeney L, Crosas M, Bar-Sinai M. [Sharing Sensitive Data with Confidence: The Datatags System](https://datatags.org/publications/sharing-sensitive-data-confidence-datatags-system). Technology Science \[Internet\]. 2015.



 

 

 



###    PolicyModels  expand\_more  

 

 PolicyModels (formerly: the DataTags toolset) is a system for creating models of policies, e.g. for handling datasets or determining welfare entitlements. A policy model consists of a policy space, detailing all possible treatments within a policy, and a decision tree, which describes the process of getting to a specific treatment. Policy models can be used to perform interactive interviews which yield a concrete treatment that is both human readable and machine actionable. Models can also be visualized, and can be analyzed to find caveats or loopholes. For basic information about PolicyModels, please [watch this video](https://vimeo.com/239310791).



 

 

 



###    Robot Lawyers for License Generation  expand\_more  

 

The Robot Lawyers system is being developed to provide data repositories with expert system-like support for automating certain data handling decisions and generating custom data sharing agreements. It relies on a formalization of the privacy-relevant aspects of selected statutes, regulations, and best practices, supported by an analysis documented in legal memoranda. This formalization enables automated reasoning about the conditions under which a data transfer is permitted, based on facts learned about the data through an interview with the user depositing the data and the application of rules to these facts. The system uses this formalization to generate a custom data sharing agreement that accurately captures the relevant conditions on the data transfer. Transparency at each stage enables repository administrators, lawyers, institutional review boards, and other interested parties to examine the legal analysis and interpretation embodied in the formalization, as well as the rationale behind the generation of a particular license. Through integration with Dataverse, DataTags, and PolicyModels, this system will aim to help Dataverse users access and share data under tailored licenses, with confidence that the agreements reflect legal requirements and best practices with respect to privacy.



 

 

 



###    How have students contributed to the project?  expand\_more  

 

Students have been involved throughout the development of DataTags. Law students contribute to the project by performing legal research and drafting memoranda analyzing how various privacy laws and regulations govern the collection, use, and sharing of personal data for research purposes. They also draft questions for the DataTags automated interview and terms for the custom license agreements. Undergraduates, graduate students, and postdocs in computer science contribute to the development of the DataTags software. This involves creating a custom language for the DataTags interview, inference, and tags assignment process, as well as tools for testing, verifying, and validating the software code.



 

 

 



###    How can I get involved?  expand\_more  

 

Please see our [open positions](/participate/positions) for interns, students, postdocs and visiting scholars.