Topics in DataTags





Instructor:  Michael Bar-Sinai

Course Description: Making data widely available to researchers is good policy. It enables replication and validation of scientific findings and maximizes return on research investment. However, data containing sensitive information about individuals cannot be shared openly without appropriate safeguards. An extensive body of statutes, regulations, institutional policies, consent forms, data sharing agreements, and common practices govern how sensitive data should be used and disclosed in different contexts. DataTags is an algorithmic system allowing researchers to create a proper data handling policy. It is composed of a programming language for an interactive questionnaire, a runtime engine, a web application to present said interview to researchers, and other code handling tools. The system is still being developed, an is part of the Privacy Tools for Sharing Research Data at Harvard University. During this mini-project, students will be able to work on a chosen aspect of the DataTags system, which offers challenges ranging from the design of the programming languages itself, through code tools (validation, optimization, visualization), to the web application itself.