Differentially Private Ridge Regression

PDF1.04 MB


Studying problems of interest, like finding trends in medical data, can require analyzing data which contains sensitive and personally identifying information. As a result, it is often infeasible to release these datasets to researchers or to the general public. In this paper, we study algorithms that are differentially private, where there are theoretical guarantees that the mechanisms studied will reveal only limited amounts of information about individual people while still providing insights about large groups. This thesis discusses various forms of linear and ridge regression in this differentially private setting, with the goal of studying sensitive data to make predictions about future sensitive data. In particular, we will discuss the internal privacy-loss budgeting of the differentially private ridge regression technique adaSSP. This thesis provides 3 contributions. First, we discuss the existing SSP and adaSSP algorithms, and provide detailed proofs that they are each differentially private. Second, we introduce the two new algorithms adaSSPbudget and constSSPfull and prove that these are each differentially private. Third, we conduct experiments using synthetic and real world data to explore whether the precise privacy-loss budgeting used within these algorithms could improve their performance.

These experiments will explore the tradeoff between the accuracy of a hyperparameter and the accuracy of the other releases. Through the experimental results, we find that the performance is often insensitive to the particular privacy-loss budgeting and that for certain datasets, no choice of privacy-loss budget allows for the adaptive adaSSPbudget to outperform the standard SSP algorithm.

Last updated on 06/15/2021