An implementation of major general-purpose mechanisms for privatizing
statistics, models, and machine learners, within the framework of differential
privacy of Dwork et al. (2006)

The `diffpriv`

package makes privacy-aware data science in R easy. `diffpriv`

implements the formal framework of differential privacy: differentially-private mechanisms can safely release to untrusted third parties: statistics computed, models fit, or arbitrary structures derived on privacy-sensitive data. Due to the worst-case nature of the framework, mechanism development typically requires involved theoretical analysis. `diffpriv`

offers a turn-key approach to differential privacy by automating this process with sensitivity sampling in place of theoretical sensitivity analysis.

Obtaining `diffpriv`

is easy. From within R:

install.packages("devtools")devtools::install_github("brubinstein/diffpriv")

A typical example in differential privacy is privately releasing a simple `target`

function of privacy-sensitive input data `X`

. Say the mean of `numeric`

data:

## a target function we'd like to run on private data X, releasing the resulttarget <- function(X) mean(X)

First load the `diffpriv`

package (installed as above) and construct a chosen differentially-private mechanism for privatizing `target`

.

## target seeks to release a numeric, so we'll use the Laplace mechanism---a## standard generic mechanism for privatizing numeric responseslibrary(diffpriv)mech <- DPMechLaplace(target = target)

To run `mech`

on a dataset `X`

we must first determine the sensitivity of `target`

to small changes to input dataset. One avenue is to analytically bound sensitivity (on paper; see the vignette) and supply it via the `sensitivity`

argument of mechanism construction: in this case not hard if we assume bounded data, but in general sensitivity can be very non-trivial to calculate manually. The other approach, which we follow in this example, is sensitivity sampling: repeated probing of `target`

to estimate sensitivity automatically. We need only specify a distribution for generating random probe datasets; `sensitivitySampler()`

takes care of the rest. The price we pay for this convenience is the weaker form of random differential privacy.

## set a dataset sampling distribution, then estimate target sensitivity with## sufficient samples for subsequent mechanism responses to achieve random## differential privacy with confidence 1-gammadistr <- function(n) rnorm(n)mech <- sensitivitySampler(mech, oracle = distr, n = 5, gamma = 0.1)#> Sampling sensitivity with m=285 gamma=0.1 k=285[email protected] ## DPMech and subclasses are S4: slots accessed via @#> [1] 0.8089517

With a sensitivity-calibrated mechanism in hand, we can release private responses on a dataset `X`

, displayed alongside the non-private response for comparison:

X <- c(0.328,-1.444,-0.511,0.154,-2.062) # length is sensitivitySampler() nr <- releaseResponse(mech, privacyParams = DPParamsEps(epsilon = 1), X = X)cat("Private response r$response: ", r$response,"\nNon-private response target(X):", target(X))#> Private response r$response: -1.119506#> Non-private response target(X): -0.707

The above example demonstrates the main components of `diffpriv`

:

- Virtual class
`DPMech`

for generic mechanisms that captures the non-private`target`

and releases privatized responses from it. Current subclasses`DPMechLaplace`

,`DPMechGaussian`

: the Laplace and Gaussian mechanisms for releasing numeric responses with additive noise;`DPMechExponential`

: the exponential mechanism for privately optimizing over finite sets (which need not be numeric); and`DPMechBernstein`

: the Bernstein mechanism for privately releasing multivariate real-valued functions. See the bernstein vignette for more.

- Class
`DPParamsEps`

and subclasses for encapsulating privacy parameters. `sensitivitySampler()`

method of`DPMech`

subclasses estimates target sensitivity necessary to run`releaseResponse()`

of`DPMech`

generic mechanisms. This provides an easy alternative to exact sensitivity bounds requiring mathematical analysis. The sampler repeatedly probes`[email protected]`

to estimate sensitivity to data perturbation. Running mechanisms with obtained sensitivities yield random differential privacy.

Read the package vignette for more, or news for the latest release notes.

`diffpriv`

is an open-source package offered with a permissive MIT License. Please acknowledge use of `diffpriv`

by citing the paper on the sensitivity sampler:

Other relevant references to cite depending on usage:

**Differential privacy and the Laplace mechanism:**Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. "Calibrating noise to sensitivity in private data analysis." In Theory of Cryptography Conference, pp. 265-284. Springer Berlin Heidelberg, 2006.**The Gaussian mechanism:**Cynthia Dwork and Aaron Roth. "The algorithmic foundations of differential privacy." Foundations and Trends in Theoretical Computer Science 9(3–4), pp. 211-407, 2014.**The exponential mechanism:**Frank McSherry and Kunal Talwar. "Mechanism design via differential privacy." In the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07), pp. 94-103. IEEE, 2007.**The Bernstein mechanism:**Francesco Aldà and Benjamin I. P. Rubinstein. "The Bernstein Mechanism: Function Release under Differential Privacy." In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI'2017), pp. 1705-1711, 2017.**Random differential privacy:**Rob Hall, Alessandro Rinaldo, and Larry Wasserman. "Random Differential Privacy." Journal of Privacy and Confidentiality, 4(2), pp. 43-59, 2012.

- Second vignette
`bernstein`

on: Bernstein approximations and use of`DPMechBernstein`

for private function release. - Minor edits to docs

- Expanding test coverage of Bernstein mechanism and function approximation code.

- Addition of
`S3`

constructor and`predict()`

generic implementation for fitting (non-iterated) Bernstein polynomial function approximations. - Addition of
`DPMechBernstein`

class implementing the Bernstein mechanism of Alda and Rubinstein (AAAI'2017), for privately releasing functions. - Bug fix in the Laplace random sampler affecting
`DPMechLaplace`

- Unit test coverage of new functionality; general documentation improvements.

- Addition of
`DPMechGaussian`

class for the generic Gaussian mechanism to README, Vignette. Resolves #2 - Minor test additions.

- Refactoring around
`releaseResponse()`

method in`DPMechNumeric`

. Resolves #1 - Increased test coverage.

- New
`DPMechGaussian`

class implementing the Gaussian mechanism, which achieves (epsilon,delta)-differential privacy by adding Gaussian noise to numeric responses calibrated by L2-norm sensitivity. - Refactoring of
`DPMechGaussian`

and`DPMechLaplace`

underneath a new`VIRTUAL`

class`DPMechNumeric`

which contains common methods,`dims`

slot (formerly`dim`

changed because`dim`

is a special slot for S4).

`DPMechLaplace`

objects can now be initialized without specifying non-private`target`

response`dim`

. In such cases, the sensitivity sampler will perform an additional`target`

probe to determine`dim`

.

- Sensitivity sampler methods no longer require oracles that return lists. Acceptable oracles may now return lists, matrices, data frames, numeric vectors, or char vectors. As a consequence some example code in docs, README and vignette, is simplified.

- Initial release