An implementation of major general-purpose mechanisms for privatizing
statistics, models, and machine learners, within the framework of differential
privacy of Dwork et al. (2006)
The diffpriv
package makes privacy-aware data science in R easy. diffpriv
implements the formal framework of differential privacy: differentially-private mechanisms can safely release to untrusted third parties: statistics computed, models fit, or arbitrary structures derived on privacy-sensitive data. Due to the worst-case nature of the framework, mechanism development typically requires involved theoretical analysis. diffpriv
offers a turn-key approach to differential privacy by automating this process with sensitivity sampling in place of theoretical sensitivity analysis.
Obtaining diffpriv
is easy. From within R:
install.packages("devtools")devtools::install_github("brubinstein/diffpriv")
A typical example in differential privacy is privately releasing a simple target
function of privacy-sensitive input data X
. Say the mean of numeric
data:
## a target function we'd like to run on private data X, releasing the resulttarget <- function(X) mean(X)
First load the diffpriv
package (installed as above) and construct a chosen differentially-private mechanism for privatizing target
.
## target seeks to release a numeric, so we'll use the Laplace mechanism---a## standard generic mechanism for privatizing numeric responseslibrary(diffpriv)mech <- DPMechLaplace(target = target)
To run mech
on a dataset X
we must first determine the sensitivity of target
to small changes to input dataset. One avenue is to analytically bound sensitivity (on paper; see the vignette) and supply it via the sensitivity
argument of mechanism construction: in this case not hard if we assume bounded data, but in general sensitivity can be very non-trivial to calculate manually. The other approach, which we follow in this example, is sensitivity sampling: repeated probing of target
to estimate sensitivity automatically. We need only specify a distribution for generating random probe datasets; sensitivitySampler()
takes care of the rest. The price we pay for this convenience is the weaker form of random differential privacy.
## set a dataset sampling distribution, then estimate target sensitivity with## sufficient samples for subsequent mechanism responses to achieve random## differential privacy with confidence 1-gammadistr <- function(n) rnorm(n)mech <- sensitivitySampler(mech, oracle = distr, n = 5, gamma = 0.1)#> Sampling sensitivity with m=285 gamma=0.1 k=285[email protected] ## DPMech and subclasses are S4: slots accessed via @#> [1] 0.8089517
With a sensitivity-calibrated mechanism in hand, we can release private responses on a dataset X
, displayed alongside the non-private response for comparison:
X <- c(0.328,-1.444,-0.511,0.154,-2.062) # length is sensitivitySampler() nr <- releaseResponse(mech, privacyParams = DPParamsEps(epsilon = 1), X = X)cat("Private response r$response: ", r$response,"\nNon-private response target(X):", target(X))#> Private response r$response: -1.119506#> Non-private response target(X): -0.707
The above example demonstrates the main components of diffpriv
:
DPMech
for generic mechanisms that captures the non-private target
and releases privatized responses from it. Current subclasses
DPMechLaplace
, DPMechGaussian
: the Laplace and Gaussian mechanisms for releasing numeric responses with additive noise;DPMechExponential
: the exponential mechanism for privately optimizing over finite sets (which need not be numeric); andDPMechBernstein
: the Bernstein mechanism for privately releasing multivariate real-valued functions. See the bernstein vignette for more.DPParamsEps
and subclasses for encapsulating privacy parameters.sensitivitySampler()
method of DPMech
subclasses estimates target sensitivity necessary to run releaseResponse()
of DPMech
generic mechanisms. This provides an easy alternative to exact sensitivity bounds requiring mathematical analysis. The sampler repeatedly probes [email protected]
to estimate sensitivity to data perturbation. Running mechanisms with obtained sensitivities yield random differential privacy.Read the package vignette for more, or news for the latest release notes.
diffpriv
is an open-source package offered with a permissive MIT License. Please acknowledge use of diffpriv
by citing the paper on the sensitivity sampler:
Other relevant references to cite depending on usage:
bernstein
on: Bernstein approximations and use of DPMechBernstein
for private function release.S3
constructor and predict()
generic implementation for fitting (non-iterated) Bernstein polynomial function approximations.DPMechBernstein
class implementing the Bernstein mechanism of Alda and Rubinstein (AAAI'2017), for privately releasing functions.DPMechLaplace
DPMechGaussian
class for the generic Gaussian mechanism to
README, Vignette. Resolves #2releaseResponse()
method in DPMechNumeric
. Resolves #1DPMechGaussian
class implementing the Gaussian mechanism, which
achieves (epsilon,delta)-differential privacy by adding Gaussian noise to
numeric responses calibrated by L2-norm sensitivity.DPMechGaussian
and DPMechLaplace
underneath a new
VIRTUAL
class DPMechNumeric
which contains common methods, dims
slot
(formerly dim
changed because dim
is a special slot for S4).DPMechLaplace
objects can now be initialized without specifying
non-private target
response dim
. In such cases, the sensitivity sampler
will perform an additional target
probe to determine dim
.