We are excited to introduce our fourth guest author in this blog series, Chike Abuah, PhD student in computer science at the University of Vermont, whose research expands the state of the art in the subject of this blog post: static and dynamic analysis approaches to automatic proofs of differential privacy. - Joseph Near and David Darais
Previously, we have discussed some differentially private algorithms and implementation bugs which can cause these algorithms to fail to protect privacy in practice. Previous posts have described two ways of addressing this problem: automated testing and manual proofs.
In this post, we examine a third option: automatic proof. Automatic proof tools for differential privacy analyze the program and attempt to build a proof that the program satisfies differential privacy.
The testing tools described in the last post can be used on any program, written in any programming language. However, testing-based tools can only find bugs - they can never prove that no bugs exist. The automatic proof tools described in this post are less broadly applicable, but can produce a proof of privacy equivalent to the manual proofs written by privacy experts.
Automatic proof tools for differential privacy are divided into two categories: static analysis tools construct a proof without running the program (like the type checker in Java); dynamic analysis tools construct a proof while running the program, using a dynamic monitor executed by the unmodified runtime system (e.g. the Python interpreter or Java Virtual Machine). The resulting proof may apply only to that execution of the program. Figure 1 summarizes these two categories, and lists some of the tools that have been developed in each category. We will describe specific features of some of these tools later in the post.
Design Requirements for Automatic Proof
One of the most popular methods for analysis of differential privacy for an algorithm is to convert a regular algorithm into a randomized version of the same algorithm. When constructing a proof of privacy, there are three primary points to consider:
Determine the global sensitivity of the algorithm to confidential information
Add random noise to the output of the algorithm proportional to its sensitivity
If the algorithm consists of multiple privacy-preserving algorithms, we must also consider composition
Thus, automated proof tools have two main concerns: the determination of sensitivity for an "atomic" private algorithm, and the composition of multiple private algorithms. Ideally, our tools would accommodate both of these features.
Static Analysis for Differential Privacy
The Fuzz language design proposes a brand-new programming language and specialized type system to statically verify differential privacy of programs automatically for ε-differential privacy specifically. Fuzz implements a "linear type system" to track and analyze variable usage in order to reason about global sensitivity of confidential information. This approach is similar to the aspect of the Rust type system which ensures memory safety of programs; Rust does this via another form of linear types. The linear types required for sensitivity analysis are however more sophisticated than those seen in Rust, and currently do not exist in any mainstream programming language. However, the Fuzz approach is still very compelling, and has been extended/built upon in a number of subsequent research endeavours, such as DFuzz and Duet.
The static approach in general is attractive because it allows programs to be verified as differentially private for confidential inputs, without ever having to run the program on any data. Intuitively, this tells us that when the program actually executes it will not introduce any privacy bugs regardless of the actual program inputs on that execution. Maintaining a similar data-independent guarantee in the dynamic analysis setting is more challenging, although possible with certain additional restrictions on the program, such as limitations on conditionals.
A barrier to adoption for Fuzz-like approaches is the lack of support for expressing these sophisticated type systems in mainstream programming languages.
Dynamic Analysis for Differential Privacy
The PINQ system (code available here) is a tool for dynamic analysis of differentially private programs; it analyzes the privacy properties of the program during the program's execution, and throws an error if privacy might be violated. PINQ is designed for dynamic data analysis on a relational database preserving differential privacy. It supports a subset of SQL operations such as Where, Select, GroupBy, and Join. The PINQ code sample shown below counts searches by distinct users based on their IP addresses. This program loads in a sensitive database as a PINQueryable, allowing access to the database records via a restricted API which tracks the global sensitivity of the program.
var data = new PINQueryable( ... ... ); var users = from record in data where record.Query == argv[0] groupby record.IPAddress Console.WriteLine(argv[0] + ": " + users.NoisyCount(0.1));
Several subsequent research projects have been based on PINQ, also targeting a relational database query model. Dynamic enforcement approaches for differential privacy must be carefully implemented to mitigate potential exploitation of program side-effects and side channels. For this reason, operations are usually very restricted in the available programming model.
More recently, general-purpose open-source privacy analysis projects such as εktelo, Smartnoise, and Diffprivlib aim to make dynamic privacy enforcement accessible as library utilities in mainstream dynamically typed languages. Other work has attempted to make privacy analysis pervasive throughout the programming model of the core of a language such as Python, using techniques similar to dynamic taint analysis, which has historically been used to enforce other security properties such as information control flow (confidentiality, integrity, etc.) Automatic enforcement of differential privacy in mainstream languages such as Python "by design" has the potential to be useful and convenient, but comes with necessary drawbacks, such as limitations on the use of conditionals.
Another special challenge arises in the dynamic setting: sequential composition theorems for differential privacy are designed to have the privacy parameters fixed up-front, or statically, and do not necessarily allow these parameters to be picked adaptively, potentially based on the result of prior computations. However privacy filters and odometers, developed in recent research works, may be used in the adaptive setting to allow for adaptive parameter selection. While privacy odometers allow the analyst to compute an upper bound on the total privacy leakage at any point in a sequence of adaptive private computations, privacy filters allow the analyst to place an upper bound on her desired privacy cost for a sequence of adaptive private computations, and halt the process immediately if the bound is exceeded.
Software Tools
Automatic proof tools (for differential privacy, and for other properties of programs) is an active area of research, and new tools are developed all the time. Because they are more easily integrated into existing programming languages, dynamic analysis tools tend to be more mature and closer to production-ready. The Smartnoise and Diffprivlib systems, for example, both provide well-maintained open-source frameworks for building differentially private analysis pipelines with automated proofs of differential privacy based on dynamic analysis. Ensuring that analyses are free of privacy bugs is a major advantage of building on systems like these. Open-source dynamic analysis tools include:
Static approaches like Fuzz, as described earlier, are much more difficult to integrate with existing tools. However, as static program analysis techniques become more mature, and as mainstream programming languages gain more sophisticated type systems, static approaches may become just as easy to use as dynamic ones. Open-source static analysis tools include:
The majority of the posts in this series, including this one, have focused on technical approaches for achieving differential privacy. In our next post, we'll return to the subject of what differential privacy means, what it protects and doesn't protect, and the all-important question of how to set the privacy budget.
This post is part of a series on differential privacy. Learn more and browse all the posts published to date on the differential privacy blog series page in NIST's Privacy Engineering Collaboration Space.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.