Understanding privacy policies

(A study in empirical analysis of language usage)


Ralf Lämmel and Ekaterina Pek

There is growing recognition that users of web-based systems want to understand, if not control, what customer's data is stored by whom, for what purpose, for what duration, and with whom it is shared. We inform current language-based privacy efforts with an empirical study of P3P--the W3C domain-specific language for privacy policies. We use methods of software language engineering to study usage profiles, correctness of policies, metrics, cloning, and language extensions. The study supports the conclusion that P3P's approach to policy validation is too weak to ensure correct use of the language. The study also discovers common, dominating policies, which may suggest a simpler approach to web privacy. Further, the study investigates a range of metrics for policies in an attempt to discover particularly interesting or complex policies. Finally, the study also attempts to discover symptoms of the need for extending the P3P language, but the found results are not conclusive here.

Web-based systems, Privacy, Privacy policies, P3P, language usage, empirical study, language understanding, domain-specific languages, software metrics, clone detection, software language engineering, software linguistics, policy compliance, policy enforcement

