Researchers take on complicated privacy policies

Credit: Photo illustration by Kelsey Scott Credit: Photo illustration by Kelsey Scott

How does Amazon know what deals will catch your interest? How does Facebook know what ads to show you? Most people are unaware of the fact that websites collect information about us all the time. “A fundamental aspect of privacy is that people are supposed to have control over what happens to their information,” said Norman Sadeh, a Carnegie Mellon professor of computer science. He is the leader of the Usable Privacy Policy Project, a 42-month, $3.75-million project sponsored by the National Science Foundation whose goal is to improve internet privacy.

The research team working on this project includes Lorrie Cranor, associate professor of computer science and engineering and public policy; Alessandro Acquisti, associate professor of information technology and public policy; Travis Breaux, assistant professor of computer science; Aleecia McDonald, director of privacy at Stanford University’s Center for Internet and Society; and Joel Reidenberg, a Fordham University law professor.

Websites are supposed to inform users when they collect information. These privacy policies should answer consumer questions about how and for what purpose they collect user information, and with whom the information will be shared. Every major website should have a privacy policy that can be easily found there.

But some question the accessibility of these policies. “It’s easily accessible if accessing means being able to get to the page, but it’s not easily accessible if access is supposed to also include the ability to quickly digest the text and make sense of it,” said Sadeh. The problem is that many privacy policies are not only extremely lengthy — averaging five to seven pages in length — but also very complex and difficult for a layperson to understand. In fact, a 2008 study conducted by Cranor and McDonald estimated it would take 200 to 300 hours for a person to read privacy policies for every website they visited in a year.

Another study conducted at Carnegie Mellon tested people’s comprehension of privacy policies. After reading a range of policies, individuals were asked very simple comprehension questions, such as if the website collects email addresses or shares information to third parties. Although these questions are supposed to be answered clearly in privacy policies, 50 percent of the people who read them could not answer these questions correctly.

In the past, some people have tried to force standardized formats, such as presenting policies in the form of multiple choice questions or yes/no answers that could easy convey to their users what type of information the website collects. Another attempt was to encourage websites to write privacy policies in machine-readable language. However, these endeavors have been met with a lot of pushback from the companies who write these policies.

The Usable Privacy Policy Project team has been studying new methods of making privacy policies more transparent for consumers. One such method involves the modeling of people’s privacy preferences. “We’ve shown through research over the past few years that even though people care about privacy, they don’t actually care about everything these privacy policies talk about. They only care about a relatively small number of factors,” Sadeh said. So it turns out that there are just a few things in these policies that need to be highlighted in order to convey the information consumers care about most.

Another direction that this group is taking is combining machine learning with crowdsourcing. The machine-learning aspect involves using computer programs that can detect patterns of text within privacy policies in order to extract information from them. Along with the contributions from crowdworkers, machine learning can ensure that privacy policies will be understood on a large scale. This information can then be presented to consumers in the form of user interfaces or browser add-ons that can rate the safety of websites with color codes or letter grades — almost like a nutrition label.

Privacy engineering is becoming an increasingly important field, particularly because people who write privacy policy tend to be lawyers, while people who build the products tend to be software engineers. Carnegie Mellon is launching a new master’s degree program in privacy engineering.

“The long-term hope is that when information becomes more easy to digest, companies will realize they have to be more aware of their privacy policies,” Sadeh said. The researchers hope that, at some point, companies will even start competing with each other and using safer privacy policies to set them apart from other companies, ultimately improving privacy on the web.