Privacy and this website
03 August 2015
Posted in Privacy & Security
Privacy on the internet has always been important to me. I guess this is because I grew up in Germany where, in no small part due to experiences with the Nazi government and, later, the Stasi in Eastern Germany, people are generally more privacy conscious. For example, when the implications of the 180-day rule in the US Electronic Communications Privacy Act really sunk in, I moved all my personal email hosting out of the United States. Last year, I finally realised that I should talk about privacy and mass surveillance more publicly, which I then did; in a talk, in a group interview on privacy and security, and an interview about the Pixelated project.
At the same time I'm running this website. Do I put my money where my mouth is? What am I doing with this website regarding your privacy?
Datensparsamkeit and log files
Let's start with the idea of datensparsamkeit. Martin Fowler's article describes the issue of recording IP addresses in logfiles. On the one hand, recording the IP addresses of site visitors can be useful, for example, to get an understanding of where the visitors are coming from on a country/region level. On the other hand, IP addresses are potentially personally identifiable, especially in a home setting, so ideally a website shouldn't record them. A simple solution in this scenario is to blank out the fourth octet of the IP address, which still provides enough information for geo analysis, but leaves too little information to make the address personally identifiable.
This website is now hosted at Uberspace, a German ISP who seems pretty serious about privacy. (For example, they allow you to create an anonymous account and pay by mailing bank notes in the post.) The shared hosts are configured so that the fourth octet in IP addresses is replaced with a zero in the access log files, which is great. Unfortunately, this doesn't happen in the error logs, but these aren't even written unless specifically requested.
Commenting system
The commenting system is another obvious candidate to look at. I despise third-party cookies and I don't want to rely on a software as a service platform to host your comments. (See below for third parties.) After some searching and experimentation I've now settled on hosting the Juvia commenting system myself, at the same ISP as the main website.
You can post comments without being forced to provide any input in the name or email fields. If you provide your email address it is not displayed back, it's only used for the Gravatars to create a more personal feel to the comment section. And, yes, I am aware of the weakness of Gravatar's approach, as described in this article for example.
What about IP addresses, though? If you study the source code for Juvia you might stumble over this line (also shown in the picture at the top), where the system explicitly captures the remote address and records it in the database with the comment. I thought about changing this line but in the end that wasn't even necessary, because due to the way I'm hosting Juvia, a Rails app, it never gets to see the real remote IP address; it always sees 127.0.0.1 as the remote address.
Third parties
Websites like this one only come together in the browser, combining assets from potentially different sources. I have tried to make as much available from the main site as possible, including web fonts and JavaScript libraries, in an effort not to leak your visit to third party sites. I must be honest, though, and state that there are three exceptions to this.
First, the front page uses JavaScript to dynamically load information about my Github repositories from Github. Second, on some pages, sometimes on the front page, I'm embedding videos of conference talks. These are iframes loading data from Youtube. Third, and last, on all pages a small snippet of JavaScript sends information about the visit to Statcounter, a web analytics provider. The Statcounter privacy policy implies that they could record full IP addresses but what I can see when I look at the visitor logs are IP addresses with the last octet blanked out. They also have an opt out for cookies that seems to work.
I'm not sure what to do about Github and Youtube, especially given that this is now a statically hosted website, but I am considering moving the web analytics to a self-hosted instance of Piwik when I find the time to do so.
So what?
To be honest, this is for you to decide. I've taken some steps, but I'm not there yet. Visitor IP addresses are not recorded in logfiles and I've made an effort to avoid some of the biggest data collectors—Facebook Like buttons, Google Analytics, web fonts, etc—but I haven't been willing to sacrifice functionality that I feel is important to the site. And, of course, I should really buy that certificate and let you reach this site using HTTPS.