Some time ago I did some research on the effectiveness of the PirateBay website blockade. I tried to measure this by looking at the intended effect: are there less Dutch people downloading torrents published on ThePirateBay? It turned out that this is very easily measurable, and in this post I am explaining what kind of information you expose when you are downloading a torrent.
Downloading the Torrent
The first step in downloading a torrent is to download the magnet link from a website such as ThePirateBay. As you would expect the web server can see what you search for and which magnet link you selected. A magnet link looks like the following (line breaks added for readability):
The first part is the hash identifier, which is used to find the torrent file itself in the peer-to-peer storage called the BitTorrent Distributed Hash Table (DHT). The second part is the human-readable description, and the last part is a set of URLs for trackers.
The BitTorrent client contacts the trackers to bootstrap the connection to the peer-to-peer network, together with some preconfigured IPs in the client. The torrent file itself is then downloaded from the DHT network, and then the downloading actually starts. In this stage very few nodes can see what anyone is downloading.
Downloading the Content
Once the client has the complete torrent file it knows what the actual content looks like. It then needs to find other peers to download the content from. To find these peers the client contacts the trackers, uses the DHT to discover others, and gossips to other peers. Once a swarm is formed, the participating peers then negotiate to exchange content and collaboratively distribute the content over the swarm.
At this stage any peer in the swarm can learn about the existence of other peers. There is some limitation to this, as the tracker only allows the peer to ask for a new list of peers every once in a while. The DHT can also contain a short list of peers that are active for that torrent, which is infrequently updated. Another possibility is to learn by gossip, also called Peer Exchange (PEX); once peers connect, they exchange a set of peers that they know about.
At this stage, many peers know about each others existence. The tracker also knows, and it may even be advertised in the DHT that someone is downloading the content. Someone monitoring this swarm can quickly monitor who is participating and for how long. What’s more, when a peer connects to the monitor even more information is exchanged. The monitor can then see how much has been downloaded, how fast the peer is up- and downloading, not only for this torrent, but also in total for the client. Such a monitor can participate in the swarm for extended periods of time charting almost every peer passing through.
Crawling the DHT
While some trackers are private, some of them still use magnet links instead of direct downloads of torrent files. This can be a problem, as Scott Wolchok and Alex Halderman of the University of Michigan have shown in 2010 that it is possible to crawl the complete DHT and retrieve all torrents easily in just a few hours. And we can safely assume that techniques have only been improved since then. Using these torrents crawled from the DHT, it is then also possible to monitor each of the associated swarms as described above.
Once you are downloading something using BitTorrent, you can safely assume that anyone who wants to know can find out. The torrent websites have some vague statistics, and the DHT can be used to do some rough monitoring also. But creating a monitor that passively participates in the torrent swarm is not that hard and silently record almost everyone participating. If we really want to have an anonymous way of sharing files we either need to use private trackers that don’t use the DHT, or we have to use something different altogether.