Problem

A large number of Tor users report getting hit by infinite CAPTCHA loops when visiting webpages fronted by Cloudflare. This makes them feel punished for using Tor to protect their privacy and prevents them from legitimately accessing websites.

Proposal

For this project we would like to track in practice how often Cloudflare fronted webpages return CAPTCHAs to Tor clients.

Our proposed approach consists of:

  1. Setting up a very simple static webpage to be fronted by Cloudflare
  2. Write a web client to periodically fetch this static webpage via Tor and record how often a CAPTCHA is returned
  3. Record and graph CAPTCHA vs real page rates
  4. Using the pre-existing architecture, run a second client that does not fetch this webpage via Tor. This will allow us to contrast and compare how Cloudflare responds to Tor Browser vs other browsers.
  5. Track and publish these details publicly

There are two interesting metrics to track over time:

  • the fraction of exit relays that are getting hit with CAPTCHAs, and
  • the chance that a Tor client, choosing an exit relay in the normal weighted faction, will get hit by a CAPTCHA.

Then there are other interesting patterns to look for:

  • Are certain IP addresses punished consistently and others never punished?
  • Is whether you get a CAPTCHA much more probabilistic and transient?
  • Does that pattern change over time?

Getting Started

As this is a new project, in order to demonstrate your skills and familiarise yourself with this project you may want to:

  1. Set up the required infrastructure for this project e.g by setting up a very simple static webpage to be fronted by Cloudflare.
  2. Familiarise yourself and start experimenting with various web clients to fetch pages via Tor, bearing in mind you may need to adapt them to your needs if they are lacking the required functionality.
  3. Read the comments in ticket #33010 for more ideas.

Resources

There is pre-existing research by the Berkeley ICSI group which includes these sorts of checks:

For the original ticket and discussion, please see ticket #33010