Skip to content

Load balancing topologies

An interesting property of Onion Services technology, from a service operator perspective, is that is allows for many possible network topologies.

Firstly, because it's a portable technology, meaning that a service can be moved around servers by just copying it's keys and configurations anywhere the Tor network is reachable.

And second, because it's execution can be split about multiple machines by the following approaches:

  1. By running many tor daemon instances in parallel to act as a Onion Service load balancing/failover layer.
  2. By splitting the Onion Services descriptor publisher from the actual backends.
  3. Combining both methods above, by running a mix of tor daemon instances and publishers.

We'll discuss each approach below, but note that load balancing with Onion Services is related to the way Onion Services work, and depends on which introduction points a client picks up to connect, which is made available through a descriptor document published in the Tor network. A descriptor usually lists many introduction points from a single tor daemon instance, so the strategy to load balance is based on either:

  1. Alternate between the currently published descriptor from different tor daemon instances, by simply running these instances in parallel.
  2. Include introduction points from different tor daemon instances in the same descriptor, by splitting the publisher process from the backend instances.

Running multiple instances in parallel

This is the simpler approach, consisting in running multiple tor daemon instances in parallel in different servers (or in multiple CPUs from the same server, with limited effectiveness, as discussed in the topologies document):

Advantages:

Disadvantages:

  • Every tor server needs to have a copy of the .onion private key, so if one server is compromised then your service is compromised.
  • This is not "full" load balancing, acting mostly as a simple failover, and may depend in the timing you start each of the tor daemons plus a random internal timer on each tor instance, to ensure they (re)publish their descriptors at different times1.
  • The descriptor re-publishing interval on each instance is rather unpredictable, since this depends on the random interval timer -- which is specified as between 60 and 120 minutes -- or any event that requires a descriptor to be republished, such as when Proof of Work is functioning.

With Onionspray, you can

  • Use hardmap to configure services.
  • And then copy/sync the whole project folder, with all configurations, keys, certificates etc to other machine, and run all instances in parallel.

Splitting the publisher from the backends

Right now this is achieved with Onionbalance, which is a tool that combines backend information in a single "superdescriptor" and publishes it in the Tor network, hence providing load-balancing and redundancy by distributing requests to multiple backend Tor instances.

Advantages:

  • Fully implements load balancing/failover.
  • Provides better isolation of the main .onion keys, reducing the attack surface.

Disadvantages:

With Onionspray, you can

  • Use softmap to configure service to be used with Onionbalance.

Combining both methods

You can combine both approaches in hybrid setups, like

  • Using softmap so your services rely on Onionbalance.
  • Replicate the whole project folder to other servers, and run Onionspray (and hence Onionbalance) in parallel.

The number of ways can easily get very complicated, and the topologies page shows some examples.

Which one to go?

It's hard to tell what's best for every scenario. Onionbalance is the best candidate, except if you plan to deploy PoW.

If you prefer PoW over the other advantages offered by Onionbalance, maybe you should start by using the simpler method instead.

You can also switch anytime from one approach to the other, without disrupting the service.

As for the number of nodes, that will depend mostly on load/requests.

So our recomendation is for you to try what fits to your situation, an reportback for us what worked best.


  1. Check discussion at the Periodically republish the descriptor issue.