Regulating Network Traffic with Worker Throttling

  • Blog

Version: Deadline 7.2

OVERVIEW

One of the things we know and love about Deadline is that when submitting a job, render nodes quickly become aware of it and can begin working immediately. A caveat, however, is that they all need access to the job’s auxiliary file(s) (eg. a Maya or 3dsMax scene file) before they can start rendering. As a result, the network can become congested as render nodes all try to pull down the resources they need. This is especially noticeable for jobs with large auxiliary files.

Enter Worker Throttling. This is a feature which, when enabled, regulates network traffic by imposing a limit on how many Workers are allowed to copy resources to their local caches at once. In this post, I’ll briefly explain how Worker Throttling works, and how it can be configured in Deadline for your render farm.

HOW IT WORKS: THE THROTTLE QUEUE

The purpose of Worker Throttling is to ensure that at any given time, no more than a certain number of Workers are copying files from the network at once. The maximum number of Workers that are allowed to copy files concurrently is called the Throttle Limit.

The regular workflow for a Worker is to pick up a job, copy the resources it needs to render the job, and then begin rendering tasks from that job. When Worker Throttling is enabled, Workers must first enter a Throttle Queue, where they wait their turn before they are allowed to copy files. Workers can only leave the Throttle Queue if fewer Workers than the Throttle Limit are currently copying files.

Note that Worker Throttling only applies when a Worker first picks up a job. Once a Worker has copied the necessary files for a job to its local cache, it can continue to render tasks from that job without reentering the Throttle Queue.

MORE DETAILS AND A “WHAT IF?”

Each Worker is responsible for reporting to the Throttle Queue when it is ready to copy files, when it is copying, and when it has finished copying files. So, what if after a Worker has started copying files, it suddenly bursts into flames and becomes a small pile of ashes? A rational person’s first thought might be “Is everything okay?” by which they of course mean “The Worker never had a chance to report that it was finished. Will it hold up the Throttle Queue forever?”

The answer is a reassuring no, it won’t prevent other Workers from copying files in its place. Each Worker in the queue maintains a “heartbeat”, which is a signal to the Throttle Queue that everything is going okay. If a node hasn’t updated its heartbeat in a little while, it is assumed that the worst has happened, and it is purged from the queue.

SETTING UP WORKER THROTTLING

In Deadline 7.2, the Worker Throttling options can be found in Configure Repository Options by selecting the Pulse Settings option and switching to the Throttling tab. In order to use Worker Throttling in Deadline 7.2, Deadline Pulse must be running and Workers must be able to connect to it.

In Deadline 8.0 and beyond, Deadline Pulse is no longer necessary for Worker Throttling, and the options for it can be found in Worker Settings, rather than Pulse Settings.

After enabling Worker Throttling via the checkbox, three configurable options become available. The first is the Throttle Limit, which is the maximum number of Workers that can concurrently copy files.

The second controls how often a Worker checks the throttle queue to see if it can start copying files. In the default configuration, Workers wait 20 seconds between each time they ask for their turn.

The last option controls how long a Worker can go without reporting in before it is assumed to have gone offline, after which it relinquishes its position in the Throttle Queue. This is a multiplier that is applied to the update interval. For instance, in the default configuration, a Worker that has not updated its heartbeat since 3 × 20 = 60 seconds ago is removed from the Throttle Queue.

WRAPUP

And that’s all there is to it. If you’ve noticed your network getting bogged down when render nodes chaotically try to copy files all at once, give Worker Throttling a try.