Competing Resources in Deadline
Version: Deadline 8.0
A common problem with render farms is resource contention. Naturally, render jobs compete with each other for the render nodes and Deadline has a Job Scheduling System to handle this type of resource contention. However, that’s not the focus of this blog entry. Instead, we’ll be focusing on the resources that jobs compete for after they have been selected by the render nodes.
For example, it’s not uncommon for a render farm to have more render nodes than floating licenses for a specific rendering application or plugin. In this situation, contention for floating licenses can lead to license errors when those floating licenses have been exhausted. These license errors can lead to wasted render time and reduced productivity, so obviously we want a solution to limit access to this resource.
Another example is shared data on a file server. Many jobs could require access to the same data, like textures, particle caches, etc, which normally isn’t an issue (since they’re on a file server). However, perhaps the file server isn’t capable of serving that data to a large number of render nodes simultaneously, or perhaps the network doesn’t have the bandwidth to transfer that much data at once. This situation can lead to file server or network instability, which can lead to increased render times or errors. This is another resource that we would like to limit access to.
The good news is that Deadline already has a Limit System that allows you to limit access to arbitrary resources that can be shared by multiple jobs!
HOW LIMITS WORK
Here’s a general overview of how Limits work:
Limits in Deadline have a maximum value that you define.
Once a Limit has been created, it can then be assigned to render jobs.
When a Deadline Worker dequeues a job, it checks out a “Stub” for each Limit that the job requires.
The amount of “Stubs” that are checked out for a Limit corresponds to a Limit’s “In Use” count.
When a Deadline Worker finishes the job, it checks the “Stubs” back in for the Limits it no longer requires.
The key thing is that if a Deadline Worker tries to check out a “Stub” for a Limit that is already maxed out, the check out will fail, and that will prevent the Worker from dequeuing that job. The Worker then simply moves on to the next job, but what’s important is that the Limit prevented the number of Workers from exceeding the Limit’s maximum value.
Before we start, here are the details of the render farm that we will refer to throughout this blog entry:
20 render nodes
20 floating licenses of Maya
10 floating licenses of Nuke
Maya jobs use a shared data cache on the file server
To follow along this blog entry, you should have the Deadline Monitor open, and you should add a Limit Panel to your Monitor Layout. You can open a new Limit panel by selecting View -> New Panel -> Limits, and you can either leave the panel floating or dock it to the Monitor.
Unless you’ve created Limits in the past, this panel will be empty. In addition, this blog entry assumes you are in Super User Mode, which is required to access the necessary administrator options. To enter Super User Mode, select Tools -> Super User Mode, and enter in your Super User password (if there is one).
Let’s start with floating licenses. Since we have 20 floating licenses of Maya and 20 render nodes, we don’t have to worry about running out of those licenses. However, we only have 10 licenses of Nuke, so we’ll need a Limit to ensure that no more than 10 render nodes are processing Nuke jobs at the same time.
CREATING THE NUKE LIMIT
To create this Limit, right-click anywhere in the list in the Limit panel and select New Limit. Alternatively, you can press the [+] button at the top of the Limit panel.
This will bring up the Create New Limit dialog. Now let’s set the following settings:
Name: Since this Limit is for Nuke, let’s simply set the name to “nuke”.
Usage Level: Nuke itself only consumes one license per machine, regardless of the number of instances of it running on the machine. Because of this, we’ll set the Usage Level to Machine. If Nuke was licensed per instance, we would choose Task instead.
Limit: This is the maximum value for the Limit. Since we have 10 Nuke licenses, we’ll set this to 10.
You can now press OK to create the new Limit.
ASSIGNING THE NUKE LIMIT
Creating the Nuke Limit only gets us halfway there. The next step is to ensure that Nuke jobs have the Nuke Limit assigned to them when they are submitted to Deadline. If a Nuke job doesn’t have the Nuke Limit assigned to it, Deadline can’t track the license resources that the job requires, which can result in licensing errors.
One way to assign the Nuke Limit is to specify it in the Nuke submitter. Simply click on the Browse button next to the Limits field and choose the Nuke Limit (note that all submitters that ship with Deadline support the ability to specify Limits).
In our case, we want every Nuke job to be assigned the Nuke Limit and we don’t want the Nuke artists to have to remember to set the Nuke Limit every time they submit a job. The good news is that Deadline allows us to specify Limits at the Render Plugin level, which means we can ensure that any job that uses Nuke will automatically use the Nuke Limit. To do this, select Tools -> Configure Plugins in the Monitor, and select Nuke from the list on the left.
Press theSet Limits button at the bottom of the Nuke plugin settings, and then drag and drop “nuke” to theSelected list. Press OK to set the Limits, and then press OK again to save the Nuke plugin configuration. Now every Nuke job will automatically use the Nuke Limit!
Note that the Nuke Limit isn’t actually assigned to the job, so it won’t show up in the Limits column in the Job Panel, and it won’t show up in job’s Limit list when viewing its properties. Instead, Deadline detects at render time that the job’s Render Plugin requires the Nuke Limit. This is done this way so that if the Nuke plugin’s Limit list changes, you don’t have to change the properties for every Nuke job that was previously submitted.
NODE LOCKED LICENSES
While our example farm only consists of floating Nuke licenses, it’s possible for a render farm to have both floating and node locked licenses in place. For example, the render nodes themselves might share floating licenses, but a few artists could have node locked licenses, and they want their machines to join the render farm when they’re not using them.
The key thing is that we don’t want these workstations to count against the Nuke Limit’s maximum value. This is because they are using node locked licenses and won’t use up any of the floating licenses. This can be achieved by adding the workstations to the Nuke Limit’s Excluded List. Simply right-click on the Nuke Limit in the Limit panel and select Modify Limit Properties. In this example, all the “mobile” machines are workstations with node locked licenses, so they’ve been added to the Excluded List.
When a machine is in the Excluded List, it can still pick up jobs that require the Nuke Limit, but it won’t check out a “Stub”, and therefore won’t count against the Limit’s maximum value.
LIMITING DATA ACCESS
Now that we have a Nuke Limit managing our limited license resources, let’s handle the data access problem. After running numerous tests, we’ve confirmed that our network starts having issues if more than five Maya jobs start rendering at the same time. When six or more machines try to pull the data cache at the same time, the network slows down considerably, which can cause data access errors.
The initial solution is to create a Maya Limit the same we created the Nuke Limit, but this time set the maximum value to five. We’ll set the Usage Level to Task this time, just in case any Maya jobs get submitted with Concurrent Tasks greater than one. If there are multiple instances of Maya on one machine, we want each one to count against the Limit’s maximum value.
We also want this Maya Limit to apply to all Maya jobs, so we’ll edit the MayaCmd and MayaBatch plugins like we did for the Nuke plugin above, except that we’ll add “maya” to the Selected list.
However, after rendering some jobs, it becomes clear that the farm isn’t being fully utilized for Maya jobs properly because no more than five Maya jobs can render at the same time. Yes, this is saving the network from being overwhelmed, but it turns out the data cache is only accessed when initially loading the Maya scene file. So if we could somehow only apply the Maya Limit to the start of the job (when the scene data is being loaded from the server), that would allow more than five Maya jobs to render at the same time!
RELEASING LIMIT STUBS EARLY
When creating the Limits, you may have noticed a property called Release at Task Progress, followed by a percentage value. When enabled, a Deadline Worker will check in the Limit’s “Stub” when the rendering progress reaches the specified value. In the case of scene loading, the rendering progress is still at 0% during and after the scene is loaded, so we can’t use a value of 0%. But if we set the progress value to 1%, we can be confident that the Limit “Stub” will be returned after the scene has finished loading.
When a render reaches 1%, the Worker checks in its “Stub” so that another Worker can start a Maya job. This allows more than 5 machines to work on Maya jobs at the same time, while ensuring that no more than five machines are accessing the data cache.
WHAT ELSE CAN WE DO WITH LIMITS?
While we’ve covered how Limits are primarily used, there are other things we can do with Limits.
3RD PARTY USAGE BASED LICENSING
Usage Based Licensing was introduced in Deadline 8.0. It’s a new on-demand licensing model that can be used as an alternative to traditional floating licenses, or as supplemental licensing to cover temporary increases in render nodes (cloud burst compute, rentals, artist machines overnight, etc).
Not only is Usage Based Licensing available to license Deadline, it can also be used to license select 3rd party products when using Deadline to render them. Limits play a key role in managing the 3rd party Usage Based Licensing system, and this is where the Limit Overage and Use Usage Based Third Party Licensing options come into play.
3rd party Usage Based Licensing was covered in a previous blog entry, but the information in that entry is a little outdated, so we recommend reading the Usage Based Licensing documentation for more information.
WHITELISTING OR BLACKLISTING RENDER NODES
There might be cases where we want to prevent certain machines from rendering jobs that require specific Limits. For example, the “support-01” render node on the farm doesn’t have a lot of RAM, and often struggles to render Nuke jobs. So to avoid potential problems, I want to prevent it from rendering Nuke jobs.
This can be done by editing the Nuke Limit and adding “support-01” to the list of Blacklisted Machines.
Alternatively, I could add every machine except for “support-01” to the Limit’s Whitelist. One reason for using a whitelist instead of a blacklist is that any new machines that are added to the farm won’t be able to pick up Nuke jobs by default. Once the machine is properly configured to render Nuke jobs, it can then be added to the Nuke Limit’s Whitelist.
It is possible to use an existing Limit as the base for a new Limit. For example, let’s say I’ve purchased 5 floating NukeX licenses, as well as node locked NukeX licenses for artist workstations. I need to create a new NukeX Limit, but in this case, the only difference between it and the existing Nuke Limit is that the maximum value is 5 instead of 10.
So instead of creating the NukeX Limit from scratch, I can right-click on the existing Nuke Limit in the Limit Panel and select Clone Limit.
This will set the new Limit’s initial settings to match the existing Nuke Limit. All I have to do is change the Name to “nukex” and change the Limit value to 5. After pressing OK, I now have my new NukeX Limit!
A common requirement for render farms is to submit a job to a specific group of render nodes. Deadline has a Group feature designed specifically for this, and Groups are often used to organize render nodes based on software or hardware requirements. For example, if only certain render nodes have Maya installed, I can create a Maya Group and submit all Maya jobs to it. This ensures that my Maya jobs only render on machines that can render successfully.
However, this gets a bit more complex if I have machines with different amounts of cores and RAM. For example, let’s say that my Maya machines have a combination of 4/8 cores, and 8/16/32 GB of RAM. If I want a Group for each hardware and software combination, I end up with 6 Groups:
That’s not too bad to manage, but if I do the same for Nuke, I’m now up to 12 Groups! Not only does that present the user with 12 Groups to choose from during submission, but the system administrator will have to place the correct render nodes in each Group as well. A Maya machine with 8 cores and 32 GB of RAM could be placed in all 6 Maya Groups, since it meets the requirements of each.
An alternative to using Groups in this situation is to use Limits. In this case, a Limit could be created for each resource:
Within each limit, the Whitelist would be set to only include render nodes that meet the resource requirement. For example, all machines with 4 or 8 cores would be whitelisted for the “4_core” Limit, but only 8 core machines would be whitelisted for the “8_core” Limit.
When the artist goes to submit the job, they simply need to select the Limits to match the resources they need. For example, they could select “maya”, “8_cores”, and “32_gb” for heavier Maya renders. For lighter renders, maybe they choose “maya”, “4_cores”, and “8_gb”.
While this still takes some initial efforts to set up, it’s much easier to maintain going forward. For example, if I were to add Nuke, I would just have to create a Nuke Limit and whitelist the appropriate machines. As mentioned above, I would have to make 6 new Nuke Groups if I was using Groups.
Now I know what you’re thinking. Wouldn’t it be easier to make Resource Tagging a first class feature in Deadline, instead of having to use Limits for this? The answer is yes, and is something we will be considering for a future version of Deadline!
Limits can play a vital role in ensuring that your render farm is being used efficiently, and as shown in this blog entry, they can do more than simply manage floating licenses. For additional reading, check out the Limit Documentation and the Job Scheduling Documentation.