Job Scheduling in Deadline
Version: Deadline 9.0
Have you ever wondered why someone else's Job was rendering before one of yours? This is a common question for Deadline newcomers and veterans alike and the answer is not always straightforward, so let's take a closer look at how the Deadline Workers make their decisions.
There are two dimensions that are considered by the Worker when deciding which Job to start working on. The first is the notion of whether or not it can render a Job; this is largely just a set of 'yes' or 'no' questions that the Worker asks. Examples of questions asked when dequeuing a Job would be things like "Am I on this Job's Whitelist?", "Do I belong to this Job's Group?", or "Can this Worker acquire a Stub for this Limit?". If we were to think of the Job Queue as a list view, this would be akin to the Worker applying a Filter to the list, in order to reduce the number of Jobs there are to consider.
The second dimension is the notion of Scheduling Order, which effectively instructs Workers on which order they should attempt to work on Jobs. Continuing our list view metaphor, this would be similar to sorting the list based on one or more columns. Unexpected things happening in this dimension is where things get a bit more confusing, especially if there are several 'columns' involved. Given that this is the more interesting and variable of the two dimensions, let's explore it a bit further!
JOB SCHEDULING ORDER
We'll start our journey of exploring Deadline's Job Scheduling Order way down in the depths of the Configure Repository Options dialog. Under the Job Settings section, you can find the settings which we will spend the rest of this post exploring, the Scheduling Order settings:
The 'Job Scheduling Order' combo box depicted above controls how Jobs in the farm are ordered by Workers at dequeue time. While the contents of this combo box may seem intimidating at first, it really isn't too bad.
The first thing to know is that it is simply a comma-separated list of sort criteria for the Jobs. The leftmost of these is the primary criterion, and will generally be the determining factor for Job ordering. Any subsequent criteria in the list specify secondary (and tertiary, etc) sorting in the event that two Jobs yield the same value in the primary (or secondary, etc) criteria. In this way, anything beyond the primary is essentially just used as a tie-breaker for the previous mechanisms. The frequency at which these ties occur is very dependent on the actual mechanism, number of jobs/Workers, and general farm setup; secondary mechanisms may or may not end up being very relevant to overall ordering.
Now that we know the elements in the Scheduling Order combo box are just various permutations of different ordering mechanisms, all that's left is to go into those in a bit more detail, so let's jump right in!
The simplest of all the mechanisms, 'First-In First-Out' will order Jobs by Submit Date/Time in ascending order (older Jobs appearing first). If this is the only thing determining the Scheduling order, what this achieves is a pure FIFO queue in which Jobs are rendered in the order they were submitted. On its own, this may not seem very helpful, but can definitely be useful if you've got large compute loads, and are really only interested in overall throughput. In most cases though, generally 'First-In First-Out' is not used as the Primary mechanism, and only exists as a tie-breaker to one of the others below.
This one is also relatively simple; the Priority mechanism will sort Jobs in descending order of their 'Priority' property, resulting in Jobs with higher priority numbers getting rendered first. With Priority as the primary ordering mechanism, we solve our 'important Job' problem -- newer Jobs that need to be rendered first can simply be submitted at a higher priority under this scheme. However, this still may not be enough for your needs; both 'Priority' and 'First-In First-Out' will lead to each Worker making the same decisions in terms of prioritization of Jobs.
Pools are arguably the most complex ordering mechanism; they have warranted their own feature blog post on a couple occasions, after all. This is largely because Pools tend to break the notion of an overall Job "Queue", allowing each individual Worker to be configured to have its own personalized Job ordering. As soon as you introduce Pools, each Worker will have its own idea of which Job in the Queue is the most important. They are useful for cases where you may want to dedicate a slice of your machines to always prioritize Jobs for Project A, while the rest of the farm prioritizes for Project B, for example.
Pools are particularly useful to have as the primary ordering mechanism, and it is the default in new Deadline installations (Pool, Priority, First-In Fist-Out). This is because the impact of Pools on priority is directly related to how much effort has been put into configuring them -- you may never configure pools at all, in which case your Scheduling Order would effectively just be 'Priority, First-In First-Out'.
The 'Balanced' mechanism is designed to attempt to spread available Workers across as many different Jobs as possible. This is essentially achieved by sorting the Jobs based on the number of their Tasks that are currently in the Rendering state. If you select a Scheduling Order that contains 'Balanced' in it, you'll notice that a couple extra options also become available to you.
The first of these is the 'Rendering Task Buffer' option -- this value will be subtracted from the Rendering Task count of the Workers current Job when dequeueing new Tasks. This helps provide a bit of 'inertia' to Workers that are already rendering a Job, so that they aren't constantly thrashing between different Jobs. The second newly-available option is the 'Enhanced Balancing Logic'. This will try to maintain more up-to-date counts for Rendering Tasks, at the cost of slightly increased database traffic.
A pure Balanced Scheduling Order is useful if you're wanting to generally treat all Jobs as equals, and distribute available Worker resources evenly among them. If, on the other hand, you're interested in making some Jobs more equal than others, a "Priority, Balanced" or "Pool, Balanced" Scheduling Order might be more your speed.
The Weighted mechanism is a bit different in that it actually blends three of the previous mechanisms together in one, and gives you control over the relative weight of each.
The simplest way of thinking about the Weighted mechanism is that it's basically just a derivative of the Priority number that changes dynamically based on:
- The number of errors accumulated by the Job.
- The number of rendering Tasks currently in the Job (similar to Balanced).
- The amount of time that the Job has been in the queue (similar to First-In First-Out).
Specifically, the exact formula is:
A*(Priority) + B*(# of seconds in queue) + C*(# of rendering Tasks) + D*(# of errors)
This formula gives us the weight for a Job, and the Workers will prefer Jobs with higher weights. Each of these coefficients (A, B, C, D) can be configured via the additional fields that become editable when a Weighted Scheduling Order has been selected. I've labelled them below in red, for ease of reference:
Note that the 'Weighted' type also includes the Balanced configuration options, for adjusting the Task Buffer and enabling Enhanced Balancing Logic; these work identically to how they did with Balanced.
Observant readers may have noticed by now that with proper weight selection, the Weighted mechanism can actually behave identically to the Priority (A=1, B=C=D=0), First-In First-Out (B=1, A=C=D=0), and Balanced (C=1, A=B=D=0) mechanisms. It could also be a really chaotic error-centric queue where Workers latch onto Jobs that generate more errors if D is a large positive number, so choose your values wisely! Hint: D should probably never be positive.
In all seriousness, though, picking good values for the coefficients can be a bit daunting, and might take a bit of trial and error at first, so let's go into an example. Again, I find it helpful to think of it as a modified Priority number, and look at what impact I'd like Errors, Rendering Tasks, and Time In Queue to have on that number. As an example, let's say I'd want the Priority to go up by 1 for every 10 minutes (600 seconds) a Job has spent in the queue, decrease the Priority by 5 for every Error the Job generates, and decrease it by 10 for every Task that's currently being rendered.
Since I'm just looking to modify the Priority, we can keep the Priority weight at 1. It's also fairly intuitive that the Error weight then needs to be -5, and the Rendering Task weight needs to be -10. The Submission Time weight is a bit trickier, and requires a tiny bit of math -- remembering that the value we're providing a coefficient for is in seconds, we want to ensure that B * 600 seconds = 1, so B should then clearly be 1 / 600 = 0.001666 (repeating).
Let's apply the formula to a couple Jobs with the same priority of 50:
- Job A: Priority 50, submitted 1 minute ago (60 seconds), 3 rendering tasks, 0 errors.
- Job B: Priority 50, submitted 2 minutes ago (120 seconds), 2 rendering tasks, 4 errors.
Now let's calculate the results:
- Job A: (1*50) + (0.0017*60) + (-10*3) + (-5*0) = 20.102
- Job B: (1*50) + (0.0017*120) + (-10*2) + (-5*4) = 10.204
Even though Job B is older and has less rendering tasks, Job A has the higher weight because it hasn't generated any errors. As a result, the Workers will prefer Job A over Job B.
Note that depending on what scale you want your numbers to be, it is very easy to scale these numbers up or down. You could easily set A = 10, B = 0.0167, C = -100, D = -50 and have the same effect described above, just with bigger numbers. Once you get the hang of reasoning about these coefficients this way, they're definitely not as intimidating as they look!
And that's all there is to it! Hopefully we were able to clear up some of the mysteries around Deadline's queueing logic, and you'll now be better equipped to answer tricky questions such as "Why do your Jobs always render before mine lately?". Or maybe not. Either way, I hope it's at least given you some idea of how to better customize Deadline's Job Ordering to suit your needs!
For additional reading, check out the Job Scheduling section of the Deadline User Manual.