Author: Adhityaa Chandrasekar (Google)
Jobs are a crucial part of
Kubernetes’ API. While other kinds of workloads such as Deployments,
solve use-cases that require Pods to run forever, Jobs are useful when Pods need
to run to completion. Commonly used in parallel batch processing, Jobs can be
used in a variety of applications ranging from video rendering and database
maintenance to sending bulk emails and scientific computing.
While the amount of parallelism and the conditions for Job completion are
configurable, the Kubernetes API lacked the ability to suspend and resume Jobs.
This is often desired when cluster resources are limited and a higher priority
Job needs to execute in the place of another Job. Deleting the lower priority
Job is a poor workaround as Pod completion history and other metrics associated
with the Job will be lost.
With the recent Kubernetes 1.21 release, you will be able to suspend a Job by
updating its spec. The feature is currently in alpha and requires you to
SuspendJob feature gate
on the API server
and the controller manager
in order to use it.
We introduced a new boolean field
suspend into the
.spec of Jobs. Let’s say
I create the following Job:
apiVersion: batch/v1 kind: Job metadata: name: my-job spec: suspend: true parallelism: 2 completions: 10 template: spec: containers: - name: my-container image: busybox command: ["sleep", "5"] restartPolicy: Never
Jobs are not suspended by default, so I’m explicitly setting the
to true in the
.spec of the above Job manifest. In the above example, the
Job controller will refrain from creating Pods until I’m ready to start the Job,
which I can do by updating
suspend to false.
As another example, consider a Job that was created with the
omitted. The Job controller will happily create Pods to work towards Job
completion. However, before the Job completes, if I explicitly set the field to
true with a Job update, the Job controller will terminate all active Pods that
are running and will wait indefinitely for the flag to be flipped back to false.
Typically, Pod termination is done by sending a SIGTERM signal to all container
processes in the Pod; the graceful termination period
defined in the Pod spec will be honoured. Pods terminated this way will not be
counted as failures by the Job controller.
It is important to understand that succeeded and failed Pods from the past will
continue to exist after you suspend a Job. That is, that they will count towards
Job completion once you resume it. You can verify this by looking at Job’s
status before and after suspension.
Read the documentation
for a full overview of this new feature.
Where is this useful?
Let’s say I’m the operator of a large cluster. I have many users submitting Jobs
to the cluster, but not all Jobs are created equal — some Jobs are more
important than others. Cluster resources aren’t infinite either, so all users
must share resources. If all Jobs were created in the suspended state and placed
in a pending queue, I can achieve priority-based Job scheduling by resuming Jobs
in the right order.
As another motivational use-case, consider a cloud provider where compute
resources are cheaper at night than in the morning. If I have a long-running Job
that takes multiple days to complete, being able to suspend the Job in the
morning and then resume it in the evening every day can reduce costs.
Since this field is a part of the Job spec, CronJobs
automatically get this feature for free too.
References and next steps
If you’re interested in a deeper dive into the rationale behind this feature and
the decisions we have taken, consider reading the enhancement proposal.
There’s more detail on suspending and resuming jobs in the documentation for Job.
As previously mentioned, this feature is currently in alpha and is available
only if you explicitly opt-in through the
SuspendJob feature gate.
If this is a feature you’re interested in, please consider testing suspended
Jobs in your cluster and providing feedback. You can discuss this enhancement on GitHub.
The SIG Apps community also meets regularly
and can be reached through Slack or the mailing list.
Barring any unexpected changes to the API, we intend to graduate the feature to
beta in Kubernetes 1.22, so that the feature becomes available by default.