API Reference¶
Packages¶
batch.tensorstack.dev/v1beta1¶
Package v1beta1 contains API Schema definitions for the batch v1beta1 API group
Resource Types¶
Aggregate¶
Aggregate records the number of replica pods at each phase.
Appears in: - GenericJobStatus
Field | Description |
---|---|
creating integer |
Pod has been created, but resources have not been scheduled. |
pending integer |
Pod has been accepted by the system, but one or more of the containers has not been started. This includes time before being bound to a node, as well as time spent pulling images onto the host. |
running integer |
Pod has been bound to a node and all of the containers have been started. At least one container is still running or is in the process of being restarted. |
succeeded integer |
All containers in the pod have voluntarily terminated with a container exit code of 0, and the system is not going to restart any of these containers. |
failed integer |
All containers in the pod have terminated, and at least one container has terminated in failure (exited with a non-zero exit code or was stopped by the system). |
unknown integer |
For some reason the state of the pod could not be obtained, typically due to an error in communicating with the host of the pod. |
deleted integer |
Pod has been deleted. |
CleanUpPolicy¶
Underlying type: string
CleanUpPolicy specifies the collection of replicas that are to be deleted upon job completion.
Appears in: - GenericJobSpec
ContainerStatus¶
ContainerStatus defines the observed state of the container.
Appears in: - ReplicaStatus
DebugMode¶
DebugMode configs whether and how to start a job in debug mode.
Appears in: - RunMode
Field | Description |
---|---|
enabled boolean |
Whether to enable debug mode. |
replicaSpecs ReplicaDebugSet array |
If provided, these specs provide overwriting values for job replicas. |
FinishRule¶
A finishRule is a condition used to check if the job has finished. A finishRule identifies a set of replicas, and the controller determines the job's status by checking the status of all of these replicas.
Appears in: - GenericJobSpec
GenericJob¶
GenericJob represents the schema for a general-purpose batch job API. While it offers less automation compared to specialized APIs like PyTorchTrainingJob, it allows for greater flexibility in specifying parallel replicas/pods. This design serves as a comprehensive job definition mechanism when more specialized APIs are not applicable or available.
Appears in: - GenericJobList
Field | Description |
---|---|
apiVersion string |
batch.tensorstack.dev/v1beta1 |
kind string |
GenericJob |
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
spec GenericJobSpec |
|
status GenericJobStatus |
GenericJobList¶
GenericJobList contains a list of GenericJob
Field | Description |
---|---|
apiVersion string |
batch.tensorstack.dev/v1beta1 |
kind string |
GenericJobList |
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata . |
items GenericJob array |
GenericJobSpec¶
GenericJobSpec defines the desired state of GenericJob
Appears in: - GenericJob
Field | Description |
---|---|
successRules FinishRule array |
Rules used to check if a generic job has succeeded. The job succeeded when any one of the successRules is fulfilled. Each item of successRules may refer to a series of replicas, and the job succeeded only if all of the replicas referred in this series are completed successfully. |
failureRules FinishRule array |
Rules used to check if a generic job has failed. The job failed when any one of failureRules is fulfilled. Each item of failureRules refers to a series of replicas, and the job failed only if all of the replicas in this series failed. |
service ServiceOption |
Details of v1/Service for replica pods. Optional: Defaults to empty and no service will be created. |
runMode RunMode |
Job running mode. Defaults to Immediate mode. |
cleanUpPolicy CleanUpPolicy |
To avoid wasting resources on completed tasks, controller will reclaim resource according to the following policies: None: (default) no resources reclamation; Unfinished: only finished pods is to be deleted; All: all the pods are to be deleted. |
scheduler SchedulePolicy |
If specified, the pod will be dispatched by the specified scheduler. Otherwise, the pod will be dispatched by the default scheduler. |
replicaSpecs ReplicaSpec array |
List of replica specs belonging to the job. There must be at least one replica defined for a Job. |
GenericJobStatus¶
GenericJobStatus defines the observed state of GenericJob
Appears in: - GenericJob
Field | Description |
---|---|
tasks Tasks array |
An array of status of individual tasks. |
phase JobPhase |
Provides a simple, high-level summary of where the Job is in its lifecycle. Note that this is NOT indended to be a comprehensive state machine. |
aggregate Aggregate |
Records the number of replicas at each phase. |
conditions JobCondition array |
The latest available observations of a job's current state. |
JobCondition¶
JobCondition describes the current state of a job.
Appears in: - GenericJobStatus
Field | Description |
---|---|
type JobConditionType |
Type of job condition: Complete or Failed. |
status ConditionStatus |
Status of the condition, one of True, False, Unknown. |
lastTransitionTime Time |
Last time the condition transited from one status to another. |
reason string |
Brief reason for the condition's last transition. |
message string |
Human readable message indicating details about last transition. |
JobConditionType¶
Underlying type: string
JobConditionType defines all possible types of JobStatus. Can be one of: Initialized, Running, ReplicaFailure, Completed, or Failed.
Appears in: - JobCondition
JobPhase¶
Underlying type: string
Appears in: - GenericJobStatus
PauseMode¶
PauseMode configs whether and how to start a job in pause mode.
Appears in: - RunMode
Field | Description |
---|---|
enabled boolean |
Whether to enable pause mode. |
resumeSpecs ResumeSpec array |
If provided, these specs provide overwriting values for job replicas when resuming. |
ReplicaDebugSet¶
ReplicaDebugSet describes how to start replicas in debug mode.
Appears in: - DebugMode
Field | Description |
---|---|
type string |
Replica type. |
skipInitContainer boolean |
Skips creation of initContainer, if true. |
command string |
Entrypoint array. Optional: Default to ["sleep", "inf"] |
ReplicaSpec¶
ReplicaSpec defines the desired state of replicas.
Appears in: - GenericJobSpec
Field | Description |
---|---|
type string |
Replica type. |
replicas integer |
The desired number of replicas of this replica type. Defaults to 1. |
restartPolicy RestartPolicy |
Restart policy for replicas of this replica type. One of Always, OnFailure, Never. Optional: Default to OnFailure. |
template PodTemplateSpec |
Defines the template used to create pods. |
ReplicaStatus¶
ReplicaStatus defines the observed state of the pod.
Appears in: - Tasks
Field | Description |
---|---|
name string |
Pod name. |
uid UID |
Pod uid. |
phase PodPhase |
Pod phase. The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. |
containers ContainerStatus array |
Containers status. |
RestartPolicy¶
RestartPolicy describes how the replica should be restarted.
Appears in: - ReplicaSpec
Field | Description |
---|---|
policy RestartPolicyType |
The policy to restart finished replica. |
limit integer |
The maximum number of restarts. Optional: Default to 0. |
RestartPolicyType¶
Underlying type: string
Appears in: - RestartPolicy
ResumeSpec¶
ResumeSpec describes how to resume replicas from pause mode.
Appears in: - PauseMode
Field | Description |
---|---|
type string |
Replica type. |
skipInitContainer boolean |
Skips creation of initContainer, if true. |
command string |
Entrypoint array. Provides overwriting values if provided; otherwise, values in immediate mode are used. |
args string |
Arguments to the entrypoint. Arguments in immediate mode are used if not provided. |
RunMode¶
RunMode defines the job's execution behavior: Immediate mode: (Default) Tasks are executed immediately upon submission. Debug mode: Job pods are created, but regular executions are replaced with null operations (e.g., sleep) for convenient debugging purposes. Pause mode: Job execution is halted, and pods are deleted to reclaim resources. A graceful pod termination process is initiated to allow pods to exit cleanly.
Appears in: - GenericJobSpec
Field | Description |
---|---|
debug DebugMode |
Debug mode. |
pause PauseMode |
Pause mode. |
SchedulePolicy¶
SchedulePolicy signals to K8s how the job should be scheduled.
Appears in: - GenericJobSpec
Field | Description |
---|---|
t9kScheduler T9kScheduler |
T9k Scheduler. TODO: link to t9k scheduler docs. |
ServiceOption¶
Details of a replicas' servivce.
Appears in: - GenericJobSpec
Field | Description |
---|---|
ports ServicePort array |
The list of ports that are exposed by this service. |
T9kScheduler¶
T9kScheduler provides additonal configurations needed for the scheduling process.
Appears in: - SchedulePolicy
Field | Description |
---|---|
queue string |
Specifies the name of the queue should be used for running this workload. TODO: link to t9k scheduler docs. |
priority integer |
Indicates the priority of the PodGroup; valid range: [0, 100]. Optional: Default to 0. |
Tasks¶
Task defines the observed state of the task.
Appears in: - GenericJobStatus
Field | Description |
---|---|
type string |
Replica type. |
restartCount integer |
The number of restarts that have been performed. |
replicas ReplicaStatus array |
Replicas status array. |