API Reference
Packages
batch.tensorstack.dev/v1beta1
Package v1beta1 contains API Schema definitions for the batch v1beta1 API group
Resource Types
Aggregate
Aggregate records the number of replica pods at each phase.
Appears in:
Field | Description |
---|---|
creating integer | Pod has been created, but resources have not been scheduled. |
pending integer | Pod has been accepted by the system, but one or more of the containers has not been started. This includes time before being bound to a node, as well as time spent pulling images onto the host. |
running integer | Pod has been bound to a node and all of the containers have been started. At least one container is still running or is in the process of being restarted. |
succeeded integer | All containers in the pod have voluntarily terminated with a container exit code of 0, and the system is not going to restart any of these containers. |
failed integer | All containers in the pod have terminated, and at least one container has terminated in failure (exited with a non-zero exit code or was stopped by the system). |
unknown integer | For some reason the state of the pod could not be obtained, typically due to an error in communicating with the host of the pod. |
deleted integer | Pod has been deleted. |
CleanUpPolicy
Underlying type: string
CleanUpPolicy specifies the collection of replicas that are to be deleted upon job completion.
Appears in:
ContainerStatus
ContainerStatus defines the observed state of the container.
Appears in:
DebugMode
DebugMode configs whether and how to start a job in debug mode.
Appears in:
Field | Description |
---|---|
enabled boolean | Whether to enable debug mode. |
replicaSpecs ReplicaDebugSet array | If provided, these specs provide overwriting values for job replicas. |
FinishRule
A finishRule is a condition used to check if the job has finished. A finishRule identifies a set of replicas, and the controller determines the job’s status by checking the status of all of these replicas.
Appears in:
GenericJob
GenericJob represents the schema for a general-purpose batch job API. While it offers less automation compared to specialized APIs like PyTorchTrainingJob, it allows for greater flexibility in specifying parallel replicas/pods. This design serves as a comprehensive job definition mechanism when more specialized APIs are not applicable or available.
Appears in:
Field | Description |
---|---|
apiVersion string | batch.tensorstack.dev/v1beta1 |
kind string | GenericJob |
metadata ObjectMeta | Refer to Kubernetes API documentation for fields of metadata . |
spec GenericJobSpec | |
status GenericJobStatus |
GenericJobList
GenericJobList contains a list of GenericJob
Field | Description |
---|---|
apiVersion string | batch.tensorstack.dev/v1beta1 |
kind string | GenericJobList |
metadata ListMeta | Refer to Kubernetes API documentation for fields of metadata . |
items GenericJob array |
GenericJobSpec
GenericJobSpec defines the desired state of GenericJob
Appears in:
Field | Description |
---|---|
successRules FinishRule array | Rules used to check if a generic job has succeeded. The job succeeded when any one of the successRules is fulfilled. Each item of successRules may refer to a series of replicas, and the job succeeded only if all of the replicas referred in this series are completed successfully. |
failureRules FinishRule array | Rules used to check if a generic job has failed. The job failed when any one of failureRules is fulfilled. Each item of failureRules refers to a series of replicas, and the job failed only if all of the replicas in this series failed. |
service ServiceOption | Details of v1/Service for replica pods. Optional: Defaults to empty and no service will be created. |
runMode RunMode | Job running mode. Defaults to Immediate mode. |
cleanUpPolicy CleanUpPolicy | To avoid wasting resources on completed tasks, controller will reclaim resource according to the following policies: None: (default) no resources reclamation; Unfinished: only finished pods is to be deleted; All: all the pods are to be deleted. |
scheduler SchedulePolicy | If specified, the pod will be dispatched by the specified scheduler. Otherwise, the pod will be dispatched by the default scheduler. |
replicaSpecs ReplicaSpec array | List of replica specs belonging to the job. There must be at least one replica defined for a Job. |
GenericJobStatus
GenericJobStatus defines the observed state of GenericJob
Appears in:
Field | Description |
---|---|
tasks Tasks array | An array of status of individual tasks. |
phase JobPhase | Provides a simple, high-level summary of where the Job is in its lifecycle. Note that this is NOT indended to be a comprehensive state machine. |
aggregate Aggregate | Records the number of replicas at each phase. |
conditions JobCondition array | The latest available observations of a job’s current state. |
JobCondition
JobCondition describes the current state of a job.
Appears in:
Field | Description |
---|---|
type JobConditionType | Type of job condition: Complete or Failed. |
status ConditionStatus | Status of the condition, one of True, False, Unknown. |
lastTransitionTime Time | Last time the condition transited from one status to another. |
reason string | Brief reason for the condition’s last transition. |
message string | Human readable message indicating details about last transition. |
JobConditionType
Underlying type: string
JobConditionType defines all possible types of JobStatus. Can be one of: Initialized, Running, ReplicaFailure, Completed, or Failed.
Appears in:
JobPhase
Underlying type: string
Appears in:
PauseMode
PauseMode configs whether and how to start a job in pause mode.
Appears in:
Field | Description |
---|---|
enabled boolean | Whether to enable pause mode. |
resumeSpecs ResumeSpec array | If provided, these specs provide overwriting values for job replicas when resuming. |
ReplicaDebugSet
ReplicaDebugSet describes how to start replicas in debug mode.
Appears in:
Field | Description |
---|---|
type string | Replica type. |
skipInitContainer boolean | Skips creation of initContainer, if true. |
command string | Entrypoint array. Optional: Default to [“sleep”, “inf”] |
ReplicaSpec
ReplicaSpec defines the desired state of replicas.
Appears in:
Field | Description |
---|---|
type string | Replica type. |
replicas integer | The desired number of replicas of this replica type. Defaults to 1. |
restartPolicy RestartPolicy | Restart policy for replicas of this replica type. One of Always, OnFailure, Never. Optional: Default to OnFailure. |
template PodTemplateSpec | Defines the template used to create pods. |
ReplicaStatus
ReplicaStatus defines the observed state of the pod.
Appears in:
Field | Description |
---|---|
name string | Pod name. |
uid UID | Pod uid. |
phase PodPhase | Pod phase. The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. |
containers ContainerStatus array | Containers status. |
RestartPolicy
RestartPolicy describes how the replica should be restarted.
Appears in:
Field | Description |
---|---|
policy RestartPolicyType | The policy to restart finished replica. |
limit integer | The maximum number of restarts. Optional: Default to 0. |
RestartPolicyType
Underlying type: string
Appears in:
ResumeSpec
ResumeSpec describes how to resume replicas from pause mode.
Appears in:
Field | Description |
---|---|
type string | Replica type. |
skipInitContainer boolean | Skips creation of initContainer, if true. |
command string | Entrypoint array. Provides overwriting values if provided; otherwise, values in immediate mode are used. |
args string | Arguments to the entrypoint. Arguments in immediate mode are used if not provided. |
RunMode
RunMode defines the job’s execution behavior: Immediate mode: (Default) Tasks are executed immediately upon submission. Debug mode: Job pods are created, but regular executions are replaced with null operations (e.g., sleep) for convenient debugging purposes. Pause mode: Job execution is halted, and pods are deleted to reclaim resources. A graceful pod termination process is initiated to allow pods to exit cleanly.
Appears in:
SchedulePolicy
SchedulePolicy signals to K8s how the job should be scheduled.
Appears in:
Field | Description |
---|---|
t9kScheduler T9kScheduler | T9k Scheduler. TODO: link to t9k scheduler docs. |
ServiceOption
Details of a replicas’ servivce.
Appears in:
Field | Description |
---|---|
ports ServicePort array | The list of ports that are exposed by this service. |
T9kScheduler
T9kScheduler provides additonal configurations needed for the scheduling process.
Appears in:
Field | Description |
---|---|
queue string | Specifies the name of the queue should be used for running this workload. TODO: link to t9k scheduler docs. |
priority integer | Indicates the priority of the PodGroup; valid range: [0, 100]. Optional: Default to 0. |
Tasks
Task defines the observed state of the task.
Appears in:
Field | Description |
---|---|
type string | Replica type. |
restartCount integer | The number of restarts that have been performed. |
replicas ReplicaStatus array | Replicas status array. |