API Reference¶
Packages¶
batch.tensorstack.dev/v1beta1¶
Package v1beta1 contains API Schema definitions for the batch v1beta1 API group
Resource Types¶
Aggregate¶
Aggregate records the number of replica pods at each phase.
Appears in: - GenericJobStatus
| Field | Description | 
|---|---|
creating integer | 
Pod has been created, but resources have not been scheduled. | 
pending integer | 
Pod has been accepted by the system, but one or more of the containers has not been started. This includes time before being bound to a node, as well as time spent pulling images onto the host. | 
running integer | 
Pod has been bound to a node and all of the containers have been started. At least one container is still running or is in the process of being restarted. | 
succeeded integer | 
All containers in the pod have voluntarily terminated with a container exit code of 0, and the system is not going to restart any of these containers. | 
failed integer | 
All containers in the pod have terminated, and at least one container has terminated in failure (exited with a non-zero exit code or was stopped by the system). | 
unknown integer | 
For some reason the state of the pod could not be obtained, typically due to an error in communicating with the host of the pod. | 
deleted integer | 
Pod has been deleted. | 
CleanUpPolicy¶
Underlying type: string
CleanUpPolicy specifies the collection of replicas that are to be deleted upon job completion.
Appears in: - GenericJobSpec
ContainerStatus¶
ContainerStatus defines the observed state of the container.
Appears in: - ReplicaStatus
DebugMode¶
DebugMode configs whether and how to start a job in debug mode.
Appears in: - RunMode
| Field | Description | 
|---|---|
enabled boolean | 
Whether to enable debug mode. | 
replicaSpecs ReplicaDebugSet array | 
If provided, these specs provide overwriting values for job replicas. | 
FinishRule¶
A finishRule is a condition used to check if the job has finished. A finishRule identifies a set of replicas, and the controller determines the job's status by checking the status of all of these replicas.
Appears in: - GenericJobSpec
GenericJob¶
GenericJob represents the schema for a general-purpose batch job API. While it offers less automation compared to specialized APIs like PyTorchTrainingJob, it allows for greater flexibility in specifying parallel replicas/pods. This design serves as a comprehensive job definition mechanism when more specialized APIs are not applicable or available.
Appears in: - GenericJobList
| Field | Description | 
|---|---|
apiVersion string | 
batch.tensorstack.dev/v1beta1 | 
kind string | 
GenericJob | 
metadata ObjectMeta | 
Refer to Kubernetes API documentation for fields of metadata. | 
spec GenericJobSpec | 
|
status GenericJobStatus | 
GenericJobList¶
GenericJobList contains a list of GenericJob
| Field | Description | 
|---|---|
apiVersion string | 
batch.tensorstack.dev/v1beta1 | 
kind string | 
GenericJobList | 
metadata ListMeta | 
Refer to Kubernetes API documentation for fields of metadata. | 
items GenericJob array | 
GenericJobSpec¶
GenericJobSpec defines the desired state of GenericJob
Appears in: - GenericJob
| Field | Description | 
|---|---|
successRules FinishRule array | 
Rules used to check if a generic job has succeeded. The job succeeded when any one of the successRules is fulfilled. Each item of successRules may refer to a series of replicas, and the job succeeded only if all of the replicas referred in this series are completed successfully. | 
failureRules FinishRule array | 
Rules used to check if a generic job has failed. The job failed when any one of failureRules is fulfilled. Each item of failureRules refers to a series of replicas, and the job failed only if all of the replicas in this series failed. | 
service ServiceOption | 
Details of v1/Service for replica pods. Optional: Defaults to empty and no service will be created. | 
runMode RunMode | 
Job running mode. Defaults to Immediate mode. | 
cleanUpPolicy CleanUpPolicy | 
To avoid wasting resources on completed tasks, controller will reclaim resource according to the following policies: None: (default) no resources reclamation; Unfinished: only finished pods is to be deleted; All: all the pods are to be deleted. | 
scheduler SchedulePolicy | 
If specified, the pod will be dispatched by the specified scheduler. Otherwise, the pod will be dispatched by the default scheduler. | 
replicaSpecs ReplicaSpec array | 
List of replica specs belonging to the job. There must be at least one replica defined for a Job. | 
GenericJobStatus¶
GenericJobStatus defines the observed state of GenericJob
Appears in: - GenericJob
| Field | Description | 
|---|---|
tasks Tasks array | 
An array of status of individual tasks. | 
phase JobPhase | 
Provides a simple, high-level summary of where the Job is in its lifecycle. Note that this is NOT indended to be a comprehensive state machine. | 
aggregate Aggregate | 
Records the number of replicas at each phase. | 
conditions JobCondition array | 
The latest available observations of a job's current state. | 
JobCondition¶
JobCondition describes the current state of a job.
Appears in: - GenericJobStatus
| Field | Description | 
|---|---|
type JobConditionType | 
Type of job condition: Complete or Failed. | 
status ConditionStatus | 
Status of the condition, one of True, False, Unknown. | 
lastTransitionTime Time | 
Last time the condition transited from one status to another. | 
reason string | 
Brief reason for the condition's last transition. | 
message string | 
Human readable message indicating details about last transition. | 
JobConditionType¶
Underlying type: string
JobConditionType defines all possible types of JobStatus. Can be one of: Initialized, Running, ReplicaFailure, Completed, or Failed.
Appears in: - JobCondition
JobPhase¶
Underlying type: string
Appears in: - GenericJobStatus
PauseMode¶
PauseMode configs whether and how to start a job in pause mode.
Appears in: - RunMode
| Field | Description | 
|---|---|
enabled boolean | 
Whether to enable pause mode. | 
resumeSpecs ResumeSpec array | 
If provided, these specs provide overwriting values for job replicas when resuming. | 
ReplicaDebugSet¶
ReplicaDebugSet describes how to start replicas in debug mode.
Appears in: - DebugMode
| Field | Description | 
|---|---|
type string | 
Replica type. | 
skipInitContainer boolean | 
Skips creation of initContainer, if true. | 
command string | 
Entrypoint array. Optional: Default to ["sleep", "inf"] | 
ReplicaSpec¶
ReplicaSpec defines the desired state of replicas.
Appears in: - GenericJobSpec
| Field | Description | 
|---|---|
type string | 
Replica type. | 
replicas integer | 
The desired number of replicas of this replica type. Defaults to 1. | 
restartPolicy RestartPolicy | 
Restart policy for replicas of this replica type. One of Always, OnFailure, Never. Optional: Default to OnFailure. | 
template PodTemplateSpec | 
Defines the template used to create pods. | 
ReplicaStatus¶
ReplicaStatus defines the observed state of the pod.
Appears in: - Tasks
| Field | Description | 
|---|---|
name string | 
Pod name. | 
uid UID | 
Pod uid. | 
phase PodPhase | 
Pod phase. The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. | 
containers ContainerStatus array | 
Containers status. | 
RestartPolicy¶
RestartPolicy describes how the replica should be restarted.
Appears in: - ReplicaSpec
| Field | Description | 
|---|---|
policy RestartPolicyType | 
The policy to restart finished replica. | 
limit integer | 
The maximum number of restarts. Optional: Default to 0. | 
RestartPolicyType¶
Underlying type: string
Appears in: - RestartPolicy
ResumeSpec¶
ResumeSpec describes how to resume replicas from pause mode.
Appears in: - PauseMode
| Field | Description | 
|---|---|
type string | 
Replica type. | 
skipInitContainer boolean | 
Skips creation of initContainer, if true. | 
command string | 
Entrypoint array. Provides overwriting values if provided; otherwise, values in immediate mode are used. | 
args string | 
Arguments to the entrypoint. Arguments in immediate mode are used if not provided. | 
RunMode¶
RunMode defines the job's execution behavior: Immediate mode: (Default) Tasks are executed immediately upon submission. Debug mode: Job pods are created, but regular executions are replaced with null operations (e.g., sleep) for convenient debugging purposes. Pause mode: Job execution is halted, and pods are deleted to reclaim resources. A graceful pod termination process is initiated to allow pods to exit cleanly.
Appears in: - GenericJobSpec
| Field | Description | 
|---|---|
debug DebugMode | 
Debug mode. | 
pause PauseMode | 
Pause mode. | 
SchedulePolicy¶
SchedulePolicy signals to K8s how the job should be scheduled.
Appears in: - GenericJobSpec
| Field | Description | 
|---|---|
t9kScheduler T9kScheduler | 
T9k Scheduler. TODO: link to t9k scheduler docs. | 
ServiceOption¶
Details of a replicas' servivce.
Appears in: - GenericJobSpec
| Field | Description | 
|---|---|
ports ServicePort array | 
The list of ports that are exposed by this service. | 
T9kScheduler¶
T9kScheduler provides additonal configurations needed for the scheduling process.
Appears in: - SchedulePolicy
| Field | Description | 
|---|---|
queue string | 
Specifies the name of the queue should be used for running this workload. TODO: link to t9k scheduler docs. | 
priority integer | 
Indicates the priority of the PodGroup; valid range: [0, 100]. Optional: Default to 0. | 
Tasks¶
Task defines the observed state of the task.
Appears in: - GenericJobStatus
| Field | Description | 
|---|---|
type string | 
Replica type. | 
restartCount integer | 
The number of restarts that have been performed. | 
replicas ReplicaStatus array | 
Replicas status array. |