API Reference
Packages
batch.tensorstack.dev/v1beta1
Package v1beta1 contains API Schema definitions for the batch v1beta1 API group
Resource Types
Aggregate
Aggregate records the number of replica pods at each phase.
Appears in:
| Field | Description | 
|---|---|
| creatinginteger | Pod has been created, but resources have not been scheduled. | 
| pendinginteger | Pod has been accepted by the system, but one or more of the containers has not been started. This includes time before being bound to a node, as well as time spent pulling images onto the host. | 
| runninginteger | Pod has been bound to a node and all of the containers have been started. At least one container is still running or is in the process of being restarted. | 
| succeededinteger | All containers in the pod have voluntarily terminated with a container exit code of 0, and the system is not going to restart any of these containers. | 
| failedinteger | All containers in the pod have terminated, and at least one container has terminated in failure (exited with a non-zero exit code or was stopped by the system). | 
| unknowninteger | For some reason the state of the pod could not be obtained, typically due to an error in communicating with the host of the pod. | 
| deletedinteger | Pod has been deleted. | 
CleanUpPolicy
Underlying type: string
CleanUpPolicy specifies the collection of replicas that are to be deleted upon job completion.
Appears in:
ContainerStatus
ContainerStatus defines the observed state of the container.
Appears in:
DebugMode
DebugMode configs whether and how to start a job in debug mode.
Appears in:
| Field | Description | 
|---|---|
| enabledboolean | Whether to enable debug mode. | 
| replicaSpecsReplicaDebugSet array | If provided, these specs provide overwriting values for job replicas. | 
FinishRule
A finishRule is a condition used to check if the job has finished. A finishRule identifies a set of replicas, and the controller determines the job’s status by checking the status of all of these replicas.
Appears in:
GenericJob
GenericJob represents the schema for a general-purpose batch job API. While it offers less automation compared to specialized APIs like PyTorchTrainingJob, it allows for greater flexibility in specifying parallel replicas/pods. This design serves as a comprehensive job definition mechanism when more specialized APIs are not applicable or available.
Appears in:
| Field | Description | 
|---|---|
| apiVersionstring | batch.tensorstack.dev/v1beta1 | 
| kindstring | GenericJob | 
| metadataObjectMeta | Refer to Kubernetes API documentation for fields of metadata. | 
| specGenericJobSpec | |
| statusGenericJobStatus | 
GenericJobList
GenericJobList contains a list of GenericJob
| Field | Description | 
|---|---|
| apiVersionstring | batch.tensorstack.dev/v1beta1 | 
| kindstring | GenericJobList | 
| metadataListMeta | Refer to Kubernetes API documentation for fields of metadata. | 
| itemsGenericJob array | 
GenericJobSpec
GenericJobSpec defines the desired state of GenericJob
Appears in:
| Field | Description | 
|---|---|
| successRulesFinishRule array | Rules used to check if a generic job has succeeded. The job succeeded when any one of the successRules is fulfilled. Each item of successRules may refer to a series of replicas, and the job succeeded only if all of the replicas referred in this series are completed successfully. | 
| failureRulesFinishRule array | Rules used to check if a generic job has failed. The job failed when any one of failureRules is fulfilled. Each item of failureRules refers to a series of replicas, and the job failed only if all of the replicas in this series failed. | 
| serviceServiceOption | Details of v1/Service for replica pods. Optional: Defaults to empty and no service will be created. | 
| runModeRunMode | Job running mode. Defaults to Immediate mode. | 
| cleanUpPolicyCleanUpPolicy | To avoid wasting resources on completed tasks, controller will reclaim resource according to the following policies: None: (default) no resources reclamation; Unfinished: only finished pods is to be deleted; All: all the pods are to be deleted. | 
| schedulerSchedulePolicy | If specified, the pod will be dispatched by the specified scheduler. Otherwise, the pod will be dispatched by the default scheduler. | 
| replicaSpecsReplicaSpec array | List of replica specs belonging to the job. There must be at least one replica defined for a Job. | 
GenericJobStatus
GenericJobStatus defines the observed state of GenericJob
Appears in:
| Field | Description | 
|---|---|
| tasksTasks array | An array of status of individual tasks. | 
| phaseJobPhase | Provides a simple, high-level summary of where the Job is in its lifecycle. Note that this is NOT indended to be a comprehensive state machine. | 
| aggregateAggregate | Records the number of replicas at each phase. | 
| conditionsJobCondition array | The latest available observations of a job’s current state. | 
JobCondition
JobCondition describes the current state of a job.
Appears in:
| Field | Description | 
|---|---|
| typeJobConditionType | Type of job condition: Complete or Failed. | 
| statusConditionStatus | Status of the condition, one of True, False, Unknown. | 
| lastTransitionTimeTime | Last time the condition transited from one status to another. | 
| reasonstring | Brief reason for the condition’s last transition. | 
| messagestring | Human readable message indicating details about last transition. | 
JobConditionType
Underlying type: string
JobConditionType defines all possible types of JobStatus. Can be one of: Initialized, Running, ReplicaFailure, Completed, or Failed.
Appears in:
JobPhase
Underlying type: string
Appears in:
PauseMode
PauseMode configs whether and how to start a job in pause mode.
Appears in:
| Field | Description | 
|---|---|
| enabledboolean | Whether to enable pause mode. | 
| resumeSpecsResumeSpec array | If provided, these specs provide overwriting values for job replicas when resuming. | 
ReplicaDebugSet
ReplicaDebugSet describes how to start replicas in debug mode.
Appears in:
| Field | Description | 
|---|---|
| typestring | Replica type. | 
| skipInitContainerboolean | Skips creation of initContainer, if true. | 
| commandstring | Entrypoint array. Optional: Default to [“sleep”, “inf”] | 
ReplicaSpec
ReplicaSpec defines the desired state of replicas.
Appears in:
| Field | Description | 
|---|---|
| typestring | Replica type. | 
| replicasinteger | The desired number of replicas of this replica type. Defaults to 1. | 
| restartPolicyRestartPolicy | Restart policy for replicas of this replica type. One of Always, OnFailure, Never. Optional: Default to OnFailure. | 
| templatePodTemplateSpec | Defines the template used to create pods. | 
ReplicaStatus
ReplicaStatus defines the observed state of the pod.
Appears in:
| Field | Description | 
|---|---|
| namestring | Pod name. | 
| uidUID | Pod uid. | 
| phasePodPhase | Pod phase. The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. | 
| containersContainerStatus array | Containers status. | 
RestartPolicy
RestartPolicy describes how the replica should be restarted.
Appears in:
| Field | Description | 
|---|---|
| policyRestartPolicyType | The policy to restart finished replica. | 
| limitinteger | The maximum number of restarts. Optional: Default to 0. | 
RestartPolicyType
Underlying type: string
Appears in:
ResumeSpec
ResumeSpec describes how to resume replicas from pause mode.
Appears in:
| Field | Description | 
|---|---|
| typestring | Replica type. | 
| skipInitContainerboolean | Skips creation of initContainer, if true. | 
| commandstring | Entrypoint array. Provides overwriting values if provided; otherwise, values in immediate mode are used. | 
| argsstring | Arguments to the entrypoint. Arguments in immediate mode are used if not provided. | 
RunMode
RunMode defines the job’s execution behavior: Immediate mode: (Default) Tasks are executed immediately upon submission. Debug mode: Job pods are created, but regular executions are replaced with null operations (e.g., sleep) for convenient debugging purposes. Pause mode: Job execution is halted, and pods are deleted to reclaim resources. A graceful pod termination process is initiated to allow pods to exit cleanly.
Appears in:
SchedulePolicy
SchedulePolicy signals to K8s how the job should be scheduled.
Appears in:
| Field | Description | 
|---|---|
| t9kSchedulerT9kScheduler | T9k Scheduler. TODO: link to t9k scheduler docs. | 
ServiceOption
Details of a replicas’ servivce.
Appears in:
| Field | Description | 
|---|---|
| portsServicePort array | The list of ports that are exposed by this service. | 
T9kScheduler
T9kScheduler provides additonal configurations needed for the scheduling process.
Appears in:
| Field | Description | 
|---|---|
| queuestring | Specifies the name of the queue should be used for running this workload. TODO: link to t9k scheduler docs. | 
| priorityinteger | Indicates the priority of the PodGroup; valid range: [0, 100]. Optional: Default to 0. | 
Tasks
Task defines the observed state of the task.
Appears in:
| Field | Description | 
|---|---|
| typestring | Replica type. | 
| restartCountinteger | The number of restarts that have been performed. | 
| replicasReplicaStatus array | Replicas status array. |