API Reference
Packages
batch.tensorstack.dev/v1beta1
Package v1beta1 contains API Schema definitions for the batch v1beta1 API group
Resource Types
ColossalAIJob
ColossalAIJob is the Schema for the colossalaijobs API
Appears in:
Field | Description |
---|---|
apiVersion string | batch.tensorstack.dev/v1beta1 |
kind string | ColossalAIJob |
metadata ObjectMeta | Refer to Kubernetes API documentation for fields of metadata . |
spec ColossalAIJobSpec | |
status ColossalAIJobStatus |
ColossalAIJobList
ColossalAIJobList contains a list of ColossalAIJob.
Field | Description |
---|---|
apiVersion string | batch.tensorstack.dev/v1beta1 |
kind string | ColossalAIJobList |
metadata ListMeta | Refer to Kubernetes API documentation for fields of metadata . |
items ColossalAIJob array |
ColossalAIJobSpec
ColossalAIJobSpec defines the configurations of a ColossalAI training job.
Appears in:
Field | Description |
---|---|
ssh SSHConfig | SSH configs. |
runMode RunMode | The desired running mode of the job, defaults to Immediate . |
runPolicy RunPolicy | Controls the handling of completed replicas and other related processes. |
scheduler SchedulePolicy | Specifies the scheduler to request for resources. Defaults to cluster default scheduler. |
launcher Launcher | Specication for the launcher replica. |
worker Worker | Specication for the launcher replica. |
ColossalAIJobStatus
ColossalAIJobStatus describes the observed state of ColossalAIJob.
Appears in:
Field | Description |
---|---|
tasks Tasks array | The statuses of individual tasks. |
aggregate Aggregate | The number of replicas in each phase. |
phase JobPhase | Provides a simple, high-level summary of where the Job is in its lifecycle. Note that this is NOT indended to be a comprehensive state machine. |
conditions JobCondition array | The latest available observations of an object's current state. |
Launcher
Specification of replica launcher
.
Appears in:
Field | Description |
---|---|
image string | Container image name. |
workingDir string | Working directory of container launcher . If not specified, the container runtime's default will be used, which might be configured in the container image. Cannot be updated. |
env EnvVar array | List of environment variables set for the container. Cannot be updated. |
resources ResourceRequirements | Compute Resources required by this container. Cannot be updated. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
RunPolicy
RunPolicy dictates specific actions to be taken by the controller upon job completion.
Appears in:
Field | Description |
---|---|
cleanUpWorkers boolean | Defaults to false. |
SSHConfig
SSHConfig specifies various configurations for running the SSH daemon (sshd).
Appears in:
Field | Description |
---|---|
authMountPath string | SSHAuthMountPath is the directory where SSH keys are mounted. Defaults to "/root/.ssh". |
sshdPath string | The location of the sshd executable file. |
Worker
Specification of the worker replicas.
Appears in:
Field | Description |
---|---|
replicas integer | Number of replicas to launch. Defaults to 1. |
procPerWorker integer | The number of processes of a worker. Defaults to 1. |
command string array | Specifies the command used to start the workers. |
torchArgs string array | Args of torchrun. |
template PodTemplateSpec | Template defines the workers that will be created from this pod template. |