API Reference

Packages

batch.tensorstack.dev/v1beta1

Package v1beta1 contains API Schema definitions for the batch v1beta1 API group

Resource Types

ColossalAIJob

ColossalAIJob is the Schema for the colossalaijobs API

Appears in:

FieldDescription
apiVersion stringbatch.tensorstack.dev/v1beta1
kind stringColossalAIJob
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec ColossalAIJobSpec
status ColossalAIJobStatus

ColossalAIJobList

ColossalAIJobList contains a list of ColossalAIJob.

FieldDescription
apiVersion stringbatch.tensorstack.dev/v1beta1
kind stringColossalAIJobList
metadata ListMetaRefer to Kubernetes API documentation for fields of metadata.
items ColossalAIJob array

ColossalAIJobSpec

ColossalAIJobSpec defines the configurations of a ColossalAI training job.

Appears in:

FieldDescription
ssh SSHConfigSSH configs.
runMode RunModeThe desired running mode of the job, defaults to Immediate.
runPolicy RunPolicyControls the handling of completed replicas and other related processes.
scheduler SchedulePolicySpecifies the scheduler to request for resources. Defaults to cluster default scheduler.
launcher LauncherSpecication for the launcher replica.
worker WorkerSpecication for the launcher replica.

ColossalAIJobStatus

ColossalAIJobStatus describes the observed state of ColossalAIJob.

Appears in:

FieldDescription
tasks Tasks arrayThe statuses of individual tasks.
aggregate AggregateThe number of replicas in each phase.
phase JobPhaseProvides a simple, high-level summary of where the Job is in its lifecycle. Note that this is NOT indended to be a comprehensive state machine.
conditions JobCondition arrayThe latest available observations of an object’s current state.

Launcher

Specification of replica launcher.

Appears in:

FieldDescription
image stringContainer image name.
workingDir stringWorking directory of container launcher. If not specified, the container runtime’s default will be used, which might be configured in the container image. Cannot be updated.
env EnvVar arrayList of environment variables set for the container. Cannot be updated.
resources ResourceRequirementsCompute Resources required by this container. Cannot be updated. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

RunPolicy

RunPolicy dictates specific actions to be taken by the controller upon job completion.

Appears in:

FieldDescription
cleanUpWorkers booleanDefaults to false.

SSHConfig

SSHConfig specifies various configurations for running the SSH daemon (sshd).

Appears in:

FieldDescription
authMountPath stringSSHAuthMountPath is the directory where SSH keys are mounted. Defaults to “/root/.ssh”.
sshdPath stringThe location of the sshd executable file.

Worker

Specification of the worker replicas.

Appears in:

FieldDescription
replicas integerNumber of replicas to launch. Defaults to 1.
procPerWorker integerThe number of processes of a worker. Defaults to 1.
command string arraySpecifies the command used to start the workers.
torchArgs string arrayArgs of torchrun.
template PodTemplateSpecTemplate defines the workers that will be created from this pod template.