API Reference¶
Packages¶
batch.tensorstack.dev/v1beta1¶
Package v1beta1 contains API Schema definitions for the batch v1beta1 API group
Resource Types¶
ColossalAIJob¶
ColossalAIJob is the Schema for the colossalaijobs API
Appears in: - ColossalAIJobList
| Field | Description | 
|---|---|
apiVersion string | 
batch.tensorstack.dev/v1beta1 | 
kind string | 
ColossalAIJob | 
metadata ObjectMeta | 
Refer to Kubernetes API documentation for fields of metadata. | 
spec ColossalAIJobSpec | 
|
status ColossalAIJobStatus | 
ColossalAIJobList¶
ColossalAIJobList contains a list of ColossalAIJob.
| Field | Description | 
|---|---|
apiVersion string | 
batch.tensorstack.dev/v1beta1 | 
kind string | 
ColossalAIJobList | 
metadata ListMeta | 
Refer to Kubernetes API documentation for fields of metadata. | 
items ColossalAIJob array | 
ColossalAIJobSpec¶
ColossalAIJobSpec defines the configurations of a ColossalAI training job.
Appears in: - ColossalAIJob
| Field | Description | 
|---|---|
ssh SSHConfig | 
SSH configs. | 
runMode RunMode | 
The desired running mode of the job, defaults to Immediate. | 
runPolicy RunPolicy | 
Controls the handling of completed replicas and other related processes. | 
scheduler SchedulePolicy | 
Specifies the scheduler to request for resources. Defaults to cluster default scheduler. | 
launcher Launcher | 
Specication for the launcher replica. | 
worker Worker | 
Specication for the launcher replica. | 
ColossalAIJobStatus¶
ColossalAIJobStatus describes the observed state of ColossalAIJob.
Appears in: - ColossalAIJob
| Field | Description | 
|---|---|
tasks Tasks array | 
The statuses of individual tasks. | 
aggregate Aggregate | 
The number of replicas in each phase. | 
phase JobPhase | 
Provides a simple, high-level summary of where the Job is in its lifecycle. Note that this is NOT indended to be a comprehensive state machine. | 
conditions JobCondition array | 
The latest available observations of an object's current state. | 
Launcher¶
Specification of replica launcher.
Appears in: - ColossalAIJobSpec
| Field | Description | 
|---|---|
image string | 
Container image name. | 
workingDir string | 
Working directory of container launcher. If not specified, the container runtime's default will be used, which might be configured in the container image. Cannot be updated. | 
env EnvVar array | 
List of environment variables set for the container. Cannot be updated. | 
resources ResourceRequirements | 
Compute Resources required by this container. Cannot be updated. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ | 
RunPolicy¶
RunPolicy dictates specific actions to be taken by the controller upon job completion.
Appears in: - ColossalAIJobSpec
| Field | Description | 
|---|---|
cleanUpWorkers boolean | 
Defaults to false. | 
SSHConfig¶
SSHConfig specifies various configurations for running the SSH daemon (sshd).
Appears in: - ColossalAIJobSpec
| Field | Description | 
|---|---|
authMountPath string | 
SSHAuthMountPath is the directory where SSH keys are mounted. Defaults to "/root/.ssh". | 
sshdPath string | 
The location of the sshd executable file. | 
Worker¶
Specification of the worker replicas.
Appears in: - ColossalAIJobSpec
| Field | Description | 
|---|---|
replicas integer | 
Number of replicas to launch. Defaults to 1. | 
procPerWorker integer | 
The number of processes of a worker. Defaults to 1. | 
command string array | 
Specifies the command used to start the workers. | 
torchArgs string array | 
Args of torchrun. | 
template PodTemplateSpec | 
Template defines the workers that will be created from this pod template. |