API Reference¶
Packages¶
batch.tensorstack.dev/v1beta1¶
Package v1beta1 contains API Schema definitions for the batch v1beta1 API group
Resource Types¶
ColossalAIJob¶
ColossalAIJob is the Schema for the colossalaijobs API
Appears in: - ColossalAIJobList
Field | Description |
---|---|
apiVersion string |
batch.tensorstack.dev/v1beta1 |
kind string |
ColossalAIJob |
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
spec ColossalAIJobSpec |
|
status ColossalAIJobStatus |
ColossalAIJobList¶
ColossalAIJobList contains a list of ColossalAIJob.
Field | Description |
---|---|
apiVersion string |
batch.tensorstack.dev/v1beta1 |
kind string |
ColossalAIJobList |
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata . |
items ColossalAIJob array |
ColossalAIJobSpec¶
ColossalAIJobSpec defines the configurations of a ColossalAI training job.
Appears in: - ColossalAIJob
Field | Description |
---|---|
ssh SSHConfig |
SSH configs. |
runMode RunMode |
The desired running mode of the job, defaults to Immediate . |
runPolicy RunPolicy |
Controls the handling of completed replicas and other related processes. |
scheduler SchedulePolicy |
Specifies the scheduler to request for resources. Defaults to cluster default scheduler. |
launcher Launcher |
Specication for the launcher replica. |
worker Worker |
Specication for the launcher replica. |
ColossalAIJobStatus¶
ColossalAIJobStatus describes the observed state of ColossalAIJob.
Appears in: - ColossalAIJob
Field | Description |
---|---|
tasks Tasks array |
The statuses of individual tasks. |
aggregate Aggregate |
The number of replicas in each phase. |
phase JobPhase |
Provides a simple, high-level summary of where the Job is in its lifecycle. Note that this is NOT indended to be a comprehensive state machine. |
conditions JobCondition array |
The latest available observations of an object's current state. |
Launcher¶
Specification of replica launcher
.
Appears in: - ColossalAIJobSpec
Field | Description |
---|---|
image string |
Container image name. |
workingDir string |
Working directory of container launcher . If not specified, the container runtime's default will be used, which might be configured in the container image. Cannot be updated. |
env EnvVar array |
List of environment variables set for the container. Cannot be updated. |
resources ResourceRequirements |
Compute Resources required by this container. Cannot be updated. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
RunPolicy¶
RunPolicy dictates specific actions to be taken by the controller upon job completion.
Appears in: - ColossalAIJobSpec
Field | Description |
---|---|
cleanUpWorkers boolean |
Defaults to false. |
SSHConfig¶
SSHConfig specifies various configurations for running the SSH daemon (sshd).
Appears in: - ColossalAIJobSpec
Field | Description |
---|---|
authMountPath string |
SSHAuthMountPath is the directory where SSH keys are mounted. Defaults to "/root/.ssh". |
sshdPath string |
The location of the sshd executable file. |
Worker¶
Specification of the worker replicas.
Appears in: - ColossalAIJobSpec
Field | Description |
---|---|
replicas integer |
Number of replicas to launch. Defaults to 1. |
procPerWorker integer |
The number of processes of a worker. Defaults to 1. |
command string array |
Specifies the command used to start the workers. |
torchArgs string array |
Args of torchrun. |
template PodTemplateSpec |
Template defines the workers that will be created from this pod template. |