跳转至

API Reference

Packages

batch.tensorstack.dev/v1beta1

Package v1beta1 contains API Schema definitions for the batch v1beta1 API group

Resource Types

ColossalAIJob

ColossalAIJob is the Schema for the colossalaijobs API

Appears in: - ColossalAIJobList

Field Description
apiVersion string batch.tensorstack.dev/v1beta1
kind string ColossalAIJob
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec ColossalAIJobSpec
status ColossalAIJobStatus

ColossalAIJobList

ColossalAIJobList contains a list of ColossalAIJob.

Field Description
apiVersion string batch.tensorstack.dev/v1beta1
kind string ColossalAIJobList
metadata ListMeta Refer to Kubernetes API documentation for fields of metadata.
items ColossalAIJob array

ColossalAIJobSpec

ColossalAIJobSpec defines the configurations of a ColossalAI training job.

Appears in: - ColossalAIJob

Field Description
ssh SSHConfig SSH configs.
runMode RunMode The desired running mode of the job, defaults to Immediate.
runPolicy RunPolicy Controls the handling of completed replicas and other related processes.
scheduler SchedulePolicy Specifies the scheduler to request for resources. Defaults to cluster default scheduler.
launcher Launcher Specication for the launcher replica.
worker Worker Specication for the launcher replica.

ColossalAIJobStatus

ColossalAIJobStatus describes the observed state of ColossalAIJob.

Appears in: - ColossalAIJob

Field Description
tasks Tasks array The statuses of individual tasks.
aggregate Aggregate The number of replicas in each phase.
phase JobPhase Provides a simple, high-level summary of where the Job is in its lifecycle. Note that this is NOT indended to be a comprehensive state machine.
conditions JobCondition array The latest available observations of an object's current state.

Launcher

Specification of replica launcher.

Appears in: - ColossalAIJobSpec

Field Description
image string Container image name.
workingDir string Working directory of container launcher. If not specified, the container runtime's default will be used, which might be configured in the container image. Cannot be updated.
env EnvVar array List of environment variables set for the container. Cannot be updated.
resources ResourceRequirements Compute Resources required by this container. Cannot be updated. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

RunPolicy

RunPolicy dictates specific actions to be taken by the controller upon job completion.

Appears in: - ColossalAIJobSpec

Field Description
cleanUpWorkers boolean Defaults to false.

SSHConfig

SSHConfig specifies various configurations for running the SSH daemon (sshd).

Appears in: - ColossalAIJobSpec

Field Description
authMountPath string SSHAuthMountPath is the directory where SSH keys are mounted. Defaults to "/root/.ssh".
sshdPath string The location of the sshd executable file.

Worker

Specification of the worker replicas.

Appears in: - ColossalAIJobSpec

Field Description
replicas integer Number of replicas to launch. Defaults to 1.
procPerWorker integer The number of processes of a worker. Defaults to 1.
command string array Specifies the command used to start the workers.
torchArgs string array Args of torchrun.
template PodTemplateSpec Template defines the workers that will be created from this pod template.