API Reference¶
Packages¶
batch.tensorstack.dev/v1beta1¶
Package v1beta1 contains API Schema definitions for the batch v1beta1 API group
Resource Types¶
ReplicaSpec¶
ReplicaSpec describes the spec of a replica.
Appears in: - TensorFlowTrainingJobSpec
Field | Description |
---|---|
type ReplicaType |
ReplicaType is the type of the replica, one of "chief ", "worker ", "ps ", or "evaluator ". |
replicas integer |
The desired number of replicas created for the current replica type. If unspecified, defaults to 1. |
template PodTemplateSpec |
Describes the pod that will be created for this replica. Note that RestartPolicy in PodTemplateSpec will always be set to Never as the job controller will create new pods if restart is required. |
restartPolicy RestartPolicy |
The restart policy for this replica, one of Always , OnFailure , Never , or ExitCode . |
ReplicaType¶
Underlying type: string
ReplicaType is the type of the replica, one of "chief
", "worker
", "ps
", or "evaluator
".
Appears in: - ReplicaSpec
RestartPolicy¶
Underlying type: string
RestartPolicy describes how the replicas should be restarted. Can be one of: Always
, OnFailure
, Never
, or ExitCode
.
Appears in: - ReplicaSpec
RunPolicy¶
RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.
Appears in: - TensorFlowTrainingJobSpec
Field | Description |
---|---|
activeDeadlineSeconds integer |
Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer. |
backoffLimit integer |
Optional number of retries before marking this job failed. |
cleanUpPolicy CleanUpPolicy |
Clean the tasks after the training job finished. |
TensorFlowTrainingJob¶
TensorFlowTrainingJob is the Schema for the TensorFlowTrainingJob API.
Appears in: - TensorFlowTrainingJobList
Field | Description |
---|---|
apiVersion string |
batch.tensorstack.dev/v1beta1 |
kind string |
TensorFlowTrainingJob |
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
spec TensorFlowTrainingJobSpec |
|
status TensorFlowTrainingJobStatus |
TensorFlowTrainingJobList¶
TensorFlowTrainingJobList contains a list of TensorFlowTrainingJob
Field | Description |
---|---|
apiVersion string |
batch.tensorstack.dev/v1beta1 |
kind string |
TensorFlowTrainingJobList |
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata . |
items TensorFlowTrainingJob array |
TensorFlowTrainingJobSpec¶
TensorFlowTrainingJobSpec outlines the intended configuration and execution parameters for a TensorFlowTrainingJob.
Appears in: - TensorFlowTrainingJob
Field | Description |
---|---|
replicaSpecs ReplicaSpec array |
Describes the spec of the replicas of the job. |
runMode RunMode |
Job's execution behavior. If omitted, defaults to Immediate mode, and tasks are executed immediately upon submission. |
tensorboardSpec TensorBoardSpec |
Describes the Tensorboard to be created for showing training logs. |
runPolicy RunPolicy |
Execution policy configurations governing the behavior of the TensorFlowTrainingJob. |
scheduler SchedulePolicy |
Identifies the preferred scheduler for allocating resources to replicas. Defaults to cluster default scheduler. |
TensorFlowTrainingJobStatus¶
TensorFlowTrainingJobStatus defines the observed state of TensorFlowTrainingJob
Appears in: - TensorFlowTrainingJob
Field | Description |
---|---|
tasks Tasks array |
The statuses of individual tasks. |
tensorboard DependentStatus |
The status of tensorboard. |
backoffCount integer |
The number of restarts being performed. |
aggregate Aggregate |
|
conditions JobCondition array |
Represents the latest available observations of a TensorFlowTrainingJob's current state. |
phase JobPhase |
Phase is the phase-style status of the TensorFlowTrainingJob. |