SageMaker Pipeline Job

The SageMaker Pipeline job type starts an Amazon SageMaker Pipeline execution via StartPipelineExecution, passing declared pipeline parameters as (Name, Value) pairs. The pipeline is identified by its name (the SageMaker Pipelines API is name-based) — Polysync stores that in the Job's External Id.

This job type is supported on the Amazon SageMaker platform.

Required job fields

  • External Id — the SageMaker pipeline name.
  • Job TypeSageMaker Pipeline (set automatically on import).

Optional job attributes:

  • SageMaker Execution Display Name — friendly display name shown in the SageMaker console.
  • SageMaker Execution Description — free-text description for this execution.
  • SageMaker Max Parallel Execution Steps — integer ≥ 1, applied as ParallelismConfiguration.MaxParallelExecutionSteps.

Job discovery

ListPipelines (paginated, MaxResults=100), then DescribePipeline per pipeline to capture description and display name. If the pipeline definition is inlined on the response (PipelineDefinition), Polysync parses the Parameters array and imports the declared parameter names as suggested inputs. If the definition lives in S3 (PipelineDefinitionS3Location) or cannot be parsed, parameter import is silently skipped.

Parameter handling

Input + Input&Output parameters are sent as PipelineParameters (Name, Value) pairs:

{
  "PipelineName": "<external-id>",
  "PipelineParameters": [
    { "Name": "<param-1>", "Value": "<value-as-string>" },
    { "Name": "<param-2>", "Value": "<value-as-string>" }
  ],
  "ClientRequestToken": "polysync-<guid>"
}
Direction Sent in PipelineParameters Updated from response
Input
Output (not supported)
Input&Output (not supported)

All values are sent as strings — SageMaker resolves the declared parameter type (String / Integer / Float / Boolean) server-side based on the pipeline definition. Mismatches raise ValidationException at start time.

Output parameters are not supported. DescribePipelineExecution returns status and timing only. Per-step artefacts (training models, evaluation metrics, processing outputs) land in S3 and the Model Registry.

Execution flow

  1. Polysync calls StartPipelineExecution(PipelineName, PipelineParameters, ClientRequestToken, …) optionally with display name, description, and ParallelismConfiguration.

  2. The RunId is the raw PipelineExecutionArn.

  3. Status is polled via DescribePipelineExecution(PipelineExecutionArn):

    SageMaker PipelineExecutionStatus Polysync status
    Executing / Stopping Running
    Succeeded Success
    Failed Failed
    Stopped Cancelled
    (other / null) Unknown

    FailureReason (when present) — otherwise PipelineExecutionDescription — is surfaced on the run message.

  4. Cancel is supported via StopPipelineExecution(PipelineExecutionArn, ClientRequestToken="polysync-cancel-<guid>").

Monitor URL

https://{region}.console.aws.amazon.com/sagemaker/home?region={region}
  #/pipelines/{pipelineName}/executions/{executionArn}

Best practices

  • Declare every pipeline parameter explicitly in the pipeline definition's Parameters block — that lets Polysync's importer seed the parameter schema automatically.
  • Wrap SageMaker training, processing, transform, and tuning jobs inside pipeline steps (TrainingStep, ProcessingStep, …) rather than invoking them separately — pipelines are the AWS- recommended orchestration surface and the only SageMaker resource Polysync currently exposes.
  • Use SageMaker Max Parallel Execution Steps to cap concurrency when fan-out steps would otherwise saturate quotas.

Troubleshooting

  • AccessDeniedException on StartPipelineExecution — the caller is missing sagemaker:StartPipelineExecution on the pipeline ARN, or iam:PassRole on the execution role (when one is supplied at start time).
  • ValidationException on parameter type — the supplied value doesn't match the declared type in the pipeline definition.
  • ResourceLimitExceeded — a SageMaker quota (concurrent executions, training jobs, endpoints) has been hit. Request a quota increase from AWS Service Quotas.