AWS Batch Job

The AWS Batch Job job type submits a job to an AWS Batch queue via SubmitJob, passing Polysync parameters that AWS Batch substitutes into Ref::<name> placeholders in the registered job definition's command and environment variables. The job definition is identified by its name — Polysync stores that in the Job's External Id, and Batch auto-resolves to the latest ACTIVE revision at submit time. To pin a specific revision, set the External Id to <name>:<revision> or the full ARN.

This job type is supported on the AWS Batch platform.

Required job fields

  • External Id — the Batch job definition name (bare name, or <name>:<revision>, or the full ARN).
  • Job TypeAWS Batch Job (set automatically on import).
  • Aws Batch Job Queue (job attribute) — required if the platform's Default Job Queue is not set; otherwise overrides it.

Optional job attributes:

  • Aws Batch Share Identifier — fair-share bucket tag.
  • Aws Batch Scheduling Priority Override — integer; higher means sooner inside the fair-share group.
  • Aws Batch Attempt Duration Seconds — must be ≥ 60.
  • Aws Batch Retry Attempts — must be 1–10.
  • Aws Batch Array Size — must be 2–10000; enables array job mode.
  • Aws Batch Tags — JSON object with string values; applied as Batch tags.
  • Aws Batch Propagate Tags — bool; when true, tags propagate to ECS / EKS resources.
  • Aws Batch Depends On — JSON array of {jobId, type} objects; types are SEQUENTIAL or N_TO_N.

Container resource overrides (vCPUs, Memory, GPUs, command, environment variables) are intentionally not exposed by Polysync — define them on the job definition.

Job discovery

DescribeJobDefinitions(Status="ACTIVE", MaxResults=100, paginated via NextToken). The API returns every active revision — Polysync deduplicates by name and keeps the highest revision per name. Default parameters declared on the definition (Parameters field with Ref:: substitution values) are imported as Polysync input parameters.

Parameter handling

Input + Input&Output parameters are sent as a flat string dictionary in SubmitJob's Parameters:

{
  "JobName": "Polysync-<guid>",
  "JobDefinition": "<external-id>",
  "JobQueue": "<resolved-queue>",
  "Parameters": {
    "<param-1>": "<value-as-string>",
    "<param-2>": "<value-as-string>"
  }
}
Direction Sent in Parameters Updated from response
Input
Output (not supported)
Input&Output (not supported)

All values are strings — Batch substitutes them into Ref::<name> placeholders in the container command and environment at launch time.

Output parameters are not supported. DescribeJobs returns status, status reason, and the container exit code only. Persist outputs to S3 / EFS / CloudWatch Logs.

Execution flow

  1. Polysync resolves the effective queue (job attribute → platform default; error if neither set), generates a unique JobName (Polysync-<guid>, ≤ 128 chars), assembles parameters / tags / depends-on / retry / array configuration, and calls SubmitJob.

  2. The RunId is the raw JobId.

  3. Status is polled via DescribeJobs(Jobs=[JobId]):

    Batch JobStatus Polysync status
    SUBMITTED / PENDING / RUNNABLE / STARTING / RUNNING Running
    SUCCEEDED Success
    FAILED + cancellation StatusReason Cancelled
    FAILED (any other reason) Failed
    (other / null) Unknown

    Cancellation is detected by inspecting StatusReason for "Cancelled", "Canceled", "Terminated", "Job terminated", or the Polysync cancel reason. The message text is the StatusReason (or Container exit code: {n} if available).

  4. Cancel is supported via TerminateJob(JobId, Reason="Cancelled by Polysync") — works on both queued (SUBMITTED / PENDING / RUNNABLE) and running (STARTING / RUNNING) jobs.

Monitor URL

https://{region}.console.aws.amazon.com/batch/home?region={region}
  #jobs/detail/{jobId}

Fair-share scheduling

When the target queue uses a fair-share policy, populate Aws Batch Share Identifier (required by the queue) and optionally Aws Batch Scheduling Priority Override. For non- fair-share queues, leave both blank.

Array jobs

Set Aws Batch Array Size (2–10000) to submit an array job. AWS Batch tracks the array as a single parent job — Polysync's run status reflects the parent's aggregate state; per-child status is not yet exposed.

Best practices

  • Pin a revision on the External Id (<name>:<revision> or full ARN) when running production workloads that need reproducibility.
  • Use Aws Batch Attempt Duration Seconds for runaway-job protection; Aws Batch Retry Attempts for transient-failure retries.
  • For job DAGs spanning multiple Batch jobs, use Polysync Task dependencies rather than Aws Batch Depends On — Polysync only surfaces Batch job IDs after submit, so cross-Batch chains are awkward to wire through the attribute.

Troubleshooting

  • ClientException: Queue does not exist — the queue name on the job attribute / platform default doesn't exist in this region.
  • ClientException: Job definition X does not exist — confirm the name and that at least one revision has Status=ACTIVE.
  • Jobs stuck in RUNNABLE — the compute environment can't satisfy resource requirements. Check vCPU quotas (EC2 / Fargate) and the environment's instance types.
  • Immediate FAILED with Essential container in task exited — the container failed at startup. CloudWatch Logs stream is linked from the job detail page; common causes are missing Ref::X substitutions, missing IAM permissions, or a bad container image.